Artificial and biological agents are often tasked with making early decisions to maximize reward rate. Here, we provide a theory of time-constrained decision-making for a task in which subjects observe a random walk and must use prediction to capitalize on early decisions. We support the theory with (1) the construction of an optimal solution, (2) a qualitative match to neural recordings from non-human primates and human behavioural data, and (3) a neurally plausible reinforcement learning algorithm. The urgency signal from urgency-gating models of neural decision-making, as measured in non-human primates performing this task, is qualitatively matched by this theory for a minimally-biased prior distribution of reward rate, which leads to the observed hyperbolic form of temporal discounting. Our work suggests that the urgency signal, rather than being an ad hoc bias for early decisions, is an online estimate of return-on-(time)investment.