Complementing all the other answers, it's apt to trace a bit of history germane to the topic in hand.
Edgeworth in investigating the differences of two means, taking a cue from Laplace's empirical work on calculating the difference of two means of $400$ respective barometric observations taken at different times (cf. $[\rm I]$), used the notion of measuring the observed divergence to assess whether the excess was "accidental" or due to a mere chance or whether there was a "constant cause". He introduced a pre-specified constant (it was $2\sqrt 2$) for his rejection rule. As $[\rm II] $ writes:
... the difference between the two means could not be justified as "accidental" and it would appear to be significant.
Fast forward to Fisher who formally introduced the concept of null hypothesis, which is
a statement about the underlying statistical model.
He developed the concept of probability-value or p-value to evaluate the plausibility of the null hypothesis -- that is, by p-value, he measured to what extent
the sample realization leads credence to the null hypothesis.
Fisher's philosophy was clear: it was inferential. The implicit alternative hypothesis (retroactively) would be any possible statistical model that goes beyond the boundaries of the postulated ones (more to that later).
As mentioned in the last para of Michael Lew's answer, Fisher never wanted to end the investigation by some sort of mechanism at the end of calculating the p-value. Rather it would mark a broader investigation.
This brings us to decision-making framework of Neyman & Pearson, who didn't like the ad-hoc choice of a test statistic and the subsequent use of p-value. The solution they proposed was to have a "choice between rival hypotheses" and thus focusing on whether to reject or accept the null hypothesis instead of simply to measure the extent of (or lack thereof) legitimacy the observed data bestowed upon the null hypothesis.
This would turn the testing problem into a problem of optimization: fix the probability of type I error ($\alpha$ -- the size) and minimize the probability of type II error. The test statistic $\tau(\mathbf X) ,$ based on a distance function from the postulated parameter, if results in a difference "significantly different from zero" (that is $|\tau(\mathbf X) | > c_\alpha,$ the latter being the constant which determines the rejection region of size $\alpha$), leads to a rejection of the null hypothesis and the $\alpha$ is sometimes termed as significance level.
Thus the concepts of p-value and significance level emanated from two different philosophical frameworks, which can be summed up in viewing their respective alternative hypotheses -- if $\boldsymbol\Phi$ represents a specification of probability model, and we postulate that the true probability distribution $f(\mathbf X) $ belongs to proper subset that is, $f\in \boldsymbol\Phi_0\subset \boldsymbol\Phi,$ in NP framework, it would be
$$\begin{align}\mathrm H_0&:= f\in \boldsymbol\Phi_0\\\textrm{against}\\\mathrm H_1&:=f\in \boldsymbol\Phi\setminus \boldsymbol\Phi_0\end{align};$$
whereas in Fisher's framework, it would be
$$\begin{align}\mathrm H_0&:= f\in \boldsymbol\Phi_0\\\textrm{against}\\\mathrm H_1&:=f\in \boldsymbol{\mathcal P}\setminus \boldsymbol\Phi_0\end{align},$$
where $\boldsymbol{\mathcal P}$ represents the collection of all possible statistical models.
Much to the chagrin of both Fisher and Neyman-Pearson, modern day statisticians justify the use of p-value in an N-P test (a "monstrous hybrid"), calling it the observed significance level as
... the critical value $c_\alpha$ depends on the significance level $\alpha$, which is often arbitrary, except for the requirement of being "small"...
As Gigerenzer lamented
... the significance level is a property of the test itself, irrespective of any observed data, but the p-value is a measure which is inextricably bound up with the specific data under consideration
which again reflects the difference between the above two approaches.
--
References:
$\rm[I]$ On the Determination of the Modulus of Errors, F. Y. Edgeworth, $1886,$ Phil. Mg. $21,$ pp. $500 - 507.$
$\rm[II]$ Probability Theory and Statistical Inference: Econometric Modeling with Observational Data, Aris Spanos, Cambridge University Press, $1999, $ ch. $14.$