I've always known the wave equation since even before taking a proper PDE lecture, from browsing the internet and watching videos, but I never saw how it came to be derived. Like, why does this describe a wave? $$\frac{\partial^2 u}{\partial t^2} = c^2 \frac{\partial^2 u}{\partial x^2}$$ I decided to read Stein and Shakarchi's book on Fourier Analysis (as it is part one of a tetralogy, and I feel like I really need more insight on this topic), and a nice derivation is found right in the beginning.
The idea is to approximate the wave as a string of finitely many points evenly spaced out. If we consider a point $x_n$ of the wave, we know this point will neighbor $x_{n-1}$ and $x_{n+1}$. If the wave is completely flat, then nothing will happen. If we were to pluck the wave downward at the point $x_n$, now there will be a "tension" between its neighbors, which we shall assume is proportional to how far we're plucking down.
More rigorously, once we release our finger (that was plucking down the string at the point $x_n$) there will be a force acting upwards. Suppose $\rho > 0$ represents the density of the material of the wave/string and the wave is thin enough that we can represent its volume as merely the distance between each point (call it $h > 0$). So the mass of each point is $\rho h$. Using $F = ma$ we know that at a given time $t$, $$F_n(t) = \rho h y_n''(t)$$ where $y_n$ represents the vertical position of the point $x_n$ at the given time $t$. We know this force upwards is caused by how the neighboring points $x_{n-1}$ and $x_{n+1}$ are higher and are "pulling" on $x_n$. It makes sense to assume this force is proportional to the height difference, since whenever $y_{n-1}(t) = y_n(t) = y_{n+1}(t)$ it means the wave is flat and so doesn't move. How much the points pull on each other depends on the tension of the wave/string (which involves how much it's being stretched, like when we tune a guitar string, we tighten it more and more over time, which makes it snap after a long time), which we can call $\tau > 0$, and is inversely proportional to how far apart the points are (the farther apart, the less the height matters). Therefore, we can say $$F_n(t) = \frac \tau h (y_{n-1}(t) - y_n(t)) + \frac \tau h (y_n(t) - y_{n+1}(t))$$ Joining these two equations together, we get $$\rho h y_n''(t) = \frac \tau h (y_{n-1}(t) + y_{n+1}(t) - 2y_n(t))$$ which we rearrange into $$\frac \rho \tau y_n''(t) = \frac {y_{n-1}(t) + y_{n+1}(t) - 2y_n(t)} {h^2}$$ We can think of a wave as a function $u(x, t)$ that depends on both position and time, in the sense that if we want to ask "what is the height of the wave?" we need to know at which point and when. If the wave starts at $0$, then that means $y_0 = 0$, $y_1 = h$, $y_n = nh$, and so on. This means that if we were to define $x_n = x$, then $x_{n-1} = x - h$ and $x_{n+1} = x + h$. Thus, we can rewrite the formula above as $$\frac \rho \tau \frac{\partial^2 u}{\partial t^2}(x, t) = \frac {u(x - h, t) + u(x + h, t) - 2u(x, t)} {h^2}$$ where $y_n''(t)$ was the second derivative at $x$ with respect of $t$ which is time, so that's why it was replaced with $\partial^2 u / \partial t^2$. For the right-hand side, we can make the very simple notation of $$f(x) := u(x, t)$$ which simplifies the argument below.
Theorem. If a function $f$ is two-times differentiable at a point $x$, then $$\lim_{h \to 0} \frac {f(x - h) + f(x + h) - 2f(x)} {h^2} = f''(x)$$
Proof. We see that "when $h=0$" the limit becomes $0/0$, so we can use L'Hopital's rule (on $h$, not on $x$) to derive $$\lim_{h \to 0} \frac {-f'(x - h) + f'(x + h)} {2h}$$ which we now rewrite as $$\begin{alignat*}{1} & \lim_{h \to 0} \frac {-f'(x - h) + f'(x) - f'(x) + f'(x + h)} {2h} \\ = & \lim_{h \to 0} \frac 1 2 \left(\frac{f'(x)-f'(x - h)}{h} + \frac{f'(x+h)-f'(x)}{h}\right) \\ = & \frac 1 2 \left( \lim_{h \to 0} \frac{f'(x)-f'(x - h)}{h} + \lim_{h \to 0} \frac{f'(x+h)-f'(x)}{h}\right) \\ = & \frac 1 2 (f''(x) + f''(x)) = f''(x) \quad \square \end{alignat*}$$
Using this theorem, and knowing that $f''(x)$ is a second-derivative with respect to position, $$f''(x) = \frac{\partial^2 u}{\partial x^2}(x, t)$$ we know what happens when our approximation gets finer and finer. When $h \to 0$, our approximation converges to $$\frac \rho \tau \frac{\partial^2 u}{\partial t^2}(x, t) = \frac{\partial^2 u}{\partial x^2}(x, t)$$ We're honestly done. The only difference from what I showed in the beginning is notation. People often define $$c := \sqrt{{\tau}/{\rho}}$$ because the solution to the wave equation involves the term $\sqrt{{\tau}/{\rho}}$, so it's easier to just write $c$ instead. The math here was not very rigorous, but when we're using math to represent a physical phenomenon, that's to be expected.