THX Logo Theme

Here's a preview of what we're about to create in Python.

Out[25]:

What is the THX logo theme?

If you're not familiar, here is the trademark sound:

Out[4]:

Following the instructions in the score, here is a simpler version of the THX logo theme. It may not be exactly the same, but it comes close.

Signal Model

For voice $i$, we have the following sinusoid with a time-dependent amplitude and frequency:

$$ x_i(t) = A(t) \sin(2 \pi f_i(t) t) $$

The output is a sum of the voices:

$$ y(t) = \sum_i x_i(t) = A(t) \sum_i \sin(2 \pi f_i(t) t) $$

In discrete time, for sampling rate/frequency $f_s$,

$$ y[n] = \sum_i x_i[n] = A[n] \sum_i \sin(2 \pi f_i[n] n/f_s) $$

Stages

We define four stages in the signal:

  1. During the start, voice frequencies move randomly within a narrow frequency range, and voices are quiet.
  2. During the transition, voices move toward their target frequency, and voices become louder.
  3. During the sustain, voices remain at their target frequency, and voices are loud.
  4. During the decay, voices become quieter.

Here are the times when the start, transition, sustain, and decay stages begin. t_end is the end of the signal.

In [5]:
t_start = 0
t_transition = 4
t_sustain = 10
t_decay = 14
t_end = 16

Define a sample rate:

In [6]:
sr = 44100

Convert these times to units of samples:

In [7]:
n_start = t_start*sr
n_transition = t_transition*sr
n_sustain = t_sustain*sr
n_decay = t_decay*sr
n_end = t_end*sr

Define time axis in units of samples and seconds:

In [8]:
n = numpy.arange(n_end)
t = n/sr

Amplitude

For each of the four stages above, define amplitude envelope, $A(t)$.

The start stage, sustain stage, and end will have constant amplitudes:

In [9]:
amplitude_start = 0.1
amplitude_sustain = 1
amplitude_end = 0.01

The transition and decay stages will increase or decrease following a geometric/exponential sequence:

In [10]:
amplitude_transition = numpy.geomspace(amplitude_start, amplitude_sustain, n_sustain-n_transition)
amplitude_decay = numpy.geomspace(amplitude_sustain, amplitude_end, n_end-n_decay)

Create the amplitude envelope:

In [11]:
amplitude = numpy.zeros(n_end)
amplitude[n_start:n_transition] = amplitude_start
amplitude[n_transition:n_sustain] = amplitude_transition
amplitude[n_sustain:n_decay] = amplitude_sustain
amplitude[n_decay:n_end] = amplitude_decay

Plot amplitude envelope with a log-y axis:

Out[12]:
Text(0,0.5,'Amplitude Envelope')

Frequency

Revisiting the model above, for voice $i$,

$$ x_i[n] \propto \sin(2 \pi f_i[n] n/f_s) $$

Define $\phi_i[n] = f_i[n] n$. Therefore, $ x_i[n] \propto \sin((2 \pi/f_s) \phi_i[n] ) $.

Suppose $f_i$ remains constant over time. Then we have the following update rule:

$$ \phi[n] - \phi[n-1] = f_i n - f_i(n-1) = f_i $$

$$ \Rightarrow \phi[n] = \phi[n-1] + f_i $$

Now suppose $f_i[n]$ changes over time. Then we can simply use the following update rule instead:

$$ \phi[n] = \phi[n-1] + f_i[n] $$

This update rule is simply a cumulative sum of $f_i[n]$!

Let's define the frequencies during the final sustain phase, i.e. the final four measures. We tune the concert-A to be 448 Hz which seems to match the original recording. Then we simply define notes at A and D at different octaves. We use Pythagorean tuning, i.e. perfect fifths have a frequency ratio of 3:2, and octaves have a frequency ratio of 2:1.

In [13]:
f_sustain = [448*x for x in [1/12, 1/8, 1/6, 1/4, 1/3, 1/2, 2/3, 1, 4/3, 2, 8/3, 4, 16/3, 8]] * 3

Total number of voices:

In [14]:
num_voices = len(f_sustain)
num_voices
Out[14]:
42

Then, we create our time-dependent frequecies for each voice.

In [16]:
phi = [None]*num_voices
f_lo = librosa.note_to_hz('C3')
plt.figure(figsize=(15, 9))

for voice in range(num_voices):
    
    # Initialize frequency array.
    f = numpy.zeros(n_end)

    # Initial frequencies are random between C3 and C5.
    f1 = f_lo*4**numpy.random.rand()
    f2 = f_lo*4**numpy.random.rand()
    f_transition = f_lo*4**numpy.random.rand()

    # Define one random time point during the start stage.
    n1 = numpy.random.randint(n_start, n_transition)

    # Assign frequencies to each stage.
    f[n_start:n1] = numpy.geomspace(f1, f2, n1-n_start)
    f[n1:n_transition] = numpy.geomspace(f2, f_transition, n_transition-n1)
    f[n_transition:n_sustain] = numpy.geomspace(f_transition, f_sustain[voice], n_sustain-n_transition)
    f[n_sustain:n_end] = f_sustain[voice]
    
    # Perform the update rule. Easy!
    phi[voice] = numpy.cumsum(f)
    
    # Plot frequency.
    plt.semilogy(t, f)
    
plt.xlabel('Time (seconds)')
plt.ylabel('Frequency (Hertz)')
plt.ylim(fmin, fmax)
Out[16]:
(16.351597831287414, 8372.018089619156)

Synthesize output

Initialize the output signal:

In [17]:
out_signal = numpy.zeros(n_end)

Finally, construct the output signal, one voice at a time:

In [18]:
for voice in range(num_voices):
    out_signal += numpy.sin(2*numpy.pi/sr*phi[voice])

Normalize the signal amplitude:

In [19]:
out_signal *= amplitude
out_signal *= (0.99/out_signal.max())

Listen:

Out[20]:

Pretty cool!

Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x10c3f3828>

How does it compare to the original?

In [22]:
thx, sr_thx = librosa.load('audio/thx_original.mp3')
Out[23]:
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x10c457c18>