Pitch shifting ed
Some adventure story of writing a pitch shifter. With incorrect, but interesting ideas.
Table of Contents
Naive approach ed
- Task (almost)
As with the equalizers, this screams to be solved via FFT:
- transform the whole signal into the frequency domain
- remap any component \(z_i\) to the new array location \(z_j\) with \(j = i \cdot K\)
- transform back into the time domain
Mild obstacles ed
First, we notice, that most new array locations \(i \cdot K\) don't fall on integer indices. When transforming a whole 5 min song, we can easily round up or down without any audible problems. Also multiple components mapped to the same location can be summed up.
Things get complicated, when the signal comes in small chunks (\(\le\) 1024 samples). Suddenly, rounding introduces audible detuning and harmonic changes.
Simple fix ed
If the new index \(i \cdot K\) falls between two integers \(j\) and \(j+1\), add the component to both array locations with weights \((\frac{1}{2}, \frac{1}{2})\). (smoothly going to weights \((1,0)\) if we hit an integer index).
This is still bad! Now, a single sine wave input will create dissonant output when spread over two close frequencies.
Fancier version, interesting failure ed
Actually, what is the perfect output of a sine wave input?
Let's say the input frequency is one of the FFT's frequencies. Then after the FFT, we get an array \((\dots,0,0,z,0,0,\dots)\) with a single entry.
Of course, the output should again be a sine wave, but not hitting an FFT frequency, so the FFT'ed output will be a very messy array. With some patience, we can compute the components of this array as a sinc function, with its peak at the location \(i \cdot K\) but some decaying mess everywhere.
So, to create a pitch shifter, we need to take each component in the FFT array, use it to scale a sinc-function, sum it all up (and FFT back). This could be done with a matrix. An approximation would just interpolate between more neighbours around the index \(i \cdot K\).
But...