This process allows one to change pitch without changing the length of the soundfile
or to change length without changing pitch. It does this by extracting amplitude and
phase information for from 8 to 4096 frequency bands with a bank of filters. If time
stretching is desired these phase and amplitude envelopes are lengthened (or shortened,
for time compression), and then given to a bank of oscillators with corresponding
frequencies to the filters. For pitch shifting, the envelopes are untouched, and given
to a bank of oscillators with frequencies related by the pitch ratio.
To use the phase vocoder, set the number of Bands to the number of
filter-oscillator pairs one would like to use. A large number of bands will give one
better frequency resolution, a small number of bands will give one better time
resolution. The Window menu allows one to chose different pre-FFT windows for
different filtering characteristics. Only the Hamming, von Hann and Kaiser will give
good results (the others are there only because I wanted to use a single menu
throughout the program for all window selection). The Overlap setting adjusts the
size of the filter window (relative to the number of filter bands) for analysis and
synthesis and thus, the sharpness of the filter. A large setting (4x) will give the
sharpest filter. A sharper filter will differentiate better between frequencies which
are between bands, but responds to amplitude changes slower. Click the Time Scale
button for time scaling, Pitch Scale for pitch scaling. Type the scale factor in
the Scale box. Click on the word Scale (a pop-up menu) to specify time
scaling by the length desired, or pitch scaling by equal tempered semitones. If one wants
the time expansion factor or the pitch transposition factor to change during processing,
click the Scaling Function box, and the Draw Function... button. This will
bring up the Draw Function dialog, which is
described below.
Resynthesis Gating performs a simple spectral gate which lets only some of the
spectral data through. If a band is below the Minimum Amplitude it is not let
through. Threshold Under Max. cuts off all bands which are lower than the
threshold below the peak band in a given block of samples. So if the peak band has an
amplitude of -7 dB and Threshold Under Max. is set to -40 dB, all bands below
-47 dB will be cut off.
This process performs standard dynamics processing (gating, ducking, expansion,
compression) on each spectral band individually. It has individual threshold detection
for each band, so that one band could have the dynamics process active, while another
is inactive. The process can be limited to affect only a specific frequency range. One
can select whether to affect sounds which are above the threshold or sounds which are
below the threshold. The threshold level can be set to one value for all bands, or it
can be set to a different value for each band by reading in and analyzing a soundfile.
This soundfile's amplitude spectrum is used for the thresholds for each band. This is
especially useful if there is a sound that one wants to emphasize or deemphasize (hiss
or hum).
Most controls are self-explanatory. The first popup menu allows you to select the type
of process to use; gating/ducking, expansion or compression. The second popup sets the
number of filter bands to separate the sound into. 512 is a good compromise for the
number of bands at a 44100 sample rate as each band is about 43 Hz apart and the filters
used have a (512*2)/44100 or .023 second delay. In other words, a pretty good frequency
resolution (provided no partials are closer than 43 Hz) and not too much time smearing.
The Highest Band and Lowest Band boxes allow one to limit the frequency
range affected. The Gain/Reduction box allows you to set the amount of gain or
reduction for the bands which are past the threshold. For compression and expansion
this box becomes the ratio. When affecting sounds below the threshold, the compressor
and expander hold the highest level steady and affect lower levels (also known as
"downward" expansion or compression) . When the process is set to affect
sounds above the threshold, the compressor and expander hold the threshold level
steady and compress/expand up from there.
Use Smoothed Amplitude is a way to avoid abrupt gating. Instead of comparing the
input soundfile directly to the threshold, an windowed average of the input is compared
to the threshold. This setting allows one to reduce the "martian voices" effect (a common
problem in spectral dynamics processing) and is particularly nice for hiss and hum
removal. Attack/Decay Time allows one to set the speed that each triggered band opens or
closes. The default is the minimum time for the number of bands used. If this value is
set too slow, one loses transients, too fast and you start modulating the soundfile
(whistling sounds). Threshold Level is where you set the threshold level! If you
check Threshold Relative to Peak Amp., the threshold is now set relative to the
peak amplitude for each block of sound processed. For example, if you are using the
spectral gate, and the loudest frequency band for the current block of sound has an
amplitude of -12 dB and the threshold is set at -40 dB, the gate will be active for
sounds below -52 dB.
As a sample rate converter, this is slower and maybe less accurate than the Sound
Designer II or Alchemy software (then again maybe not), so this function may not be
useful to those who own that software. However, SoundHack includes a variable sample
rate conversion utility (varispeed). The Varispeed box enables this feature. The
Varispeed Function... button will bring up the Draw Function
dialog, giving one control over a 10 octave varispeed. The Quality buttons give
one control over the size of smoothing filter used, and the resultant quality of
interpolation/decimation. The Vary by Scale and Vary by Pitch buttons
allow one to draw a curve for either pitch or scaling factor.
This function attempts to separate the stable (pitched) and transient (unpitched)
parts of a sound. It does this by measuring the speed of frequency deviation. If the
deviation is too quick, it is marked unpitched information and output to the transient
soundfile. If it is too stable, it is marked pitched information and output to the
stable soundfile.
You can control the separation with this dialog box. Set the Bands to a high
number if the sound being processed is harmonically dense, otherwise keep it around 512.
Setting the number of Frames allows you to set the size of the analysis frame (in
multiples of FFT frames). Set this higher if you are having difficulty separating the
pitched material, lower if you are having difficulty separating the transient material.
The two frequency values specify the amount of change allowed during each analysis
frame. In this example, if the harmonic deviates by more than 5 hertz in 0.035 seconds
it is put into the transient soundfile, if the harmonic deviates by less than 2 hertz in
0.035 seconds, it is put in the stable soundfile.
This function is draws heavily from the work of Zack Settel and Cort Lippe on the ISPW
workstation using Max-DSP. Thank you Zack and Cort for sharing a great idea.
This function will create a Csound or SoundHack format spectral data file from a
soundfile (analysis) or create a soundfile from either format spectral data file
(resynthesis). With this, you can create files to be used with Csound's pvoc unit
generator or you to use your own programs to process the spectral data directly.
The format of the spectral data file (a limited version of the Csound pvanal format)
is a header, followed by multiple frames of spectral data.
Here is a C structure describing the header:
This does a simple, no-questions asked, normalization of the front-most soundfile.
soundhack doc top,
table of contents
(Command - D) Spectral Dynamics...
soundhack doc top,
table of contents
(Command - V) Varispeed...
soundhack doc top,
table of contents
(Command - X) Spectral Extraction...
soundhack doc top,
table of contents
(Command - -) Spectral Analysis/Resynthesis...
typedef struct
{
long magic; // 517730 for Csound files, 'Erbe' for SoundHack files
long headBsize; // byte offset from start to data
// (usually sizeof(SpectHeader))
long dataBsize; // number of bytes of data not including the header
long dataFormat; // (short) format specifier
// always 36 for floating point
float samplingRate;
long channels;
long frameSize; // number of points in FFT (number of bands * 2)
long frameIncr; // number of new samples each frame (frames overlap)
long frameBsize; // bytes in each file frame
// frameBsize = sizeof(float) * (frameSize >> 1 + 1) << 1;
long frameFormat; // this is either 3 for SoundHack files (amplitude & phase)
// or 7 for Csound files (amplitude & frequency)
float minFreq; // 0.0
float maxFreq; // maxFreq = samplingRate/2.0;
long freqFormat; // flag for log/lin frequency (always 1 for linear)
char info[4]; // extendable byte area
} SpectHeader;
The following frames of spectral data are organized as follows:
// frameFormat == 3, amplitude and phase pairs, SoundHack file
typedef struct
{
float amplitude; // from 0.0 to 1.0
float phase; // from 0.0 to (2.0 * pi)
} band;
band spectralFrame[(frameSize >> 1) + 1];
// frameFormat == 7, amplitude and frequency pairs, Csound file
typedef struct
{
float amplitude; // from 0.0 to 1.0
float frequency; // from 0.0 to samplingRate/2.0
} band;
band spectralFrame[(frameSize >> 1) + 1];
If the spectral file is stereo, the frames are interleaved, first left then right.
Included with SoundHack is the source code for a simple spectral data processor
which should illustrate how to read and write this format.
soundhack doc top,
table of contents
(Command - ;) Normalize
soundhack doc top,
table of contents,
next screen - draw to acknowlegements