SoundHack: Phase Vocoder to Normalize

(Command - P) Phase Vocoder

This process allows one to change pitch without changing the length of the soundfile or to change length without changing pitch. It does this by extracting amplitude and phase information for from 8 to 4096 frequency bands with a bank of filters. If time stretching is desired these phase and amplitude envelopes are lengthened (or shortened, for time compression), and then given to a bank of oscillators with corresponding frequencies to the filters. For pitch shifting, the envelopes are untouched, and given to a bank of oscillators with frequencies related by the pitch ratio.

To use the phase vocoder, set the number of Bands to the number of filter-oscillator pairs one would like to use. A large number of bands will give one better frequency resolution, a small number of bands will give one better time resolution. The Window menu allows one to chose different pre-FFT windows for different filtering characteristics. Only the Hamming, von Hann and Kaiser will give good results (the others are there only because I wanted to use a single menu throughout the program for all window selection). The Overlap setting adjusts the size of the filter window (relative to the number of filter bands) for analysis and synthesis and thus, the sharpness of the filter. A large setting (4x) will give the sharpest filter. A sharper filter will differentiate better between frequencies which are between bands, but responds to amplitude changes slower. Click the Time Scale button for time scaling, Pitch Scale for pitch scaling. Type the scale factor in the Scale box. Click on the word Scale (a pop-up menu) to specify time scaling by the length desired, or pitch scaling by equal tempered semitones. If one wants the time expansion factor or the pitch transposition factor to change during processing, click the Scaling Function box, and the Draw Function... button. This will bring up the Draw Function dialog, which is described below.

Resynthesis Gating performs a simple spectral gate which lets only some of the spectral data through. If a band is below the Minimum Amplitude it is not let through. Threshold Under Max. cuts off all bands which are lower than the threshold below the peak band in a given block of samples. So if the peak band has an amplitude of -7 dB and Threshold Under Max. is set to -40 dB, all bands below -47 dB will be cut off.

soundhack doc top, table of contents

(Command - D) Spectral Dynamics...

This process performs standard dynamics processing (gating, ducking, expansion, compression) on each spectral band individually. It has individual threshold detection for each band, so that one band could have the dynamics process active, while another is inactive. The process can be limited to affect only a specific frequency range. One can select whether to affect sounds which are above the threshold or sounds which are below the threshold. The threshold level can be set to one value for all bands, or it can be set to a different value for each band by reading in and analyzing a soundfile. This soundfile's amplitude spectrum is used for the thresholds for each band. This is especially useful if there is a sound that one wants to emphasize or deemphasize (hiss or hum).

Most controls are self-explanatory. The first popup menu allows you to select the type of process to use; gating/ducking, expansion or compression. The second popup sets the number of filter bands to separate the sound into. 512 is a good compromise for the number of bands at a 44100 sample rate as each band is about 43 Hz apart and the filters used have a (512*2)/44100 or .023 second delay. In other words, a pretty good frequency resolution (provided no partials are closer than 43 Hz) and not too much time smearing.

The Highest Band and Lowest Band boxes allow one to limit the frequency range affected. The Gain/Reduction box allows you to set the amount of gain or reduction for the bands which are past the threshold. For compression and expansion this box becomes the ratio. When affecting sounds below the threshold, the compressor and expander hold the highest level steady and affect lower levels (also known as "downward" expansion or compression) . When the process is set to affect sounds above the threshold, the compressor and expander hold the threshold level steady and compress/expand up from there.

Use Smoothed Amplitude is a way to avoid abrupt gating. Instead of comparing the input soundfile directly to the threshold, an windowed average of the input is compared to the threshold. This setting allows one to reduce the "martian voices" effect (a common problem in spectral dynamics processing) and is particularly nice for hiss and hum removal. Attack/Decay Time allows one to set the speed that each triggered band opens or closes. The default is the minimum time for the number of bands used. If this value is set too slow, one loses transients, too fast and you start modulating the soundfile (whistling sounds). Threshold Level is where you set the threshold level! If you check Threshold Relative to Peak Amp., the threshold is now set relative to the peak amplitude for each block of sound processed. For example, if you are using the spectral gate, and the loudest frequency band for the current block of sound has an amplitude of -12 dB and the threshold is set at -40 dB, the gate will be active for sounds below -52 dB.

soundhack doc top, table of contents

(Command - V) Varispeed...

As a sample rate converter, this is slower and maybe less accurate than the Sound Designer II or Alchemy software (then again maybe not), so this function may not be useful to those who own that software. However, SoundHack includes a variable sample rate conversion utility (varispeed). The Varispeed box enables this feature. The Varispeed Function... button will bring up the Draw Function dialog, giving one control over a 10 octave varispeed. The Quality buttons give one control over the size of smoothing filter used, and the resultant quality of interpolation/decimation. The Vary by Scale and Vary by Pitch buttons allow one to draw a curve for either pitch or scaling factor.

soundhack doc top, table of contents

(Command - X) Spectral Extraction...

This function attempts to separate the stable (pitched) and transient (unpitched) parts of a sound. It does this by measuring the speed of frequency deviation. If the deviation is too quick, it is marked unpitched information and output to the transient soundfile. If it is too stable, it is marked pitched information and output to the stable soundfile.

You can control the separation with this dialog box. Set the Bands to a high number if the sound being processed is harmonically dense, otherwise keep it around 512. Setting the number of Frames allows you to set the size of the analysis frame (in multiples of FFT frames). Set this higher if you are having difficulty separating the pitched material, lower if you are having difficulty separating the transient material. The two frequency values specify the amount of change allowed during each analysis frame. In this example, if the harmonic deviates by more than 5 hertz in 0.035 seconds it is put into the transient soundfile, if the harmonic deviates by less than 2 hertz in 0.035 seconds, it is put in the stable soundfile.

This function is draws heavily from the work of Zack Settel and Cort Lippe on the ISPW workstation using Max-DSP. Thank you Zack and Cort for sharing a great idea.

soundhack doc top, table of contents

(Command - -) Spectral Analysis/Resynthesis...

This function will create a Csound or SoundHack format spectral data file from a soundfile (analysis) or create a soundfile from either format spectral data file (resynthesis). With this, you can create files to be used with Csound's pvoc unit generator or you to use your own programs to process the spectral data directly. The format of the spectral data file (a limited version of the Csound pvanal format) is a header, followed by multiple frames of spectral data. Here is a C structure describing the header:

typedef struct
{
    long    magic;           // 517730 for Csound files, 'Erbe' for SoundHack files
    long    headBsize;       // byte offset from start to data
                             // (usually sizeof(SpectHeader))
    long    dataBsize;       // number of bytes of data not including the header
    long    dataFormat;      // (short) format specifier
                             // always 36 for floating point
    float   samplingRate;
    long    channels;
    long    frameSize;       // number of points in FFT (number of bands * 2)
    long    frameIncr;       // number of new samples each frame (frames overlap)
    long    frameBsize;      // bytes in each file frame
                             // frameBsize = sizeof(float) * (frameSize >> 1 + 1) << 1;
    long    frameFormat;     // this is either 3 for SoundHack files (amplitude & phase)
                             // or 7 for Csound files (amplitude & frequency)
    float   minFreq;         // 0.0
    float   maxFreq;         // maxFreq = samplingRate/2.0;
    long    freqFormat;      // flag for log/lin frequency (always 1 for linear)
    char    info[4];         // extendable byte area
} SpectHeader;

The following frames of spectral data are organized as follows:

// frameFormat == 3, amplitude and phase pairs, SoundHack file
typedef struct
{
    float    amplitude;    // from 0.0 to 1.0
    float    phase;        // from 0.0 to (2.0 * pi)
}   band;

band    spectralFrame[(frameSize >> 1) + 1];

// frameFormat == 7, amplitude and frequency pairs, Csound file
typedef struct
{
    float    amplitude;    // from 0.0 to 1.0
    float    frequency;    // from 0.0 to samplingRate/2.0
}   band;

band    spectralFrame[(frameSize >> 1) + 1];

If the spectral file is stereo, the frames are interleaved, first left then right. Included with SoundHack is the source code for a simple spectral data processor which should illustrate how to read and write this format.

soundhack doc top, table of contents

(Command - ;) Normalize

This does a simple, no-questions asked, normalization of the front-most soundfile.

soundhack doc top, table of contents, next screen - draw to acknowlegements