pvoc

Signal Generators: STFT Resynthesis (Vocoding)

pvoc

  ar      pvoc       ktimpnt, kfmod, ifilcod[, ispecwp, iextractmode, ifreqlim, igatefn]
  ar      vpvoc      ktimpnt, kfmod, ifile[, ispecwp[, ifn]]

Description

Output is an additive set of individually controlled sinusoids, using phase vocoder resynthesis.

Initialization

ifilcod - integer or character-string denoting a control-file derived from analysis of an audio signal. An integer denotes the suffix of a file pvoc.m; a character-string (in double quotes) gives a filename, optionally a full pathname. If not fullpath, the file is sought first in the current directory, then in the one given by the environment variable SADIR (if defined). pvoc control contains breakpoint amplitude and frequency envelope values organized for fft resynthesis. Memory usage depends on the size of the files involved, which are read and held entirely in memory during computation but are shared by multiple calls (see also lpread).

ispecwp (optional) - if non-zero, attempts to preserve the spectral envelope while its frequency content is varied by kfmod. The default value is zero.

iextractmode (optional) - determines if spectral extraction will be carried out and if so whether components that have changes in frequency below ifreqlim or above ifreqlim will be discarded. A value for iextractmode of 1 will cause pvadd to synthesize only those components where the frequency difference between analysis frames is greater than ifreqlim. A value of 2 for iextractmode will cause pvadd to synthesize only those components where the frequency difference between frames is less than ifreqlim. The default values for iextractmode and ifreqlim are 0, in which case a simple resynthesis will be done. See examples under pvadd for how to use spectral extraction.

igatefn (optional) - the number of a stored function which will be applied to the amplitudes of the analysis bins before resynthesis takes place. If igatefn is greater than 0 the amplitudes of each bin will be scaled by igatefn through a simple mapping process. First, the amplitudes of all of the bins in all of the frames in the entire analysis file are compared to determine the maximum amplitude value. This value is then used create normalized amplitudes as indeces into the stored function igatefn. The maximum amplitude will map to the last point in the function. An amplitude of 0 will map to the first point in the function. Values between 0 and 1 will map accordingly to points along the function table. See examples under pvadd for how to use amplitude gating.

ifn (optional) - optional function table containing control information for vpvoc. If ifn = 0, control is derived internally from a previous tableseg or tablexseg unit. Default is 0. (New in Csound version 3.59)

Performance

pvoc implements signal reconstruction using an fft-based phase vocoder. The control data stems from a precomputed analysis file with a known frame rate. The passage of time through this file is specified by ktimpnt, which represents the time in seconds. ktimpnt must always be positive, but can move forwards or backwards in time, be stationary or discontinuous, as a pointer into the analysis file. kfmod is a control-rate transposition factor: a value of 1 incurs no transposition, 1.5 transposes up a perfect fifth, and .5 down an octave. kfmod is a control-rate transposition factor: a value of 1 incurs no transposition, 1.5 transposes up a perfect fifth, and .5 down an octave.

This implementation of pvoc was orignally written by Dan Ellis. It is based in part on the system of Mark Dolson, but the pre-analysis concept is new. The spectral extraction and amplitude gating (new in Csound version 3.56) were added by Richard Karpen based on functions in SoundHack by Tom Erbe.

vpvoc is identical to pvoc except that it takes the result of a previous tableseg or tablexseg and uses the resulting function table (passed internally to the vpvoc), as an envelope over the magnitudes of the analysis data channels. Optionally, a table specified by ifn may be used. The result is spectral enveloping. The function size used in the tableseg should be framesize/2, where framesize is the number of bins in the phase vocoder analysis file that is being used by the vpvoc. Each location in the table will be used to scale a single analysis bin. By using different functions for ifn1, ifn2, etc.. in the tableseg, the spectral envelope becomes a dynamically changing one. See also tableseg and tablexseg.

Example

The following example, using vpvoc, shows the use of functions such as

  f 1 0 256 5 .001 128 1 128 .001
  f 2 0 256  5 1 128 .001 128 1
  f 3 0 256  7 1 256 1

to scale the amplitudes of the separate analysis bins.

  ktime   line            0, p3,3 ; time pointer, in seconds, into file
          tablexseg       1, p3*.5, 2, p3*.5, 3
  apv     vpvoc           ktime,1, "pvoc.file"

The result would be a time-varying "spectral envelope" applied to the phase vocoder analysis data. Since this amplifies or attenuates the amount of signal at the frequencies that are paired with the amplitudes which are scaled by these functions, it has the effect of applying very accurate filters to the signal. In this example the first table would have the effect of a band-pass filter, gradually be band-rejected over half the note's duration, and then go towards no modification of the magnitudes over the second half.

Author

Dan Ellis

Richard Karpen
Seattle, Wash
1997

Signal Generators: STFT Resynthesis (Vocoding)