Soundgen is an open-source algorithm that is being developed for the synthesis of human non-speech sounds and animal vocalizations in a user-friendly way, with a limited number of acoustically meaningful parameters. This tool is primarily intended for psychological and biological research. Potentially it can also be used for human-machine interaction, since it provides a straightforward method for reproducing or generating de novo a wide variety of emotional vocalizations. The first official release of R package "soundgen 1.0.0" contains the sound generating algorithm as well as tools for acoustic analysis.

Bug reports, criticisms and suggestions are all warmly welcome! Soundgen is written and maintained by Andrey Anikin. You can reach me at rty.anik / at / rambler.ru


Essential links

Released R package on CRAN To install from R: install.packages("soundgen")
Developmental version and source code on github
Vignette on sound generation (html)
Vignette on acoustic analysis (html)
Automatically generated manual (pdf)
A demo with cat vocalizations (html)

How to install it

The current version is written in R, which is a popular general-purpose programming language. To install the package, run install.packages('soundgen') in R environment.

You can also install the most recent developmental version or download the source code from github. To install from github, run:
install.packages(c("devtools", "tuneR", "seewave", "phonTools", "zoo", "shiny", "shinyBS", "reshape2", "mvtnorm", "plyr", "dtw", "grid"))

How to use it

The easiest way to get a feel for the program is to use the interactive Shiny app, which you can open in a browser by typing "soundgen_app()". NB: no sound in Safari! Please use Firefox or Chrome. The app is also available online. Server time is currently very limited; for extensive use, please install the package and run locally!

A detailed tutorial on sound generation with soundgen is provided in the vignette. To generate sounds from R console, call the soundgen() function with the desired parameters. Soundgen relies on "tuneR" library for audio playback, and it may intially fail, depending on your platform and installed software. If "soundgen(play = TRUE)" throws an error, you may need to change the default player in "tuneR" or install additional software. See the seewave vignette on sound input/output for an in-depth discussion of audio playback in R. Some tips are also available here.

How to cite it

Anikin, A. (2017) soundgen: Parametric Voice Synthesis. R package version 1.0.0.


    title = {soundgen: Parametric Voice Synthesis.},
    author = {Andrey Anikin},
    year = {2017},
    note = {R package version 1.0.0},

How it works (briefly)

The purpose is to start with a few control parameters (e.g., the intonation contour, the amount of noise, the number of syllables and their duration, etc.) and to generate a corresponding audio stream. Ignoring dependencies between control parameters and the procedure for the creation of polysyllabic vocalizations, the algorithm for generating a single voiced segment basically implements the standard source-filter model. The voiced component is generated as a sum of sine waves, one for each harmonic, and the noise component is generated as filtered white noise. Both components are then passed through a frequency filter simulating the effect of the vocal tract. This process can be conceptually divided into three stages:

  1. Generation of the harmonic component (glottal source). We "paint" the spectrogram of the glottal source based on the desired intonation contour and spectral envelope by specifying the frequencies, phases, and amplitudes of a number of sine waves, one for each harmonic of the fundamental frequency. If needed, we also add stochastic and non-linear effects at this stage: jitter and shimmer (random fluctuation in frequency and amplitude), subharmonics, slower random drift of control parameters, etc. Once the spectrogram is complete, we synthesize the corresponding waveform by generating and adding up as many sine waves as there are harmonics in the spectrum.
  2. Generation of the noise component. In addition to harmonic oscillations of the vocal cords, there are other sources of excitation, which may be generated as some form of noise. For example, aspiration noise is synthesized as white noise with some basic rolloff and added to the glottal source before formant filtering. It is similarly straightforward to add other types of noise, which may originate higher up in the vocal tract and thus display a different formant structure (e.g., high-frequency hissing, broadband clicks, etc.)
  3. Spectral filtering (formants and lip radiation). The vocal tract acts as a resonator that modifies the source spectrum by amplifying certain frequencies and dampening others. Just as we "painted" a spectrogram for the acoustic source in (1), we now "paint" a spectral filter with a specified number of stationary or moving formants. We then take a Fast Fourier transform of the generated waveform to convert it back to a spectrogram, multiply the latter by our filter, and then take an inverse Fast Fourier transform to go back to the time domain. This filtering can be applied to harmonic and noise components separately or - for noise sources close to the glottis - the harmonic component and the noise component can be added first and then filtered together.

    This page was last updated on Sep 07, 2017.