In creating the audio files for Virtual Gramophone, the following steps were taken:
- Proper Stylus size
- Proper playback speed
- Analogue to digital conversion
- Digital noise reduction
- Digital recording and editing
- RealAudio encoding
During the playback process, it was necessary to take into consideration that (1) groove widths have varied over the years and (2) recordings were not consistently made at a speed of exactly 78-rpm until the mid-1930s. Thus, determining both the proper stylus size and playback speed were required. It was decided to adjust the playback speed after first recording onto reel-to-reel tape in order to minimize wear on the discs.
1. Proper stylus size
There was a gradual decrease in the groove widths of 78s over time owing to refinements in the recording process. Proper contact between stylus and groove walls is essential in obtaining optimum sound quality during reproduction. Thus, it is important to ensure that the proper size and shape of stylus be chosen.
Also, by choosing the appropriate stylus, one can compensate for different types of wear. For example, a smaller stylus can be used to track lower into the groove to compensate for a record with upper-groove damage. In some instances, tracking different parts of the groove will not make a noticeable difference in the quality of reproduction, but in other cases there will be a remarkable improvement.
2. Proper playback speed
For this project, considering the age and fragility of the recordings, the discs were played at 78-rpm and recorded onto reel-to-reel tape. The speed of the recordings was then adjusted by varying the speed of the tape deck at playback. This way, the records were played only once, and a conservation copy of the disc was made.
Though these records are referred to as 78s, the 78-rpm speed did not become an exact standard until the mid-1930s. Even slight variations in playback speed will change the recording pitch and will make a considerable difference to the timbre of a recording. A 5% difference in playback speed is approximately equal to a semitone (i.e. the note A becomes an A flat and so on). A "78" recorded at 76.6-rpm must be played at 76.6-rpm for proper reproduction.
Determining the ideal playback speed can be difficult. Having a score indicating the correct key is a good start. Unfortunately, singers had no qualms about transposing a piece to suit their voices, and, of course, not all recorded music is available on sheet music. Also, "concert pitch", now A=440Hz, has varied over the years (and is now in the process of changing throughout the world). As well, stringed instruments prefer sharp keys and brass instruments flat keys. When pitching a vocal recording, the singer's diction, resonance, naturalness of tone and vibrato speed were carefully evaluated.
3. Analogue to digital conversion
The tapes were digitized using a 20-bit analogue to digital convertor. The superiority of 20-bit over 16-bit conversion is clearly audible in low-level signals (room reverberation, quiet passages, etc.) and in the overall naturalness of the sound. It also increases audio accuracy during subsequent digital processing.
4. Digital noise reduction
Three general classes of noise are found on sound recordings: clicks, crackle and hiss. CEDAR (Computer Enhanced Digital Audio Restoration) removes or reduces these imperfections with the De-Clicker DC1, the De-Crackler CR1 and the De-Hisser DH2. These units are based on twin 40-bit floating-point processors that process sound in real time (i.e. there is no waiting while the units are calculating the results).
The benefit of removing high frequency, high-energy transient noises, such as clicks and pops, becomes immediately apparent. The DC1 removes both clicks and any underlying music. It then re-creates the missing sound wave by analyzing pre- and post-click samples and interpolating the results using high-order algorithms. The number of samples that the DC1 examines depends on the length of the click. A short click requires fewer samples (10) than a longer pop (60 to 200) to rebuild the sound wave. The De-Clicker can remove up to 2500 clicks per second per channel in real time.
Crackle is a burst of short, small spikes which is added to the original sound by poor record surface quality, buzzing caused by improperly wired or grounded equipment, or distortion caused by overloading amplifier mixer outputs or digital clipping. These all introduce a harshness to the sound. Crackle is a more subtle and difficult form of noise to remove than a click. The De-Crackler addresses this problem by dividing the input signal into "genuine" signal and "crackle/distortion" signal, and working solely on the signal with crackle. First, the operator determines the the level of "crackle/distortion" present in the input signal, then adjusts the amount of crackle that the CR1 is required to remove. Next, the Crackle Mode is set to either Crackle 1 (sharp and well defined) or Crackle 2 ("grungy" and not so well defined). Finally, the signal is recombined. The CR1 must be adjusted by ear and, if not set correctly, can have a detrimental effect on the sound quality.
Hiss is quite obvious to a human listener but is far more difficult for a machine to detect. Therefore, it is harder to remove than a sharp click or crackle. The DH2 removes hiss by analyzing the tonal, transient and ambiance content of the signal at hundreds of frequency bands and removing the frequency bands in which it does not detect any musical signal. The operator must first adjust the Noise Level parameter, giving the DH2 a rough idea of the amount of noise present in any given signal. Next, the operator adjusts the Attenuation, which sets a maximum limit on the amount of noise that the DH2 will remove at any given frequency. Finally, the operator adjusts the Brightness algorithm to preserve the appropriate amount of presence by controlling the speed at which the DH2 will remove noise. The DH2 is the most difficult CEDAR box to adjust. If done improperly, it can have a very negative result on the audio quality.
The net aural effect of removing noise from sound recordings can be spectacular. Judicious use of digital noise reduction can effectively free music from the shortcomings of its recording medium, uncovering details which were once masked by noise.
5. Digital recording and editing
Editing was done on a PC-based digital audio workstation (DAW), Pyramix by Merging Technologies, which combines real-time hard disk recording, digital audio mixing, editing, audio-effects processing, and CD-R mastering. The system is based on a card, containing four 32-bit floating-point AT&T DSP 3210 microprocessors, which provides an aggregate peak power of 133 Mflops (Million floating-point operations per second). All operations are executed in 32-bit. The recording function records from 16 to 32 bits.
Once the selection was recorded onto the hard disk, the DAW displayed the waveform so that the sound waves could be manipulated visually. Editing reel-to-reel tape involves physically cutting the medium with a razor blade, then splicing the sections. This process makes it very difficult, if not impossible, to reverse errors. Digital editing involves placing markers at the beginning and end of sections to be edited. The edit points can be moved, removed or altered at any time. While in place, they instruct the DAW on the next action required, without altering the original file in any way.
Also, while the equivalent of the blunt tape cut is possible on a DAW, a function called cross-fading also exists. This allows the simultaneous fading-out of one signal while another fades-in at the edit point. This operation can take place within milliseconds (or as long as each clip) and produces a smooth, seamless edit. Recording levels can be changed and fade-ins and fade-outs can be moved or changed at any time.
6. RealAudio encoding
RealAudio encoding compresses the size of the audio files so that the information can be "streamed" continuously over the Internet at a rate which the user's modem (Internet connection) can handle. For instance, an uncompressed mono .WAV file of "Meet me in St. Louis" performed by Robert Price (Berliner 1426) is 10,236 KB. Compressed for a 14.4K modem, the file becomes 130KB (1.2% of the original file size), for a 28.8K modem 246KB (2.4%) and a 56K modem 485KB (4.7%). Each selection was compressed three times, for 14.4, 28.8 and 56K modems.
For more information about digital audio consult the article "Digital Audio at the National Library of Canada" (archived).