What follows is an up-to-the-minute account of my progress on my senior thesis
January 8th, 2004: It's done! The paper is available for download here: Lossless Editing of Lossy-compressed Audio Data.
Per my advisor's recommendation, I am keeping a design journal where I keep track of my thoughts and ideas as the project progresses.
Audacity is a free digital audio editor available for Windows, Mac OS X and Linux. I have been on the development team for Audacity for several years now.
We have discovered that a common usage pattern for Audacity is:
Because of the way Audacity (and almost all other audio editors) handle compressed data, this process causes all of the data to be uncompressed when the file is imported and recompressed when a new file is exported. This is undesirable because every time data is compressed with lossy codecs like MP3 or Ogg Vorbis, more data is lost and the quality becomes lower.
The vast majority of the audio data in the above example is copied verbatim from input to output. For example, imagine that the user imports an MP3, cuts off everything after 10 seconds, and saves those first ten seconds to a new file. The new file is just a copy of the first ten seconds of the old file. If there was a way to take the compressed data and simply chop off everything after ten seconds, the whole process could be performed without having to re-encode (and suffer the associated loss of quality).
In uncompressed formats, this kind of operation is trivial. Since the file is just a string of samples, you can copy exactly the samples you want from input to output. To truncate everything after ten seconds, you could just find the sample number that corresponds with the point exactly ten seconds into the file and copy everything before that point.
With compressed formats it is quite a bit more tricky. Instead of storing individual samples they store the data in chunks called "blocks," "frames," or "pages." Each one of these blocks can be decoded to produce a string of samples, but they cannot be trivially split. Generally each block can be moved around as long as it is kept intact, though blocks can also depend on the blocks around them. The details of how these compressed formats work is something I do not yet fully understand.
My thesis will consist of enhancements to Audacity that will allow a user to preserve as much of the original lossy-compressed material as possible. The unconventional data-storage scheme that Audacity uses (explained by Audacity lead programmer Dominic Mazzoni in his paper A Fast Data Structure for Disk-Based Audio Editing, CMJ, Summer 2002) will make this project achievable with a minimum amount of changes to the data-storage infrastructure.
For my thesis I have decided to implement this functionality for Ogg Vorbis instead of MP3. On one hand MP3 is much more widespread and could therefore benefit more people. However Ogg Vorbis is a more modern codec, it is free in every way (so I will not need a patent license to distribute my work) and I have the benefit of access to the developers of the codec through mailing lists. MP3 support could be added later.
My advisor for this thesis is Randy Bentson.