Sunday, March 2, 2014

Manually parsing a wave file in Java

In Java it is possible to get data directly from an audio device, audio file or URL (see examples here and here) through a standard package. In this article, we will manually parse a wave audio file to learn about its format which is rather simple: a part that states the format of the samples (e.g.: stereo 16bit 44100Hz) and a part with the samples themselves.

00 "RIFF"
04 File size in bytes - 8
08 "WAVE"

[Format chunk]
12 "fmt " (format chunk mark)
16  Format chunk size - 8 (16 for PCM)
20  Format tag (1 for PCM)
22  Number of Channels (e.g.: 2)
24  SamplesPerSec (e.g.: 8000Hz)
28  AvgBytesPerSec = (Sample Rate * BytesPerFrame).
32  BlockAlign = BytesPerFrame = (BitsPerSample * Channels)/8 
34  BitsPerSample (e.g.: 8)

[Data chunk]
36 "data"
40 Data chunk size - 8
44 Frame_0: sample_channel0[0], sample_channel1[0], ..
   Frame_1: sample_channel0[1], sample_channel1[1], ..
   Frame_n: sample_channel0[n], sample_channel1[n], ..

If you don't believe me you can check it by yourself:

$ arecord -f U8 -c 1 -r 8000 -d 5 pepe.wav
Recording WAVE 'pepe.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono
$ ls -l pepe.wav 
-rw-r--r-- 1 xxx xxx 40044 Mar  2 20:14 pepe.wav
$ od -A d -N 44 -w4 -v -t a pepe.wav 
0000000   R   I   F   F
0000008   W   A   V   E
0000012   f   m   t  sp
0000036   d   a   t   a
$ od -A d -N 44 -w4 -v -t u2 pepe.wav 
0000000 18770 17990   RIFF
0000004 40036     0   file size (40044) - 8)
0000008 16727 17750   WAVE
0000012 28006  8308   fmt
0000016    16     0   16 bytes of format data
0000020     1     1   1 channel
0000024  8000     0   8000 samples/sec
0000028  8000     0   8000 bytes/sec
0000032     1     8   8 bits/sample (1 byte/frame)
0000036 24932 24948   ata
0000040 40000     0   4000 samples

Now here is the point. Suppose that we just want to play with a wave file in a specific format. Then, we can skip the first 44 format bytes and go straight to the data. In particular, if the format of the wave file is "mono 8bit" things get as simple as this:

$ vi
public class pepe {
  public static void main(String[] args) {
    int bytes, cursor, unsigned;
    try {
      FileInputStream s = new FileInputStream("./pepe.wav");
      BufferedInputStream b = new BufferedInputStream(s);
      byte[] data = new byte[128];
      cursor = 0;
      while ((bytes = > 0) {
        // do something
        for(int i=0; i<bytes; i++) {
                unsigned = data[i] & 0xFF; // Java..
                System.out.println(cursor + " " + unsigned);
    } catch(Exception e) {
$ javac
$ java pepe > data.txt
$ gnuplot
gnuplot> set size ratio 0.3
gnuplot> plot "data.txt" with lines

Hope that's useful for you.


Salman Khan said...

The pic and the code does not sum up. Can you please explain the data for the line is stored in which variable? if not how can we access the data of the line?

hope that you will respond.


Sangorrin said...

Hi sam, not sure I understood you. What do you mean by "the data of the line"?

Salman Khan said...

Please accept my apologies for not being able to explain the issue. First, let me say that you work (this blog) is highly commendable and helped me in the right direction in many issues. You have presented the Header and Data Chunk in a very methodical way which is easy to understand. Having said that, i have searched extensively the problem of what you have actually demonstrated graphically using the pepe.wav. I am trying to extract a valid audio signal (16 bit, mono(Channel-1), Duration: 2sec, Sampling Rate 44100). This amounts to 44100X2sec= 88200 samples. I need to collect/extract these samples and graph them. The problem is really really reached a dead end. What are these samples? How these are computed?

Now let me point towards your graph. In your graph, the red line represents the audio signal and the y axis represents the sampling (rate) scale. The x-axis represents the signal data. Each signal is identified as (x,y) i.e.,y-axis (byte, int, float etc.) while the x-axis is the sampling rate from 0 to 40,000. Now in your code the
byte[] data = new byte[128]; So the length is 128. the iteration only runs as far as 128 bytes. How you have been able to transform the 128 bytes into 8000 audio samples and plot it.
In summary, i dont need to graph but i need the RED LINE signal as array. Can you help me how to do it?


Sangorrin said...

Hi Sam,

Ok, I think I understood you this time. The samples are just 8/16 bit numbers that digitally represent the audio wave's amplitude. So in the example the Y axis consists of numbers from 0 to 255 (the wave amplitude). The X axis represents time. The distance between two consecutive samples is the sample period, in this case 1/8000s.
About the 128bytes, if you notice I am printing the samples' value to the standard output inside a while. Then, I redirect the stdout to a text file (pepe.txt) which contains all of the samples. If you want the red line as an array change the size of the data buffer so that you can store all samples inside.
Do you understand now?


Sangorrin said...

fix: data.txt, not pepe.txt

Sangorrin said...
This comment has been removed by the author.
Salman Khan said...

Great..i understand the y-axis. however i have trouble understanding "The distance between two consecutive samples is the sample period, in this case 1/8000s." I am unable to replicate your results(the graph). I have been able to obtain the samples

as i read it from the info given above,
0000024 8000 0 8000 samples/sec.

Note: I am using 44100 samples/sec. the total samples are 44100X2sec=88200 samples. The values that your (similar) code gives me is double[] ...=1490 number of samples. I dont understand how the 88200 samples are supposed to be retrieved. Where this information is stored and how to access it. For example in Matlab the audioread() function generates the 88200 samples but in java it appears to be a nightmare. any help will be really appreciated.


Sangorrin said...

The numbers on the X axis are just the sample's number. So the first sample is the one that appears at x=0 and so on. You can multiply these numbers by the sampling period if you want the X axis to appear in seconds.

The data is stored in the variable data[]. And the function that reads it from the file is read(). The variable bytes gives you the amount of bytes read (up to the size of data[]). Also, if you use 16bits you need to modify the code. Maybe you can try to run it 8000/8bit/mono as in the example and see if it works. Thanks.