Taking Baby Steps with Node.js – Pumping Data Between Streams

April 6th, 2011

Here are the links to the previous installments:

  1. Introduction
  2. Threads vs. Events
  3. Using Non-Standard Modules
  4. Debugging with node-inspector
  5. CommonJS and Creating Custom Modules
  6. Node Version Management with n
  7. Implementing Events
  8. BDD Style Unit Tests with Jasmine-Node Sprinkled With Some Should
  9. “node_modules” Folders

It’s the 10th blog post already in this series on Node.js! And for this post we’ll be talking about a fairly common scenario when developing applications with Node.js, namely reading data from one stream and sending it to another stream. Suppose we want to develop a simple web application that reads a particular file from disk and send it to the browser. The following code shows a very simple and naïve implementation in order to make this happen.

var http = require('http'),
    fileSystem = require('fs'),
    path = require('path');

http.createServer(function(request, response) {
    var filePath = path.join(__dirname, 'AstronomyCast Ep. 216 - Archaeoastronomy.mp3');
    var stat = fileSystem.statSync(filePath);
    
    response.writeHead(200, {
        'Content-Type': 'audio/mpeg', 
        'Content-Length': stat.size
    });
    
    var readStream = fileSystem.createReadStream(filePath);
    readStream.on('data', function(data) {
        response.write(data);
    });
    
    readStream.on('end', function() {
        response.end();        
    });
})
.listen(2000);

Here we create a stream for reading the data of an mp3 file and writing it to the response stream. When we point our browser to http://localhost:2000, it pretty much behaves as we expect. The mp3 file either starts playing or the browser asks whether the file should be downloaded.

But as I mentioned earlier, this is a pretty naïve implementation. The big issue with this approach is that reading the data from disk through the read stream is usually faster than streaming the data through the HTTP response. So when the data of the mp3 file is read too fast, the write stream is not able to flush the data it is given in a timely manner so it starts buffering this data. For this simple example this is not really a big deal, but if we want to scale this application to handle lots and lots of requests, then having Node.js to compensate for this can imply an intolerable burden for the application.

So, the way to fix this problem is to check whether all the data gets flushed when we send it to the write stream. If this data is being buffered, then we need to pause the read stream. As soon as the buffers are emptied and the write stream gets drained, we can safely resume the data fetching process from the read stream.

var http = require('http'),
    fileSystem = require('fs'),
    path = require('path');

http.createServer(function(request, response) {
    var filePath = path.join(__dirname, 'AstronomyCast Ep. 216 - Archaeoastronomy.mp3');
    var stat = fileSystem.statSync(filePath);
    
    response.writeHead(200, {
        'Content-Type': 'audio/mpeg', 
        'Content-Length': stat.size
    });
    
    var readStream = fileSystem.createReadStream(filePath);
    readStream.on('data', function(data) {
        var flushed = response.write(data);
        // Pause the read stream when the write stream gets saturated
        if(!flushed)
            readStream.pause();
    });
    
    response.on('drain', function() {
        // Resume the read stream when the write stream gets hungry 
        readStream.resume();    
    });
    
    readStream.on('end', function() {
        response.end();        
    });
})
.listen(2000);

This example illustrates a fairly common pattern of throttling data between a read stream and a write stream. This pattern is generally referred to as the “pump pattern”. Because it’s so commonly used, Node.js provides a helper function that takes care of all the goo required to correctly implement this behavior.

var http = require('http'),
    fileSystem = require('fs'),
    path = require('path')
    util = require('util');

http.createServer(function(request, response) {
    var filePath = path.join(__dirname, 'AstronomyCast Ep. 216 - Archaeoastronomy.mp3');
    var stat = fileSystem.statSync(filePath);
    
    response.writeHead(200, {
        'Content-Type': 'audio/mpeg', 
        'Content-Length': stat.size
    });
    
    var readStream = fileSystem.createReadStream(filePath);
    // We replaced all the event handlers with a simple call to util.pump()
    util.pump(readStream, response);
})
.listen(2000);

Using this utility function certainly clears up the code and makes it more readable and easier to understand what is going on, don’t you think? If you’re curious, then you also might want to check out the implementation of the util.pump() function.

So get that data flowing already :-).

  • http://tanepiper.com Tane Piper

    Very nice, I did a basic version with express here: https://gist.github.com/f9f1b5b39c74c7dd12b9 – this uses res.sendfile which already handles the steam and pumping stuff.

    One thing I’ve noticed is the audio tag in HTML still issues with this, the .mp3 files show as status 200 so I’m getting a valid response, just nothing will play. But clicking on the links in Firefox they work fine, Chrome also has issues (untested in others)

  • http://tanepiper.com Tane Piper

    A little update to this, I actually quickly looked up MDC and found that for the audio tag player, Firefox and Chrome you have to use OGG format over MP3. After a little conversion, this now output’s players directly to the page.

  • http://usb3gvn.com USB 3G

    Nice, that’s helpful for me!