nodjs : Stream for big file

Node.js: Stream for Big File

If you are working with Node.js and need to handle large files, you may run into performance problems if you try to read the entire file into memory at once. One solution to this problem is to use streams.

What is a stream?

A stream is a way of processing data in a Node.js application. It allows you to read or write data in smaller chunks instead of loading everything into memory at once. Streams can be used to process data from a variety of sources, including files, network connections, and even other processes.

How to use streams in Node.js?

In Node.js, there are four types of streams:

  • Readable: used for reading data
  • Writable: used for writing data
  • Duplex: used for both reading and writing data
  • Transform: used for changing the format of data

To use a stream, you need to create an instance of the appropriate type of stream and then pipe the data from the source (e.g. a file) to the destination (e.g. a network connection) using the pipe() method.

Example code:


const fs = require('fs');

const readStream = fs.createReadStream('example.txt');
const writeStream = fs.createWriteStream('output.txt');

readStream.pipe(writeStream);

readStream.on('end', () => {
  console.log('File has been processed');
});

In this example, we are using the createReadStream() method from the fs module to read data from a file called example.txt. We then create a write stream using the createWriteStream() method and specify that we want to write to a file called output.txt. Finally, we pipe the read stream to the write stream.

Using streams to process big files

One of the main advantages of using streams is that they allow you to process large files without running out of memory. Instead of trying to read the entire file into memory at once, you can read it in smaller chunks and process each chunk as it comes in.

Example code:


const fs = require('fs');

const readStream = fs.createReadStream('bigfile.txt', {
  highWaterMark: 1024 * 1024 // process 1 MB at a time
});

let data = '';

readStream.on('data', chunk => {
  data += chunk;
});

readStream.on('end', () => {
  console.log(data);
});

In this example, we are using the createReadStream() method again, but this time we are specifying a highWaterMark option of 1 MB. This means that we will read 1 MB of data at a time and process it before moving on to the next chunk.

We then listen for the 'data' event, which is triggered each time a chunk of data is read. We append each chunk to a string variable called data. Finally, we listen for the 'end' event, which is triggered when all of the data has been read, and we log the contents of the file to the console.

Conclusion

Streams are a powerful feature of Node.js that allow you to process data in smaller chunks and avoid running out of memory when working with large files. By using streams, you can improve the performance and scalability of your Node.js applications.

Subscribe to The Poor Coder | Algorithm Solutions

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe