FroquizFroquiz
HomeQuizzesSenior ChallengeGet CertifiedBlogAbout
Sign InStart Quiz
Sign InStart Quiz
Froquiz

The most comprehensive quiz platform for software engineers. Test yourself with 10000+ questions and advance your career.

LinkedIn

Platform

  • Start Quizzes
  • Topics
  • Blog
  • My Profile
  • Sign In

About

  • About Us
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

Β© 2026 Froquiz. All rights reserved.Built with passion for technology
Blog & Articles

Node.js Streams and File Handling: Process Large Files Without Running Out of Memory

Learn how Node.js streams work and when to use them. Covers readable, writable, transform streams, piping, backpressure, and practical examples for file and HTTP processing.

Yusuf SeyitoğluMarch 11, 20266 views9 min read

Node.js Streams and File Handling: Process Large Files Without Running Out of Memory

One of the most important performance concepts in Node.js is streams. When you need to process a 2GB CSV file, read a large HTTP response, or transform data on the fly, streams let you handle it with constant memory usage instead of loading everything into RAM at once.

The Problem with Non-Streaming Approaches

javascript
import fs from "fs/promises"; // Bad for large files -- loads entire file into memory const data = await fs.readFile("huge-file.csv", "utf8"); const lines = data.split("\n"); // If file is 2GB, you need 2GB+ of RAM

For small files this is fine. For large files it crashes your process or causes severe memory pressure.

What Are Streams?

Streams process data in chunks as it arrives β€” you start processing before the entire input is available. Node.js has four stream types:

  • Readable β€” source of data (file read, HTTP request body)
  • Writable β€” destination for data (file write, HTTP response)
  • Duplex β€” both readable and writable (TCP socket)
  • Transform β€” duplex that modifies data (gzip, encryption)

Reading Files with Streams

javascript
import fs from "fs"; import readline from "readline"; // Process a large CSV line by line -- constant memory usage async function processLargeCSV(filePath) { const fileStream = fs.createReadStream(filePath, { encoding: "utf8" }); const rl = readline.createInterface({ input: fileStream, crlfDelay: Infinity, }); let lineCount = 0; let headerRow = null; for await (const line of rl) { if (lineCount === 0) { headerRow = line.split(","); } else { const values = line.split(","); const record = Object.fromEntries( headerRow.map((col, i) => [col.trim(), values[i]?.trim()]) ); await processRecord(record); } lineCount++; } console.log(`Processed ${lineCount - 1} records`); }

This processes a 10GB file with the same ~50MB memory footprint as a 10KB file.

Piping Streams

The .pipe() method connects a readable stream to a writable stream β€” data flows through automatically:

javascript
import fs from "fs"; import zlib from "zlib"; // Compress a large file -- no intermediate buffering function compressFile(inputPath, outputPath) { return new Promise((resolve, reject) => { const readStream = fs.createReadStream(inputPath); const gzip = zlib.createGzip(); const writeStream = fs.createWriteStream(outputPath); readStream .pipe(gzip) // compress .pipe(writeStream) // write to disk .on("finish", resolve) .on("error", reject); }); } await compressFile("large-log.txt", "large-log.txt.gz");

stream.pipeline (preferred over .pipe)

.pipe() does not propagate errors well. Use stream.pipeline for production code:

javascript
import { pipeline } from "stream/promises"; import fs from "fs"; import zlib from "zlib"; import crypto from "crypto"; // Compress and encrypt in one pipeline await pipeline( fs.createReadStream("data.csv"), zlib.createGzip(), crypto.createCipheriv("aes-256-gcm", key, iv), fs.createWriteStream("data.csv.gz.enc") ); // Errors from any stage are propagated and cleanup happens automatically

Transform Streams

Transform streams modify data as it passes through:

javascript
import { Transform } from "stream"; // Custom transform: convert CSV rows to JSON objects class CSVToJSON extends Transform { constructor(options = {}) { super({ ...options, objectMode: true }); this.headers = null; this.buffer = ""; } _transform(chunk, encoding, callback) { this.buffer += chunk.toString(); const lines = this.buffer.split("\n"); this.buffer = lines.pop(); // keep incomplete last line for (const line of lines) { if (!line.trim()) continue; if (!this.headers) { this.headers = line.split(",").map(h => h.trim()); } else { const values = line.split(","); const obj = Object.fromEntries( this.headers.map((h, i) => [h, values[i]?.trim() ?? ""]) ); this.push(obj); } } callback(); } _flush(callback) { if (this.buffer.trim() && this.headers) { const values = this.buffer.split(","); const obj = Object.fromEntries( this.headers.map((h, i) => [h, values[i]?.trim() ?? ""]) ); this.push(obj); } callback(); } }

Writable Streams

javascript
import { Writable } from "stream"; class DatabaseWriter extends Writable { constructor(db, options = {}) { super({ ...options, objectMode: true }); this.db = db; this.batch = []; this.batchSize = 100; } async _write(record, encoding, callback) { this.batch.push(record); if (this.batch.length >= this.batchSize) { try { await this.db.batchInsert("records", this.batch); this.batch = []; callback(); } catch (err) { callback(err); } } else { callback(); } } async _final(callback) { if (this.batch.length > 0) { try { await this.db.batchInsert("records", this.batch); callback(); } catch (err) { callback(err); } } else { callback(); } } } // Use the pipeline await pipeline( fs.createReadStream("data.csv"), new CSVToJSON(), new DatabaseWriter(db) );

Backpressure

Backpressure happens when a writable stream cannot keep up with the readable stream. Without handling it, data accumulates in memory. Node.js streams handle this automatically when you use .pipe() or pipeline() β€” the readable pauses when the writable buffer is full.

If manually controlling flow:

javascript
const readable = fs.createReadStream("large.txt"); const writable = fs.createWriteStream("output.txt"); readable.on("data", (chunk) => { const canContinue = writable.write(chunk); if (!canContinue) { readable.pause(); // slow down the source } }); writable.on("drain", () => { readable.resume(); // writable caught up, resume reading });

Streaming HTTP Responses

Streams are natural for HTTP β€” start sending the response before you have all the data:

javascript
import http from "http"; import fs from "fs"; import zlib from "zlib"; const server = http.createServer(async (req, res) => { if (req.url === "/download") { const stat = await fs.promises.stat("large-report.csv"); res.writeHead(200, { "Content-Type": "text/csv", "Content-Encoding": "gzip", "Content-Disposition": "attachment; filename=report.csv", }); await pipeline( fs.createReadStream("large-report.csv"), zlib.createGzip(), res ); } });

The client starts receiving data immediately β€” no waiting for the full file to be read.

Async Iteration over Streams

Modern Node.js supports for await...of on readable streams:

javascript
import fs from "fs"; async function countLines(filePath) { const stream = fs.createReadStream(filePath, { encoding: "utf8" }); let count = 0; let remainder = ""; for await (const chunk of stream) { const text = remainder + chunk; const lines = text.split("\n"); remainder = lines.pop(); count += lines.length; } if (remainder) count++; return count; }

Common Interview Questions

Q: What is the difference between fs.readFile and fs.createReadStream?

fs.readFile loads the entire file into a Buffer in memory before your callback runs. fs.createReadStream emits chunks of data as they are read from disk, keeping memory usage constant. Use readFile for small files (configuration, templates); use streams for large files or when you want to start processing before the full file is available.

Q: What is backpressure in Node.js streams?

Backpressure is the mechanism that prevents a fast producer from overwhelming a slow consumer. When a writable stream's internal buffer fills up, write() returns false, signaling the producer to pause. When the buffer drains, the drain event fires and the producer resumes. pipeline() handles this automatically.

Q: Why use stream.pipeline instead of .pipe()?

.pipe() does not forward errors from upstream to downstream and does not clean up streams on error β€” you can end up with resource leaks. stream.pipeline propagates errors across all stages and calls destroy() on all streams when any stage fails.

Practice Node.js on Froquiz

Node.js internals including streams, the event loop, and async patterns are tested in backend interviews. Test your Node.js knowledge on Froquiz across all difficulty levels.

Summary

  • Streams process data in chunks β€” constant memory regardless of input size
  • Four types: Readable, Writable, Duplex, Transform
  • Use stream.pipeline() instead of .pipe() β€” it handles errors and cleanup
  • Transform streams modify data as it passes through β€” ideal for format conversion
  • Backpressure prevents memory buildup when consumers are slower than producers
  • for await...of on readable streams is the cleanest modern API
  • Stream HTTP responses to start sending data before you have the full content

About Author

Yusuf Seyitoğlu

Author β†’

Other Posts

  • CSS Advanced Techniques: Custom Properties, Container Queries, Grid Masonry and Modern LayoutsMar 12
  • GraphQL Schema Design: Types, Resolvers, Mutations and Best PracticesMar 12
  • Java Collections Deep Dive: ArrayList, HashMap, TreeMap, LinkedHashMap and When to Use EachMar 12
All Blogs