Connection Pooling Explained: Build a Pool in 30 Lines

Nothing is magic — part of a series on building infrastructure primitives from scratch.

I used to think request queuing and connection pooling were deep infrastructure magic — something libraries did that I could never see. Then I built each piece myself in ~30 lines of TypeScript, and the whole thing turned out to be arrays, counters, and one key insight about what a "connection" actually is.

This post builds the idea bottom-up. By the end, connection pooling should feel obvious — not memorized, but inevitable.

Building Block 1: A connection is just a held reference

When a client calls fetch(), the kernel opens a TCP connection identified by a unique 4-tuple: (client IP, client port, server IP, server port). Ten simultaneous requests from the same laptop use up to ten ephemeral client ports — ten distinct sockets. (Browsers cap this at ~6 per origin and queue the rest — yet another queue. Sequential requests reuse one connection via keep-alive, and HTTP/2 multiplexes everything over a single socket.)

When Node accepts one, it hands your callback a (req, res) pair. Here's the insight that unlocked everything for me:

res is not "response data". It is a handle to one specific open socket.

There's no ID matching, no "which user does this response belong to" lookup. The correlation is structural — res is the write-end of that exact connection. As long as you hold the reference, you can reply later — even minutes later — and the bytes go down the right socket. (Not indefinitely: Node kills requests after a timeout, 5 minutes by default, and the client may give up first.)

Which means: a connection is something you can store. Put res in an array and answer when you're ready. That single trick is what queuing is.

Building Block 2: A queue is an array and a counter

Say I only want to process 2 requests at a time, and allow at most 5 to wait in line. I need two limits, a counter, and an array:

const MAX_CONCURRENT = 2;      // how many requests we process at once
const MAX_QUEUE = 5;           // how many can wait before we start rejecting

let active = 0;                // how many are being processed right now
const waiting: Job[] = [];     // everyone else — this array IS the queue

http.createServer((req, res) => {
  const job = { res, enqueuedAt: Date.now() };

  if (active < MAX_CONCURRENT) {
    handle(job);               // free slot → run now
  } else if (waiting.length < MAX_QUEUE) {
    waiting.push(job);         // ENQUEUE: res sits in memory, client keeps waiting
  } else {
    res.writeHead(503).end();  // queue full → shed load
  }
});

And when a job finishes, pull the next one:

async function handle(job: Job) {
  active++;
  await doWork();              // the slow part — say each job takes 2 seconds
  active--;
  job.res.end("done");         // reply NOW — the socket was open this whole time
  drain();
}

function drain() {
  while (active < MAX_CONCURRENT && waiting.length > 0) {
    handle(waiting.shift()!);  // DEQUEUE: FIFO, next in line gets the slot
  }
}

Now run this and fire 10 concurrent requests. Assume each job takes 2 seconds of work: with only 2 slots, the server drains the queue in waves of 2, every 2 seconds:

t=0s — requests 1–2 grab the free slots and start immediately. Requests 3–7 land in waiting[]. Requests 8–10 find the queue full and get instant 503s.
t=2s — requests 1–2 finish, freeing both slots. drain() pulls requests 3–4 off the queue. They waited 2s.
t=4s — requests 3–4 finish. Requests 5–6 start. They waited 4s.
t=6s — requests 5–6 finish. Request 7 finally starts. It waited 6s.

That's where the climbing wait times come from: every request ahead of you in the queue costs you a share of a work-cycle. In general, the request at queue position p waits about ceil(p / MAX_CONCURRENT) × WORK_MS. This is the arithmetic behind every "slow server" you've ever hit — your wait grows with how deep in the queue you land, in steps of (work time ÷ worker count). It's also why "add more instances" fixes latency under load: double the workers and the same queue position waits half as long.

While a request "queues", nothing mystical happens — its res object sits in an array, its TCP socket stays open, and the client's fetch() promise simply hasn't resolved yet.

(Below this array there are more queues you don't manage: the kernel's accept queue, socket buffers, Node's event loop. Same concept at every layer — bytes parked in a buffer, waiting for capacity. A request is never "floating"; it's always parked somewhere specific.)

Building Block 3: Connections are expensive, requests are cheap

One more fact before the payoff. Opening a database connection costs real time:

TCP handshake (a network round trip)
TLS negotiation (another round trip — two on older TLS 1.2)
Authentication (password/SCRAM exchange)
The DB allocating resources

That last step is the killer, and it's a design decision from the 1980s we still live with: Postgres forks an entire OS process per connection. Process isolation was the robust concurrency primitive of that era — but it means every connection costs the database real memory, and a Postgres server comfortably holds hundreds of connections, not tens of thousands.

So connections are slow to create (easily 5–50ms, often more than the query itself) and expensive for the server to hold in bulk. Creating one per query would be like hiring and firing an employee for every task. The obvious move: pay the setup cost once, keep the connection alive, and share it.

But "keep it alive" — where? Same answer as Building Block 1. In an array. An idle DB connection is just an authenticated, open socket wrapped in a JS object, sitting in your process memory doing nothing, ready to be handed out.

The Payoff: A pool is the queue, inverted

Look at what we've built:

Request queue: work waits in an array for a free slot.
Connection pool: connections wait in an array for incoming work.

Same machinery, mirrored. And when the pool runs dry, it flips back into our request queue — callers wait in line for a connection:

const MAX_POOL_SIZE = 10;      // most connections we'll ever open

class Pool {
  private idle: Connection[] = [];                     // connections waiting for work
  private waiters: ((c: Connection) => void)[] = [];   // work waiting for connections
  private total = 0;

  async acquire(): Promise<Connection> {
    if (this.idle.length > 0)
      return this.idle.pop()!;              // reuse: no handshake, ~0ms

    if (this.total < MAX_POOL_SIZE) {
      this.total++;
      return await createConnection();      // expensive path, done rarely
    }

    // pool exhausted → the caller queues (Building Block 2 again!)
    return new Promise(resolve => this.waiters.push(resolve));
  }

  release(conn: Connection) {
    const waiter = this.waiters.shift();
    if (waiter) waiter(conn);               // hand straight to next in line
    else this.idle.push(conn);              // nobody waiting → back on the shelf
  }
}

Usage:

const conn = await pool.acquire();
try {
  await conn.query("SELECT ...");
} finally {
  pool.release(conn);   // forget this and you have a "connection leak"
}

That's it. That's connection pooling. Two arrays and a counter.

Everything you've heard about pools now decodes for free:

"Pool exhaustion" — all connections checked out, the waiters array is growing. Usually a missing release() or slow queries hogging connections.
"Acquire timeout" — a waiter that gives up (rejects its promise) after N ms instead of queuing forever.
"Idle timeout" — a janitor that closes sockets sitting unused in idle for too long, so the DB isn't holding a process for nothing.
"Max pool size" — the bound on total, playing the same role MAX_CONCURRENT did in our request queue. (Some pools also take a min size: connections pre-created and kept warm — our toy has no equivalent.)

Proof: the real thing is the same two arrays

Don't take my word for it — open pg-pool/index.js, the pool used by node-postgres. Its constructor has:

this._clients = []        // every connection that exists
this._idle = []           // connections sitting alive, waiting for work
this._pendingQueue = []   // callers waiting because the pool is full

Our toy pool, with underscores. "Holding a connection alive" is literally an object pushed onto _idle:

class IdleItem {
  constructor(client, idleListener, timeoutId) {
    this.client = client              // wraps a live net.Socket to Postgres
    this.idleListener = idleListener  // if the server dies while parked, evict
    this.timeoutId = timeoutId        // idle-timeout janitor
  }
}

The client holds the underlying socket, the array holds the IdleItem, so nothing closes or garbage-collects the connection. A live socket, held by an object, held by an array — the same trick as parking res in our HTTP queue.

Our drain() is their _pulseQueue(): _pendingQueue.shift() to dequeue the next waiter (FIFO), _idle.pop() to grab a parked connection, newClient() if there's room to grow. Even the production metrics are just array lengths:

get waitingCount() { return this._pendingQueue.length }
get idleCount()    { return this._idle.length }
get totalCount()   { return this._clients.length }

The remaining ~400 lines are exactly the hardening we predicted: idle timeouts, acquire timeouts, maxUses/maxLifetimeSeconds rotation, error listeners on parked clients, double-release protection.

One elegant quirk: reuse is _idle.pop() — LIFO, not FIFO. That looks unfair until you see why: handing out the most-recently-used connection keeps a small hot set busy, while rarely-used connections quietly age toward their idle timeout and close themselves. The pool shrinks to fit actual load. A FIFO pool would keep every connection just-barely-alive forever.

The bigger lesson

Your typical Node service is queues all the way down: HTTP requests queue for the event loop, handlers queue for pool connections, queries queue inside the database, and TCP backpressure queues bytes all the way back to the client when anything fills up. None of it is magic. At every layer it's the same primitive:

Hold a reference in an array. Hand it out when there's capacity.

Whenever infrastructure feels magical, build the naive 30-line version. The real thing is almost always the naive version plus error handling — and the next time someone says "we're seeing pool exhaustion," you won't picture magic. You'll picture an array named waiters, and it's growing.

Connection Pooling Explained: Build One From Scratch in 30 Lines

Building Block 1: A connection is just a held reference

Building Block 2: A queue is an array and a counter

Building Block 3: Connections are expensive, requests are cheap

The Payoff: A pool is the queue, inverted

Proof: the real thing is the same two arrays

The bigger lesson

Comments

More from this blog

Building PostgreSQL Extensions with Rust: A Complete Guide Using pgrx

I Tried Corrode’s “Prototyping in Rust” — And It Changed How I Build Things

My Research Journey into Rust & Performance: Solving the 1BRC Challenge ⚡️

🪶 Apache Arrow: The Modern Memory Format Powering Analytical Engines

Command Palette

Building Block 1: A connection is just a held reference

Building Block 2: A queue is an array and a counter

Building Block 3: Connections are expensive, requests are cheap

The Payoff: A pool is the queue, inverted

Proof: the real thing is the same two arrays

The bigger lesson

Comments

More from this blog