After Day 98, I thought I learned some new concepts (virtual threads). Virtual threads made blocking I/O scalable. Just write sequential code, let the JVM handle the unmounting magic, ship it. Problem solved.
Then I asked myself: what are reactive frameworks actually doing? Cause they are here for a while and they have been solving the problem from long ago even when virtual threads werent there. An example Netty framwork, handles millions of connections. Vert.x powers real-time systems. Project Reactor runs high-throughput services. None of them use virtual threads. They use event loops a completely different concurrency model that predates virtual threads by decades.
Why do both approaches exist? I spent few weekends building both models from scratch (simple implementation). Here’s what I learned.
The Misconception I Had
I thought virtual threads replaced the need for non-blocking I/O and event loops. After all, if blocking I/O can now scale to millions of connections, why bother with callback hell?
Virtual threads work by unmounting when they hit blocking I/O. The carrier thread stays free. Other virtual threads mount and do work. It’s brilliant for business logic database calls, REST APIs, file I/O. Sequential code that scales.
But reactive frameworks don’t work this way. They use event loops: one thread handles thousands of connections by multiplexing I/O events. No mounting. No unmounting. No stack switching. Just a tight loop reading from a Selector.
I needed to understand both models to know when each wins.
Non-Blocking I/O: The Foundation
Blocking I/O wastes threads. Even virtual threads consume heap memory for their stack chunks about 1KB per thread at minimum. Scale to 500K connections? That’s 500MB just for stacks. Plus the mount/unmount overhead (1-5 microseconds per context switch).
Non-blocking I/O takes a different approach: one thread, many connections, explicit multiplexing.
What is I/O multiplexing
I/O multiplexing breaks down to kernel-level efficiency: one thread polls multiple file descriptors via system calls like select()/poll()/epoll(), reacting only to ready I/O events to avoid per-connection blocking.
At the OS kernel level, I/O operations involve context switches between user space and kernel space. Traditional blocking I/O ties one thread per file descriptor when you call socket.read(), the thread blocks until data arrives. At scale, this exhausts resources: 10,000 connections means 10,000 threads, each consuming memory and CPU cycles even when idle.
Multiplexing inverts this model. Instead of one thread per connection, one thread monitors many connections. The kernel tells you which connections are ready for I/O, and you react only to those.
Here’s how it works at the kernel level:
The select() system call (or epoll on Linux, kqueue on macOS) takes a set of file descriptors with interest operations (read/write/accept), atomically blocks until any file descriptor signals readiness via kernel events, then returns a bitmask of ready file descriptors all without per-fd polling.
How It Works In Java
Java NIO’s Selector wraps this mechanism. On Linux, it uses EPollSelectorImpl, which queries the OS efficiently in O(1) time for epoll. The selector maintains a set of registered channels and their interest operations. When you call selector.select(), it blocks until at least one channel is ready, then returns the set of ready channels.
Non-blocking channels ensure read()/write() return immediately they never block. If data isn’t ready, read() returns 0 bytes. If the socket buffer is full, write() returns 0 bytes written. This forces applications to re-check readiness via selector keys in the event loop.
ByteBuffer manages data with position/limit/capacity semantics. After reading, you call flip() to prepare for consumption: it sets limit = position and position = 0. This is critical without flip(), you’ll read from the wrong position or read garbage data.
The Reactor Pattern layers on top of multiplexing. It consists of:
- Acceptor: Handles new connections, registers clients with the selector
- Demultiplexer: The
Selector.select()call that waits for events - Dispatcher: Routes
SelectionKeyevents to appropriate handlers - Handler: Business logic that processes the I/O event
In Java reactive frameworks like Netty or Project Reactor, this pattern scales to millions of connections. A TcpServer creates an NioEventLoopGroup; events from the selector feed into Mono/Flux streams, enabling backpressure (e.g. pause reads when consumers are slow).
A single-thread event loop processes sequentially: select → dispatch → callback.
If a handler blocks, it stalls the entire loop that’s why reactive frameworks emphasize non-blocking handlers.
The key lifecycle per channel:
- Register interest operations (e.g.,
OP_ACCEPTfor server sockets,OP_READfor client sockets) select()yields a set of ready operations (readyOps)- Process the event (read → flip buffer → handle → set
OP_WRITEif partial write) - Cancel the key after use to avoid duplicate events
Here’s a minimal example showing the pattern:
| |
This ping-pongs data efficiently. For production systems, you’d add write queues (only register OP_WRITE when the queue is non-empty) and handle partial reads/writes properly. See in the below diagram how this event loop works.
The magic: one thread handles thousands of connections. The selector blocks only when no I/O is ready. When data arrives on any connection, the kernel wakes the selector, and you process only the ready connections. No wasted threads. No context switching overhead. Just efficient event-driven I/O.
Here’s the core pattern using Java NIO:
| |
This is the foundation. One thread handles all connections. The Selector monitors multiple channels. When data arrives, the selector wakes up with ready events. We handle them without blocking.
Key insight: selector.select() is the only blocking call. Everything else accept(), read(), write() returns immediately. If data isn’t ready, the operation returns zero bytes. No waiting.
“one thread handled 10,000 concurrent connections using about 50MB .That is the whole process (selector, buffers, socket state). With 10K virtual threads you have ~10–15MB in stack chunks plus carrier threads and other JVM overhead and each connection’s state is still on the heap.” Then the reader knows you’re comparing total system cost, not “50MB vs 15MB
Building an Event Loop HTTP Server
The pattern above is raw NIO. Let’s build something more real: an HTTP server using the event loop pattern.
Event loop = infinite loop + selector + event handlers + state machines.
Here’s a production-style implementation:
| |
This runs on a single thread. I tested it with Bombardier (the Go HTTP benchmarking program):
| |
Results: 10,000 concurrent connections, 45K requests/second, memory usage stable at ~120MB. One thread.
The trick: state machines. Each connection is a state machine (READING → WRITING → READING). The event loop transitions states based on I/O readiness. No blocking. No thread-per-connection.
Virtual Threads vs Event Loops: The Real Trade-offs
I built both models in production. Here’s what actually matters:
| Aspect | Virtual Threads (Blocking I/O) | Event Loops (Non-blocking I/O) |
|---|---|---|
| Programming Model | Sequential, imperative | Callback-based, state machines |
| Memory per connection | ~1KB heap (stack chunk) | ~Few bytes (state machine) |
| CPU overhead | Mount/unmount (1-5μs) | State machine transitions (~100ns) |
| Debuggability | Stack traces work perfectly | Callback hell, fragmented traces |
| Max connections | Millions (heap limited) | Millions (memory limited) |
| Code complexity | Simple, readable | Complex, hard to follow |
| Best for | Business logic, DB queries | High-throughput proxies |
When to Use Virtual Threads
I use virtual threads when:
Complex business logic: Multiple database calls, service calls, branching logic. Sequential code wins. Debugging wins. Maintainability wins.
Example: Processing a payment involves calling fraud detection, inventory check, payment gateway, sending email confirmation. Sequential code with virtual threads is 10x easier to write and debug than callback chains.
Moderate connection counts: 10K-100K concurrent connections. Virtual threads handle this easily. The memory overhead is acceptable. The mount/unmount cost is negligible.
Team velocity: Most developers understand sequential code. Onboarding is faster. Code reviews are easier. Bugs are simpler to fix.
Here’s a basic HTTP server implementation using virtual threads:
| |
Total blocking time: ~10ms per request. With platform threads, this ties up a thread for 10ms. With virtual threads, the carrier thread stays free. The virtual thread unmounts at each blocking call. Other virtual threads run.
When to Use Event Loops
I use event loops when:
Ultra-high connection counts: 100K-1M+ connections. Memory matters. Every byte counts. Event loops use ~5KB per connection. Virtual threads use ~1KB+ heap plus JVM overhead.
Simple request/response patterns: API gateways, load balancers, WebSocket servers, streaming proxies. The logic is simple: read request, forward it, write response. State machines work fine here.
Maximum memory efficiency: You’re running on constrained hardware. You need to squeeze every ounce of performance. You can’t afford the mount/unmount overhead.
Real example: I built an API gateway that routes requests to backend services. Peak load: 500K concurrent WebSocket connections. Each connection forwards messages bidirectionally. Minimal state. Event loops won.
The entire gateway ran on 4 CPU cores, 2GB heap. Event loops handled all 500K connections. Virtual threads would’ve used ~500MB just for stacks. Plus mount/unmount overhead on every message.
The Hybrid Approach
Production system can use both at the same time. Netty uses event loops for network I/O, then dispatches business logic to thread pools (or virtual threads in newer versions).
Pattern:
| |
Event loops handle I/O multiplexing. Virtual threads handle business logic. Best of both worlds.
Benchmarks and Repo: Putting Both to the Test
To validate the trade-offs with real numbers, I added a benchmark suite to a small project that runs both implementations side by side. The repo is virtual-thread-eventloop-test and is set up so you can run the same tests and draw your own conclusions.
Repo Layout
Here is the project link GITHUB .The project contains two HTTP servers and a 4-phase benchmark suite:
VirtualThreadsHttpServer(port 8080) Java 21HttpServerwithExecutors.newVirtualThreadPerTaskExecutor(). Each request runs on a virtual thread and does ~10 ms simulated blocking work (e.g. DB/REST). Simple sequential handler.EventLoopHttpServer(port 8081) Single-thread NIO server: oneSelector, non-blockingServerSocketChannel/SocketChannel, and the same 10 ms work simulated inside the event loop (no virtual threads). Pure reactor style.
Both servers expose the same JSON endpoint and the same simulated workload so the comparison is about concurrency model, not API shape. Build with Maven; the pom.xml produces two runnable JARs: virtual-thread-app and event-loop-app.
Benchmark Suite (4 Phases)
The benchmarks folder holds a hypothesis-driven suite that measures throughput, latency, and resource use:
| Phase | What it does | Goal |
|---|---|---|
| Phase 1: Baseline | Fixed loads (e.g. 100, 1K, 10K connections), 10s–300s, multiple runs | Establish normal throughput and latency patterns. |
| Phase 2: Progressive stress | Ramp connections from 100 → 50K (e.g. +1K every 30s) | Find where each implementation degrades or fails. |
| Phase 3: Spike | Baseline (1K conn) → spike (10K conn) → back to 1K, repeated cycles | Observe recovery and stability. |
| Phase 4: Endurance | Constant load (e.g. 5K connections) for several hours per server | Check for memory growth and long-term stability. |
Load is generated with Bombardier (Go-based HTTP benchmark). The repo includes bombardier.exe in the scripts for Windows, so you can run the suite natively. The suite can collect JFR, JMX (e.g. VisualVM), and system metrics; the analyze-and-report script turns raw results into CSVs and a FINAL-REPORT.md in benchmark-results/.../analysis/.
Results From a Sample Run
Test machine (from the benchmark run’s config.json): 16 CPU cores, 31.82 GB RAM, 183 GB free disk. Each server ran with 4 GB heap (-Xmx4096m). Bombardier used 14 worker threads. Windows host.
From one full run (Phase 1–3; Phase 2 summary and report):
- Peak throughput: Event Loop ~4,627 req/s vs Virtual Threads ~3,926 req/s event loop ahead under this workload.
- Breaking point (Phase 2): Both hit limits around 15,000 connections in that environment (stress ramp).
- Winner in this setup: Event Loop, for peak RPS, with both degrading at similar connection counts.
So for this “many connections, small fixed delay per request” scenario, the single-thread event loop gave higher throughput, while virtual threads stayed in the same ballpark and remained predictable. Your mileage will depend on hardware, OS, and actual workload (e.g. real DB or HTTP calls).
How to Run It Yourself
- Clone/build: Open the virtual-thread-eventloop-test repo, build with Maven (
mvn package). Use JDK 21+. - Start both servers with enough heap (e.g.
-Xmx4096m) and JMX if you want VisualVM. Virtual threads on 8080, event loop on 8081. - Run the benchmark orchestrator from the
benchmarksfolder (seeREADME.mdandQUICK-REFERENCE.md). The suite uses Bombardier for load generation (e.g.bombardier.exeon Windows); ensure both servers are reachable from the machine running the benchmark. - Analyze: Run the analysis script to generate
FINAL-REPORT.mdand the CSV summaries underbenchmark-results/.../analysis/.
The README in benchmarks explains the hypothesis template (predict VT vs EL before running), what to monitor in VisualVM, and how to interpret throughput, latency percentiles, and breaking points. Repeating the suite on your own machine is a good way to see how the two models behave under your constraints.
The Production Decision
Here’s my mental model :
Start with virtual threads: For 95% of applications, virtual threads are the right default. Simpler code. Easier debugging. Good enough performance. Your business logic probably involves databases, REST calls, file I/O. Sequential code wins.
Switch to event loops when: You’re building infrastructure. API gateways. Load balancers. Proxies. WebSocket servers. High connection counts with simple logic. Memory is constrained. You need maximum throughput.
Use both when: You’re building a platform. Use event loops for network layer (Netty, Vert.x). Use virtual threads for business logic. This is what modern frameworks do.
I made the mistake of using event loops for business logic in 2015. Callback hell. Debugging nightmares. Three-hour sessions tracing through fragmented stack traces. Never again. Virtual threads solved that problem.
The right tool depends on your constraints. Virtual threads didn’t replace event loops. They made blocking I/O a viable alternative for most use cases. But if you’re pushing extreme scale on minimal hardware, event loops still win.
What I Learned
Virtual threads and event loops solve different problems:
Virtual threads: Make blocking I/O scalable. Keep sequential code readable. Remove the need for thread pool tuning. Perfect for business logic.
Event loops: Maximize connection density. Minimize memory overhead. Handle simple I/O patterns efficiently. Perfect for infrastructure.
Understanding both models gives you the full picture of Java’s I/O concurrency landscape. You can make informed decisions based on your actual constraints, not hype or cargo-culting.
Next time you’re designing a system, ask: What’s the connection pattern? What’s the business logic complexity? What are the memory constraints? Then choose the right model.
Both are tools. Use the right one for the job.