Day 100 - The Day I Stopped Learning Java and Started Learning Systems

Day 65 was the day when I was really amazed how little I know about the underlying structures and concepts and I dont reflect on those enough.

I’d spent two months writing about Java internals—memory models, GC algorithms, bytecode manipulation. I understood the pieces. But when I tried to debug a production memory leak at work, I couldn’t connect them.

The heap dump showed the leak. The GC logs showed the pressure. The thread dump showed the contention. I couldn’t see how these pieces created the failure.

I stared at that heap dump for hours. I knew what was leaking (a static ConcurrentHashMap holding request contexts). I knew why it was dangerous (objects never garbage collected). But I didn’t understand the system behavior—why the GC pauses got longer, why throughput collapsed under load, why the leak only manifested with specific request patterns.

I’d been learning Java. I needed to learn systems.

For 99 days, I zoomed in. I wrote custom memory allocators using Unsafe (Day 77). I built Java agents that intercept bytecode (Day 90). I compared virtual threads against event loops (Day 99). I fixed parallel streams that left CPU cores idle (Day 100).

By Day 100, I wasn’t thinking about Java anymore. I was thinking about how memory moves, how threads coordinate, and what observability actually means.

This isn’t a summary of 100 posts. It’s what happened when learning JVM internals forced me to think like a systems engineer.


1. Performance Is Not only About Memory, Also CPU

Early on, I confidently wrote this code:

1
2
3
4
5
6
7
8
@GetMapping("/users")
public List<UserDTO> getActiveUsers() {
    List<User> users = userRepository.findAll(); // 10MB allocation
    return users.stream()
        .filter(User::isActive)
        .map(this::toDTO)
        .collect(toList());
}

It worked fine in development. Small dataset, no problem.

Then we hit production with thousands of users and high request rates.

The service collapsed. Not from CPU load—CPU was barely stressed. From memory pressure.

The GC logs showed frequent minor collections. Every few hundred milliseconds, the world stopped. Not because the algorithm was slow. Because I was flooding the young generation with transient objects.

After studying memory-mapped files (Day 91), off-heap allocation with Unsafe (Day 77), and DirectByteBuffer behavior, I rewrote it:

1
2
3
4
5
6
@GetMapping("/users")
public Flux<UserDTO> getActiveUsers() {
    return userRepository.findAllStream()  // Streaming query
        .filter(User::isActive)
        .map(this::toDTO);
}

Same functionality. Different memory behavior. Dramatically reduced allocation per request. GC frequency dropped. Latency improved.

Memory kills throughput long before CPU does.

Performance engineering is about managing when objects are born, how long they live, and when they die. You’re not optimizing algorithms. You’re managing memory pressure.

After 100 days of studying allocation patterns, GC behavior, and heap dumps, I stopped asking “is this code fast?” I started asking:

  • What’s the per-request allocation footprint?
  • How many transient objects am I creating in the hot path?
  • What’s the object graph breadth?
  • When will these objects become garbage?

Understanding the JVM’s allocation model changed how I design APIs.


2. Concurrency Is About Coordination, Not Threads

Day 95 taught me this the hard way.

I built an image processing pipeline with 4 thread pools, 3 blocking queues, and 2 semaphores:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
ExecutorService downloadPool = Executors.newFixedThreadPool(20);
ExecutorService processPool = Executors.newFixedThreadPool(10);
ExecutorService uploadPool = Executors.newFixedThreadPool(15);
ExecutorService notificationPool = Executors.newFixedThreadPool(5);

BlockingQueue<Image> downloadQueue = new ArrayBlockingQueue<>(100);
BlockingQueue<Image> processQueue = new ArrayBlockingQueue<>(50);
BlockingQueue<Image> uploadQueue = new ArrayBlockingQueue<>(75);

Semaphore rateLimiter = new Semaphore(10);
Semaphore memoryLimiter = new Semaphore(5);

It worked. It was also impossible to debug.

When it hung under load, the thread dump showed dozens of threads waiting. But waiting on what? I couldn’t tell. The coordination had become the system. The actual work was just noise.

Then I learned about virtual threads and event loops (Day 99). I thought virtual threads would solve everything. Just replace the thread pools:

1
ExecutorService pool = Executors.newVirtualThreadPerTaskExecutor();

It didn’t help. The system still hung under load.

The problem wasn’t thread count. It was coordination complexity.

A poorly designed fan-out service that calls 5 downstream services, shares mutable request context, has no cancellation propagation, and no timeout aggregation will fail under load. Doesn’t matter if you use platform threads, virtual threads, or reactive pipelines.

The real bottleneck in high-concurrency systems isn’t thread count. It’s:

  • Ownership ambiguity (who owns this state?)
  • Mutable shared state (what can change while I’m reading it?)
  • Long-lived coordination chains (how many handoffs before completion?)
  • Backpressure mismanagement (what happens when producers outpace consumers?)

I rewrote the pipeline with structured concurrency:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Future<Image> downloaded = scope.fork(() -> download(url));
    Future<Image> processed = scope.fork(() -> process(downloaded.get()));
    Future<String> uploaded = scope.fork(() -> upload(processed.get()));
    
    scope.join();
    scope.throwIfFailed();
    
    return uploaded.resultNow();
}

Same throughput. Half the code. Zero coordination bugs. The coordination was explicit and scoped.

After studying concurrency primitives (Day 95), building both virtual thread and event loop servers (Day 99), and debugging countless deadlocks:

Threads are scheduling primitives. Coordination is architecture.

Virtual threads didn’t make my coordination problems disappear. They made blocking cheaper. The hard part—designing clear ownership, explicit lifetimes, and deterministic cancellation—that’s still on me.


3. The Heap Dump That Changed Everything

Back to Day 65. That production memory leak.

I finally understood it after writing about memory leak patterns. Here’s what was happening:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
public class RequestContextHolder {
    private static final ConcurrentHashMap<String, RequestContext> contexts 
        = new ConcurrentHashMap<>();
    
    public static void set(String requestId, RequestContext context) {
        contexts.put(requestId, context);
    }
    
    public static RequestContext get(String requestId) {
        return contexts.get(requestId);
    }
    
    // BUG: No remove() method
    // Contexts never cleaned up
}

Every request added a context. None were ever removed. Under load, that’s millions of leaked contexts.

The heap dump showed the answer: millions of objects retained in memory. The JVM heap was filling up.

Under sustained load, the leak filled the heap. Then:

  1. Young gen filled faster (more live objects to scan)
  2. Minor GC pauses increased
  3. Old gen filled (promotion from young gen)
  4. Full GC triggered frequently (multi-second pauses)
  5. Throughput collapsed (threads blocked on GC)

The fix was obvious once I saw it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
public static void remove(String requestId) {
    contexts.remove(requestId);
}

// In request filter:
try {
    RequestContextHolder.set(requestId, context);
    chain.doFilter(request, response);
} finally {
    RequestContextHolder.remove(requestId);  // Always clean up
}

But the real lesson: observability isn’t postmortem tooling. It’s a runtime contract.

After Day 65, I added metrics to every service:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Allocation rate per endpoint
registry.gauge("heap.allocation.rate", this, 
    m -> getHeapAllocationRate());

// Object retention by type
registry.gauge("heap.retained.contexts", this,
    m -> contexts.size());

// GC pause distribution
registry.timer("gc.pause.duration");

Now I see problems before they become incidents. Systems evolve toward states you didn’t design. Without observability, you’re flying blind.


4. Virtual Threads Didn’t Kill Event Loops

Day 99: I built both a virtual thread server and an event loop server to compare them.

I thought virtual threads would make event loops obsolete. They didn’t.

For high-connection-count workloads with simple request/response patterns, event loops can still win on memory efficiency. Virtual threads excel at complex business logic with multiple I/O calls.

The real difference isn’t performance—it’s code complexity.

Virtual thread server (readable):

1
2
3
4
5
6
server.createContext("/", exchange -> {
    String data = fetchFromDatabase();  // Blocks, thread unmounts
    String result = callExternalAPI(data);  // Blocks, thread unmounts
    exchange.sendResponseHeaders(200, result.length());
    exchange.getResponseBody().write(result.getBytes());
});

Event loop server (state machine hell):

1
2
3
4
5
6
7
8
selector.select();
for (SelectionKey key : selector.selectedKeys()) {
    if (key.isReadable()) {
        handleRead(key);  // State: READING -> PROCESSING
    } else if (key.isWritable()) {
        handleWrite(key);  // State: PROCESSING -> WRITING
    }
}

Virtual threads didn’t kill event loops. They made blocking I/O viable for most cases.

Use virtual threads when:

  • Complex business logic with multiple I/O calls
  • Team velocity matters (readable code)
  • Connection count < 100K

Use event loops when:

  • Ultra-high connection counts (500K+)
  • Simple request/response patterns
  • Maximum memory efficiency

After building both, I stopped arguing about “which is better.” I started asking “what are my constraints?”


5. The Parallel Stream That Taught Me About Work Stealing

Day 100. The final lesson.

I had a CSV processing job that was mysteriously slow:

1
2
3
4
5
List<Record> records = readCSV();

List<Result> results = records.parallelStream()
    .map(this::process)
    .collect(toList());

Multiple cores available. Low CPU utilization. Cores sitting idle.

I thought parallel streams automatically used all cores efficiently. They don’t always.

The default spliterator for ArrayList splits the list in half recursively. But when processing time varies significantly between records, work distribution becomes uneven. Some threads finish early and sit idle while one thread processes the slow records.

I wrote a custom spliterator that split by estimated work (not count):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class WorkBalancedSpliterator implements Spliterator<Record> {
    @Override
    public boolean tryAdvance(Consumer<? super Record> action) {
        if (hasNext()) {
            action.accept(next());
            return true;
        }
        return false;
    }
    
    @Override
    public Spliterator<Record> trySplit() {
        // Split based on estimated processing time, not count
        long estimatedWork = estimateRemainingWork();
        if (estimatedWork < threshold) return null;
        
        return splitByWork();
    }
}

Result: much better CPU utilization across all cores.

ForkJoinPool’s work stealing only helps if work is splittable. You need to understand the work distribution, not just throw .parallel() at the problem.

Read the full investigation in Day 100.


What 100 Days Actually Taught Me

I thought I was learning:

  • How the JMM enforces happens-before relationships
  • How G1 manages regions
  • How the JIT speculates and deoptimizes
  • How ForkJoinPool steals work
  • How virtual threads suspend and resume

What I actually learned:

  • Systems are constrained by memory movement, not CPU speed
  • Scalability is constrained by coordination, not thread count
  • Reliability is constrained by observability, not testing
  • Performance is constrained by work distribution, not parallelism

The JVM is a compressed model of distributed systems problems. Scheduling. Memory management. Isolation. Synchronization. Instrumentation.

Studying it deeply forced me to think in systems.


What’s Next

Day 65 broke me because I couldn’t connect the pieces.

Day 100 taught me the pieces were never separate.

Memory leaks cause GC pressure. GC pressure causes thread starvation. Thread starvation causes coordination failures. Coordination failures cause cascading timeouts. It’s all one system.

The next phase isn’t about learning more Java internals. It’s about understanding how systems fail, how they scale, and how to design them to survive production.

That’s what I’ll be exploring in Software Scientist’s Pursuits—not as polished conclusions, but as investigations in progress. The messy process of figuring out how things actually work.

I spent 100 days looking at the JVM through a microscope. Now I’m looking at the whole organism.