Reducing Batch Processing Latency via Parallel Query Execution

STATUS

ACTIVE

TIMELINE

Started: 2026-02-26

TECHNOLOGIES

JavaSpring BootCompletableFutureJDBISplunk

Postmortem

Metrics & Impact

- 21% performance improvement with 8 parallel threads - 44% improvement with 20 threads - Nearly 50% reduction in total wall-clock data loading time - Added full query execution observability - Introduced reusable concurrency infrastructure for batch pipelines - Implementation now active in three batch jobs with additional rollout planned - Concurrency model validated to scale safely across a distributed batch worker fleet

Roadblocks

In order to fully realize the performance gains, the database connection pool needed to be increased from 10 to 20 connections. This required validating that the parallel query execution model would not introduce: - connection contention - database instability - excessive memory consumption Once the proof of concept demonstrated safe operation, the connection pool was increased to match the thread pool size, allowing the system to unlock the full performance benefits.

What I Learned

First, thread pool sizing must align with database connection pool capacity. If these two resources are not coordinated, threads can become blocked waiting for available connections, negating the benefits of concurrency. Second, observability should be implemented alongside performance changes. By adding timing logs early in development, I was able to analyze performance across hosts using Splunk and perform statistical comparisons quickly. Finally, this work demonstrated that performance improvements are often bounded by downstream systems. During additional testing with 25 threads, the database began exceeding PGA memory limits, confirming that the database was the true performance ceiling for this workload.

Background

As part of a broader initiative to reduce daily batch processing time from approximately 12 hours to under 1 hour, I investigated several batch jobs responsible for loading staging data from one database into memory, processing it, and writing mastered data to a downstream database. During analysis of one batch job used as a proof of concept, I discovered that the data-loading method executed 17 JDBI queries sequentially on a single thread. Each query completed before the next began, meaning the wall-clock execution time for a single chunk was effectively the sum of all queries, averaging over 60 seconds to load the required data before processing could even begin. This implementation introduced two primary problems: - No concurrency: Independent queries were forced to execute sequentially. - Limited observability: The system lacked logging to measure query performance or diagnose slow execution. The goal of this work was to validate whether parallel query execution could significantly reduce batch job latency while ensuring thread safety, database stability, and safe deployment practices.

Daily Timeline

February 25, 2026

Parallel Query Execution Architecture Design

I analyzed the proof-of-concept batch method that loads staging data into memory and identified that it executed 17+ JDBI queries sequentially, resulting in 60+ seconds of load time per chunk. I performed dependency and thread safety analysis to determine which queries could run independently without shared state. Before writing code, I documented the execution plan, constraints, risks, and rollback strategy.

February 26, 2026

Added Observability for Baseline and Comparative Testing

Before introducing concurrency, I added timing and sizing logs to establish a reliable performance baseline and enable comparison after changes. These logs captured start/end timing for the loading phase, input ID batch size, and returned data sizes so I could aggregate results in Splunk and quantify where time was being spent.

Splunk

February 27, 2026

Implemented Parallel Query Execution with Safe Fallback

I refactored the method into three paths: - dispatcher controlled by a job-level feature flag - preserved sequential implementation as the fallback - parallel implementation using CompletableFuture.supplyAsync() backed by a bounded ExecutorService The parallel path executes independent queries concurrently without changing the method signature. I wrapped execution with error handling and logging, and implemented feature flags to allow global disablement while enabling selectively at the job level.

JavaJDBICompletableFuture

March 2, 2026

Centralized the ExecutorService into a Shared Spring Bean

After proving the approach worked in the DB implementation, I extracted the executor into a shared Spring-managed bean (initialized via @PostConstruct) so other batch jobs could reuse the same concurrency infrastructure. I injected the shared executor via a custom qualifier and pulled the thread pool sizing into a centrally controlled configuration value to support tuning without rewiring implementations.

Spring Boot

March 3, 2026

Tuned Concurrency, Quantified Gains, and Began Batch-Wide Rollout

I deployed the sequential implementation first to capture baseline metrics, then deployed the parallel version and reran the same dataset for comparison using Splunk aggregation and statistical analysis. With 8 threads against a 10-connection staging pool, data loading was ~21% faster. After proving the PoC and increasing the connection pool to 20, I tested with 20 threads and saw ~44% improvement, with better tail latency and data loading wall clock time nearly cut in half. Following validation, I rolled the pattern out to additional jobs with similar sequential query bottlenecks. The implementation is now active in three batch jobs and is scheduled to expand across the remaining batch pipelines using the same feature-flagged approach for safe incremental adoption and immediate rollback if needed.