Reducing Batch Processing Latency via Parallel Query Execution
STATUS
ACTIVETIMELINE
TECHNOLOGIES
Postmortem
Metrics & Impact
- 21% performance improvement with 8 parallel threads - 44% improvement with 20 threads - Nearly 50% reduction in total wall-clock data loading time - Added full query execution observability - Introduced reusable concurrency infrastructure for batch pipelines - Implementation now active in three batch jobs with additional rollout planned - Concurrency model validated to scale safely across a distributed batch worker fleet
Roadblocks
In order to fully realize the performance gains, the database connection pool needed to be increased from 10 to 20 connections. This required validating that the parallel query execution model would not introduce: - connection contention - database instability - excessive memory consumption Once the proof of concept demonstrated safe operation, the connection pool was increased to match the thread pool size, allowing the system to unlock the full performance benefits.
What I Learned
First, thread pool sizing must align with database connection pool capacity. If these two resources are not coordinated, threads can become blocked waiting for available connections, negating the benefits of concurrency. Second, observability should be implemented alongside performance changes. By adding timing logs early in development, I was able to analyze performance across hosts using Splunk and perform statistical comparisons quickly. Finally, this work demonstrated that performance improvements are often bounded by downstream systems. During additional testing with 25 threads, the database began exceeding PGA memory limits, confirming that the database was the true performance ceiling for this workload.
Background
As part of a broader initiative to reduce daily batch processing time from approximately 12 hours to under 1 hour, I investigated several batch jobs responsible for loading staging data from one database into memory, processing it, and writing mastered data to a downstream database. During analysis of one batch job used as a proof of concept, I discovered that the data-loading method executed 17 JDBI queries sequentially on a single thread. Each query completed before the next began, meaning the wall-clock execution time for a single chunk was effectively the sum of all queries, averaging over 60 seconds to load the required data before processing could even begin. This implementation introduced two primary problems: - No concurrency: Independent queries were forced to execute sequentially. - Limited observability: The system lacked logging to measure query performance or diagnose slow execution. The goal of this work was to validate whether parallel query execution could significantly reduce batch job latency while ensuring thread safety, database stability, and safe deployment practices.