My Research Journey into Rust & Performance: Solving the 1BRC Challenge ⚡️
I am software developer, primarily working on the nodejs, graphql, react and mongoDB.
A little over a year ago, I got curious about the 1 Billion Row Challenge (1BRC). It seemed like the perfect playground to test Rust’s performance chops — 1 billion weather station measurements, aggregate per-city statistics (min, max, average), and do it as fast as possible.
At that time, I went down a rabbit hole of Rust performance research, experimenting with naïve approaches, multithreading, and low-level optimizations. I never wrote about it back then, but looking back, the lessons are worth sharing. So here’s my journey — from 12 minutes → 2 mins → 10 seconds.
Stage 1: The Naïve Rust Approach — 12 Minutes ⏳
I began with a straightforward solution:
Load the file into a string.
Split by newline.
Parse each line into
city;temperature.Aggregate results in a
HashMap<String, CityStats>.
It was idiomatic Rust, safe, and simple. But it took 12 minutes to finish.
This stage gave me a baseline, but it was clear that high-level string parsing was eating performance alive.
Stage 2: Embracing Concurrency — 15 Seconds 🚀
My next line of research was parallelism. Rust provides great abstractions like std::thread::scope and Arc<Mutex<T>>, so I divided the file into thread-safe chunks aligned on newline boundaries. Each thread processed its own slice of the file and then merged results into a global HashMap.
The speedup was dramatic — down to ~2 mins.
This was my first “wow” moment: Rust’s fearless concurrency makes scaling across CPU cores approachable and safe. But something was still bothering me — parsing overhead.
Stage 3: Researching Parsing Costs → Working with Bytes — 10 Seconds ⚡️
I dug deeper into how Rust handles strings and UTF-8. My research led me to an important insight:
Strings are expensive. Bytes are cheap.
Every conversion to String or &str was adding overhead. So I restructured my code to work directly on raw u8 arrays. Instead of treating the file as text, I processed byte slices and converted only when strictly necessary.
This optimization cut execution time almost in half — from 2 mins to ~10s.
At this point, profiling showed something surprising:
~4s = actual computation.
~6s = just loading data from the SSD.
That meant I had reached the I/O limit of my hardware. Any further improvement would require tricks like memory-mapped files (mmap), SIMD parsing, or asynchronous I/O.
Lessons Learned 📚
This wasn’t just about solving a coding challenge — it was a research journey into Rust’s performance model.
Naïve is necessary. My 12-min baseline gave me something to measure against.
Concurrency matters, but parsing dominates. Threads gave me my first big win, but eliminating string parsing was the real breakthrough.
I/O is king. Once your code is fast enough, the bottleneck shifts from CPU to hardware.
Rust shines in performance-critical paths. Working with raw bytes in a safe way is exactly where Rust feels both low-level and empowering.
Code Snapshot: Processing Data with Bytes
Here’s the core of my final approach:
fn process_data(data: &[u8]) -> HashMap<String, CityStats> {
let mut map: HashMap<String, CityStats> = HashMap::new();
for segment in data.split(|&byte| byte == b'\n') {
let mut parts = std::str::from_utf8(segment).unwrap().split(';');
if let (Some(city), Some(value)) = (parts.next(), parts.next()) {
let val = value.parse::<f32>().unwrap();
match map.entry(city.to_string()) {
Entry::Occupied(mut e) => {
let s = e.get_mut();
s.count += 1.0;
s.sum += val;
s.min = s.min.min(val);
s.max = s.max.max(val);
}
Entry::Vacant(e) => {
e.insert(CityStats { min: val, max: val, count: 1.0, sum: val });
}
}
}
}
map
}
Closing Thoughts 💡
This project was less about “solving 1BRC” and more about understanding Rust at the performance frontier.
I started with high-level Rust (strings, safe iteration) and ended up optimizing down to raw bytes. Along the way, I learned how multithreading, memory access patterns, and I/O limits interact in real-world workloads.
Right now, my solution runs in 10 seconds, where 6 seconds are I/O bound. That means the core algorithm is blazing fast — and any further speedup requires going beyond CPU optimizations into system-level tricks.
This experience has convinced me: Rust isn’t just about safety. It’s about giving you the tools to write code that’s as fast as your hardware will allow.





