Scanner- The Team Accelerating Log Analysis With Rust

This is my interview with Steven Wu, CTO of Scanner. It was awesome to hear all of the nitty gritty details about how they've architected and optimized Scanner for super fast log search! We also talked about Scanner as a company and the exciting inflection point where the company finds itself. To see jobs available at this and other cool rust companies, check out our extensive rust job board.

Want to advertise here? Reach out! filtra@filtra.io

Drew: I spent some time looking into Scanner before the interview, but the audience is going to be coming in fresh so maybe you can start with your elevator pitch.

Steven: Scanner is a petabyte scale log search and storage tool. We basically do log search and analysis at very, very large scales for cloud-specific architectures. The way we currently brand it right now is as a “diet Splunk.” It's a product that's ten times cheaper than Splunk and significantly faster than Splunk. In exchange, it doesn't have a lot of the long tail features that Splunk has. That's the current trade-off.

Steven: We sell to security people as a product that is SIEM-adjacent. I don't know if that's a familiar term–SIEM stands for Security Information and Event Management. It's basically a tool that a security analyst depends on to keep on top of things. For instance, it does detection, alerting, and it lets you do retrospective searches.

Drew: I'm curious about your background. What put you in a position to work on this problem and how did you end up working on it?

Steven: We are unusual because my cofounder and I are both just engineers with no security background. We spent a really long time not being in security. We actually started in DevOps. At our previous startup, we had a seven-person engineering team–and at some point we ended up with over a $1 million dollar Splunk bill. If anyone's used Splunk or DataDog or any of those tools, they know those bills scale up really fast. So we were like, “That's crazy. Someone should do something about this.” That's kind of how we got into it. We started in DevOps. We learned along the way though that if you're a DevOps person and you have a huge Splunk bill, you just say, “I'm just gonna stop using Splunk.” You reduce your retention to one day, pay less, and if you need data from more than one day out or three days ago, you just say, “Sorry, it's not there.” It's painful but it's not a big deal.

Steven: So we were trying to sell to DevOps for a while, but we ended up in security because it turns out you can’t do that in security. In security, you can't say, “Whoops, it was too expensive to keep the logs.” So that's how we ended up here. And I think we’re pretty unique in the security space. A lot of security companies are started by people with deep domain knowledge in security, but not necessarily in building a database or backend infrastructure. We’re kind of the opposite. We started by saying, “We can build a database that solves this better,” and then went looking for a problem it could solve. If you look at other products in the space, a lot are built on top of ClickHouse or Snowflake. We went against the conventional wisdom—“don’t build your own database”—and actually built our own, then asked, “What problems can this solve?” So I think that’s how we’re different. I don’t know if that makes us better equipped, but it does make us differently equipped—and I think that, in itself, is an edge.

Drew: So the security aspect—which is now the main driver of your business—was kind of a pivot as you figured out where the solution was best applied. Is that right?

Steven: Yeah, basically. It turns out the security guys really need this. The DevOps guys think it’s cool, but they don’t need it. They can just throw away their logs, and it’s not the end of the world.

Drew: Classic startup trajectory—that makes a lot of sense. On that note, Scanner is a startup. I don’t have a great sense of how big you are right now, but I know you’re still in the early phase. How big is the business, and how has it been funded so far?

Steven: We’re a seed-stage, venture-backed company. The team is eight people—we have an offer out, so it might be nine soon. We’re super, super early. We raised a pretty small amount of money so far.

Drew: So it sounds like Scanner is growing really well! Can you tell me a bit about the business trajectory so far? How are things going right now, and have there been any exciting recent milestones?

Steven: It took about a year and a half to build the first iteration of the product. We went pretty deep early on—building a query engine, building a storage and indexing system—so it was a slow start. We probably spent a full year just trying to figure out who to sell to. It’s only in the last two or three quarters that we’ve started to hit a real stride—where people really want to buy the product. That’s also why we’re hiring. We think we’ve found something people want, and they’re buying it. We want to get ahead of that growth so we’re not bottlenecked a year from now due to lack of engineering capacity. The kind of contracts we’re doing require some engineering support, and we really don’t want to be in a position where growth outpaces hiring and we can’t keep up. So we’re starting early.

Drew: That’s super exciting! I hope this comes through in the interview—that you’re at such a cool inflection point, right?

Steven: Yeah, it just happened, and it hasn’t really sunk in for me yet. I think, “Wait a minute, we have revenue. People actually want to buy this. We have a sales pipeline that’s quantifiable.” It still hasn’t fully clicked. I’m still in the mindset of, “I’ll just go hack on something for eight hours and go home.”

Drew: Yeah, so cool! I feel so much secondhand excitement for you!

Steven: Thank you. Yeah, I kind of feel secondhand excitement for myself too, because it still hasn’t totally sunk in.

Drew: So, one of the things that stood out when I was reading about Scanner was what you mentioned earlier: that it’s a lot faster than other tools. So what’s the trick? What’s the speed-up?

Steven: The high-level answer—which sounds kind of stupid—is that we just throw a lot of compute at it. This goes back to being cloud-native. For example, we use S3 for storage. That’s where it starts. S3 is storage-only—there’s no compute tied to it. Compare that to using something like Splunk or Datadog. In those systems, you’ve probably got physical servers or VMs on the backend—indexing servers with local hard drives, maybe some memory caches, and those machines are always on. So if you want fast queries, you need enough compute capacity running 24/7 to serve them. That means you’re effectively paying for computers to sit mostly idle—doing nothing for, say, 23 hours and 15 minutes a day—because you’re not querying constantly.

Steven: What we do instead is take advantage of the cloud's elasticity. The cloud is implicitly multi-tenant, and we can summon compute out of thin air when we need it. So, we use S3 for storage, and then when you run a query, we spin up compute very fast. The system can pull up several thousand cores almost instantly, within seconds of you hitting Enter. That allows us to throw way more compute at the problem than something like Splunk can for a similarly sized cluster—because in Splunk, you’re trying to avoid wasting CPU cycles during downtime. Snowflake is similar—you have to pre-scale your data warehouse, which means you’re again managing physical compute resources. We don’t do that. Our system is built around the idea that when you need 6,000 CPUs, the cloud just gives them to you on demand. Then we run the query. Of course, there’s a lot more going on under the hood—tight inner-loop text search logic, clever trade-offs, and a bunch of cloud-native design choices—but the core idea is: we don’t need to run compute 24/7. That’s a big part of the magic.

Drew: That makes sense. Instead of having compute that's running all the time to prepare for the moment of the query, you just summon a bunch of compute right at query time.

Steven: Yeah, exactly. And it’s interesting because the common wisdom for this kind of thing is that you want a really clever index to minimize how much data you have to read. You want to have a warm cache, keep as much as possible in memory, all of that. We don’t do any of that—because we can’t. Our compute doesn’t exist until the moment of the query. It’s completely ephemeral.

Drew: Ephemeral until the moment that you need it, right?

Steven: Right. So everything is basically a cold start, and that means we make totally different trade-offs. As an example—most indexes, like in Splunk or Elastic or whatever, will say, “Here’s this token, and it shows up in this exact byte range.” So when you search for that token, the system knows, “Oh, I just need to read this one row off disk,” and that’s super fast. Our index doesn’t work like that. Instead, it says, “Here’s this token, and it shows up somewhere in this 100MB page of data.” And the reason is that when you’re reading from S3, it takes something like 50 to 100 milliseconds just to do any read—no matter how small it is. I don’t know if you’ve seen that metaphor where L1 cache is like grabbing something off your desk, L2 cache is like getting it from the fridge, and going over the network is like flying to New York and back. Because of that latency, it almost doesn’t matter how much data we read. It takes 100 milliseconds just to get the first byte, but maybe only 10 milliseconds to read 80 or 100 megabytes after that—especially when it’s compressed. Don’t quote me on the exact numbers, but you get the idea. So, we can use an index that’s like 10x smaller, because a more detailed index just doesn’t help us. That’s the kind of trade-off I mean—we’ve made these unusual technical decisions because of the nature of the system we’re building. That’s probably the clearest example of it.

Drew: Yeah, that makes a ton of sense. I usually see this kind of thing more at the business level, where you start making a bunch of unique decisions and they all click together into something differentiated. But it’s cool to see that at the technical layer you’re getting that same kind of effect.

Steven: Yeah, and going back to the “build your own database” thing—when you do that, you get to make your own choices. If you’re using something like Snowflake or ClickHouse, those systems have already made a bunch of decisions based on the kinds of problems they are trying to solve—and those problems aren’t exactly the same as ours.

Steven: For example, both Snowflake and ClickHouse are SQL databases. They’re column-oriented, have schemas, support complex joins, and offer transactional guarantees. But when you’re building a log search system, you don’t need any of that. You don’t need read-after-write consistency because you’re not updating logs. Logs are, by definition, append-only and mostly immutable. If you are updating your logs, something’s probably wrong—or you're doing something sketchy. So we can build a system with totally different trade-offs. And those are trade-offs you just can’t make when you’re using off-the-shelf tools. With ClickHouse or Snowflake, it’s a black box—you feed it SQL, it gives you an answer. Sure, you can experiment and figure out what types of queries run faster, but you can’t crack it open and say, "Actually, I don't need ACID compliance" or "Actually, I don't care about transactions at all because I never make updates." You don’t get to say those things—because it’s not your system. And that’s what we think gives us an edge here. By building our own database, we can make different, very intentional trade-offs that improve both performance and user experience.

Drew: Yeah, that makes a ton of sense. You mentioned earlier that Scanner is built on top of S3, and I was just curious about that choice—if you could speak to it a bit.

Steven: Yeah, so I touched on this earlier, but the main reason we use S3 is that it’s really cheap. But there’s another surprising reason as well. At our scale, we’re ingesting dozens of terabytes of data a day, and we have thousands of compute cores spinning up and down quickly—the biggest challenge is actually networking. One of the great things about S3 is that it’s already a distributed system. On the back end, Amazon has already taken care of all the distribution, sharding across data centers, replication, and failover. We don’t need to worry about any of that. A lot of our bottlenecks actually come from network provisioning. When you have 6,000 cores spinning up and they need to communicate with each other and do reads, the network traffic can become a big issue. Even AWS’s network, which scales quickly, will take about 60 seconds to scale up, and by then, our cores are already done with their tasks. So, we do a lot of micro-optimizations around network routing inside the data center. It’s a bit silly that we have to do this, but it’s a huge performance factor. The benefit of using S3 is that we don’t need to worry about the storage layer’s networking. It’s abstracted away, and we know it works because it’s S3—reliable and widely regarded as the gold standard for cloud storage at scale. Now, there are a lot of competitors to S3, and they’re really good, but it’s still very reliable and inexpensive. If we had to manage our own hard drives, we’d have to handle replication, failover, network availability and all those details. So, that’s the main reason. The downside is that S3 can be slower, but the slowness we encounter is latency, not bandwidth. And for us, bandwidth matters far more than latency.

Drew: S3 is such an interesting product. It’s kind of become an accidental protocol for storage.

Steven: Yeah, that’s actually true. It’s not why we chose it, but we did realize later that a lot of people are already moving their stuff into S3 data lakes. So it actually turned out to be a great fit for us. Our interface just reads from S3, processes the data, and writes it back into S3—all within the user’s AWS account. That makes the whole process a lot easier, and I think that’s a nice benefit. I wish I could say we totally knew that going in, but it was a bit of an accident. Still, it’s been a really good fit.

Drew: Since these interviews are primarily for people looking at companies they might want to work for, I always like to ask: What’s the most interesting engineering problem your team is working on right now? It helps give a sense of the exciting aspects of the job.

Steven: One big thing we’re focused on right now is our query engine. It’s built on something similar to MapReduce but simpler, with the goal of minimizing communication. One major challenge we’re tackling behind the scenes is supporting something like joins. Joins are always a tough problem for query tools, so that’s probably one of the most interesting long-term projects from an academic standpoint.

Steven: On a more technical level, we also do a lot of work with zero-copy deserialization. To explain that a bit: When you serialize a JSON object, it’s stored completely differently in memory than your data structure. When you deserialize it, the system has to copy a lot of bytes around, and sometimes the strings are escaped, so you have to rewrite the string. Zero-copy deserialization is about creating a buffer that matches your structure directly. Then when you deserialize it, you just pointer-cast to the buffer, and it works. The benefit is huge: for us and other tools, a lot of system load comes from serializing and deserializing data over the network. By avoiding that extra step, we can sometimes cut more than half the CPU cost of an operation. A lot of the compute load comes from things like serializing and deserializing packets, whether they’re coming over the network or from S3.

Steven: Another thing we focus on is networking optimizations, like improving how we route traffic between availability zones in AWS VPCs. We also do a lot of text search work—stuff that’s close to regex, but not quite the same. Some of that is off-the-shelf, but a lot of it we build ourselves because our use case isn’t exactly covered by regular expression search capabilities. Lastly, we do a lot of work with approximate data structures. For example, we have our own HyperLogLog implementation. Should I assume people know what that is, or should I explain it?

Drew: Explain it.

Steven: So HyperLogLog is a probabilistic data structure used to solve the distinct count problem. Normally, if you want an exact distinct count, you’d use a set and store everything in it. But the downside is that it scales linearly in memory size with the number of things you’re counting. For example, if you have a trillion rows in your data set each with a UUID you need to hold a trillion UUIDs in memory, which can crash your computer. The idea behind HyperLogLog is that instead of storing every value, you hash them and then store the smallest hash you've seen. When you need to find out how many distinct items you’ve seen, you look at the smallest hash and count how many leading zeros it has. The number of leading zeros can be used to estimate the distinct count. The math behind this works because, on average, the number of leading zeros you see correlates to the number of distinct things you’ve encountered. For example, if you’ve seen one thing, the average number of leading zeros will be around one, and if you've seen three things, the average will be around two. This method allows us to count trillions of things with very little memory usage. It’s a space-efficient data structure, and while it’s not R&D—there are papers and libraries available for it—we ended up writing our own version. That’s because we want to make specific trade-offs that align with how our query system works. We have to create custom versions of these data structures to match the particular demands of our system. I hope that explanation wasn’t too technical, but that’s the gist.

Drew: No, I think it's great. The type of person who’ll be motivated by this interview is exactly the type of person who will be excited by this kind of work.

Steven: Yeah, exactly. We have a lot of projects like this.

Drew: So, the reason I first found out about Scanner is because you guys are using Rust. I’d love to hear more about how much Rust you’re using and where it fits into your tech stack.

Steven: So, Rust is our entire backend. We don’t use any other backend languages. By "backend," I mean everything from querying, ingestion, and the parser for our custom query language, which is also in our frontend via WASM. Pretty much everything deployed to a server is a Rust binary. We did have a Node.js microservice for about two months during a migration, but that was the only time we’ve used anything other than Rust for the back end. So yeah, everything’s Rust. I can explain why we chose it, but I think the audience probably already knows.

Drew: I think the reasons are clear, but it’s always good to be reminded. Tell me more about your personal experience with Rust. Were you using it before you started with this?

Steven: I’d done a little hobbyist Rust before starting at Scanner. I’ll admit, in the early days of Scanner, about three or four years ago, there was a lot of cloning and a lot of Arc usage. But now, we really love Rust because it's highly performant and gives us fine-grained control over bytes, which is crucial for us since we do a lot of work with specific byte-level data. For example, we use a custom non-Unicode string binary format to store token boundaries. This gives us faster text searches around token boundaries at query time, which is useful for how our index works. I think it would be a real pain to do this in higher-level languages, because the ergonomics for byte manipulation in higher-level languages can be really bad. In contrast, Rust allows us to do this kind of low-level work safely. Some of this requires unsafe code, but Rust’s memory safety features help us manage that risk. We can confidently use memory sharing, knowing that the code is safe because it compiles.

Steven: Also, I just really enjoy Rust as a language. I love the design and the trade-offs it makes. The fact that things are explicit, like how Option and Result are designed, appeals to me. I love that Option is a monad, and I love that it has methods like map, just like Vec does. It can also be turned into an iterator. There’s a lot of consistency in Rust’s design, and I think it’s one of the best languages I’ve worked with. There are a few parts of async that I’m not fond of, but that’s a conversation for another time.

Drew: Yeah, I know a lot of people have gripes about async. I'm really glad you shared all that detail though. There's more depth than I expected. The last set of questions I had prepared is about company culture. How would you describe your current team?

Steven: Well, we're small, so it's a bit hard to describe without talking about our values and culture. What we value most is kindness. On one hand, it's because I think kindness is virtuous, but on the other hand, there are real benefits to it. When everyone on the team is operating in good faith, you can accomplish things you couldn’t otherwise. There are parts of the solution space you can’t explore unless people are all on the same page. At bigger companies, you have to convince a bunch of people to get things done. While there’s some correlation between persuasive ideas and good ones, it’s not perfect. The magic of a startup is that you get to work on ideas that might not be obviously persuasive but are still good. You don’t have to convince a bunch of VPs, which allows for more creativity.

Steven: So, we emphasize kindness, good faith, and empathy because it lets us explore parts of the solution space we wouldn’t if we had a team full of brilliant but difficult people. A useful analogy is the difference between a database and a blockchain. A database doesn’t need to worry about bad actors, but a blockchain does. Because blockchain systems need Byzantine fault tolerance, they’re much slower and more complicated. The same is true for companies. If you don’t need to account for bad faith actors in your team, things become much easier. Big companies often struggle with simple tasks because they have to bring in external consultants just to make the obvious decisions.

Steven: I haven’t actually described the team yet. We’re eight people right now: five senior engineers, one ops person, no dedicated sales person (our CEO handles that), one designer, and a product person. As for culture, it’s not about everyone being super close friends, but rather about caring for each other. It’s a culture where people are empathetic to each other’s emotional needs and engage in good-faith discourse, assuming we’re all on the same page—not just about wanting to make a great product, but also about genuinely caring for one another. That last part is hard to achieve, but once you have it, it makes everything else much easier.

Drew: I really appreciate that you lean into kindness. It’s clear from my interactions with Alex and now with you that it’s important to you guys. So, you’re doing some hiring now and likely more in the future. What do you look for in new hires?

Steven: I touched on some of this with the culture, but the most important thing for us is that we don’t want to hire assholes. We don't care how smart you are if you're an asshole. That’s number one. We also have pretty good talent density right now, and that’s an advantage. If you have twice as many people but they produce the same amount of work, there’s a qualitative difference in the kind of work you can do. More people equals more overhead, and it’s less nimble. So, we want people who are kind, empathetic, and ideally good at coding. I know that’s very generic, but those are the main things. Right now, we’re leaning a bit more towards senior people because the work we do can be pretty deep. I wouldn’t say it’s not junior-friendly, but we do a lot of very tight database work or tasks where understanding how computers work at multiple levels of the architecture is crucial. We do things where understanding vectorization and SIMD, and things like cache lines, is important, because that's the level we’re working at sometimes. Of course, we also do tasks like running CRUD queries against databases and printing out results. But the features we’re building tend to be deep, technical ones, so we’re looking for people who can dive into that kind of work. I’m not sure if that answers your question properly, but that’s what we’re looking for.

Drew: That’s a good answer, and I don’t think you need to worry about the elitism piece. There’s an element of not wanting people to be overwhelmed by problems they might face, right? It’s fair to say that these problems could be overwhelming for someone just starting out.

Steven: Yeah.

Drew: With this next question, feel free to speak as much or as little as you're comfortable with, but how do you think about compensation? Things like equity, benefits, that kind of stuff?

Steven: We try to do as well as we can in that regard. One stat I can share is that nearly everyone employed by us is making more at Scanner than they did at their previous job in terms of cash compensation. Now, we obviously can’t match big tech companies like Google. If you’re an L7 at Google, we probably can’t match that. But we don’t want to be stingy. We want to hire good people who are skilled at what they do and compensate them fairly—maybe even slightly more than their worth. A lot of companies deny people raises, and those people end up leaving, meaning the company loses productivity. Why go through that when you can just pay people what they’re worth? It’s cheaper in the long run than trying to stick with low compensation, hiring new people, and losing productivity. So, overall, I think our compensation is pretty good.

Drew: Got it. That makes sense. One thing I noticed is that you’re hybrid in the Bay Area, and a lot of companies are moving to a full return-to-office model. Why have you chosen hybrid?

Steven: The reason we’re hybrid, and not fully remote, is because people want to have a day where they can see each other. Our policy is to have one day—Tuesday—where everyone goes into the office. The rest of the week is flexible. It turns out people enjoy coming in. Remote work is great, but it can also feel lonely. We’ve found that, at our scale, there’s a lot more operational efficiency when people are in the same room. Things like impromptu conversations lead to good ideas and decisions, which wouldn’t be as easy over Zoom. It’s zero overhead to discuss an idea during lunch rather than scheduling a meeting. We didn’t set out to have a hybrid model for principled reasons. It’s more about doing what works for our team. We’re not a “butts-in-seats” company—we sell technology, not office hours. We’re hybrid because it’s what people are comfortable with, and as long as the work gets done, we’re flexible.

Drew: That makes sense. Especially as a startup, there’s that “in the foxhole together” mentality.

Steven: Yeah, exactly. Just having people physically there with you in the foxhole—it’s symbolic. It’s different than calling people over Zoom and trying to connect that way.

Drew: Totally. Okay, last question: Is there anything you wish people knew about Scanner that we didn’t touch on?

Steven: Not really. I think we’ve covered the main points!

Drew: Well thank you Steven!

Steven: No problem!

get rust jobs on filtra

Know someone we should interview? Let us know: filtra@filtra.io