Building A New Public Cloud With Rust

This is my interview with Senyo Simpson, Senior Software Engineer at Fly.io, a new public cloud company. We talked about the biggest techncial challenges Senyo has taken on at Fly, balancing Rust vs. Go, Fly's impressively distributed culture, and leveraging speaking engagements to bootstrap a career with Rust. To see jobs available at this and other cool rust companies, check out our extensive rust job board.

Brought to you by:

EuroRust by Mainmatter: Come To Paris With Us!

RustForge: A creative new conference organized by Rust In Action author Tim McNamara. $20 off.

Want to advertise here? Reach out! filtra@filtra.io

Drew: I think a lot of people might know what Fly.io is, but probably a lot of people don't as well. So, maybe we can just start with you explaining what it is.

Senyo: Fly.io is a public cloud. We like to say it's a new public cloud really focused on developer experience. The idea is that when you're using our cloud, a lot of the options and configurations just make sense out of the box. A very good example of that is setting up your VPN and all the configuration for things to talk to each other. Or, for different compute instances to talk to each other. We've got a model of that that just works. You don't have to, for instance, configure your own VPN or security groups or any of the more low-level pieces of infrastructure. Obviously, with time, we've figured out what works and what doesn't work in the public cloud space. But, overall, Fly is very invested in the developer experience.

Senyo: Within that large umbrella, we have two main sides of the company. We have our Platform as a Service (PaaS) offering, which is what most people will use when they interact with Fly. That's similar to a Heroku or a Railway, where you get a turnkey deployment system. We take care of all the infrastructure for you. On the other side, we have what we call our Machines product, Fly Machines. That's a lower-level compute primitive that you can drive with an HTTP API. So, you can start, stop, and create different compute instances. You can also have things like auto-suspend. And, you can bunch them together in ways that make sense for you.

Senyo: For example, some of our customers might want to have a pool of machines ready for when requests come in. They can then easily route them to different instances in this pool to achieve whatever objective they have. Others we’ve seen will just create new instances to run an AI code generator. So, we can do something like that where you launch a container (or MicroVM specifically in our case) that has whatever you want ready, it comes up when a request comes, and then spins down once it's done. So, Fly Machines is a much more lower-level primitive that gives you more control. The Platform as a Service is built on top of Fly Machines, DREW: You mentioned at the very beginning that the big focus is on developer experience. As someone who works on the platform, do you build a lot on top of your own platform just to have the experience of how it works?

Senyo: It's generally recommended by the company to run some of your stuff on our platform so that you get more exposure to it. On top of that, I've also been part of certain products that have used the platform directly. For instance, we're doing something called Fly Kubernetes, which is a Kubernetes distribution that runs on our platform. We didn’t build it with much special integration into the system, we just built it on top of the Fly Machines API. So, we were just a regular consumer of Fly Machines. So through those products I've worked on that are heavy users of our API, I've gotten to learn the rough edges and what works well. Also, we ended up building a lot of additional features to meet the requirements that come up in these other projects. For example, we have expanded container support now. Before, we didn't have support for multiple containers on one machine.

Drew: I feel like Fly.io fits in this big trend I see of Rust being used for cloud and infrastructure. Some of the large players in the space have made significant, very public investments in Rust. And then there’s many companies like Fly as well.

Senyo: Yeah, absolutely.

Drew: From your perspective, what is that about? Is it a performance thing or a reliability thing or all of it?

Senyo: Well, I think for most companies it's a performance thing. Rust just does a really good job when you need high-performance, low-latency services. I think that’s how it started for us. There are aspects of our systems where it just makes sense to use Rust. And then in other places we just use Go. Normally, the thing that makes us take that next step to use Rust is performance.

Drew: So how did you end up working at Fly.io?

Senyo: There was a time I was giving a lot of talks on Rust. Through that, somewhere along the line, the CTO of Fly, Jerome, became mutuals with me on Twitter. I wanted to work somewhere where I could do low-level infrastructure. And, I saw somewhere that Fly pays global salaries. That means that no matter where you are you get paid the same amount. I thought that was really cool. So, those two things made me interested in working there. At the time, I was on a short sabbatical, looking for my next thing and focusing on my masters. I DM'd Jerome and just asked him, "What is the possibility of me working here?" I'd seen some other people joining by initially making open-source contributions to Flyctl, the CLI that we use. Anyway, three weeks went by and he just popped me a DM saying, "Do you want to interview for this role that does Rust development in the company?" I said yes and that was that. Initially, I was doing my master's full-time. So, I started working part-time and then went into a full-time gig after a few months. It felt very lucky.

Drew: Well, I think there's some luck to it, but you said you became mutuals with him because you were doing a lot of talks on Rust, right?

Senyo: Yeah, pretty much.

Drew: That seems important. How did you get into that?

Senyo: I was tweeting a lot about Rust, and I started following the Rust London Meetup account on Twitter. They reached out to me at some point to give a talk. It was during Covid, so everything was online. It was very easy to give talks anywhere, because everything was online. So that was my first talk. From there I started noticing that people would mention different venues where you could apply to give a talk. I already had a talk from Rust London, so I just gave that talk at other places.

Drew: The reason I asked about that is because you said it felt like luck, but I feel like you did a lot of things to maximize your luck.

Senyo: Yes, yes. I do believe in that, for sure.

Drew: Giving the talks got you in front of a lot of people and helped start your relationship with Fly. Then, you were brave enough to message Jerome and actually talk about a job. Obviously a lot of our audience is people looking for jobs in Rust, and I think it's good for people to hear these stories about people like you being a little bit more creative.

Senyo: Yeah, absolutely.

Drew: Has Fly hit any exciting recent milestones that you would want to talk about?

Senyo: One that's close to my heart is that we just publicly launched Fly Managed Postgres. Managed Postgres is our managed postgres offering on our cloud compute platform. That has been a lot of my time for probably almost a year. Getting to this stage has been a labor of love. That's probably the big company milestone that I would mention, but it also feels like a personal milestone. Obviously, it's not just me; there's a whole team around it. But, I did a lot of the initial work, so it feels closer to home.

Drew: For whatever reason, there's a lot of love in the Rust community for Postgres.

Senyo: Yeah, for sure.

Drew: Did you use Rust heavily in that project?

Senyo: No, that one was basically all Go. The area where we use Rust most is in our proxy. We have a global distributed proxy that routes your request to your instances. And then we also have another global state propagation system called Corrosion. Corrosion basically dispatches the state of our platform. If you create new instances, we want to know what region those are in for example. That gets spread across our platform through Corrosion. That's also performance-sensitive in certain aspects. And then the last thing that uses Rust is the init process. We have a custom init in our Fly Machines. That’s the first process that manages everything else. Those are the three places that we use Rust at the moment.

Senyo: How it relates to Managed Postgres is that we needed to have machines that could run multiple Docker images or containers. For context for those who don’t know, a Fly Machine is actually a micro VM. It's backed by Firecracker. So, it's not a container. We can run multiple containers in one of these micro VMs. And, we needed that capability for our Managed Postgres solution. Someone else ended up building that whole system, but that also uses Rust and has been a bit of an adventure.

Drew: I hadn't thought about this question before, but does Fly have its own data centers?

Senyo: We have co-location partners. So, we'll put compute in certain regions. We manage all of the software infrastructure, but the actual hardware is managed by the co-location partner. We're in maybe 30 or 40 different regions now.

Drew: You mentioned that the Postgres project has been mostly Go. Is the reason that you use a lot of Go because you feel like Rust takes that little bit of extra effort to do?

Senyo: It's partially that a lot of our Managed Postgres system isn't extremely performance-sensitive. You can think of it like an orchestration tool. We manage all of the replication and failover and all that. Obviously, you could do it with Rust. And, one of the beauties of Rust is you can use it at every layer of the stack if you want to. But, we are quite a big Go shop. Go is kind of the natural tool that we lean to unless we have these requirements that push us over the edge to use Rust. Rust is difficult to learn and it's not as easy to use. Most of the company's logic is written in Go, and Go is easy to learn.

Drew: Do you know what originally led Fly to adopt Rust?

Senyo: The first project that adopted Rust was the proxy. Actually, I think it might have been the init, but the one that made Rust a big feature of our platform was the proxy. We have proxies deployed across all of the regions. When requests come in, they hit the proxy and the proxy will know your instance of your application is running in the Johannesburg region or Chicago or Sydney or wherever, depending on where your request comes in. That system has to be performant. You get tons of network requests coming in and they all need to be proxied to whichever region in the world they are.

Drew: Has the way that you think about Rust at Fly or the relationship that Fly has with Rust changed at all over time?

Senyo: I think it's mostly been that same thing where we use it when we need more performance. The other benefit is that you get a predictable performance profile. You could use a language with a garbage collector for your proxy, and it would probably be pretty fast. But, you end up with these spots when your garbage collector runs and you can have unstable latencies popping up in your network requests. Obviously, you don't want that for customer workloads. And, you want your system to behave in predictable ways. That's a huge part of why Rust is good for what we do.

Drew: How do you decide who works in what language?

Senyo: Our company is structured in kind of an unusual way, designed to give a lot of autonomy. People kind of choose their own projects. So, if there's a Rust project and a Go project, the people that know Rust are always excited to work on Rust and will kind of gravitate toward those projects. However, we’ve seen that people that would like to work on the proxy but don't know much Rust will tend to work on Go projects for a longer time. I think that speaks to the ramp-up for learning Rust. I think a lot of people have a hard time justifying taking the time to go up that learning curve when they could be productive in Go right away.

Drew: What are some of the most interesting technical problems you've faced lately?

Senyo: The most interesting one that I've worked on personally actually happened in my first three or four months. So, we basically have a mesh network. All of our hosts are connected to each other. So, all of our compute instances can talk to each other directly through what is basically a private networking system. But, this system uses IPv6. It's powered by WireGuard. Long story short, we wanted to add this private networking that got routed through the proxy and then to your instances. But, because of the way everything was working together, you basically couldn't set it up that a single host can talk to multiple other hosts, because the prefixes were the same. WireGuard doesn't allow you to have multiple endpoints with the same network prefixes. So, we do this weird trick where we have BPF on our hosts that swaps some parts around, which then makes things unique. So, we wanted to do the same thing for this routing through the proxy. For two weeks, every day I would literally sit down to learn about TCP dump and all the tools to monitor when packets go through a system. Eventually, we figured out that we had to redirect the packets manually to the specific interface that a machine was listening on. Then, we figured out a hack that if you reroute the packets through localhost the kernel will reprocess the packet and route it wherever it's supposed to go. That was two tough weeks of painful debugging, but that feature has got a lot of mileage. It’s called Flycast. It was one of the most fun technical problems I've worked on but also super frustrating. It made me realize that networking as a whole is quite frustrating. Things just disappear, and there's no way to really figure out where your packets have gone.

Drew: So you had to get smart on networking fast.

Senyo: Yeah. I started off from zero knowledge. On top of that, I had to figure out a bit of how WireGuard worked, and I've never written BPF or eBPF in my life. So, that's the job. It’s fun and exciting.

Drew: As we've been talking you've been dropping little details about Fly's culture. It sounds like Fly is a really unique company. Can you lay out those unique things?

Senyo: The biggest thing is that Fly is a highly autonomous culture. You kind of get a top-level directive like “We want to build managed postgres” and then teams organize themselves and figure out how they want to solve that problem. We don't do sprints or agile or kanban. You kind of wake up and you know what's important and you figure it out. Or, you pick up work that people have wanted to do but hasn’t been done yet. Over time, you pretty much learn to know what's important to move a project forward.

Senyo: A good example is that there are some improvements that we want to make to our Platform as a Service. There are like six different high-level things we can do. For example, we can make logs better, we can make pushing images faster, etc. No one will tell me, "Senyo, you're in charge of doing image pulling speed for Docker images." You can just decide, "I think that's an interesting problem", and if there’s buy-in to solve that problem, you can go ahead.

Drew: That's actually really cool.

Senyo: Also, the teams themselves are responsible for the way that they want to work. There could be a team that does agile. Not many engineers love agile. So, almost no one does it. But, you can use GitHub project boards to track your work or set roadmaps for your team. If you decide that that's important for you, there's a lot of levers you can pull. Sometimes, you know, you'll pull certain levers if things are really urgent or need more tracking. When we were building Managed Postgres there were points where there was so much to do that it was hard to figure out and keep visibility over everything. So we'd use a GitHub project board for that period of time, but then like three weeks later it wouldn’t be necessary anymore and would kind of fade away. You do always kind of have a manager above you or a team lead. But, the shape and form is largely up to us as engineers.

Drew: I feel like when people read this their desire to work for Fly is going to go up significantly. It sounds like a dream for an engineer in a lot of ways.

Senyo: Generally, it is. It's interesting though because there are people that have joined that just don’t vibe with the culture at all. Some people like a lot more structure than others. One way that this shows up is that a lot of people want to be told how long something should take. You can get into this weird zone where maybe you've been doing work for a long time that's not getting finished. Some people are uncomfortable with that. The way that Fly works is that you just communicate around the problem and explain why things are taking a long time. And, usually at that point people will chime in and decide whether it's worth cutting scope or whatever. Some people don't like that autonomy. For a lot of engineers, it makes sense. For a lot of other engineers, it's like, "I just kind of want to be told what to do.”

Drew: I think almost every engineer probably thinks they want to work in an environment like that, but maybe only half or something could actually thrive in an environment like that.

Senyo: Yes. That's pretty much a good way to describe it. I think it works, but it works for the people it works for. I also think this type of culture can break down if you have too many people that need a lot of structure.

Drew: The other thing that you mentioned, which I think plays into this, is that Fly is remote. Is that right?

Senyo: Yeah, fully remote.

Drew: How does that play into things?

Senyo: Fly is a fully remote, globally distributed company, so we have people all over. The biggest thing that comes with that is this natural constraint where you cannot communicate synchronously 99% of the time. You can, but it'll usually be an inconvenient time for somebody. So, most of our communication happens async. The nice thing that Fly has done, which I think has made it actually possible, is we use Discourse. We have Slack for messaging, but Discourse is like a forum platform. Any big pieces of work or anything that needs more input from the company gets put on Discourse. Also, every week or every other week, we have a synchronous all-hands. That happens at two different times to hit the different time zones. One week it'll happen at like 5 PM my time, and then the next week it'll be like 2 AM my time. So, you just join whichever one is more convenient for you. There's also no expectation to work U.S. hours. There can be times in a project’s lifecycle where that gets difficult though. So, sometimes I just don't work in the morning and work from the afternoon to the evening because my American counterparts are online at that time.

Drew: It almost sounds like it's kind of similar to open source in a way. You're managing work through written discussions on Discourse. So, another thing where you've already kind of hinted at a way that Fly is different is with compensation. Can you speak about that?

Senyo: Yeah, so Fly has the same compensation wherever you live. Each level has a salary and an equity amount, and that's that. It doesn’t vary from location to location, and there's no range. It's a very straightforward compensation system.. The four levels are junior, intermediate, senior, and staff. Depending on where you live, if the company can provide benefits, they will. Otherwise, what they do is just add some percentage of compensation on top of your base.

Senyo: I guess I should also mention how PTO works. PTO is unlimited, but you have to take at least two consecutive weeks at some point in the year. It’s really nice that they actually tell you to take time off. I think what unlimited PTO usually ends up being is whatever the culture at the company is. Luckily people take some leave at Fly.

Drew: So I imagine that for you to get paid an American salary in South Africa is pretty awesome.

Senyo: Yes. It's super sweet and is one of the big reasons why I enjoy Fly. A lot of people say money is not important, but it really is a factor. I enjoy working at Fly because I get paid a good amount of money relative to the cost of living here and I genuinely enjoy the company, the culture, the people, and the work.

Drew: Is there anything about the culture that we haven't talked about?

Senyo: One thing I will say is that Fly seems to have no fear of complexity. If there's some tooling that we're using that's just not cutting it, people aren't scared to say we can build it ourselves. I used to be of the mindset that we should not do anything ourselves as far as possible. But, with Fly, if we hit a breaking point and no solutions seem to fit, we'll do our own thing. Part of that is probably being an infrastructure company, but I just really like it.

Drew: That makes a ton of sense for a company that's building a public cloud. To me, it almost feels like building a public cloud company is very bold. It seems almost crazy to think like, "Hey, you know, every big tech company is in this business and we're going to go in here and we're also going to compete." You have to be kind of fearless to do that.

Senyo: Yes, absolutely. I think with building a public cloud, it makes sense that if you can make something that much better by building your own thing, you've got to do it.

Drew: Yeah, exactly. Well thanks so much for talking Senyo!

Senyo: Thank you!

get rust jobs on filtra

Know someone we should interview? Let us know: filtra@filtra.io