Rewriting CAD in Rust

The following is my interview with Adam Chalmers of Zoo. If you're anything like me, you'll be fascinated to hear how they're creating a new paradigm in CAD. You'll also learn a lot about Adam's time at Cloudflare and why Cloudflare is in love with Rust. These interviews, along with the Rust Jobs Report, are part of our effort to educate about the state of the Rust job market. To see jobs available at this and other cool rust companies, check out our extensive rust job board.

Drew: You’re an Australian living in Texas. How did that happen?

Adam: I did a study abroad semester at the University of Texas, and I had a great time. So, I was living in Austin that semester, and I made a bunch of friends. When I graduated from university in Sydney, I had a job offer from Atlassian. That was going to start in February, but I graduated in November. So, I did some traveling. While traveling, I bumped into my friends in Austin and stayed with them. They happened to have a job opening at their company, so I applied. I was still fresh on my interview questions, data structures and algorithms and so forth, so it didn’t sound too bad. I ended up finding out that Americans pay software engineers much better than Australians. There’s just much more demand here. It’s also cheaper to live in Austin than Sydney, which has similar housing problems to New York and San Francisco. I knew that I wanted to end up in the U.S. for work anyway, so when that job came through my friend it just happened faster than I thought.

Drew: I haven’t ever talked to someone about immigration in these interviews. Have you been able to get citizenship?

Adam: I’ve got a green card, but I haven’t been here long enough to get citizenship. I’m not sure I will end up pursuing citizenship. It might be tough. I’m already an Australian and a German citizen. I don’t know if I could get away with a third citizenship.

Drew: Okay. The green card gives you permanent resident status though, right?

Adam: Yeah, exactly. So, citizenship basically would give me the ability to vote. I figure if I want to engage politically there are more high impact ways than just casting one vote. Any kind of political action will probably move more than one vote. So, that works fine.

Drew: Is a green card fairly easy to obtain as a software engineer?

Adam: That was actually through marriage. As a software engineer, it would have been more difficult. However, there is a special visa class that applies to Australians. It’s called the E-3 visa. So, if you’re Australian, you have a tertiary degree, and a job offer that requires that degree, you can usually get that visa. But, there isn’t really an immigration pathway. So, it’s assumed that you would move back to Australia eventually. If I wanted to immigrate through work, I’d have to convert the E-3 to an H-1B and then start applying through the green card lottery. Or, I could become so talented that a company would be willing to go through the legal work to prove that they need me.

Drew: Having studied your blog a bit, it sounds like you have a long history with Rust. How did you get involved with Rust?

Adam: A friend told me I should try Rust. I went through a real programming language dilettante phase where I was picking up all sorts of different languages. This was in 2017, which was also my first year working as a professional after graduation. I was doing The Advent of Code. Are you familiar with AOC?

Drew: I’m familiar. I haven’t done it yet.

Adam: They’re a lot of fun, especially if you’re experimenting with new languages. I started doing The Advent of Code in Haskell. But, my coworker said I should try out Rust. He was a C++ guy, but he was also very interested in functional programming and really writing bulletproof code. So, he really liked Rust because it gave him performance and correctness. I didn’t care so much about performance, but I liked that I could use all my functional programming idioms. And, unlike pure functional languages, I could also just drop down to using something simple like let mut. You know, sometimes you want to just open up a file and write some bytes out. You don’t want to necessarily learn a whole IO monad system and kludgily tie together a solution. So, I tried Rust then. I thought it was interesting, and I liked it, but I didn’t use it for anything.

Adam: A while later, my same friend left RetailMeNot and went over to Cloudflare. He always talked about how fun it was to work there. So, I moved to Cloudflare and worked for him. At that time, Cloudflare was in the middle of a big push to Rust. For the work there, I had some options of what I could use, but I really liked Rust compared to Go. It allowed me to still use functional programming ideas. I like algebraic data types in particular. And, I could do this all without losing performance or going to a weird system like Haskell where no one else would be able to read my code or I would lose tools like package management.

Drew: That starts to answer one of my other questions. Cloudflare does a huge amount of Rust at this point. What spurred Cloudflare to adopt Rust so heavily?

Adam: I think you’re one of the few people who understands just how much Rust there is at Cloudflare, because you actually have the data. So, I’ve known for ages that Cloudflare was one of the biggest players in Rust, but not a lot of other people did know that. For a long time now, I’ve always told people looking for Rust jobs to check out Cloudflare. So, it's great to see filtra always highlighting that.

Adam: Toward your question specifically, Cloudflare is many different things. There’s a CDN, there’s a DNS system, a DDOS system, and all of these things are really latency sensitive. The whole model of Cloudflare is that it sits between the users of your website and your website itself. So, fundamentally, it's always adding another hop for the packets. So, it’s really important that all the applications in Cloudflare be very latency sensitive. Because of these performance requirements, Cloudflare was mostly written in C++. Then, we had a really bad bug because of C++! There was this vulnerability called CloudBleed. You can look it up. I think it was similar to HeartBleed, the openssl problem. Basically, Cloudflare was leaking private data from users to the visitors of the website. Obviously, as soon as they understood this, they rushed to mitigate it and understand the damage. The next question was, “How do we prevent things like this from happening again?” Basically, they made the decision that Cloudflare no longer does memory unsafe languages. We rewrote everything we could out of C++ into safer languages. At the time, the only option really was Go. So, Go became essentially the blessed language at Cloudflare for all new projects. There were a few projects at Cloudflare where the Go garbage collection was going to be a genuine concern. The people on those projects sought out alternatives. Around that time, Rust was starting to mature, and some of the teams I mentioned started putting in the effort to use it themselves. Slowly, as the case for it grew, people started building out Rust integration in all of the Cloudflare tooling and such.

Adam: So, because Cloudflare has these dual obsessions around preventing bugs and minimizing latency, Rust really grew there. If you’re trying to optimize on those two fronts, Rust is really the only game in town.

Drew: That’s an incredible story, and I’m surprised I’ve never heard that before. That so precisely explains when and where Rust can really be a no-brainer.

Adam: I think I saw a reddit comment recently on a Cloudflare article about deprecating Nginx in favor of a home-written Rust proxy. The commenter said something about how every post about Rust in production seems to be something about a Proxy. Some Tokio person replied that if you’re a Proxy, you really need maximal security and maximal performance. So, Rust is kinda the only game in town.

Drew: What were you doing with Rust at Cloudflare (as much as you’re able to speak to)?

Adam: When I joined, I was on the Cloudflare Tunnel team, which I can only describe as production-grade ngrok. It’s a way to put your services on the internet without exposing any ports, opening anything up, or putting any firewalls on your origins. In fact, you don’t need to have a public IP address. You can totally run it from a Raspberry Pi on your local network. Say you have a little Python server that you want to put on the internet without buying an IP address. You run your Python server on localhost:8080, and then you install cloudflared from some package manager, and it connects to your local service and the Cloudflare CDN. Whenever a request comes in for your website, Cloudflare gets the request and does all of the usual Cloudflare stuff, then it proxies that request over a series of long-lived TCP connections to cloudflared running on your home network. Then, it all goes back up the chain. Because cloudflared makes an outgoing request to the Cloudflare CDN, you don’t have to have any ports open on your server.

Adam: On top of Tunnel, we built a lot of software defined networking stuff. So, you could run multiple instances of cloudflared that all get load-balanced. You can map it to private IPs within your home network. I joined that team just as it was leaving beta. So, the first year was implementing a whole bunch of features to make it usable for real users. Load balancing was the big one. We also added free tiers for people to try it.

Adam: In the second year, we started to get real traffic. That meant we basically spent the second year fighting fires. As part of that, we started using Rust for a monitoring framework. It would periodically start a tunnel and make sure everything was working.

Adam: In the third year, we finally got enough breathing room to rearchitect the system. So, at that point we rewrote the backend in Rust. It was previously in Go. We used Actix web and Diesel to connect to the database.

Adam: In my fourth year at Cloudflare, I switched to a team that was looking for someone who already knew Rust and Cloudflare. Later, actually got to start my own team. So, we were part of the Gateway team, working on a product doing Data Loss Prevention. This is a product for very large companies who want to have total visibility into traffic being sent out through their network. For example, if someone hacked in and wanted to steal all of your user data, you can basically scan the content of all of the responses leaving your network. And, if anything looks suspicious, you can block it or log it and take action later. So, obviously this is doing TLS interception. It’s something that makes sense for corporate devices, but it's not something you’d want on the broader internet for privacy reasons. So, we wrote yet another Proxy in Rust for that! But, this proxy could run various scans on the traffic as it was moving in and out of the network. So, that’s kinda my history at Cloudflare.

Drew: Sounds like you had a great run there!

Adam: I really loved working there. They have such a depth of talent. I could always ask really detailed questions about fundamental systems built decades before and end up finding someone who was literally there when the decisions were made.

Drew: So, my last question about your involvement with Rust. What’s the deal with this new String type you’re trying to standardize? What was it called? Strang?

Adam: (Laughing) Ahhhh, Strang! When paid verification first happened on Twitter, I changed my name from Adam Chalmers to Rust, and I changed my profile picture to the Rust profile picture and started doing what some might call fake news or disinformation, but I considered it just a subtle form of comedy. I did things like announce that Strings weren’t confusing enough so we were adding a new String type called Strang. Or, there was one that was like, “We’ve heard your feedback and we’re now going to allow you to have two mutable borrows of the same data at the same time.” For those who aren’t familiar with Rust out there, the fact that the language doesn’t let you do that is kinda the whole point!

Drew: That is really funny! So were you the one that got the foundation to start rewriting the trademark rules?!

Adam: Oooh, yeah! Totally. (Laughing) I hope not…

Drew: I’m only kidding. I’m sure it wasn’t you.

Adam: (Laughing) If it was me, wow, I am sorry. That spiraled way out of control.

Drew: Okay, we have to talk about Zoo. How did the company get started?

Adam: I love this story, because it’s such a great way for a company to start.

Adam: So, Jess Frazelle is a software engineer with a great background working on things like Docker fundamentals, Kubernetes, the Go language, and stuff like that. She co-founded a company called Oxide, which is a pretty big name for those who follow the Rust space. Oxide makes their own servers and racks. As part of the process of designing the servers, they had to model things. So, they had CAD files where they were modeling these racks in 3D. These were huge CAD files.

Adam: A lot of CAD software out there is built on the same fundamental mathematics that haven’t really changed since the 70’s. It’s called a CAD Kernel. One of the problems with these kernels is that they were really designed for a different time. They can’t take advantage of all the cores in modern CPUs, or the massive parallelization from GPUs. And they have a lot of other performance problems. So, for example, when you copy something in a CAD program, sometimes it’s enough to know that the part is identical to a previous part and the program can just make a pointer to the previous part. But, a lot of the time, it's not. Sometimes when you save the file, the file format can’t understand pointers. So, everything has to basically be serialized. Now, when you have racks of thirty two servers in the case of Oxide, you’re duplicating memory by 32x. It can take over a day to open some of these files! Imagine having to sit and wait for that to open, or having to plan out that far in advance just to open the file.

Adam: When Jess was struggling with this, she called up her friend Jordan Noone, who is the CTO of Relativity Space, which 3D prints rockets. She basically said, “Hey, I hate using this CAD software. I must be doing something wrong. What am I doing wrong?” He said something like, “No, that’s right. This is just the state of CAD.” Both companies were being held back by the state of CAD. So, they wrote this open letter asking who was solving this problem. They got lots of responses back saying that this needed to be solved but none saying that it was being solved. So, Jess started a big research deep dive into CAD. She even published a paper about it with the ACM. Both Jess and Jordan at this point felt like their companies were pretty well established, so they started Zoo.

Drew: That makes so much sense. I was digging around a lot trying to find founding stories and such, and I never found that.

Drew: So, when I was researching, I noticed that Zoo is really creating a new paradigm for CAD, where the designs are code defined and the code that defines the design can be maintained just like any other code. Can you explain how that works a bit?

Adam: Yeah, one big thing is that this company has a lot of software engineers, but the user likely won’t be a software engineer. Rather, our users will be mechanical engineers who’ve done some programming, perhaps in university, but haven’t really touched it a lot outside of that. So, why would we make a code-CAD product if that’s our user? It seems like something software engineers would do, but it doesn’t seem like something hardware engineers want. But, we think that code is really well suited to CAD for a number of reasons. For one, there is no version of git for CAD. I’ve talked to so many of my friends in civil engineering who say that when they join a new project they email someone and ask for a copy of the latest file. Once they’ve made some changes, they email it back, and it ends up being called something like bridge_2_final.stl or something. So, you can’t do any real version control on these files. If you want to see the changes between two files, you basically open them up side by side and eyeball it. If you represent the CAD model in code, you can use the existing tools like git and you get diffs for free. We’ve built a product on top of this that does visual diffing.

Adam: The other big benefit of using code is the amount of repetition in CAD. I mentioned the example of the server rack where you have to duplicate the server thirty two times. Well, what if you want to adjust some spacing on those servers? All of this is basically just math, and code is a notation for doing math. So, it makes sense to store it as code because you don’t get that serialization step I mentioned before. The code will understand pointers and know not to inflate the memory 32 times.

Adam: Using code does introduce a usability problem. So, what we’re doing is a dual panel layout that’s like a markdown editor where you see the code on the left and the 3D panel on the right. You can update the model either visually or through the code. So, you can do all the visual things you’re used to, but you’ll see the code updating as you go and start to learn it. I think that bidirectional editing is going to be the key for addressing usability.

Drew: That makes so much sense to me. Of course, I’m a software engineer, so it’s probably going to make sense to me! But, I think it makes sense for the problem as well.

Adam: We really think this is going to be worth it. So, we’re trying to make the transition work. I’ve been talking a lot with Josh who is our main solutions engineer. He’s a mechanical engineer, and when I first showed him a demo of the language and that you could do fully procedural design, he was confident that people were going to find that modality powerful enough to deem it worth learning the code.

Adam: That brings me to the other really big reason we wanted to go with code. If you want the computer to help you design something, you can’t just say “make another one of these.” But, if you’re using code you just describe the properties of the object.

Adam: Also, we’re living in the time of really rapid progress in AI. And, you can’t really tell an AI to generate a CAD file, but if you have the model defined in code, you can use Copilot. So, we’ve created our own language for this, and it's incredible to see that Copilot has started to understand some of it. The potential for AI integrations is cool as well.

Drew: What’s your role at Zoo and how did you end up there?

Adam: Well, I’ve followed Jess on Twitter for ages. She posts great tweets. When I was joining Cloudflare, I was looking for information on containers and found a bunch of Jess’s talks about Docker. So, I started following her there. Eventually, I decided it would be awesome to work on anything Jess was working on. At some point, Jess posted that she was hiring Rust engineers to work on the HTTP backends at Zoo, and that sounded perfect for me. So, I started working there pretty quickly after.

Drew: I think there’s something to that strategy you used where you sort of plotted your career based on wanting to work with Jess. I feel like sometimes we underestimate how much of an impact the people we work with have.

Adam: Absolutely. There are definitely people I know through Twitter or other social media who’ve done similar things. I’ve actually been able to place people at Cloudflare by posting job offers on social media. One of my coworkers at Cloudflare got the job just because he saw an I’m hiring link at the end of one of my posts.

Adam: I started at Zoo working on the backend servers, which are all Rust. We’re very Rust forward at Zoo. I still mostly work on the backend stuff. But, I do whatever is needed. We use a lot of wasm on the frontend as well, so I’m now doing a lot of wasm compiled from Rust. For example, I’ve been working on programming language tooling lately.

Drew: The language tooling sounds like fun.

Adam: It’s really fun. It’s not something I’ve ever done before, but I really like parsers. I think they’re fun to write.

Drew: They are!

Adam: I wrote a lot of blogs about Nom back in the day. However, I never actually got to use it in production. But, I actually just finished writing the parser and tokenizer in Winnow which is a new fork of Nom.

Adam: I don’t have any background in programming languages, but when I joined Zoo I saw a demo of this code-CAD thing. It took me a couple of weeks to realize how exciting it was. At some point I realized that it was this really interesting new idea, because we were making our own language for it. That meant that we didn’t have a lot of the baggage that a normal programming language has, because we’re not compiling to an executable. We’re basically “compiling” into API calls. So, we didn’t need facilities for things like opening a file and writing bytes or making an API call. It’s basically a pure, functional language. So, it can be really perfectly tuned for just CAD work. This makes the language a really interesting playground for programming language ideas. For example, we aren’t doing this now, but in the future I’d love to have rich measurement types instead of general purpose numbers. So, you could have a distance type that cannot be mixed with a quantity type. For instance, you shouldn’t be able to say that you want two kilometers gears.

Drew: I always like to ask people for an example of a really hard problem they’ve had to work on recently. Does anything come to mind?

Adam: As I mentioned, I recently rewrote the parser for Zoo Language (KCL). It was originally written in typescript, and then we ported it to Rust to make it wasm capable. So, I’ve been rewriting it to use Winnow, that new fork of Nom. That was probably the longest project I’ve done so far. The parser was doing all of this manual recursion, and converting that over to Winnow took me about two or three weeks. That was hard, but I was lucky that it was a really well tested codebase to begin with. So, I’d just switch over the parser and start parsing a subset of the language, see which tests were failing, and then I’d figure out why and add the next feature. I got to like 90% of the tests passing, and at that point I had implemented all the features, so all that was left were just bugs in my implementation. For the next week or so, every time I fixed one thing I would break another. A lot of that was because KCL, despite being relatively simple, is enough of a general purpose language that the parsing is pretty complex.

Adam: One interesting thing about KCL is that we do a lot of machine generation of code, because whenever a user does something like even just dragging a line around, we have to change the code. This was actually the reason we wanted to do our own programming language in the first place, because we needed an AST that we could modify based on what the user is doing in the visual editor. And, we have to do that all while keeping things similar to the previous source code. For example, when you update something, we don’t want it to remove all of your comments. Most programming languages just remove comments before they compile. In our case, comments are included in the AST. So, now we have to do things like parse comments.

Adam: So, anyway, having an AST that’s flexible enough to support not just parsing but code generation as well was more complicated than I expected, especially given that I don’t have any background in this stuff. So, I did a lot of learning on the fly.

Drew: Those are the best and worst projects. It’s so overwhelming, but it’s also so fulfilling.

Adam: Absolutely. So, when we rewrote the lexer, I think it was 10,000x faster. I found a quadratic behavior that when removed accounted for most of the speedup. If you spend much time on the Rust reddit, you’ll often see people posting something about their Rust code being slower than Go or JS or something. And, one of the common performance footguns is trying to get the nth character of a string. You have to parse the string into utf-8 code points to get the character, and that is an O(n) process. So, if you’re in a loop and you call for the nth character n times, that is going to be quadratic. It ends up being better to parse into characters once and then just have those characters in a vector that you can always access. It’s an easy trick, but it can make a dramatic difference. And then in the parser, we got about a 20x speedup just by rewriting it to use Winnow.

Drew: That sounds like a really cool project.

Adam: It was. I think I’m actually going to put up a video walkthrough of the code. All of the language tools are open source, so you can go into the Zoo modeling app and start using it. Or, you can also take a look at the code and see how all the AST and parsing stuff is working. I think we’re only the third major user of Winnow, so that walkthrough might be a good tutorial for Winnow also.

Drew: Okay, yeah. I saw that you’ve started posting some videos. What are you calling those?

Adam: I think we’re going to call it Zoo's Rust Club. That’s at least what we’re calling it internally. Every week at Zoo I do an hour of Rust learning for the people at the company. And, we always record them because we have people in different time zones. We realized that these would actually be great standalone content for YouTube or elsewhere. So, they’re on YouTube. We just posted an hour of Winnow 101 basically, which I think is the first video content on Winnow. I’m happy we’re able to share this stuff to let people outside of the company learn from it too.

Drew: One more thing I wanted to ask about Zoo was just about your personal transition. How was it moving from a large corporation to a very new startup?

Adam: It was actually pretty straight forward. The biggest difference was going fully remote. Cloudflare traditionally had a very butts-in-seats approach, but of course the pandemic made them rethink things. The pandemic years ended up being really good for Cloudflare just because of the general internet growth. So after that period, Cloudflare decided they were all-in on remote work. They still have offices if people want to work from them, which I did. I used to go to the office twice a week, and I enjoyed just being on the same whiteboard with my manager. So, I was hybrid there. But, Zoo is entirely remote.

Adam: Being entirely remote has its perks and its downsides. For example, you can hire from anywhere. Kurt Hutten, who I believe was Zoo’s first hire, was a big code as CAD user before. He lives in a small town in Australia. So, that was hugely beneficial. Similarly, we were able to hire a guy named David who lives in Cambridge and worked on GPU drivers for Arm. He wrote the Rust gltf crate. Gltf is an important format for 3D files. And, we could just hire him where he was as well. I do miss being able to have a conversation in person with peers and sharing the same whiteboard and such. We actually just got back from our first ever Zoo all-hands. So, we all went to Malibu for a week.

Drew: Oh yeah, you mentioned that. That must have been great!

Adam: Yeah, it was a lot of fun. But anyway, the biggest change was going from hybrid to fully remote. In terms of the size, Cloudflare operated in small teams, and the teams did all of their own deployment. So, it felt small. So, moving to Zoo, which is a smaller organization overall, didn’t feel all that different. I’d also seen my team at Cloudflare, the Data Loss Prevention team, go from 2 people to 7 on a startup-ish roadmap.

Adam: The one other thing that is different going to a smaller company is there’s no platform to support you. So, we have one guy who manages our Kubernetes. And, apart from that, we have to manage our own stuff. There’s no database expert team or something like that we can go to. But, there’s also really never been a better time for small teams. Cloud computing has made things much easier, and cloud itself has gotten a lot easier because of CI. We run everything off of GitHub actions.

Drew: That’s the end of the questions I had prepared, but I always like to ask if there’s anything you wish we had addressed?

Adam: You hit a lot of the stuff I wanted to talk about. There is one thing I’ve been thinking about right now that I’ll mention. It’s about the distinction between traits and enums. So, we’re making an API for CAD. Zoo runs in the cloud so that you don’t have to buy an expensive machine to run it. If you’re going to do that, the API has to be really well documented. So, we have an API server that we generate an OpenAPI schema from. And, we use a library called Dropshot to keep the schema up-to-date and accurate. From that OpenAPI schema, we also generate clients. Since the schema is always correct and the clients always come from the schema, we know the clients are always correct. So, that keeps the backend, the client, and the docs all in sync. It’s very cool. But right now, I’m running into this issue where I have a list of commands that we can accept. It makes sense for that list to be an enum where we have one variant per command. And, this gets used for deserialization. But, I want them to each return a different response type. Enums can’t have a method which returns a different type depending on the variant. For that, you’d need traits, where each implementation of the trait can have a different associated type for its Response. But, then I still need something to provide all the possible types to try deserializing. So, you still need an enum. So, there’s this mismatch that’s been in the back of my head. Perhaps someone reading this interview will have a great idea of what to do instead. I’m curious how other Rust APIs are solving this. In part, this is complicated because of the OpenAPI docs. We need to be able to generate a standard input and output for everything. We have something that’s working for now where we use some macros to fix some of the expressivity problems, but I wish we had a more ergonomic system.

Drew: I like that you finished off with a question for the audience. I’ll be very interested to see if someone reads this and has a solution.

Adam: Well thanks so much for giving me the chance to talk Rust Drew.

Drew: Sure thing. Thanks for doing it with me.

links:

1. A New Era for Mechanical CAD

2. Zoo Rust Club

get rust jobs on filtra

Know someone we should interview? Let us know: filtra@filtra.io