PlaintDB Serves - another milestone reached

ecton

It's been a productive couple of weeks since I introduced PliantDb. I merged the pull request enabling client/server communications. The journey took a little longer than I had anticipated, but that's for a few reasons. Ultimately, I want to stress something: You can be extremely productive in Rust.

If you want to just learn about how PliantDb's engine works, my previous post goes into more detail. Or, you can listen to my talk at last Saturday's Rust Game Dev Meetup. Today, I'm going to talk more about the process of developing.

My journey from Rust-noob to when I began PliantDb

My Rust journey began a few years ago when I haphazardly threw together a small tool to wait for AWS CloudFormation stacks to reach a "complete" state. The official AWS CLI application allows you to wait for a single state, such as "UPDATE_COMPLETE," but not for one of many states (or any state matching a COMPLETE-like status). So, I wrote a simple tool using rusoto. I liked the idea of Rust, but it didn't click for me yet. Stubborn me didn't actually read the book.

Fast forward to when I'm daydreaming about quitting my day job to pursue game development. At that point, I firmly believed in the idea of why Rust was a big deal, but I still hadn't done anything beyond that simple tool. When I quit my job in November 2019, I had only started trying to dive into Rust full time.

PliantDb's initial commit was on Friday, March 19. I know I began writing code that morning because I kicked off the day by having a conversation with one of my former business partners: "You'll never guess what I'm seriously thinking of doing after we end our call."

When I told him, "I'm going to write my own CouchDB-like database," he protested in the fashion he always would as we debated ideas back when we ran our business together. Within a few minutes, I had sold him on the idea, which gave me the last boost of confidence I needed to embark on what most developers would consider a foolish endeavor.

Tackling async compatibility issues

I settled on sled after evaluating the landscape of available BTree-like data storage layers. It's a complex project, but it's well-tested and is fairly widespread in use. From the initial moments of designing this architecture, I was thinking of how to fit it within sled to utilize its transactions to ensure ACID compliance.

This fundamental decision wasn't without downsides. The primary of which is that sled isn't "compatible" with async/await in Rust. What I mean is that if you're trying to integrate it within an app that uses tokio, for example, you either need to operate sled within its own thread pool outside of tokio, or you need to use blocking wrappers such as spawn_blocking. These come with their own downsides, such as long-running blocking tasks requiring to worry about tasks on the currently blocked thread not executing.

For today, I've chosen to use my best guess as to the best type of blocking wrapper for each type of operation, but the long-term goal is utilizing a new async executor that Daxpedda is working on. It's compatible with tokio, but it already has a concept named block_on_blocking, which is an optimized version of blocking designed to more fairly block without needing to adopt a 'static lifetime requirement due to using spawn_blocking. He's about to resume working on the executor, but he was responsible for the QUIC-based networking stack PliantDb is using and is wrapping up a few last requests before moving on.

Complexities of supporting a rich type system over a network

The second major battle was something I hadn't fully comprehended when I started: How do you deal with types in a safe way while only exchanging bytes between a client and server? In my head, I knew serde was going to be a big part of the solution, but I didn't quite realize the levels of indirection I was going to need.

Let's take a look at an example:

db
 .view::<ShapesByNumberOfSides>()
 .with_key(3)
 .query_with_docs()
 .await?

This code could be code running on a client talking to a remote database or using PliantDb locally in a form akin to SQLite. This is meant to be one of the selling points of PliantDb, but to make this work, it's rather tricky. Here's how it works:

db.view::<ShapesByNumberOfSides>() returns a View, which acts as a builder for accessing a view. with_key(3) sets the key field of the View to QueryKey::Matches(3_u32). Finally, query_with_docs() simply calls Connection::query_with_docs().

Let's look at it from the Client's perspective. Following it along the route to the server will show the complexities I had to navigate and the power of Rust each step along the way.

On the client, db is a RemoteDatabase<Schema>. This implements connection, and converts the parameters Option<QueryKey<u32>> and AccessPolicy into Request::Database { database: "dbname", request: DatabaseRequest::Query { view: "view-name", key: Option<QueryKey<Vec<u8>>>, access_policy, with_docs: true } }.

Once it has it in this enum, it can be sent via QUIC or WebSockets across the wire. It will receive that request on the server, but at the layer that it's receiving the request, the server doesn't have any generic types in its signatures. So, we must design a way to talk to our Storage<Schema> without the <Schema> part!

This is done using an internal trait OpenDatabase, which the server implements for Storage<Schema>. This is the first layer, allowing the network code to invoke query_with_docs which takes the view's name rather than the type of the view. It then looks up an abstracted version of that view which automatically serializes and deserializes across its access points. These are the same conversion mechanisms that were used when initially creating the ViewEntries when indexing these views.

Finally, once the response is retrieved, the journey happens in reverse, going through Response::Database(DatabaseResponse::ViewMappingsWithDocs()).

To me, it's incredible the lengths that you can go in Rust to allow transparent handling of native types in user-code. One of my initial goals of PlaintDB has been achieved: writing local and remote code using async/await without needing to care whether the data is local or not.

Multi-tasking is challenging sometimes, even with Rust

Ultimately, the goal wasn't to provide a WebSocket implementation in the first pass of the server. I had a goal to present at the Game Dev Meetup this past Saturday, and I really wanted to have a working client/server, but Daxpedda and I were having troubles with some of our code. It was becoming tough to isolate whether the networking code was to blame or whether PlaintDB was to blame.

That's when I decided to add WebSockets. I was pretty confident I wanted them long-term anyways. Additionally, it was to give me a way to use a protocol I was familiar with and wasn't very complicated to verify the server's functionality. I found bugs with my code in PliantDb pretty quickly, but I was having two peculiar issues.

First, I was becoming more and more confident that the channel library Daxpedda and I fell in love with, flume, was misbehaving, but I couldn't seem to reproduce it outside of the massive PliantDb codebase. I finally called up Daxpedda on Discord and screen shared my debugging session, showing him how the tests succeeded if I retained a channel. If I allowed the sender to drop after successfully sending, sometimes the tests would fail. He agreed, something was odd. It took me a while, but I finally whittled it down to about 30 lines of code and reported the issue. In an amazingly quick fashion, the maintainer fixed the issue and released an update. And for the record, I still fully love and recommend this library if you're mixing async and non-async code using channels. It's a wonderful implementation.

The second issue was that every time I ran my unit test suite as a whole, I would sometimes succeed, but more often than not, after a random number of tests, all of the rest would fail. This ended up being my own stupidity. When I was writing the unit tests for the client, I thought to myself, "If I create one shared server, I can test the server differently by running each client test suite on a single server in its own database." I liked the idea, but I didn't think about the problem of achieving it.

Pro-tip: #[tokio::test] creates a unique tokio runtime for each test

When spawning the server, I was spawning it in a runtime that would dutifully get destroyed once the test completed. Whatever other tests happened to finish before the server was destroyed would get green marks, and the rest would start getting connection refusals.

Of course, this manifested itself in fun ways to my code -- channels just disconnecting all of a sudden, and often I wouldn't have any errors displaying anywhere!

So, remember: when writing async tests, if you spawn into your async runtime, it will not last beyond the current unit test. In this case, I decided to move that style of test to an integration-style test, to keep the "unit" nature more accurate.

One of the neat results of using the same trait to implement the database interface for Client/Server/Local is that a common unit testing suite was able to be written and reused:

pliantdb-core::test_util::define_connection_test_suite!
pliantdb-local::tests
pliantdb-server::tests
pliantdb-client::tests::websockets
pliantdb-client::tests::pliant (the QUIC-based connection tests)

This means that as more database functionality is added, it can be added to the common test suite and automatically tested across all layers of PliantDb. Once clustering support is added, the same suite will be tested there also.

Being Productive

This morning, I decided I wanted to write an example for the PliantDb server. At the end of the day, I wasn't happy with the type of interaction-less result I could make with the current functionality, so I added reduce_grouped(). I marveled with Daxpedda in Discord after looking at the diff: 19 files,+494,-90. It took me about an hour from the point of introducing my first compilation issue to getting it compiling. I added a couple of unit tests into the existing suite, and it all worked.

This is a regular occurrence with Rust and me. Yes, I can tell you about my experiences of having to debug multithreading issues. I can tell you they're just as painful as they are outside of Rust. However, the building blocks of the language itself encourage a design that helps eliminate so many types of runtime issues that you can encounter. You can still have errors in your logic, but I am finding that more often than not: when it compiles, it works.

Let's look at the stats of PliantDb as of tonight, using tokei:

===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Shell                   2           29           24            3            2
 TOML                    8          249          223            1           25
 YAML                    1            8            6            2            0
-------------------------------------------------------------------------------
 Markdown                1           77            0           49           28
 |- BASH                 1            1            1            0            0
 |- Rust                 1           54           40            3           11
 (Total)                            132           41           52           39
-------------------------------------------------------------------------------
 Rust                   61         7974         6699          221         1054
 |- Markdown            39          593            0          563           30
 (Total)                           8567         6699          784         1084
===============================================================================
 Total                  73         8337         6952          276         1109
===============================================================================

According to Tokei, I've written 6699 lines of Rust code in this project. The first day of work was Friday, March 19, which is around ~3.5 weeks for those stats.

I have a pretty-well-tested codebase that I'm almost ready to integrate into Cosmic Verge. While I have plenty of work remaining on PliantDb, I'm excited at the prospect of replacing PostgreSQL and Redis in Cosmic Verge potentially next month.

Interested in PliantDb's development?