Looking into Corax’s posting lists: Part III
We looked into the internal of Corax’s posting list and in the last post I mentioned that we have a problem with the Baseline of the page.We are by no means the first people to work with posting lists,...
View ArticleFight for every byte it takes: Storing raw numbers
I write databases for a living, which means that I’m thinking a lot about persistence. Here is a fun challenge that we went through recently. We have the need to store a list of keys and values and...
View ArticleFight for every byte it takes: Variable size data
In my previous post, we stored keys and values as raw numbers inside the 8KB page. That was simple, but wasteful. For many scenarios, we are never going to need to utilize the full 8 bytes range for a...
View ArticleFight for every byte it takes: Nibbling at the costs
In my last post we implemented variable-sized encoding to be able to pack even more data into the page. We were able to achieve 40% better density because of that. This is pretty awesome, but we would...
View ArticleFight for every byte it takes: Fitting 64 values in 4 bits
Moving to nibble encoding gave us a measurable improvement in the density of the entries in the page. The problem is that we pretty much run out of room to do so. We are currently using a byte per...
View ArticleFight for every byte it takes: Optimizing the encoding process
In my previous post, I showed how we use the nibble offload approach to store the size of entries in space that would otherwise be unused. My goal in that post was clarity, so I tried to make sure that...
View ArticleFight for every byte it takes: Decoding the entries
In this series so far, we reduced the storage cost of key/value lookups by a lot. And in the last post we optimized the process of encoding the keys and values significantly. This is great, but the...
View ArticleInteger compression: delta encoding + variable size integers
If you are building a database, the need to work with a list of numbers is commonplace. For example, when building an index, we may want to store all the ids of documents of orders from Europe.You can...
View ArticleInteger compression: Understanding Simd Compression by Lemire
In the previous post, I showed how you can use integer compression using variable-size integers. That is a really straightforward approach for integer compression, but it isn’t ideal. To start with, it...
View ArticleInteger compression: Using SIMD bit packing in practice
In the last post, I talked about how the simdcomp library is actually just doing bit-packing. Given a set of numbers, it will put the first N bits in a packed format. That is all. On its own, that...
View ArticleInteger compression: SIMD bit packing and unusual usages
I talked a bit before about the nature of bit packing and how the simdcomp library isn’t actually doing compression. Why do I care about that, then?Because the simdcomp library provides a very useful...
View ArticleInteger compression: Understanding FastPFor
The FastPFor is an integer compression algorithm that was published in 2012 initially. You can read the paper about it here: Decoding billions of integers per second through vectorization.I’ve run into...
View ArticleInteger compression: The FastPFor code
As I mentioned, I spent quite a lot of time trying to understand the mechanism behind how the FastPFor algorithm works. A large part of that was the fact that I didn’t initially distinguish the...
View ArticleInteger compression: Porting simdcomp to C#
In the code of the simdcomp library there is a 25KLOC file that does evil things to SIMD registers to get bit packing to work. When I looked at the code the first few dozen times, I had a strong desire...
View ArticleInteger compression: Adapting FastPFor to RavenDB
In this series so far, I explored several ways that we can implement integer compression. I focused on the FastPFor algorithm and dove deeply into how it works. In the last post, I showed how we can...
View ArticleInteger compression: Implementing FastPFor encoding in C#
In the previous post I outlined the requirements we have for FastPFor in RavenDB. Now I want to dig into the actual implementation. Here is the shape of the class in question:The process starts when we...
View ArticleInteger compression: Implementing FastPFor decoding in C#
In the previous post, I discussed FastPFor encoding, now I’m going to focus on how we deal with decoding. Here is the decode struct:Note that this is a struct for performance reasons. We expect that...
View ArticleInteger compression: FastPFor in C#, results
After this saga, I wanted to close the series with some numbers about the impact of this algorithm.If you’ll recall, I started this whole series discussing variable-sized integers. I was using this...
View ArticleGenerating sequential numbers in a distributed manner
On its face, we have a simple requirement:Generate sequential numbersEnsure that there can be no gapsDo that in a distributed mannerGenerating the next number in the sequence is literally as simple as...
View ArticleProduction postmortem: ENOMEM when trying to free memory
We got a support call from a client, in the early hours of the morning, they were getting out of memory errors from their database and were understandably perturb by that. They are running on a cloud...
View Article