Ayende @ Rahien

↧

Image may be NSFW.
Clik here to view.

Looking into Corax’s posting lists: Part III

April 17, 2023, 5:00 am

We looked into the internal of Corax’s posting list and in the last post I mentioned that we have a problem with the Baseline of the page.We are by no means the first people to work with posting lists,...

View Article

Fight for every byte it takes: Storing raw numbers

April 24, 2023, 5:00 am

I write databases for a living, which means that I’m thinking a lot about persistence. Here is a fun challenge that we went through recently. We have the need to store a list of keys and values and...

View Article

Fight for every byte it takes: Variable size data

April 25, 2023, 5:00 am

In my previous post, we stored keys and values as raw numbers inside the 8KB page. That was simple, but wasteful. For many scenarios, we are never going to need to utilize the full 8 bytes range for a...

View Article

Fight for every byte it takes: Nibbling at the costs

April 26, 2023, 5:00 am

In my last post we implemented variable-sized encoding to be able to pack even more data into the page. We were able to achieve 40% better density because of that. This is pretty awesome, but we would...

View Article

Image may be NSFW.
Clik here to view.

Fight for every byte it takes: Fitting 64 values in 4 bits

April 27, 2023, 5:00 am

Moving to nibble encoding gave us a measurable improvement in the density of the entries in the page. The problem is that we pretty much run out of room to do so. We are currently using a byte per...

View Article

Fight for every byte it takes: Optimizing the encoding process

April 28, 2023, 5:00 am

In my previous post, I showed how we use the nibble offload approach to store the size of entries in space that would otherwise be unused. My goal in that post was clarity, so I tried to make sure that...

View Article

Image may be NSFW.
Clik here to view.

Fight for every byte it takes: Decoding the entries

May 1, 2023, 5:00 am

In this series so far, we reduced the storage cost of key/value lookups by a lot. And in the last post we optimized the process of encoding the keys and values significantly. This is great, but the...

View Article

Integer compression: delta encoding + variable size integers

June 6, 2023, 5:00 am

If you are building a database, the need to work with a list of numbers is commonplace. For example, when building an index, we may want to store all the ids of documents of orders from Europe.You can...

View Article

Image may be NSFW.
Clik here to view.

Integer compression: Understanding Simd Compression by Lemire

June 7, 2023, 5:00 am

In the previous post, I showed how you can use integer compression using variable-size integers. That is a really straightforward approach for integer compression, but it isn’t ideal. To start with, it...

View Article

Image may be NSFW.
Clik here to view.

Integer compression: Using SIMD bit packing in practice

June 8, 2023, 5:00 am

In the last post, I talked about how the simdcomp library is actually just doing bit-packing. Given a set of numbers, it will put the first N bits in a packed format. That is all. On its own, that...

View Article

Integer compression: SIMD bit packing and unusual usages

June 12, 2023, 5:00 am

I talked a bit before about the nature of bit packing and how the simdcomp library isn’t actually doing compression. Why do I care about that, then?Because the simdcomp library provides a very useful...

View Article

Image may be NSFW.
Clik here to view.

Integer compression: Understanding FastPFor

June 13, 2023, 5:00 am

The FastPFor is an integer compression algorithm that was published in 2012 initially. You can read the paper about it here: Decoding billions of integers per second through vectorization.I’ve run into...

View Article

Integer compression: The FastPFor code

June 14, 2023, 5:00 am

As I mentioned, I spent quite a lot of time trying to understand the mechanism behind how the FastPFor algorithm works. A large part of that was the fact that I didn’t initially distinguish the...

View Article

Image may be NSFW.
Clik here to view.

Integer compression: Porting simdcomp to C#

June 15, 2023, 5:00 am

In the code of the simdcomp library there is a 25KLOC file that does evil things to SIMD registers to get bit packing to work. When I looked at the code the first few dozen times, I had a strong desire...

View Article

Image may be NSFW.
Clik here to view.

Integer compression: Adapting FastPFor to RavenDB

June 16, 2023, 5:00 am

In this series so far, I explored several ways that we can implement integer compression. I focused on the FastPFor algorithm and dove deeply into how it works. In the last post, I showed how we can...

View Article

Image may be NSFW.
Clik here to view.

Integer compression: Implementing FastPFor encoding in C#

June 19, 2023, 5:00 am

In the previous post I outlined the requirements we have for FastPFor in RavenDB. Now I want to dig into the actual implementation. Here is the shape of the class in question:The process starts when we...

View Article

Image may be NSFW.
Clik here to view.

Integer compression: Implementing FastPFor decoding in C#

June 20, 2023, 5:00 am

In the previous post, I discussed FastPFor encoding, now I’m going to focus on how we deal with decoding. Here is the decode struct:Note that this is a struct for performance reasons. We expect that...

View Article

Integer compression: FastPFor in C#, results

June 21, 2023, 5:00 am

After this saga, I wanted to close the series with some numbers about the impact of this algorithm.If you’ll recall, I started this whole series discussing variable-sized integers. I was using this...

View Article

Generating sequential numbers in a distributed manner

June 26, 2023, 5:00 am

On its face, we have a simple requirement:Generate sequential numbersEnsure that there can be no gapsDo that in a distributed mannerGenerating the next number in the sequence is literally as simple as...

View Article

Production postmortem: ENOMEM when trying to free memory

July 3, 2023, 5:00 am

We got a support call from a client, in the early hours of the morning, they were getting out of memory errors from their database and were understandably perturb by that. They are running on a cloud...

View Article