-
Notifications
You must be signed in to change notification settings - Fork 89
PK optimizations #5314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
blp
wants to merge
7
commits into
main
Choose a base branch
from
pk-optimizations
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
PK optimizations #5314
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d63901f to
b75273c
Compare
Member
Author
|
In fact I see that clippy agrees, it appended a commit that changes some pieces of code to use if-let chains. |
b75273c to
cb780ff
Compare
Member
Author
|
I pushed a revision that adds metrics for the optimizations that this PR introduces, so that one can tell whether it should be a performance benefit. |
cb780ff to
968824b
Compare
Signed-off-by: Ben Pfaff <blp@feldera.com>
Until now, in dbsp::storage::file::writer, the ColumnWriter, DataBlockBuilder, and IndexBlockBuilder haven't needed to be parameterized by their key or auxdata types. An upcoming commit will need that, so this commit prepares for it by parameterizing them. This shouldn't affect the storage format. Signed-off-by: Ben Pfaff <blp@feldera.com>
…ded. Columns 1 (and greater) in a layer file are essentially B-trees over tuples (R,K), where R is the row number and K is the key. Each row in column 0 includes as auxiliary data a range R0..R1 of rows in column 1 (the "row group"), so R in the B-tree allows that range to be looked up. Storing the minimum and maximum keys of child blocks in the index allows efficient lookup of keys in row groups that span multiple data blocks. However, when a row group starts at the beginning of a data block, there's little value in including that key in its index block, because it won't usually help us to avoid reading the data block. It only helps if we're searching for a key less than the minimum, which is rarely the case; more often, we're iterating through the keys, and usually we're doing nested iteration through the column-0 keys too, so that we'd typically have to read the data block for another one of those anyhow. Similarly, when a row group ends at the end of a data block, there's little value in including its last key in the index block. This can be a significant optimization when keys in column 1 are large. In layer files where each row in column 0 has exactly one row in column 1, this optimization means that the column-0 index will have no bounds at all. Other layer files will benefit less. Signed-off-by: Ben Pfaff <blp@feldera.com>
Until now, any row in column 0 had to have at least one row in column 1. This commit allows there to instead be none. Signed-off-by: Ben Pfaff <blp@feldera.com>
Indexed Z-sets are commonly used for updating indexes on primary keys. For this purpose, each key usually has one value (for insertions and deletions) or two values (for updates). Writing this small number of values to a second column maintained separately in the layer file causes I/O to additional data blocks. This commit introduces a new way to store these indexed Z-sets. With this commit, when a key has just one or two values, those values are written inline with the key, which reduces the number of data blocks that need to be fetched to read a key along with its values. This should improve performance. This commit does drop two performance-oriented features from FileIndexedZSet: push_cursor() and fetch(). I don't think that either of these have really been used without explicit enabling via DevTweaks. If they prove important, then we can figure out how to re-implement them with the new format. Signed-off-by: Ben Pfaff <blp@feldera.com>
Add circuit profile metadata for inlined values and omitted bounds. Signed-off-by: Ben Pfaff <blp@feldera.com>
968824b to
3ed076d
Compare
Member
Author
|
I pushed a revision that adds the metrics to the circuit profile instead. Probably a better fit there (thanks @ryzhyk for the suggestion). |
Signed-off-by: feldera-bot <feldera-bot@feldera.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
DBSP core
Related to the core DBSP library
performance
rust
Pull requests that update Rust code
storage
Persistence for internal state in DBSP operators
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements two optimizations to indexed Z-set storage.
The first, which is implemented in the final commit, is mostly above the level of the storage file implementation. It is a change to the indexed Z set implementation. In this optimization, if a key K has fewer than a threshold number of (V,R) pairs (or perhaps less than a threshold number of bytes of (V,R) pairs), then we write those (V,R) pairs as auxdata in column 0. Then we avoid any reads to column 1 at all for visiting those keys and their values.
The only change needed to the storage file implementation for this is to allow a key in column 0 to have zero values in column 1 (currently this will panic), which is implemented in the second-to-last commit.
The second, implemented by the third commit, is at the level of the storage file implementation. There is some confusion here about what we really have in column 1. It is essentially a btree over 2-tuples (R,K), where R is the row number in column 1. Each row in column 0 includes an accompanying range of column-1 row numbers for its values, so R allows that range to be looked up. K is the key for column 1, which for an indexed Z-set is the V from each (K,V,R) tuple. Storing it in the b-tree allows efficient searching by value, which is something that the DBSP cursor supports.
However, when there's only one column-1 value for each column-0 key, putting the column-1 values in the b-tree don't help. That means that we can essentially reduce that b-tree's keys from 2-tuples (R,K) into just R.
(Note that R isn't stored on a per-row basis even now, we use a much more efficient representation, since it's just a count.)
Storage format compatibility
With this PR, Feldera can read storage written by earlier versions, but storage written after this PR cannot be read by earlier versions of Feldera.