Skip to content

Implement Bit-sliced index for CRoaring #435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Smallhi opened this issue Feb 11, 2023 · 5 comments
Open

Implement Bit-sliced index for CRoaring #435

Smallhi opened this issue Feb 11, 2023 · 5 comments

Comments

@Smallhi
Copy link

Smallhi commented Feb 11, 2023

There are implements for bit slice index in java and go. Do we need in C/C++?

I think implementing Bit-sliced index for CRoaring include these tasks:

  • support mutable Bit-sliced index in C
  • support RangeBitmap in C
  • A range Index demo for postgres using RangeBitmap in C
  • A mutable bit-sliced index demo for postgres using bit-sliced index in C
  • A range index demo for Apache Parequet file-format in C/C++
  • A mutable bit-sliced index demo for Apache Parequet file-format using Bit-sliced index in C

@lemire we've implemented mutable Bit-sliced index in C and using it for postgres. Due to there are two implements for bit-sliced index, (BSI and RangeBitmap), one is mutable but the performance is poor, the other is immutable ,but the performance is great. Could we make a decision or re-design it ?

cc @ richardstartin

refer to :

  1. https://richardstartin.github.io/posts/range-bitmap-index
  2. https://richardstartin.github.io/posts/range-predicates
  3. https://github.com/RoaringBitmap/RoaringBitmap/tree/master/bsi
  4. could you merge this lib to RoraringBtimap? lemire/BitSliceIndex#1
@lemire
Copy link
Member

lemire commented Feb 11, 2023

Yes, a pull request to provide this functionality in C is invited.

@patelprateek
Copy link

@Smallhi : I alos have this use case , any idea when would we be able to merge this implementation.
Also i have a noob question : Why is bit sliced index dependent on Rangebitmap implementation ? What exactly is the difference between range index and bit slice index ?

@goldenbean
Copy link

goldenbean commented Mar 12, 2025

@Smallhi I just made my own header-only BSI impl based on CRoaring, and will upload whole project soon.... :) you could check it out if you still need it. https://github.com/goldenbean/BSI/blob/main/roaring64bsi.hh

@Smallhi
Copy link
Author

Smallhi commented Mar 12, 2025

@goldenbean That's Great!And two more suggestion:

  1. I think that's would be good if the layout had magic Number for the puposre of upgrading.

you can learn from here https://github.com/RoaringBitmap/CRoaring/blob/master/src/roaring_array.c#L551

// This function is endian-sensitive.
size_t ra_portable_serialize(const roaring_array_t *ra, char *buf) {
char *initbuf = buf;
uint32_t startOffset = 0;
bool hasrun = ra_has_run_container(ra);
if (hasrun) {
uint32_t cookie = SERIAL_COOKIE | ((uint32_t)(ra->size - 1) << 16);
memcpy(buf, &cookie, sizeof(cookie));
buf += sizeof(cookie);
uint32_t s = (ra->size + 7) / 8;
uint8_t *bitmapOfRunContainers = (uint8_t *)roaring_calloc(s, 1);
assert(bitmapOfRunContainers != NULL); // todo: handle
for (int32_t i = 0; i < ra->size; ++i) {
if (get_container_type(ra->containers[i], ra->typecodes[i]) ==
RUN_CONTAINER_TYPE) {
bitmapOfRunContainers[i / 8] |= (1 << (i % 8));
}
}

....

  1. try your best to use inplace not new bitmap object. That improve performace greatly in postgres Aggregation operation.

@goldenbean
Copy link

the suggestions sound great for me. I will take a close look at this and consolidate the changes before uploading whole project to github.

@goldenbean That's Great! I think that's would be good if the layout had magic Number for the puposre of upgrading. you can learn from here https://github.com/RoaringBitmap/CRoaring/blob/master/src/roaring_array.c#L551

// This function is endian-sensitive. size_t ra_portable_serialize(const roaring_array_t *ra, char *buf) { char *initbuf = buf; uint32_t startOffset = 0; bool hasrun = ra_has_run_container(ra); if (hasrun) { uint32_t cookie = SERIAL_COOKIE | ((uint32_t)(ra->size - 1) << 16); memcpy(buf, &cookie, sizeof(cookie)); buf += sizeof(cookie); uint32_t s = (ra->size + 7) / 8; uint8_t *bitmapOfRunContainers = (uint8_t *)roaring_calloc(s, 1); assert(bitmapOfRunContainers != NULL); // todo: handle for (int32_t i = 0; i < ra->size; ++i) { if (get_container_type(ra->containers[i], ra->typecodes[i]) == RUN_CONTAINER_TYPE) { bitmapOfRunContainers[i / 8] |= (1 << (i % 8)); } }

....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants