Skip to content

Remove allocation in count_min_sketch::get_hashes#506

Open
cv4g wants to merge 1 commit into
apache:masterfrom
cv4g:countmin-avoid-allocations-in-get_hashes
Open

Remove allocation in count_min_sketch::get_hashes#506
cv4g wants to merge 1 commit into
apache:masterfrom
cv4g:countmin-avoid-allocations-in-get_hashes

Conversation

@cv4g

@cv4g cv4g commented Jul 2, 2026

Copy link
Copy Markdown

Hi,

the countmin sketch is in use at ClickHouse and we noticed that there is a vector allocated on the hot path in count_min_sketch::get_hashes. To avoid it one could either inline it's body into the call sites or modify it to take a lambda that is invoked on each hash. Would you be interested in merging one of both approaches?

In a microbenchmark the change seems to yield an improvement for update between factor 1.1 and 1.4 and for estimate between factor 1.7 and 3.5:

   Group                    Time       CPU
  ─────────────────────  ────────  ────────
   UpdateUInt64           1.440x    1.440x
  ─────────────────────  ────────  ────────
   UpdateStringBytes      1.118x    1.119x
  ─────────────────────  ────────  ────────
   EstimateUInt64         3.548x    3.549x
  ─────────────────────  ────────  ────────
   EstimateStringBytes    1.764x    1.764x

The numbers were produced by a google-benchmark setup in #507.

Before:

Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
BM_CountMinUpdateUInt64/1024              19585 ns        19606 ns        35834 bytes_per_second=398.467M/s items_per_second=52.2279M/s
BM_CountMinUpdateUInt64/4096              78486 ns        78515 ns         9017 bytes_per_second=398.011M/s items_per_second=52.1681M/s
BM_CountMinUpdateUInt64/32768            629480 ns       629521 ns         1103 bytes_per_second=397.127M/s items_per_second=52.0523M/s
BM_CountMinUpdateUInt64/65536           1252762 ns      1252815 ns          554 bytes_per_second=399.101M/s items_per_second=52.311M/s
BM_CountMinUpdateStringBytes/1024         34527 ns        34571 ns        20270 bytes_per_second=722.64M/s items_per_second=29.6201M/s
BM_CountMinUpdateStringBytes/4096        137133 ns       137181 ns         5092 bytes_per_second=739.661M/s items_per_second=29.8584M/s
BM_CountMinUpdateStringBytes/32768      1120784 ns      1120869 ns          632 bytes_per_second=749.194M/s items_per_second=29.2345M/s
BM_CountMinUpdateStringBytes/65536      2250362 ns      2250433 ns          310 bytes_per_second=759.885M/s items_per_second=29.1215M/s
BM_CountMinEstimateUInt64/1024            37893 ns        37893 ns        18530 bytes_per_second=206.172M/s items_per_second=27.0233M/s
BM_CountMinEstimateUInt64/4096           150433 ns       150432 ns         4647 bytes_per_second=207.734M/s items_per_second=27.2282M/s
BM_CountMinEstimateUInt64/32768         1216380 ns      1216377 ns          581 bytes_per_second=205.528M/s items_per_second=26.939M/s
BM_CountMinEstimateUInt64/65536         2419302 ns      2419296 ns          292 bytes_per_second=206.672M/s items_per_second=27.0889M/s
BM_CountMinEstimateStringBytes/1024       54027 ns        54026 ns        12960 bytes_per_second=462.411M/s items_per_second=18.9537M/s
BM_CountMinEstimateStringBytes/4096      215406 ns       215404 ns         3231 bytes_per_second=471.054M/s items_per_second=19.0154M/s
BM_CountMinEstimateStringBytes/32768    1762087 ns      1762084 ns          398 bytes_per_second=476.566M/s items_per_second=18.5962M/s
BM_CountMinEstimateStringBytes/65536    3541045 ns      3541037 ns          197 bytes_per_second=482.929M/s items_per_second=18.5076M/s

After:

Benchmark                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------
BM_CountMinUpdateUInt64/1024              13339 ns        13362 ns        51549 bytes_per_second=584.679M/s items_per_second=76.635M/s
BM_CountMinUpdateUInt64/4096              55108 ns        55126 ns        12651 bytes_per_second=566.888M/s items_per_second=74.3031M/s
BM_CountMinUpdateUInt64/32768            436977 ns       436969 ns         1597 bytes_per_second=572.123M/s items_per_second=74.9893M/s
BM_CountMinUpdateUInt64/65536            877659 ns       877601 ns          808 bytes_per_second=569.735M/s items_per_second=74.6763M/s
BM_CountMinUpdateStringBytes/1024         30439 ns        30472 ns        23111 bytes_per_second=819.845M/s items_per_second=33.6044M/s
BM_CountMinUpdateStringBytes/4096        123387 ns       123418 ns         5643 bytes_per_second=822.14M/s items_per_second=33.188M/s
BM_CountMinUpdateStringBytes/32768      1001801 ns      1001729 ns          711 bytes_per_second=838.299M/s items_per_second=32.7115M/s
BM_CountMinUpdateStringBytes/65536      2029662 ns      2028176 ns          351 bytes_per_second=843.157M/s items_per_second=32.3128M/s
BM_CountMinEstimateUInt64/1024            10599 ns        10598 ns        64017 bytes_per_second=737.185M/s items_per_second=96.6243M/s
BM_CountMinEstimateUInt64/4096            43193 ns        43185 ns        16224 bytes_per_second=723.625M/s items_per_second=94.8469M/s
BM_CountMinEstimateUInt64/32768          339303 ns       339262 ns         2040 bytes_per_second=736.894M/s items_per_second=96.5861M/s
BM_CountMinEstimateUInt64/65536          681292 ns       681216 ns         1031 bytes_per_second=733.981M/s items_per_second=96.2044M/s
BM_CountMinEstimateStringBytes/1024       30358 ns        30355 ns        23095 bytes_per_second=823.022M/s items_per_second=33.7346M/s
BM_CountMinEstimateStringBytes/4096      122105 ns       122090 ns         5731 bytes_per_second=831.082M/s items_per_second=33.5489M/s
BM_CountMinEstimateStringBytes/32768    1003445 ns      1003328 ns          691 bytes_per_second=836.963M/s items_per_second=32.6593M/s
BM_CountMinEstimateStringBytes/65536    2016679 ns      2016441 ns          345 bytes_per_second=848.064M/s items_per_second=32.5008M/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant