Skip to content

fix: avoid signed overflow in truncate integer math#755

Open
fallintoplace wants to merge 1 commit into
apache:mainfrom
fallintoplace:fix/truncate-integer-overflow
Open

fix: avoid signed overflow in truncate integer math#755
fallintoplace wants to merge 1 commit into
apache:mainfrom
fallintoplace:fix/truncate-integer-overflow

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • compute integer truncate remainders with a wider signed type to avoid signed overflow
  • preserve modulo behavior for the min-value boundary where the mathematical result falls below the source type
  • add regression coverage for large positive widths and int/long minimum values

Validation

  • cmake -S . -B build-truncate -G Ninja -DICEBERG_BUILD_BUNDLE=OFF -DICEBERG_BUILD_REST=OFF -DICEBERG_BUILD_SHARED=OFF -DICEBERG_ENABLE_UBSAN=ON
  • UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 ctest --test-dir build-truncate -R util_test --output-on-failure

template <typename T>
requires std::is_same_v<T, int32_t> || std::is_same_v<T, int64_t>
static inline T TruncateInteger(T v, int32_t W) {
return v - (((v % W) + W) % W);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was from the spec:

The remainder, v % W, must be positive. For languages where % can produce negative values, the correct truncate function is: v - (((v % W) + W) % W)

I don't think we need to use int128_t since W is alway a int32_t and greater than 0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the formula is from the spec. The issue is not the spec math, but evaluating it with signed C++ integer types. For int32_t, (v % W) + W can overflow when W is large. For int64_t, the final v - remainder can overflow at INT64_MIN. The wider type keeps the spec formula while avoiding signed UB.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems deviate from the Java implementation with some very large values. Perhaps it is worth raising a discussion in the dev@iceberg.apache.org?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this on Compiler Explorer. For W > INT32_MAX / 2, (v % W) + W can overflow, causing the remainder to become negative, which violates the spec that the remainder must be positive.

For values close to std::numeric_limits<int64_t>::min(), our current implementation produces the same result as the modified version, but it triggers

runtime error: signed integer overflow

when compiled with -fsanitize=undefined.

I think Java has a similar issue with the first int32_t overflow case due to wraparound arithmetic, no idea if it suffers from the signed integer overflow problem that C++ has(maybe not).

If we change this, I'd suggest keep v - (((v % W) + W) % W) as a comment and clarify why the current form.

I agree this is worth discussing on the dev mailing list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very experienced about this process, how do we proceed?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants