DataForge is a modern C++20 header-only library for building declarative, composable data transformation pipelines.
It provides both push (output) and pull (input) iterator-based interfaces for applying arbitrary chains of conversions, including encoding, decoding, compression, encryption, hashing, and Unicode operations.
Transformations are described using quarks — small, composable objects that can be chained together with the | operator.
#include "dataforge/quark_push_iterator.hpp"
#include "dataforge/quark_pull_iterator.hpp"
#include "dataforge/base_xx/base64.hpp"
using namespace dataforge;
std::string input = "Hello, World!";
std::string base64_result;
// Create a pipeline: input bytes → Base64 encoding → output
auto push_it = quark_push_iterator(int8 | base64, std::back_inserter(base64_result));
*push_it = input;
push_it.finish();
std::cout << "Encoded: " << base64_result << std::endl; // Output: SGVsbG8sIFdvcmxkIQ==
// Reverse the process: Base64 → decoded bytes
std::string decoded_result;
auto pull_it = quark_pull_iterator(base64 | int8, base64_result);
for (auto span = *pull_it; !span.empty(); span = *++pull_it) {
std::copy(span.begin(), span.end(), std::back_inserter(decoded_result));
}
std::cout << "Decoded: " << decoded_result << std::endl; // Output: Hello, World!More complex pipelines can chain multiple transformations:
// Example: text → UTF-8 → compression → encryption → Base64
auto pipeline = utf8 | deflated() | aes(128, key) | base64;📁 See the examples/ folder for complete working examples including MD5 hashing, AES encryption, and more advanced use cases.
🧪 For comprehensive algorithm coverage and advanced pipeline patterns, explore the tests/ directory — it contains hundreds of real-world examples demonstrating every supported algorithm, from basic CRC checksums to complex multi-stage encryption pipelines.
DataForge combines multiple types of data transformations in one consistent framework, unlike other libraries that cover only subsets of functionality.
| Feature / Capability | DataForge | Crypto++ | Boost | ICU | range-v3 |
|---|---|---|---|---|---|
| Integer ↔ Bytes + Endian | ✅ | ❌ | ❌ | ❌ | ❌ |
| base16/32/58/64/ascii85/z85 | ✅ | ✅ | ❌ | ❌ | ❌ |
| Custom Base 1 < N < 256 | ✅ | ❌ | ❌ | ❌ | ❌ |
| Checksums (crc, adler, bsd) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Hashes (MD, SHA, Blake, etc) | ✅ | ✅ | ❌ | ❌ | ❌ |
| Encryption/Decryption | ✅ | ✅ | ❌ | ❌ | ❌ |
| Compression / Decompression | ✅ | ❌ | ❌ | ❌ | ❌ |
| Unicode Conversions (UTF) | ✅ | ❌ | ❌ | ✅ | ❌ |
| ICU Charset Conversions | ✅ | ❌ | ❌ | ✅ | ❌ |
| Grapheme Breaking | ✅ | ❌ | ❌ | ✅ | ❌ |
| Header-only | ✅ | ❌ | ✅ | ❌ | ✅ |
| Push/Pull iterator pipelines | ✅ | ❌ | ✅ (filters) | ❌ | ✅ |
Key point: DataForge allows chaining transformations like integer → endian → compression → encryption → base encoding in one declarative pipeline.
- Convert sequences of integers of various sizes to/from byte sequences.
- Configurable little-endian or big-endian representation.
- Base16, Base32, Base58, Base64, ASCII85, Z85.
- Arbitrary base conversion with
1 < N < 256and a custom alphabet — effectively a positional numeral system transformation.
- BSD checksum
- Adler32
- CRC8, CRC16, CRC32, CRC64
- MD2, MD4, MD5, MD6
- RIPEMD, Tiger
- SHA1, SHA2, SHA3
- Belt, GOST, Streebog, Whirlpool, Blake
- RC2, RC4, RC5, RC6
- DES, AES, Blowfish
- Belt, Magma
- Deflate
- Bzip2
- LZ4
- LZMA, LZMA2
(requires corresponding external libraries)
- UTF-7, UTF-8, UTF-16, UTF-32
- Any encoding supported by the ICU library
(requires ICU library)
- Splits a Unicode string into graphemes according to the Unicode Standard.
The library itself is header-only — nothing needs to be built for use in your projects.
However, the test suite depends on external libraries (zlib, icu, bzip2, lz4, liblzma, gtest), which are managed via vcpkg.
- Install vcpkg anywhere on your system (if not already installed).
- Set the environment variable
VCPKG_ROOTto the location of your vcpkg installation.- Example (Windows PowerShell):
setx VCPKG_ROOT "C:\dev\vcpkg"
- Example (Windows PowerShell):
- Open the Visual Studio solution for tests and build it.
- On the first build:
- The project will automatically:
- Check that
VCPKG_ROOTis set. - Run:
installing all required dependencies from
$(VCPKG_ROOT)\vcpkg.exe install
vcpkg.jsoninto a localvcpkg_installedfolder. - Configure
INCLUDEandLIBpaths to use these locally installed dependencies.
- Check that
- The project will automatically:
- On the first build:
- Run the tests from Visual Studio.
No global vcpkg integration (vcpkg integrate install) is required — everything is local to the repository.
The SHA-1 and SHA-2 families support compile-time selectable hardware-accelerated block processing. All variants always have a portable scalar fallback, so acceleration never changes results — only throughput.
A single macro DATAFORGE_ACCEL_PROFILE selects the ISA tier for all
accelerated algorithms. Set it via -DDATAFORGE_ACCEL_PROFILE=<value>:
| Value | Name | Description |
|---|---|---|
-1 |
AUTO (default) |
Runtime CPU detection — picks the best backend at first use via CPUID / getauxval, cached for the lifetime of the process. |
0 |
SCALAR |
Pure C++ — no ISA intrinsics. Compiles and runs anywhere. |
1 |
X86_SHA_NI |
x86 forced: SHA-NI + SSE4.1 for SHA-1/224/256; SSE4.1 for SHA-512 family. No runtime probing — CPU must support these. |
2 |
X86_AVX512 |
x86 forced: SHA-NI for SHA-1/224/256 + AVX-512 message schedule for all SHA-2 variants including SHA-224/256. CPU must support AVX-512F/VL. |
3 |
ARM_NEON |
AArch64 forced: NEON-vectorised SHA-512 message schedule; SHA-1/224/256 use scalar. Runs on any AArch64 CPU. |
4 |
ARM_SHA |
AArch64 forced: SHA2 crypto extension for SHA-1/224/256; NEON for SHA-384/512. Requires HWCAP_SHA2 (e.g. Cortex-A53+crypto, A55, A57…). |
5 |
ARM_CRYPTO |
AArch64 forced: SHA2 crypto extension for SHA-1/224/256 and SHA-512 hardware extension for SHA-384/512. Requires both HWCAP_SHA2 and HWCAP_SHA512 (ARMv8.2-A+sha3, e.g. Cortex-A55 rev≥1, Apple M-series, Neoverse). |
Note: "Forced" profiles (
1–4) call intrinsics directly without any runtime CPU check. Running a forced binary on a CPU that lacks the required ISA will cause an illegal-instruction signal. A profile targeting the wrong architecture (e.g.X86_SHA_NIon ARM) is silently downgraded toSCALAR.
- x86 SHA-NI — Intel SHA Extensions (
_mm_sha1rnds4/_mm_sha1nexte/_mm_sha1msg1/_mm_sha1msg2). Fastest single-block path for SHA-1. Used byX86_SHA_NI,X86_AVX512, andAUTO(when CPUID detects SHA). - ARM SHA1 — AArch64 crypto extensions (
vsha1cq/vsha1pq/vsha1mq/vsha1su0q/vsha1su1q). Used byARM_CRYPTOandAUTO(whenHWCAP_SHA1is set ingetauxval(AT_HWCAP)).
- x86 SHA-NI — Intel SHA Extensions (
_mm_sha256rnds2/msg1/msg2). Fastest single-block path. Used byX86_SHA_NI,X86_AVX512, andAUTO. - x86 AVX-512 — vectorised message schedule (AVX-512F + AVX-512VL,
vprord) for SHA-224/256. Used byX86_AVX512only (SHA-NI is faster per block, so it is never selected inAUTOmode for this family). - ARM SHA2 — AArch64 crypto extensions (
vsha256hq/vsha256su*). Used byARM_CRYPTOandAUTO(whenHWCAP_SHA2is present).
- x86 AVX-512 — vectorised message schedule using
vprorq(64-bit lane rotate), 8-way unrolled compression. Used byX86_AVX512andAUTO(when AVX-512F/VL is available and the OS has enabled the register state viaXCR0). - x86 SSE4.1 — same two-wide XMM schedule but 64-bit rotations decomposed
into shift-or pairs. Used by
X86_SHA_NIandAUTO(when AVX-512 is absent). Present on virtually all x86-64 hardware since ~2008. - ARM SHA-512 — ARMv8.2-A extension (
vsha512hq/vsha512h2q/vsha512su0q/vsha512su1q). Used byARM_CRYPTOandAUTO(whenHWCAP2_SHA512is set ingetauxval(AT_HWCAP2)). - ARM NEON — vectorised message schedule using baseline NEON 64-bit lane
operations (shift-or pairs). Available on all AArch64 CPUs (Cortex-A53
and later). Used by
ARM_NEON,ARM_CRYPTO(as fallback when SHA-512 extension is absent), andAUTO(when SHA-512 extension is absent).
On GCC/Clang the intrinsics for each backend are enabled per-function via
__attribute__((target(...))), so no global -msha / -mavx512* /
-march=armv8-a+sha2 / -march=armv8.2-a+sha3 flags are needed to build
them; on MSVC they are always available.
| Algorithm | x86 | AArch64 |
|---|---|---|
| SHA-1 | SHA-NI → scalar | SHA1 crypto ext → scalar |
| SHA-224/256 | SHA-NI → scalar | SHA2 crypto ext → scalar |
| SHA-384/512/… | AVX-512 → SSE4.1 → scalar | SHA-512 ext → NEON → scalar |
# Default (auto-detect)
cmake -S . -B build
# Force scalar everywhere
cmake -S . -B build -DDATAFORGE_ACCEL_PROFILE=0
# Force x86 SHA-NI + SSE4.1 (profile 1)
cmake -S . -B build -DDATAFORGE_ACCEL_PROFILE=1
# Force x86 AVX-512 for all SHA-2 including SHA-224/256 (profile 2)
cmake -S . -B build -DDATAFORGE_ACCEL_PROFILE=2
# Force AArch64 NEON for SHA-512, scalar for SHA-1/256 (profile 3)
cmake -S . -B build -DDATAFORGE_ACCEL_PROFILE=3
# Force AArch64 crypto extensions (SHA2 + SHA-512) (profile 4)
cmake -S . -B build -DDATAFORGE_ACCEL_PROFILE=4Without CMake (direct compiler define):
g++ ... -DDATAFORGE_ACCEL_PROFILE=2Distributed under the Boost Software License, Version 1.0.
The Dataforge library is used in my iOS application on the App Store:
| PotoHEX HEX File Viewer & Editor | |
This application is designed to view and edit files at the byte or character level; calculate different hashes, encode/decode, and compress/decompress desired byte regions.
You can support my open-source development by trying the App.
Feedback is welcome!