Generic Binary Keyed Format (*.gbkf)
This is the specification of a compact, extensible binary format designed for fast, schema-flexible storage and transmission of structured data.
Field sizes were chosen to balance storage efficiency against typical enterprise-scale data volumes. In general, the format’s capabilities far exceed what most implementations can write in terms of file size. An example of this, is that the maximum Instance ID is 4,294,967,295.
This document focuses in describing the format itself, so detailed technical choices, tests, examples and implementations are not covered by this document. Such documentation can be found at gbkf-format.org.
This version was released on July 27, 2025.
Format Specification
Description:
- Header: Main identifier of the file, essential to read the body.
- Body: Container of all the Keyed-Values.
- Footer: SHA-256 used to verify the file integrity.
Remark: The minimum valid GBKF file consists of only the header, with zero keyed-values and no footer. This represents a file with no payload and no integrity check. While the footer is optional, its use is strongly recommended to ensure file integrity.
Header
-
gbkf:
- Type: ASCII character sequence
- Encoding: 1 byte per character using 7-bit ASCII (ISO 646)
- Size: 4 bytes, 1 byte per character.
- Description: Format Identifier.
-
gbkf Version:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 1 byte
-
Values:
- Min: 0
- Max: 255
- Description: Version of the GBKF Specification. This version is 1.
-
Specification ID:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 4 bytes
-
Values:
- Min: 0
- Max: 4,294,967,295
- Default: 0
- Description: Field to distinguish top-level specifications. The list of reserved ranges is here.
-
Specification Version:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 2 bytes
-
Values:
- Min: 0
- Max: 65,535
- Default: 0
- Description: Version of the top-level specification.
-
Main String Encoding:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 2 bytes
- Values: The values are provided by the MIBenum of
iana.org
. Some examples are:
- 3: ASCII
- 4: Latin-1
- 106: UTF-8
- Description: Main encoding used for the STRING fields. If kept as UTF-8 this will allow to store "rich text". If possible, it is recommended to use LATIN-1, that will significantly reduce the size if using fixed-size strings.
-
Secondary String Encoding:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 2 bytes
- Values: The values are provided by the MIBenum of
iana.org
. Some examples are:
- 3: ASCII
- 4: Latin-1
- 106: UTF-8
- Description: Secondary string encoding used for the STRING fields. If kept as ASCII, and used for identifiers or simple data, this will significantly reduce the storage size, notably if using fixed-size strings.
-
Keys Size:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 1 byte
-
Values:
- Min: 1
- Max: 255
- Default: 1
- Description: Size used to store the keys. This can also be interpreted as the "Maximum Size", depending on the format implementation, but the most important is that in the binary file, the field size must be fixed for all the entries.
-
Number of Keyed-Values:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 4 bytes
-
Values:
- Min: 0
- Max: 4,294,967,295
- Default: 0
- Description: Total number of keyed-values before the SHA256 sum.
Body
The body contains the real payload, and all its content is managed through a Keyed-(Instance)-Value system. Each starts with a common header, and then it is followed by the values' data.
The body can contain from zero to 4,294,967,295 Keyed-(Instance)-Values.
Values-Header
This header is common to all the Keyed-Values, and it allows to define the data and its size, and to internally identify it or group it. It also allows to create a mapping table when reading the file, even if for some types (like STRING), the header is extended.
-
Key:
- Type: ASCII character sequence
- Encoding: 1 byte per character using 7-bit ASCII (ISO 646)
- Size: <Keys Size> This value is defined in the Main Header.
- Description: Values identifier. This is a friendly approach to identify the type of your data.
-
Instance ID:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 4 Bytes
-
Values:
- Min: 0
- Max: 4,294,967,295
- Default: 0
- Description: Field to identify data sharing the same key. This allows to group data across different keys, or to write sequential data across a same key.
-
Number of Values:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 4 Bytes
-
Values:
- Min: 0
- Max: 4,294,967,295
- Default: 0
- Description: Number of values that are hold by the key-instance id. Depending on the type, this can also represent the number of Bytes.
- Values Type:
Type BLOB
The binary data is stored without transformation or encoding. Each byte is treated as one value, and the "Number of Values" field in the Keyed-Value header represents the total number of bytes in the payload.
Type BOOLEAN
Booleans are grouped into a package of 8 bits and written as a byte. Because of that, it is necessary store the number of useful bits of the last byte.
-
Last Byte Useful Booleans Nb:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 1 Bytes
- Values: 1 to 8
- Description: Number of Useful bits of the last byte.
-
Values:
- Type: Boolean
- Encoding: Big-endian
- Size: 1 Byte
Type STRING
There are two types of string, with fixed size, and with dynamic size. In both cases, they start with the following data:
-
Encoding Choice:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 1 Byte
- Values:
- 0 to use the Main Encoding
- 1 to use the Secondary Encoding
- Description: Having two possible encodings, allows to store "rich text", but also to efficiently store identifiers with a secondary encoding.
-
String Type OR Size:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 2 Bytes
- Values:
- 0 for Dynamic-sized strings
- 1 to 65,535 for Fixed-Sized strings
- Description: This field allows to recognize if the subtype is Fixed-Size or Dynamic-size. And in case of using Fixed-Size, the value must be equal to the maximum number of characters (without counting the null character).
Fixed-Size String
Remarks:
- The strings will be limited by the Maximum String Size.
- Shorter strings than the maximum are delimited with the null character.
- The number of bytes per character is provided by the encoding.
- It is recommended to use LATIN-1 or ASCII encodings for fixed-strings to reduce from 4 bytes (if using UTF-8) to 1 per character.
- This subtype is recommended for database or field identifiers.
Dynamic-Size String
-
Total number of bytes:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 4 Bytes
- Description:
Total number of bytes = ∑i=1i=Number of values Stringi size.
This field is necessary to know the whole values size, and move to the next keyed-value.
-
String Size:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size: 2 Bytes
- Description: Number of bytes that it is necessary to allocate the string (without the null character)
Remarks:
- For each string its size its stored before the value.
- The number of bytes per character is provided by the encoding.
- This subtype is recommended for variable strings, or UTF-8 encoding.
Type INT/UINT
All integers are written in sequence and have the same structure. Thanks to their Type it is possible to deduce the size and respective ranges.
-
Values:
- Type: Unsigned Integer
- Encoding: Big-endian
- Size:
- 1: for uint8 / int8
- 2: for uint16 / int16
- 4: for uint32 / int32
- 8: for uint64 / int64
-
Ranges:
- int8: −128 to 127
- int16: −32,768 to 32,767
- int32: −2,147,483,648 to 2,147,483,647
- int64: −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
- uint8: 0 to 255
- uint16: 0 to 65,535
- uint32: 0 to 4,294,967,295
- uint64: 0 to 18,446,744,073,709,551,615
Type FLOAT32
Float32 values are read and written in sequence using
IEEE 754 single-precision ,
and only finite normalized values are supported.
-
Values:
- Type: Float32
- Encoding: Big-endian
- Size: 4 Bytes
-
Ranges:
- Min: -3.4028235e+38
- Max: 3.4028235e+38
Type FLOAT64
Float64 values are read and written in sequence using
IEEE 754 double-precision ,
and only finite normalized values are supported.
-
Values:
- Type: Float64
- Encoding: Big-endian
- Size: 8 Bytes
-
Ranges:
- Min: -1.7976931348623157e+308
- Max: 1.7976931348623157e+308
Footer
The footer is an SHA-256 (32 bytes) that hashes the header and the body. This allows to verify the file integrity.