hubvault.storage.chunk

Chunk planning helpers for hubvault.storage.

This module contains the chunk-planning logic used by the local repository backend when a file is large enough to switch from whole-blob storage to chunked pack storage. Phase 10 upgrades this path from fixed-size splitting to FastCDC content-defined chunking and uses blake3 as the planner’s fast digest.

The module contains:

Example:

>>> store = ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024)
>>> plan = store.plan_bytes(b"abcdefgh")
>>> sum(chunk.logical_size for chunk in plan.chunks)
8
>>> plan.etag == plan.sha256
True

OBJECT_HASH

hubvault.storage.chunk.OBJECT_HASH = 'sha256'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

DEFAULT_CHUNK_SIZE

hubvault.storage.chunk.DEFAULT_CHUNK_SIZE = 4194304

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

ChunkDescriptor

class hubvault.storage.chunk.ChunkDescriptor(chunk_id: str, checksum: str, logical_offset: int, logical_size: int, stored_size: int, compression: str = 'none')[source]

Describe one logical chunk inside a larger file.

Parameters:
  • chunk_id (str) – Internal chunk identifier with an explicit hash prefix

  • checksum (str) – Integrity checksum for the logical chunk payload

  • logical_offset (int) – Starting byte offset within the logical file

  • logical_size (int) – Logical chunk size in bytes

  • stored_size (int) – Stored chunk size in bytes

  • compression (str, optional) – Storage compression label, defaults to "none"

Example:

>>> descriptor = ChunkDescriptor("sha256:" + "a" * 64, "sha256:" + "a" * 64, 0, 4, 4)
>>> descriptor.logical_offset
0

ChunkPart

class hubvault.storage.chunk.ChunkPart(descriptor: ChunkDescriptor, data: bytes)[source]

Pair a chunk descriptor with its payload bytes.

Parameters:
  • descriptor (ChunkDescriptor) – Logical chunk metadata

  • data (bytes) – Chunk payload bytes

Example:

>>> part = ChunkPart(ChunkDescriptor("sha256:" + "a" * 64, "sha256:" + "a" * 64, 0, 4, 4), b"data")
>>> part.data
b'data'

ChunkPlan

class hubvault.storage.chunk.ChunkPlan(logical_size: int, sha256: str, oid: str, etag: str, pointer_size: int, chunks: Tuple[ChunkDescriptor, ...], parts: Tuple[ChunkPart, ...])[source]

Describe the chunked storage plan for one logical file.

Parameters:
  • logical_size (int) – Logical file size in bytes

  • sha256 (str) – Raw hexadecimal SHA-256 digest of the logical file

  • oid (str) – Git blob OID of the canonical LFS pointer

  • etag (str) – Public ETag value, aligned with the file SHA-256 for LFS mode

  • pointer_size (int) – Size of the canonical LFS pointer in bytes

  • chunks (Tuple[ChunkDescriptor, ...]) – Ordered logical chunk descriptors

  • parts (Tuple[ChunkPart, ...]) – Ordered chunk payloads paired with descriptors

Example:

>>> store = ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024)
>>> plan = store.plan_bytes(b"abcdef")
>>> plan.pointer_size > 0
True

ChunkStore

class hubvault.storage.chunk.ChunkStore(chunk_size: int = 4194304, min_chunk_size: int | None = None, max_chunk_size: int | None = None)[source]

Build deterministic chunk plans for large file payloads.

Parameters:
  • chunk_size (int, optional) – Target average chunk size in bytes, defaults to DEFAULT_CHUNK_SIZE

  • min_chunk_size (Optional[int], optional) – Optional minimum chunk size, defaults to DEFAULT_MIN_CHUNK_SIZE

  • max_chunk_size (Optional[int], optional) – Optional maximum chunk size, defaults to DEFAULT_MAX_CHUNK_SIZE

Raises:

ValueError – Raised when the chunk-size settings are invalid.

Example:

>>> store = ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024)
>>> store.chunk_size
256
__init__(chunk_size: int = 4194304, min_chunk_size: int | None = None, max_chunk_size: int | None = None) None[source]

Initialize the chunk planner.

Parameters:
  • chunk_size (int, optional) – Target average chunk size in bytes, defaults to DEFAULT_CHUNK_SIZE

  • min_chunk_size (Optional[int], optional) – Optional minimum chunk size, defaults to DEFAULT_MIN_CHUNK_SIZE

  • max_chunk_size (Optional[int], optional) – Optional maximum chunk size, defaults to DEFAULT_MAX_CHUNK_SIZE

Returns:

None.

Return type:

None

Raises:

ValueError – Raised when the chunk-size settings are invalid.

Example:

>>> ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024).chunk_size
256
plan_bytes(data: bytes) ChunkPlan[source]

Split bytes into content-defined chunks and compute public metadata.

Parameters:

data (bytes) – Logical file payload bytes

Returns:

Chunk plan with chunk descriptors and canonical LFS metadata

Return type:

ChunkPlan

Raises:

ValueError – Raised when data is not byte-like.

Example:

>>> store = ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024)
>>> plan = store.plan_bytes(b"abcdefgh")
>>> sum(len(part.data) for part in plan.parts)
8

sha256_hex

hubvault.storage.chunk.sha256_hex(data: bytes) str[source]

Compute a lowercase hexadecimal SHA-256 digest.

Parameters:

data (bytes) – Input bytes

Returns:

Lowercase hexadecimal digest

Return type:

str

Example:

>>> sha256_hex(b"abc")
'ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad'

git_blob_oid

hubvault.storage.chunk.git_blob_oid(data: bytes) str[source]

Compute a Git-compatible blob OID for bytes.

Parameters:

data (bytes) – Blob payload bytes

Returns:

Git SHA-1 blob OID without a prefix

Return type:

str

Example:

>>> len(git_blob_oid(b"abc"))
40

canonical_lfs_pointer

hubvault.storage.chunk.canonical_lfs_pointer(file_sha256: str, size: int) bytes[source]

Build canonical Git LFS pointer bytes for a file.

Parameters:
  • file_sha256 (str) – Raw hexadecimal SHA-256 digest of the logical file

  • size (int) – Logical file size in bytes

Returns:

Canonical pointer payload bytes

Return type:

bytes

Example:

>>> canonical_lfs_pointer("a" * 64, 5).startswith(b"version https://git-lfs.github.com/spec/v1\n")
True