hubvault.storage.chunk
Chunk planning helpers for hubvault.storage.
This module contains the chunk-planning logic used by the local repository
backend when a file is large enough to switch from whole-blob storage to
chunked pack storage. Phase 10 upgrades this path from fixed-size splitting to
FastCDC content-defined chunking and uses blake3 as the planner’s fast
digest.
The module contains:
ChunkDescriptor- Logical metadata for one chunk in a fileChunkPart- Chunk descriptor paired with chunk payload bytesChunkPlan- Full chunk plan and LFS-style public metadata for a fileChunkStore- Planner that splits bytes into content-defined chunkscanonical_lfs_pointer()- Build canonical Git LFS pointer bytesgit_blob_oid()- Compute a Git-compatible blob OID
Example:
>>> store = ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024)
>>> plan = store.plan_bytes(b"abcdefgh")
>>> sum(chunk.logical_size for chunk in plan.chunks)
8
>>> plan.etag == plan.sha256
True
OBJECT_HASH
- hubvault.storage.chunk.OBJECT_HASH = 'sha256'
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
DEFAULT_CHUNK_SIZE
- hubvault.storage.chunk.DEFAULT_CHUNK_SIZE = 4194304
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
ChunkDescriptor
- class hubvault.storage.chunk.ChunkDescriptor(chunk_id: str, checksum: str, logical_offset: int, logical_size: int, stored_size: int, compression: str = 'none')[source]
Describe one logical chunk inside a larger file.
- Parameters:
chunk_id (str) – Internal chunk identifier with an explicit hash prefix
checksum (str) – Integrity checksum for the logical chunk payload
logical_offset (int) – Starting byte offset within the logical file
logical_size (int) – Logical chunk size in bytes
stored_size (int) – Stored chunk size in bytes
compression (str, optional) – Storage compression label, defaults to
"none"
Example:
>>> descriptor = ChunkDescriptor("sha256:" + "a" * 64, "sha256:" + "a" * 64, 0, 4, 4) >>> descriptor.logical_offset 0
ChunkPart
- class hubvault.storage.chunk.ChunkPart(descriptor: ChunkDescriptor, data: bytes)[source]
Pair a chunk descriptor with its payload bytes.
- Parameters:
descriptor (ChunkDescriptor) – Logical chunk metadata
data (bytes) – Chunk payload bytes
Example:
>>> part = ChunkPart(ChunkDescriptor("sha256:" + "a" * 64, "sha256:" + "a" * 64, 0, 4, 4), b"data") >>> part.data b'data'
ChunkPlan
- class hubvault.storage.chunk.ChunkPlan(logical_size: int, sha256: str, oid: str, etag: str, pointer_size: int, chunks: Tuple[ChunkDescriptor, ...], parts: Tuple[ChunkPart, ...])[source]
Describe the chunked storage plan for one logical file.
- Parameters:
logical_size (int) – Logical file size in bytes
sha256 (str) – Raw hexadecimal SHA-256 digest of the logical file
oid (str) – Git blob OID of the canonical LFS pointer
etag (str) – Public ETag value, aligned with the file SHA-256 for LFS mode
pointer_size (int) – Size of the canonical LFS pointer in bytes
chunks (Tuple[ChunkDescriptor, ...]) – Ordered logical chunk descriptors
parts (Tuple[ChunkPart, ...]) – Ordered chunk payloads paired with descriptors
Example:
>>> store = ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024) >>> plan = store.plan_bytes(b"abcdef") >>> plan.pointer_size > 0 True
ChunkStore
- class hubvault.storage.chunk.ChunkStore(chunk_size: int = 4194304, min_chunk_size: int | None = None, max_chunk_size: int | None = None)[source]
Build deterministic chunk plans for large file payloads.
- Parameters:
chunk_size (int, optional) – Target average chunk size in bytes, defaults to
DEFAULT_CHUNK_SIZEmin_chunk_size (Optional[int], optional) – Optional minimum chunk size, defaults to
DEFAULT_MIN_CHUNK_SIZEmax_chunk_size (Optional[int], optional) – Optional maximum chunk size, defaults to
DEFAULT_MAX_CHUNK_SIZE
- Raises:
ValueError – Raised when the chunk-size settings are invalid.
Example:
>>> store = ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024) >>> store.chunk_size 256
- __init__(chunk_size: int = 4194304, min_chunk_size: int | None = None, max_chunk_size: int | None = None) None[source]
Initialize the chunk planner.
- Parameters:
chunk_size (int, optional) – Target average chunk size in bytes, defaults to
DEFAULT_CHUNK_SIZEmin_chunk_size (Optional[int], optional) – Optional minimum chunk size, defaults to
DEFAULT_MIN_CHUNK_SIZEmax_chunk_size (Optional[int], optional) – Optional maximum chunk size, defaults to
DEFAULT_MAX_CHUNK_SIZE
- Returns:
None.- Return type:
None
- Raises:
ValueError – Raised when the chunk-size settings are invalid.
Example:
>>> ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024).chunk_size 256
- plan_bytes(data: bytes) ChunkPlan[source]
Split bytes into content-defined chunks and compute public metadata.
- Parameters:
data (bytes) – Logical file payload bytes
- Returns:
Chunk plan with chunk descriptors and canonical LFS metadata
- Return type:
- Raises:
ValueError – Raised when
datais not byte-like.
Example:
>>> store = ChunkStore(chunk_size=256, min_chunk_size=64, max_chunk_size=1024) >>> plan = store.plan_bytes(b"abcdefgh") >>> sum(len(part.data) for part in plan.parts) 8
sha256_hex
- hubvault.storage.chunk.sha256_hex(data: bytes) str[source]
Compute a lowercase hexadecimal SHA-256 digest.
- Parameters:
data (bytes) – Input bytes
- Returns:
Lowercase hexadecimal digest
- Return type:
str
Example:
>>> sha256_hex(b"abc") 'ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad'
git_blob_oid
canonical_lfs_pointer
- hubvault.storage.chunk.canonical_lfs_pointer(file_sha256: str, size: int) bytes[source]
Build canonical Git LFS pointer bytes for a file.
- Parameters:
file_sha256 (str) – Raw hexadecimal SHA-256 digest of the logical file
size (int) – Logical file size in bytes
- Returns:
Canonical pointer payload bytes
- Return type:
bytes
Example:
>>> canonical_lfs_pointer("a" * 64, 5).startswith(b"version https://git-lfs.github.com/spec/v1\n") True