hubvault.repo.backend
Repository backend for the hubvault MVP.
This module implements the local on-disk repository format used by the MVP. The backend is intentionally embedded and file-based so the repository remains self-contained and movable as a normal directory tree.
The module contains:
RepositoryBackend- Internal repository service used by the public API
WINDOWS_RESERVED_NAMES
- hubvault.repo.backend.WINDOWS_RESERVED_NAMES = {'AUX', 'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 'COM8', 'COM9', 'CON', 'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9', 'NUL', 'PRN'}
set() -> new empty set object set(iterable) -> new set object
Build an unordered collection of unique elements.
REF_NAME_PATTERN
- hubvault.repo.backend.REF_NAME_PATTERN = re.compile('^[A-Za-z0-9][A-Za-z0-9._/-]*$')
Compiled regular expression object.
DRIVE_PATTERN
- hubvault.repo.backend.DRIVE_PATTERN = re.compile('^[A-Za-z]:')
Compiled regular expression object.
GC_ANALYSIS_PACK_ID
- hubvault.repo.backend.GC_ANALYSIS_PACK_ID = 'gc-0000000000000000-00000000'
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
GIT_OID_PATTERN
- hubvault.repo.backend.GIT_OID_PATTERN = re.compile('^[0-9a-f]{40}$')
Compiled regular expression object.
FAILPOINT_ENV
- hubvault.repo.backend.FAILPOINT_ENV = 'HUBVAULT_FAILPOINT'
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
FAIL_ACTION_ENV
- hubvault.repo.backend.FAIL_ACTION_ENV = 'HUBVAULT_FAIL_ACTION'
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
FAILPOINT_EXIT_CODE
- hubvault.repo.backend.FAILPOINT_EXIT_CODE = 86
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
RepositoryBackend
- class hubvault.repo.backend.RepositoryBackend(repo_path: Path)[source]
Internal repository backend for the MVP.
This backend owns the on-disk format, object storage, revision resolution, detached read views, and transaction lifecycle used by
hubvault.api.HubVaultApi.Example:
>>> backend = RepositoryBackend(Path("/tmp/demo-repo")) >>> isinstance(backend, RepositoryBackend) True
- __init__(repo_path: Path)[source]
Initialize the repository backend for a root directory.
- Parameters:
repo_path (pathlib.Path) – Filesystem path to the repository root
- Returns:
None.- Return type:
None
Example:
>>> backend = RepositoryBackend(Path("/tmp/demo-repo")) >>> backend._repo_path.as_posix().endswith("demo-repo") True
- create_branch(*, branch: str, revision: str | None = None, exist_ok: bool = False) None[source]
Create a new branch from an existing revision.
The public method name and main parameters intentionally mirror
huggingface_hub.HfApi.create_branch(), while omitting remote-only parameters such asrepo_id,repo_type, andtoken.- Parameters:
branch (str) – Branch name to create
revision (Optional[str], optional) – Starting revision, defaults to the repository default branch
exist_ok (bool, optional) – Whether an existing branch should be accepted
- Returns:
None.- Return type:
None
- Raises:
ConflictError – Raised when the branch already exists and
exist_okisFalse.RevisionNotFoundError – Raised when
revisioncannot be resolved.UnsupportedPathError – Raised when
branchis invalid.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... backend.create_branch(branch="dev") ... sorted(ref.name for ref in backend.list_repo_refs().branches) ['dev', 'main']
- create_commit(operations: Sequence[object], commit_message: str, commit_description: str | None = None, revision: str | None = None, parent_commit: str | None = None) CommitInfo[source]
Create a new commit on a branch revision.
The backend stages all new objects in a transaction directory, publishes them atomically, and only then updates the branch ref and reflog.
- Parameters:
operations (Sequence[object]) – Add, delete, or copy operations to apply
commit_message (str) – Commit summary/title to store. When
commit_descriptionis omitted, embedded body text after a blank line is preserved and split the same way Git and HF commit listings interpret commit text.commit_description (Optional[str], optional) – Optional commit description/body
revision (Optional[str], optional) – Branch name that will receive the new commit, defaults to the repository default branch
parent_commit (Optional[str], optional) – Optional expected parent commit for optimistic concurrency checks. When omitted, the current branch head becomes the implicit base revision.
- Returns:
Public metadata for the created commit
- Return type:
- Raises:
ConflictError – Raised when no operations are supplied, an unsupported operation is provided, or optimistic concurrency checks fail.
EntryNotFoundError – Raised when delete or copy operations refer to missing paths.
RevisionNotFoundError – Raised when the target revision cannot be resolved.
UnsupportedPathError – Raised when revision names or repo paths are invalid.
ValueError – Raised when
commit_messageis empty.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... commit = backend.create_commit( ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... commit.commit_message 'seed'
- create_repo(default_branch: str = 'main', exist_ok: bool = False, large_file_threshold: int = 16777216) RepoInfo[source]
Create a repository at the configured root path.
This method bootstraps the self-contained on-disk layout, writes the format marker and repository configuration, initializes the default branch, and immediately writes an empty initial commit so the repo has a valid first history entry from the start.
- Parameters:
default_branch (str, optional) – Default branch name to create, defaults to
DEFAULT_BRANCHexist_ok (bool, optional) – Whether an existing repository may be reused, defaults to
Falselarge_file_threshold (int, optional) – File size threshold in bytes at or above which newly added files switch to chunked storage, defaults to
LARGE_FILE_THRESHOLD
- Returns:
Public metadata for the created or reused repository
- Return type:
- Raises:
RepositoryAlreadyExistsError – Raised when the target path already contains a repository or any non-empty directory.
UnsupportedPathError – Raised when
default_branchviolates repository ref naming rules.ValueError – Raised when
large_file_thresholdis not positive.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... info = backend.create_repo() ... (info.default_branch, info.head is not None) ('main', True)
- create_tag(*, tag: str, tag_message: str | None = None, revision: str | None = None, exist_ok: bool = False) None[source]
Create a lightweight tag pointing at a revision.
- Parameters:
tag (str) – Tag name to create
tag_message (Optional[str], optional) – Optional tag message stored in the reflog
revision (Optional[str], optional) – Target revision, defaults to the repository default branch
exist_ok (bool, optional) – Whether an existing tag should be accepted
- Returns:
None.- Return type:
None
- Raises:
ConflictError – Raised when the tag already exists and
exist_okisFalse.RevisionNotFoundError – Raised when the target revision does not resolve to a commit.
UnsupportedPathError – Raised when
tagis invalid.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... backend.create_tag(tag="v1") ... [ref.name for ref in backend.list_repo_refs().tags] ['v1']
- delete_branch(*, branch: str) None[source]
Delete a branch ref from the repository.
The current default branch is protected from deletion, mirroring the practical behavior users expect from the HF Hub.
- Parameters:
branch (str) – Branch name to delete
- Returns:
None.- Return type:
None
- Raises:
ConflictError – Raised when attempting to delete the default branch.
RevisionNotFoundError – Raised when the branch does not exist.
UnsupportedPathError – Raised when
branchis invalid.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... backend.create_branch(branch="dev") ... backend.delete_branch(branch="dev") ... [ref.name for ref in backend.list_repo_refs().branches] ['main']
- delete_file(path_in_repo: str, revision: str | None = None, commit_message: str | None = None, commit_description: str | None = None, parent_commit: str | None = None) CommitInfo[source]
Delete a single file through the public commit API.
- Parameters:
path_in_repo (str) – Repo-relative file path
revision (Optional[str], optional) – Target branch name, defaults to the default branch
commit_message (Optional[str], optional) – Optional commit summary
commit_description (Optional[str], optional) – Optional commit description/body
parent_commit (Optional[str], optional) – Optional optimistic-concurrency parent commit
- Returns:
Public commit metadata for the created commit
- Return type:
- delete_folder(path_in_repo: str, revision: str | None = None, commit_message: str | None = None, commit_description: str | None = None, parent_commit: str | None = None) CommitInfo[source]
Delete a folder subtree through the public commit API.
- Parameters:
path_in_repo (str) – Repo-relative folder path
revision (Optional[str], optional) – Target branch name, defaults to the default branch
commit_message (Optional[str], optional) – Optional commit summary
commit_description (Optional[str], optional) – Optional commit description/body
parent_commit (Optional[str], optional) – Optional optimistic-concurrency parent commit
- Returns:
Public commit metadata for the created commit
- Return type:
- delete_tag(*, tag: str) None[source]
Delete a tag ref from the repository.
- Parameters:
tag (str) – Tag name to delete
- Returns:
None.- Return type:
None
- Raises:
RevisionNotFoundError – Raised when the tag does not exist.
UnsupportedPathError – Raised when
tagis invalid.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... backend.create_tag(tag="v1") ... backend.delete_tag(tag="v1") ... backend.list_repo_refs().tags []
- full_verify() VerifyReport[source]
Perform a complete repository verification pass.
The full pass verifies the live ref graph, validates published object containers, inspects visible chunk index segments, and reads chunk payloads through pack storage so corruption can be localized before space-management operations are attempted.
- Returns:
Verification summary for the current repository state
- Return type:
- Raises:
RepositoryNotFoundError – Raised when the configured root is not a valid repository.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... backend.full_verify().ok True
- gc(dry_run: bool = False, prune_cache: bool = True) GcReport[source]
Reclaim unreachable objects and compact live chunk storage.
The maintenance pass preserves all currently visible refs, rewrites the live chunk set into a compact pack/index view, and optionally prunes the rebuildable managed cache directories under
cache/.- Parameters:
dry_run (bool, optional) – Whether to compute the result without mutating storage
prune_cache (bool, optional) – Whether rebuildable managed caches should also be removed
- Returns:
Garbage-collection summary
- Return type:
- Raises:
RepositoryNotFoundError – Raised when the configured root is not a valid repository.
VerificationError – Raised when the repository fails full verification and therefore cannot be reclaimed safely.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... backend.gc(dry_run=True).dry_run True
- get_paths_info(paths: Sequence[str] | str, revision: str = 'main') List[RepoFile | RepoFolder][source]
Return public metadata for the requested paths.
This method intentionally follows the main behavior of
huggingface_hub.HfApi.get_paths_info(): existing file and folder paths are returned in input order, while missing paths are ignored instead of raising an exception.- Parameters:
paths (Union[Sequence[str], str]) – Repo-relative path or paths to inspect
revision (str, optional) – Revision to resolve, defaults to
DEFAULT_BRANCH
- Returns:
Path metadata in the same order as existing requested paths
- Return type:
List[Union[RepoFile, RepoFolder]]
- Raises:
RevisionNotFoundError – Raised when
revisioncannot be resolved.UnsupportedPathError – Raised when any supplied path is invalid.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("nested/demo.txt", b"hello")], ... commit_message="seed", ... ) ... type(backend.get_paths_info(["nested", "nested/demo.txt"])[0]).__name__ 'RepoFolder'
- get_storage_overview() StorageOverview[source]
Analyze repository disk usage and safe reclamation opportunities.
The report separates immediately reclaimable space from space that is still retained for rollback history and therefore requires an explicit rewrite such as
squash_history()beforegc()can release it.- Returns:
Repository storage overview
- Return type:
- Raises:
RepositoryNotFoundError – Raised when the configured root is not a valid repository.
IntegrityError – Raised when persisted storage cannot be analyzed safely because the live graph is inconsistent.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... backend.get_storage_overview().total_size >= 0 True
- hf_hub_download(filename: str, revision: str | None = None, local_dir: str | None = None) str[source]
Materialize a detached user view for a file and return its path.
The detached path preserves the repo-relative filename suffix, including parent directories, so callers can work with a normal-looking filesystem path while repository truth remains immutable until explicit commit APIs are used.
- Parameters:
filename (str) – Repo-relative file path to materialize
revision (Optional[str], optional) – Revision to inspect, defaults to the default branch
local_dir (Optional[str], optional) – Optional external target root for the detached view
- Returns:
Filesystem path to a detached readable file view
- Return type:
str
- Raises:
EntryNotFoundError – Raised when the requested file is absent.
RevisionNotFoundError – Raised when
revisioncannot be resolved.UnsupportedPathError – Raised when
filenameis invalid.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("nested/demo.txt", b"hello")], ... commit_message="seed", ... ) ... path = backend.hf_hub_download("nested/demo.txt") ... path.endswith("nested/demo.txt") True
- list_repo_commits(revision: str = 'main', formatted: bool = False) List[GitCommitInfo][source]
List commit entries reachable from a revision head.
The public method name and the meaningful parameters intentionally match
huggingface_hub.HfApi.list_repo_commits()as closely as the local repository design allows. The local API omits remote-only parameters such asrepo_id,repo_type, andtokenbecause they have no real behavior for an embedded on-disk repository.The returned list walks the reachable commit DAG from the selected head in a stable parent-first order. This keeps merge commits visible while remaining deterministic for local regression tests.
- Parameters:
revision (str, optional) – Revision or commit ID to inspect, defaults to
DEFAULT_BRANCHformatted (bool, optional) – Whether HTML-formatted title/message fields should be populated
- Returns:
Commit entries ordered from newest to oldest
- Return type:
List[GitCommitInfo]
- Raises:
RepositoryNotFoundError – Raised when the configured root is not a valid repository.
RevisionNotFoundError – Raised when
revisioncannot be resolved.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... commit = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... history = backend.list_repo_commits() ... history[0].title 'seed'
- list_repo_files(revision: str = 'main') List[str][source]
List all file paths in a revision.
The result is a flattened, sorted list of repo-relative file paths and intentionally omits directory placeholders.
- Parameters:
revision (str, optional) – Revision to inspect, defaults to
DEFAULT_BRANCH- Returns:
Sorted repo-relative file paths
- Return type:
List[str]
- Raises:
RevisionNotFoundError – Raised when
revisioncannot be resolved.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... backend.list_repo_files() ['demo.txt']
- list_repo_reflog(ref_name: str, limit: int | None = None) List[ReflogEntry][source]
List reflog entries for a branch or tag.
This is a local repository extension with no direct HF public counterpart. It exists to support audit and recovery workflows for the embedded on-disk repository.
- Parameters:
ref_name (str) – Full ref name such as
refs/heads/mainor a short branch/tag name when unambiguouslimit (Optional[int], optional) – Optional maximum number of newest entries to return
- Returns:
Reflog entries ordered from newest to oldest
- Return type:
List[ReflogEntry]
- Raises:
ConflictError – Raised when a short ref name is ambiguous across branches and tags.
RevisionNotFoundError – Raised when the ref or reflog does not exist.
ValueError – Raised when
limitis negative.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... backend.list_repo_reflog("main")[0].ref_name 'refs/heads/main'
- list_repo_refs(include_pull_requests: bool = False) GitRefs[source]
List visible branch and tag refs in HF-style form.
The local repository does not support convert refs or pull requests, but keeps the same top-level return shape as
huggingface_hub.HfApi.list_repo_refs().- Parameters:
include_pull_requests (bool, optional) – Whether pull-request refs should be included. The local repository returns
[]when requested andNoneotherwise.- Returns:
Visible repository refs
- Return type:
- Raises:
RepositoryNotFoundError – Raised when the configured root is not a valid repository.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... backend.list_repo_refs().branches[0].ref 'refs/heads/main'
- list_repo_tree(path_in_repo: str | None = None, recursive: bool = False, revision: str = 'main') List[RepoFile | RepoFolder][source]
List file and folder entries under a repository directory.
This method intentionally follows the main behavior of
huggingface_hub.HfApi.list_repo_tree(), including therecursiveflag and the use of HF-styleRepoFile/RepoFolderreturn objects.- Parameters:
path_in_repo (Optional[str], optional) – Repo-relative directory path, or
Nonefor the repository rootrecursive (bool, optional) – Whether to include descendants recursively
revision (str, optional) – Revision to inspect, defaults to
DEFAULT_BRANCH
- Returns:
Sorted metadata entries for direct children
- Return type:
List[Union[RepoFile, RepoFolder]]
- Raises:
EntryNotFoundError – Raised when the requested directory does not exist.
RevisionNotFoundError – Raised when
revisioncannot be resolved.UnsupportedPathError – Raised when
path_in_repoviolates path rules.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("nested/demo.txt", b"hello")], ... commit_message="seed", ... ) ... [item.path for item in backend.list_repo_tree("nested")] ['nested/demo.txt']
- merge(source_revision: str, target_revision: str = 'main', parent_commit: str | None = None, commit_message: str | None = None, commit_description: str | None = None) MergeResult[source]
Merge a source revision into a target branch.
The local repository exposes merge as a first-class public write API. Successful merges return a structured result describing whether the operation created a merge commit, fast-forwarded the target branch, or found the branches already up to date. Conflicts are reported in the result instead of mutating repository state.
- Parameters:
source_revision (str) – Source branch, tag, or commit ID to merge from
target_revision (str, optional) – Target branch name or full branch ref, defaults to
DEFAULT_BRANCHparent_commit (Optional[str], optional) – Optional expected current head for optimistic concurrency on the target branch
commit_message (Optional[str], optional) – Optional merge-commit title
commit_description (Optional[str], optional) – Optional merge-commit description/body
- Returns:
Structured merge result
- Return type:
- Raises:
ConflictError – Raised when
parent_commitdoes not match the current target head.RevisionNotFoundError – Raised when the source revision or target branch does not exist.
UnsupportedPathError – Raised when
target_revisionis not a branch ref.ValueError – Raised when
commit_messageis explicitly empty.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... operations=[CommitOperationAdd("demo.txt", b"base")], ... commit_message="seed", ... ) ... backend.create_branch(branch="feature") ... _ = backend.create_commit( ... revision="feature", ... operations=[CommitOperationAdd("feature.txt", b"hello")], ... commit_message="feature work", ... ) ... backend.merge("feature").status 'fast-forward'
- open_file(path_in_repo: str, revision: str = 'main') BufferedReader[source]
Open a file from a revision as a read-only binary stream.
The returned stream is detached from repository storage and backed by an in-memory buffer, so accidental writes through the stream cannot mutate repository truth.
- Parameters:
path_in_repo (str) – Repo-relative file path to open
revision (str, optional) – Revision to inspect, defaults to
DEFAULT_BRANCH
- Returns:
Read-only buffered binary stream
- Return type:
io.BufferedReader
- Raises:
EntryNotFoundError – Raised when the requested file is absent.
RevisionNotFoundError – Raised when
revisioncannot be resolved.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... with backend.open_file("demo.txt") as fileobj: ... fileobj.read() b'hello'
- quick_verify() VerifyReport[source]
Perform a minimal repository consistency check.
The verification pass checks repository format compatibility, validates commit closure for all visible refs, and reports stale detached views as warnings instead of fatal errors.
- Returns:
Verification summary for the current repository state
- Return type:
- Raises:
RepositoryNotFoundError – Raised when the configured root is not a valid repository.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... backend.quick_verify().ok True
- read_bytes(path_in_repo: str, revision: str = 'main') bytes[source]
Read a file from a revision into memory.
File bytes are verified against the stored blob checksum before being returned so corruption in detached storage is surfaced immediately.
- Parameters:
path_in_repo (str) – Repo-relative file path to read
revision (str, optional) – Revision to inspect, defaults to
DEFAULT_BRANCH
- Returns:
Full file content bytes
- Return type:
bytes
- Raises:
IntegrityError – Raised when persisted blob bytes do not match recorded checksums.
EntryNotFoundError – Raised when the requested file is absent.
RevisionNotFoundError – Raised when
revisioncannot be resolved.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... backend.read_bytes("demo.txt") b'hello'
- read_range(path_in_repo: str, start: int, length: int, revision: str = 'main') bytes[source]
Read a byte range from a file in a revision.
For chunked files the backend resolves only the overlapping chunks and avoids reconstructing unrelated file regions.
- Parameters:
path_in_repo (str) – Repo-relative file path to read
start (int) – Starting byte offset in the logical file
length (int) – Number of bytes to read
revision (str, optional) – Revision to inspect, defaults to
DEFAULT_BRANCH
- Returns:
Requested byte range, clamped to the file end
- Return type:
bytes
- Raises:
IntegrityError – Raised when persisted blob or pack content does not match recorded checksums.
EntryNotFoundError – Raised when the requested file is absent.
RevisionNotFoundError – Raised when
revisioncannot be resolved.ValueError – Raised when
startorlengthis negative.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... backend.read_range("demo.txt", start=1, length=3) b'ell'
- repo_info(revision: str | None = None) RepoInfo[source]
Return repository metadata for the selected revision.
The selected revision is only used to resolve the visible
headin the returnedRepoInfo; repository-wide settings still come fromrepo.json.- Parameters:
revision (Optional[str], optional) – Revision whose head should be resolved, defaults to the configured default branch
- Returns:
Current repository metadata view
- Return type:
- Raises:
RepositoryNotFoundError – Raised when the configured root is not a valid repository.
RevisionNotFoundError – Raised when
revisioncannot be resolved.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... backend.repo_info().default_branch 'main'
- reset_ref(ref_name: str, to_revision: str) CommitInfo[source]
Reset a branch ref to a target commit.
This method performs a branch-head move under the repository write lock and records the change in the reflog.
- Parameters:
ref_name (str) – Branch name to update
to_revision (str) – Revision or commit ID to resolve as the new head
- Returns:
Public commit metadata for the new branch head
- Return type:
- Raises:
RevisionNotFoundError – Raised when the target revision cannot be resolved.
UnsupportedPathError – Raised when
ref_nameviolates ref naming rules.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... commit = backend.create_commit( ... revision="main", ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... backend.reset_ref("main", commit.oid).oid == commit.oid True
- snapshot_download(revision: str | None = None, local_dir: str | None = None, allow_patterns: Sequence[str] | str | None = None, ignore_patterns: Sequence[str] | str | None = None) str[source]
Materialize a detached snapshot directory for a revision.
The return value intentionally follows the role of
huggingface_hub.snapshot_download()while omitting remote-only parameters that have no local behavior.- Parameters:
revision (Optional[str], optional) – Revision to inspect, defaults to the default branch
local_dir (Optional[str], optional) – Optional external directory where the detached snapshot should be materialized
allow_patterns (Optional[Union[Sequence[str], str]], optional) – Optional allowlist for repo-relative paths
ignore_patterns (Optional[Union[Sequence[str], str]], optional) – Optional denylist for repo-relative paths
- Returns:
Filesystem path to the snapshot directory
- Return type:
str
- Raises:
RevisionNotFoundError – Raised when
revisioncannot be resolved.UnsupportedPathError – Raised when
local_dirpoints into the repository root.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... _ = backend.create_commit( ... operations=[CommitOperationAdd("nested/demo.txt", b"hello")], ... commit_message="seed", ... ) ... path = backend.snapshot_download() ... path.endswith("cache/snapshots/" + path.split("cache/snapshots/")[-1]) True
- squash_history(ref_name: str, root_revision: str | None = None, commit_message: str | None = None, commit_description: str | None = None, run_gc: bool = True, prune_cache: bool = False) SquashReport[source]
Rewrite a branch so older history becomes reclaimable.
The rewritten branch keeps the same visible tip snapshot but the chosen root commit becomes a new parentless starting point. Older ancestors on that branch therefore become unreachable from the rewritten ref and can be reclaimed by a follow-up GC pass.
- Parameters:
ref_name (str) – Branch name or full branch ref to rewrite
root_revision (Optional[str], optional) – Oldest commit to preserve on the rewritten branch. When omitted, the current branch head is collapsed into a single new root commit.
commit_message (Optional[str], optional) – Optional replacement title for the rewritten root commit
commit_description (Optional[str], optional) – Optional replacement body for the rewritten root commit
run_gc (bool, optional) – Whether to run
gc()immediately after rewritingprune_cache (bool, optional) – Whether the follow-up GC pass should also prune managed caches
- Returns:
History rewrite summary
- Return type:
- Raises:
ConflictError – Raised when
root_revisionis not an ancestor of the selected branch head.RevisionNotFoundError – Raised when the selected branch or revision does not exist.
UnsupportedPathError – Raised when
ref_nameis not a valid branch ref.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... commit = backend.create_commit( ... operations=[CommitOperationAdd("demo.txt", b"hello")], ... commit_message="seed", ... ) ... report = backend.squash_history("main", root_revision=commit.oid, run_gc=False) ... report.ref_name 'refs/heads/main'
- upload_file(*, path_or_fileobj: str | Path | bytes | BufferedIOBase, path_in_repo: str, revision: str | None = None, commit_message: str | None = None, commit_description: str | None = None, parent_commit: str | None = None) CommitInfo[source]
Upload a single file through the public commit API.
- Parameters:
path_or_fileobj (Union[str, pathlib.Path, bytes, io.BufferedIOBase]) – File content source
path_in_repo (str) – Target repo-relative path
revision (Optional[str], optional) – Target branch name, defaults to the default branch
commit_message (Optional[str], optional) – Optional commit summary
commit_description (Optional[str], optional) – Optional commit description/body
parent_commit (Optional[str], optional) – Optional optimistic-concurrency parent commit
- Returns:
Public commit metadata for the created commit
- Return type:
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... backend = RepositoryBackend(Path(tmpdir) / "repo") ... _ = backend.create_repo() ... info = backend.upload_file(path_or_fileobj=b"hello", path_in_repo="demo.txt") ... info.commit_message 'Upload demo.txt with hubvault'
- upload_folder(*, folder_path: str | Path, path_in_repo: str | None = None, commit_message: str | None = None, commit_description: str | None = None, revision: str | None = None, parent_commit: str | None = None, allow_patterns: Sequence[str] | str | None = None, ignore_patterns: Sequence[str] | str | None = None, delete_patterns: Sequence[str] | str | None = None) CommitInfo[source]
Upload a local folder while preserving its relative layout.
Any nested
.gitdirectory is ignored automatically, matching the broad public behavior ofhuggingface_hub.HfApi.upload_folder().- Parameters:
folder_path (Union[str, pathlib.Path]) – Local folder to upload
path_in_repo (Optional[str], optional) – Optional target directory in the repo root
commit_message (Optional[str], optional) – Optional commit summary
commit_description (Optional[str], optional) – Optional commit description/body
revision (Optional[str], optional) – Target branch name, defaults to the default branch
parent_commit (Optional[str], optional) – Optional optimistic-concurrency parent commit
allow_patterns (Optional[Union[Sequence[str], str]], optional) – Optional allowlist for local relative paths
ignore_patterns (Optional[Union[Sequence[str], str]], optional) – Optional denylist for local relative paths
delete_patterns (Optional[Union[Sequence[str], str]], optional) – Optional denylist applied to already uploaded repo files beneath
path_in_repobefore new files are added
- Returns:
Public commit metadata for the created commit
- Return type:
- Raises:
ValueError – Raised when
folder_pathis not a local directory.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... repo_root = Path(tmpdir) ... source = repo_root / "source" ... source.mkdir() ... (source / "demo.txt").write_text("hello", encoding="utf-8") ... backend = RepositoryBackend(repo_root / "repo") ... _ = backend.create_repo() ... info = backend.upload_folder(folder_path=source) ... info.commit_message 'Upload folder using hubvault'
- upload_large_folder(*, folder_path: str | Path, revision: str | None = None, allow_patterns: Sequence[str] | str | None = None, ignore_patterns: Sequence[str] | str | None = None) CommitInfo[source]
Upload a large local folder through one atomic local commit.
The method name intentionally follows
huggingface_hub.HfApi.upload_large_folder(), but the local repository keeps the operation atomic and therefore returns a singleCommitInfoinstead of spreading the upload across multiple commits.- Parameters:
folder_path (Union[str, pathlib.Path]) – Local folder to upload
revision (Optional[str], optional) – Target branch name, defaults to the default branch
allow_patterns (Optional[Union[Sequence[str], str]], optional) – Optional allowlist for local relative paths
ignore_patterns (Optional[Union[Sequence[str], str]], optional) – Optional denylist for local relative paths
- Returns:
Public commit metadata for the created commit
- Return type:
- Raises:
ValueError – Raised when
folder_pathis not a local directory.
Example:
>>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... repo_root = Path(tmpdir) ... source = repo_root / "source" ... source.mkdir() ... (source / "demo.txt").write_text("hello", encoding="utf-8") ... backend = RepositoryBackend(repo_root / "repo") ... _ = backend.create_repo() ... info = backend.upload_large_folder(folder_path=source) ... info.commit_message 'Upload large folder using hubvault'