hubvault.repo.backend

Repository backend for the hubvault MVP.

This module implements the local on-disk repository format used by the MVP. The backend is intentionally embedded and file-based so the repository remains self-contained and movable as a normal directory tree.

The module contains:

WINDOWS_RESERVED_NAMES

hubvault.repo.backend.WINDOWS_RESERVED_NAMES = {'AUX', 'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 'COM8', 'COM9', 'CON', 'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9', 'NUL', 'PRN'}

set() -> new empty set object set(iterable) -> new set object

Build an unordered collection of unique elements.

REF_NAME_PATTERN

hubvault.repo.backend.REF_NAME_PATTERN = re.compile('^[A-Za-z0-9][A-Za-z0-9._/-]*$')

Compiled regular expression object.

DRIVE_PATTERN

hubvault.repo.backend.DRIVE_PATTERN = re.compile('^[A-Za-z]:')

Compiled regular expression object.

GC_ANALYSIS_PACK_ID

hubvault.repo.backend.GC_ANALYSIS_PACK_ID = 'gc-0000000000000000-00000000'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

GIT_OID_PATTERN

hubvault.repo.backend.GIT_OID_PATTERN = re.compile('^[0-9a-f]{40}$')

Compiled regular expression object.

PUBLIC_GIT_AUTHOR_NAME

hubvault.repo.backend.PUBLIC_GIT_AUTHOR_NAME = 'HubVault'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

PUBLIC_GIT_AUTHOR_EMAIL

hubvault.repo.backend.PUBLIC_GIT_AUTHOR_EMAIL = 'hubvault@local'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

FAILPOINT_ENV

hubvault.repo.backend.FAILPOINT_ENV = 'HUBVAULT_FAILPOINT'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

FAIL_ACTION_ENV

hubvault.repo.backend.FAIL_ACTION_ENV = 'HUBVAULT_FAIL_ACTION'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

FAILPOINT_EXIT_CODE

hubvault.repo.backend.FAILPOINT_EXIT_CODE = 86

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

RepositoryBackend

class hubvault.repo.backend.RepositoryBackend(repo_path: Path)[source]

Internal repository backend for the MVP.

This backend owns the on-disk format, object storage, revision resolution, detached read views, and transaction lifecycle used by hubvault.api.HubVaultApi.

Example:

>>> backend = RepositoryBackend(Path("/tmp/demo-repo"))
>>> isinstance(backend, RepositoryBackend)
True
__init__(repo_path: Path)[source]

Initialize the repository backend for a root directory.

Parameters:

repo_path (pathlib.Path) – Filesystem path to the repository root

Returns:

None.

Return type:

None

Example:

>>> backend = RepositoryBackend(Path("/tmp/demo-repo"))
>>> backend._repo_path.as_posix().endswith("demo-repo")
True
create_branch(*, branch: str, revision: str | None = None, exist_ok: bool = False) None[source]

Create a new branch from an existing revision.

The public method name and main parameters intentionally mirror huggingface_hub.HfApi.create_branch(), while omitting remote-only parameters such as repo_id, repo_type, and token.

Parameters:
  • branch (str) – Branch name to create

  • revision (Optional[str], optional) – Starting revision, defaults to the repository default branch

  • exist_ok (bool, optional) – Whether an existing branch should be accepted

Returns:

None.

Return type:

None

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     backend.create_branch(branch="dev")
...     sorted(ref.name for ref in backend.list_repo_refs().branches)
['dev', 'main']
create_commit(operations: Sequence[object], commit_message: str, commit_description: str | None = None, revision: str | None = None, parent_commit: str | None = None) CommitInfo[source]

Create a new commit on a branch revision.

The backend stages all new objects in a transaction directory, publishes them atomically, and only then updates the branch ref and reflog.

Parameters:
  • operations (Sequence[object]) – Add, delete, or copy operations to apply

  • commit_message (str) – Commit summary/title to store. When commit_description is omitted, embedded body text after a blank line is preserved and split the same way Git and HF commit listings interpret commit text.

  • commit_description (Optional[str], optional) – Optional commit description/body

  • revision (Optional[str], optional) – Branch name that will receive the new commit, defaults to the repository default branch

  • parent_commit (Optional[str], optional) – Optional expected parent commit for optimistic concurrency checks. When omitted, the current branch head becomes the implicit base revision.

Returns:

Public metadata for the created commit

Return type:

CommitInfo

Raises:
  • ConflictError – Raised when no operations are supplied, an unsupported operation is provided, or optimistic concurrency checks fail.

  • EntryNotFoundError – Raised when delete or copy operations refer to missing paths.

  • RevisionNotFoundError – Raised when the target revision cannot be resolved.

  • UnsupportedPathError – Raised when revision names or repo paths are invalid.

  • ValueError – Raised when commit_message is empty.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     commit = backend.create_commit(
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     commit.commit_message
'seed'
create_repo(default_branch: str = 'main', exist_ok: bool = False, large_file_threshold: int = 16777216) RepoInfo[source]

Create a repository at the configured root path.

This method bootstraps the self-contained on-disk layout, writes the format marker and repository configuration, initializes the default branch, and immediately writes an empty initial commit so the repo has a valid first history entry from the start.

Parameters:
  • default_branch (str, optional) – Default branch name to create, defaults to DEFAULT_BRANCH

  • exist_ok (bool, optional) – Whether an existing repository may be reused, defaults to False

  • large_file_threshold (int, optional) – File size threshold in bytes at or above which newly added files switch to chunked storage, defaults to LARGE_FILE_THRESHOLD

Returns:

Public metadata for the created or reused repository

Return type:

RepoInfo

Raises:
  • RepositoryAlreadyExistsError – Raised when the target path already contains a repository or any non-empty directory.

  • UnsupportedPathError – Raised when default_branch violates repository ref naming rules.

  • ValueError – Raised when large_file_threshold is not positive.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     info = backend.create_repo()
...     (info.default_branch, info.head is not None)
('main', True)
create_tag(*, tag: str, tag_message: str | None = None, revision: str | None = None, exist_ok: bool = False) None[source]

Create a lightweight tag pointing at a revision.

Parameters:
  • tag (str) – Tag name to create

  • tag_message (Optional[str], optional) – Optional tag message stored in the reflog

  • revision (Optional[str], optional) – Target revision, defaults to the repository default branch

  • exist_ok (bool, optional) – Whether an existing tag should be accepted

Returns:

None.

Return type:

None

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     backend.create_tag(tag="v1")
...     [ref.name for ref in backend.list_repo_refs().tags]
['v1']
delete_branch(*, branch: str) None[source]

Delete a branch ref from the repository.

The current default branch is protected from deletion, mirroring the practical behavior users expect from the HF Hub.

Parameters:

branch (str) – Branch name to delete

Returns:

None.

Return type:

None

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     backend.create_branch(branch="dev")
...     backend.delete_branch(branch="dev")
...     [ref.name for ref in backend.list_repo_refs().branches]
['main']
delete_file(path_in_repo: str, revision: str | None = None, commit_message: str | None = None, commit_description: str | None = None, parent_commit: str | None = None) CommitInfo[source]

Delete a single file through the public commit API.

Parameters:
  • path_in_repo (str) – Repo-relative file path

  • revision (Optional[str], optional) – Target branch name, defaults to the default branch

  • commit_message (Optional[str], optional) – Optional commit summary

  • commit_description (Optional[str], optional) – Optional commit description/body

  • parent_commit (Optional[str], optional) – Optional optimistic-concurrency parent commit

Returns:

Public commit metadata for the created commit

Return type:

CommitInfo

delete_folder(path_in_repo: str, revision: str | None = None, commit_message: str | None = None, commit_description: str | None = None, parent_commit: str | None = None) CommitInfo[source]

Delete a folder subtree through the public commit API.

Parameters:
  • path_in_repo (str) – Repo-relative folder path

  • revision (Optional[str], optional) – Target branch name, defaults to the default branch

  • commit_message (Optional[str], optional) – Optional commit summary

  • commit_description (Optional[str], optional) – Optional commit description/body

  • parent_commit (Optional[str], optional) – Optional optimistic-concurrency parent commit

Returns:

Public commit metadata for the created commit

Return type:

CommitInfo

delete_tag(*, tag: str) None[source]

Delete a tag ref from the repository.

Parameters:

tag (str) – Tag name to delete

Returns:

None.

Return type:

None

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     backend.create_tag(tag="v1")
...     backend.delete_tag(tag="v1")
...     backend.list_repo_refs().tags
[]
full_verify() VerifyReport[source]

Perform a complete repository verification pass.

The full pass verifies the live ref graph, validates published object containers, inspects visible chunk index segments, and reads chunk payloads through pack storage so corruption can be localized before space-management operations are attempted.

Returns:

Verification summary for the current repository state

Return type:

VerifyReport

Raises:

RepositoryNotFoundError – Raised when the configured root is not a valid repository.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     backend.full_verify().ok
True
gc(dry_run: bool = False, prune_cache: bool = True) GcReport[source]

Reclaim unreachable objects and compact live chunk storage.

The maintenance pass preserves all currently visible refs, rewrites the live chunk set into a compact pack/index view, and optionally prunes the rebuildable managed cache directories under cache/.

Parameters:
  • dry_run (bool, optional) – Whether to compute the result without mutating storage

  • prune_cache (bool, optional) – Whether rebuildable managed caches should also be removed

Returns:

Garbage-collection summary

Return type:

GcReport

Raises:
  • RepositoryNotFoundError – Raised when the configured root is not a valid repository.

  • VerificationError – Raised when the repository fails full verification and therefore cannot be reclaimed safely.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     backend.gc(dry_run=True).dry_run
True
get_paths_info(paths: Sequence[str] | str, revision: str = 'main') List[RepoFile | RepoFolder][source]

Return public metadata for the requested paths.

This method intentionally follows the main behavior of huggingface_hub.HfApi.get_paths_info(): existing file and folder paths are returned in input order, while missing paths are ignored instead of raising an exception.

Parameters:
  • paths (Union[Sequence[str], str]) – Repo-relative path or paths to inspect

  • revision (str, optional) – Revision to resolve, defaults to DEFAULT_BRANCH

Returns:

Path metadata in the same order as existing requested paths

Return type:

List[Union[RepoFile, RepoFolder]]

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("nested/demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     type(backend.get_paths_info(["nested", "nested/demo.txt"])[0]).__name__
'RepoFolder'
get_storage_overview() StorageOverview[source]

Analyze repository disk usage and safe reclamation opportunities.

The report separates immediately reclaimable space from space that is still retained for rollback history and therefore requires an explicit rewrite such as squash_history() before gc() can release it.

Returns:

Repository storage overview

Return type:

StorageOverview

Raises:
  • RepositoryNotFoundError – Raised when the configured root is not a valid repository.

  • IntegrityError – Raised when persisted storage cannot be analyzed safely because the live graph is inconsistent.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     backend.get_storage_overview().total_size >= 0
True
hf_hub_download(filename: str, revision: str | None = None, local_dir: str | None = None) str[source]

Materialize a detached user view for a file and return its path.

The detached path preserves the repo-relative filename suffix, including parent directories, so callers can work with a normal-looking filesystem path while repository truth remains immutable until explicit commit APIs are used.

Parameters:
  • filename (str) – Repo-relative file path to materialize

  • revision (Optional[str], optional) – Revision to inspect, defaults to the default branch

  • local_dir (Optional[str], optional) – Optional external target root for the detached view

Returns:

Filesystem path to a detached readable file view

Return type:

str

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("nested/demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     path = backend.hf_hub_download("nested/demo.txt")
...     path.endswith("nested/demo.txt")
True
list_repo_commits(revision: str = 'main', formatted: bool = False) List[GitCommitInfo][source]

List commit entries reachable from a revision head.

The public method name and the meaningful parameters intentionally match huggingface_hub.HfApi.list_repo_commits() as closely as the local repository design allows. The local API omits remote-only parameters such as repo_id, repo_type, and token because they have no real behavior for an embedded on-disk repository.

The returned list walks the reachable commit DAG from the selected head in a stable parent-first order. This keeps merge commits visible while remaining deterministic for local regression tests.

Parameters:
  • revision (str, optional) – Revision or commit ID to inspect, defaults to DEFAULT_BRANCH

  • formatted (bool, optional) – Whether HTML-formatted title/message fields should be populated

Returns:

Commit entries ordered from newest to oldest

Return type:

List[GitCommitInfo]

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     commit = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     history = backend.list_repo_commits()
...     history[0].title
'seed'
list_repo_files(revision: str = 'main') List[str][source]

List all file paths in a revision.

The result is a flattened, sorted list of repo-relative file paths and intentionally omits directory placeholders.

Parameters:

revision (str, optional) – Revision to inspect, defaults to DEFAULT_BRANCH

Returns:

Sorted repo-relative file paths

Return type:

List[str]

Raises:

RevisionNotFoundError – Raised when revision cannot be resolved.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     backend.list_repo_files()
['demo.txt']
list_repo_reflog(ref_name: str, limit: int | None = None) List[ReflogEntry][source]

List reflog entries for a branch or tag.

This is a local repository extension with no direct HF public counterpart. It exists to support audit and recovery workflows for the embedded on-disk repository.

Parameters:
  • ref_name (str) – Full ref name such as refs/heads/main or a short branch/tag name when unambiguous

  • limit (Optional[int], optional) – Optional maximum number of newest entries to return

Returns:

Reflog entries ordered from newest to oldest

Return type:

List[ReflogEntry]

Raises:
  • ConflictError – Raised when a short ref name is ambiguous across branches and tags.

  • RevisionNotFoundError – Raised when the ref or reflog does not exist.

  • ValueError – Raised when limit is negative.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     backend.list_repo_reflog("main")[0].ref_name
'refs/heads/main'
list_repo_refs(include_pull_requests: bool = False) GitRefs[source]

List visible branch and tag refs in HF-style form.

The local repository does not support convert refs or pull requests, but keeps the same top-level return shape as huggingface_hub.HfApi.list_repo_refs().

Parameters:

include_pull_requests (bool, optional) – Whether pull-request refs should be included. The local repository returns [] when requested and None otherwise.

Returns:

Visible repository refs

Return type:

GitRefs

Raises:

RepositoryNotFoundError – Raised when the configured root is not a valid repository.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     backend.list_repo_refs().branches[0].ref
'refs/heads/main'
list_repo_tree(path_in_repo: str | None = None, recursive: bool = False, revision: str = 'main') List[RepoFile | RepoFolder][source]

List file and folder entries under a repository directory.

This method intentionally follows the main behavior of huggingface_hub.HfApi.list_repo_tree(), including the recursive flag and the use of HF-style RepoFile / RepoFolder return objects.

Parameters:
  • path_in_repo (Optional[str], optional) – Repo-relative directory path, or None for the repository root

  • recursive (bool, optional) – Whether to include descendants recursively

  • revision (str, optional) – Revision to inspect, defaults to DEFAULT_BRANCH

Returns:

Sorted metadata entries for direct children

Return type:

List[Union[RepoFile, RepoFolder]]

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("nested/demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     [item.path for item in backend.list_repo_tree("nested")]
['nested/demo.txt']
merge(source_revision: str, target_revision: str = 'main', parent_commit: str | None = None, commit_message: str | None = None, commit_description: str | None = None) MergeResult[source]

Merge a source revision into a target branch.

The local repository exposes merge as a first-class public write API. Successful merges return a structured result describing whether the operation created a merge commit, fast-forwarded the target branch, or found the branches already up to date. Conflicts are reported in the result instead of mutating repository state.

Parameters:
  • source_revision (str) – Source branch, tag, or commit ID to merge from

  • target_revision (str, optional) – Target branch name or full branch ref, defaults to DEFAULT_BRANCH

  • parent_commit (Optional[str], optional) – Optional expected current head for optimistic concurrency on the target branch

  • commit_message (Optional[str], optional) – Optional merge-commit title

  • commit_description (Optional[str], optional) – Optional merge-commit description/body

Returns:

Structured merge result

Return type:

MergeResult

Raises:
  • ConflictError – Raised when parent_commit does not match the current target head.

  • RevisionNotFoundError – Raised when the source revision or target branch does not exist.

  • UnsupportedPathError – Raised when target_revision is not a branch ref.

  • ValueError – Raised when commit_message is explicitly empty.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         operations=[CommitOperationAdd("demo.txt", b"base")],
...         commit_message="seed",
...     )
...     backend.create_branch(branch="feature")
...     _ = backend.create_commit(
...         revision="feature",
...         operations=[CommitOperationAdd("feature.txt", b"hello")],
...         commit_message="feature work",
...     )
...     backend.merge("feature").status
'fast-forward'
open_file(path_in_repo: str, revision: str = 'main') BufferedReader[source]

Open a file from a revision as a read-only binary stream.

The returned stream is detached from repository storage and backed by an in-memory buffer, so accidental writes through the stream cannot mutate repository truth.

Parameters:
  • path_in_repo (str) – Repo-relative file path to open

  • revision (str, optional) – Revision to inspect, defaults to DEFAULT_BRANCH

Returns:

Read-only buffered binary stream

Return type:

io.BufferedReader

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     with backend.open_file("demo.txt") as fileobj:
...         fileobj.read()
b'hello'
quick_verify() VerifyReport[source]

Perform a minimal repository consistency check.

The verification pass checks repository format compatibility, validates commit closure for all visible refs, and reports stale detached views as warnings instead of fatal errors.

Returns:

Verification summary for the current repository state

Return type:

VerifyReport

Raises:

RepositoryNotFoundError – Raised when the configured root is not a valid repository.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     backend.quick_verify().ok
True
read_bytes(path_in_repo: str, revision: str = 'main') bytes[source]

Read a file from a revision into memory.

File bytes are verified against the stored blob checksum before being returned so corruption in detached storage is surfaced immediately.

Parameters:
  • path_in_repo (str) – Repo-relative file path to read

  • revision (str, optional) – Revision to inspect, defaults to DEFAULT_BRANCH

Returns:

Full file content bytes

Return type:

bytes

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     backend.read_bytes("demo.txt")
b'hello'
read_range(path_in_repo: str, start: int, length: int, revision: str = 'main') bytes[source]

Read a byte range from a file in a revision.

For chunked files the backend resolves only the overlapping chunks and avoids reconstructing unrelated file regions.

Parameters:
  • path_in_repo (str) – Repo-relative file path to read

  • start (int) – Starting byte offset in the logical file

  • length (int) – Number of bytes to read

  • revision (str, optional) – Revision to inspect, defaults to DEFAULT_BRANCH

Returns:

Requested byte range, clamped to the file end

Return type:

bytes

Raises:
  • IntegrityError – Raised when persisted blob or pack content does not match recorded checksums.

  • EntryNotFoundError – Raised when the requested file is absent.

  • RevisionNotFoundError – Raised when revision cannot be resolved.

  • ValueError – Raised when start or length is negative.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     backend.read_range("demo.txt", start=1, length=3)
b'ell'
repo_info(revision: str | None = None) RepoInfo[source]

Return repository metadata for the selected revision.

The selected revision is only used to resolve the visible head in the returned RepoInfo; repository-wide settings still come from repo.json.

Parameters:

revision (Optional[str], optional) – Revision whose head should be resolved, defaults to the configured default branch

Returns:

Current repository metadata view

Return type:

RepoInfo

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     backend.repo_info().default_branch
'main'
reset_ref(ref_name: str, to_revision: str) CommitInfo[source]

Reset a branch ref to a target commit.

This method performs a branch-head move under the repository write lock and records the change in the reflog.

Parameters:
  • ref_name (str) – Branch name to update

  • to_revision (str) – Revision or commit ID to resolve as the new head

Returns:

Public commit metadata for the new branch head

Return type:

CommitInfo

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     commit = backend.create_commit(
...         revision="main",
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     backend.reset_ref("main", commit.oid).oid == commit.oid
True
snapshot_download(revision: str | None = None, local_dir: str | None = None, allow_patterns: Sequence[str] | str | None = None, ignore_patterns: Sequence[str] | str | None = None) str[source]

Materialize a detached snapshot directory for a revision.

The return value intentionally follows the role of huggingface_hub.snapshot_download() while omitting remote-only parameters that have no local behavior.

Parameters:
  • revision (Optional[str], optional) – Revision to inspect, defaults to the default branch

  • local_dir (Optional[str], optional) – Optional external directory where the detached snapshot should be materialized

  • allow_patterns (Optional[Union[Sequence[str], str]], optional) – Optional allowlist for repo-relative paths

  • ignore_patterns (Optional[Union[Sequence[str], str]], optional) – Optional denylist for repo-relative paths

Returns:

Filesystem path to the snapshot directory

Return type:

str

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     _ = backend.create_commit(
...         operations=[CommitOperationAdd("nested/demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     path = backend.snapshot_download()
...     path.endswith("cache/snapshots/" + path.split("cache/snapshots/")[-1])
True
squash_history(ref_name: str, root_revision: str | None = None, commit_message: str | None = None, commit_description: str | None = None, run_gc: bool = True, prune_cache: bool = False) SquashReport[source]

Rewrite a branch so older history becomes reclaimable.

The rewritten branch keeps the same visible tip snapshot but the chosen root commit becomes a new parentless starting point. Older ancestors on that branch therefore become unreachable from the rewritten ref and can be reclaimed by a follow-up GC pass.

Parameters:
  • ref_name (str) – Branch name or full branch ref to rewrite

  • root_revision (Optional[str], optional) – Oldest commit to preserve on the rewritten branch. When omitted, the current branch head is collapsed into a single new root commit.

  • commit_message (Optional[str], optional) – Optional replacement title for the rewritten root commit

  • commit_description (Optional[str], optional) – Optional replacement body for the rewritten root commit

  • run_gc (bool, optional) – Whether to run gc() immediately after rewriting

  • prune_cache (bool, optional) – Whether the follow-up GC pass should also prune managed caches

Returns:

History rewrite summary

Return type:

SquashReport

Raises:

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     commit = backend.create_commit(
...         operations=[CommitOperationAdd("demo.txt", b"hello")],
...         commit_message="seed",
...     )
...     report = backend.squash_history("main", root_revision=commit.oid, run_gc=False)
...     report.ref_name
'refs/heads/main'
upload_file(*, path_or_fileobj: str | Path | bytes | BufferedIOBase, path_in_repo: str, revision: str | None = None, commit_message: str | None = None, commit_description: str | None = None, parent_commit: str | None = None) CommitInfo[source]

Upload a single file through the public commit API.

Parameters:
  • path_or_fileobj (Union[str, pathlib.Path, bytes, io.BufferedIOBase]) – File content source

  • path_in_repo (str) – Target repo-relative path

  • revision (Optional[str], optional) – Target branch name, defaults to the default branch

  • commit_message (Optional[str], optional) – Optional commit summary

  • commit_description (Optional[str], optional) – Optional commit description/body

  • parent_commit (Optional[str], optional) – Optional optimistic-concurrency parent commit

Returns:

Public commit metadata for the created commit

Return type:

CommitInfo

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     backend = RepositoryBackend(Path(tmpdir) / "repo")
...     _ = backend.create_repo()
...     info = backend.upload_file(path_or_fileobj=b"hello", path_in_repo="demo.txt")
...     info.commit_message
'Upload demo.txt with hubvault'
upload_folder(*, folder_path: str | Path, path_in_repo: str | None = None, commit_message: str | None = None, commit_description: str | None = None, revision: str | None = None, parent_commit: str | None = None, allow_patterns: Sequence[str] | str | None = None, ignore_patterns: Sequence[str] | str | None = None, delete_patterns: Sequence[str] | str | None = None) CommitInfo[source]

Upload a local folder while preserving its relative layout.

Any nested .git directory is ignored automatically, matching the broad public behavior of huggingface_hub.HfApi.upload_folder().

Parameters:
  • folder_path (Union[str, pathlib.Path]) – Local folder to upload

  • path_in_repo (Optional[str], optional) – Optional target directory in the repo root

  • commit_message (Optional[str], optional) – Optional commit summary

  • commit_description (Optional[str], optional) – Optional commit description/body

  • revision (Optional[str], optional) – Target branch name, defaults to the default branch

  • parent_commit (Optional[str], optional) – Optional optimistic-concurrency parent commit

  • allow_patterns (Optional[Union[Sequence[str], str]], optional) – Optional allowlist for local relative paths

  • ignore_patterns (Optional[Union[Sequence[str], str]], optional) – Optional denylist for local relative paths

  • delete_patterns (Optional[Union[Sequence[str], str]], optional) – Optional denylist applied to already uploaded repo files beneath path_in_repo before new files are added

Returns:

Public commit metadata for the created commit

Return type:

CommitInfo

Raises:

ValueError – Raised when folder_path is not a local directory.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     repo_root = Path(tmpdir)
...     source = repo_root / "source"
...     source.mkdir()
...     (source / "demo.txt").write_text("hello", encoding="utf-8")
...     backend = RepositoryBackend(repo_root / "repo")
...     _ = backend.create_repo()
...     info = backend.upload_folder(folder_path=source)
...     info.commit_message
'Upload folder using hubvault'
upload_large_folder(*, folder_path: str | Path, revision: str | None = None, allow_patterns: Sequence[str] | str | None = None, ignore_patterns: Sequence[str] | str | None = None) CommitInfo[source]

Upload a large local folder through one atomic local commit.

The method name intentionally follows huggingface_hub.HfApi.upload_large_folder(), but the local repository keeps the operation atomic and therefore returns a single CommitInfo instead of spreading the upload across multiple commits.

Parameters:
  • folder_path (Union[str, pathlib.Path]) – Local folder to upload

  • revision (Optional[str], optional) – Target branch name, defaults to the default branch

  • allow_patterns (Optional[Union[Sequence[str], str]], optional) – Optional allowlist for local relative paths

  • ignore_patterns (Optional[Union[Sequence[str], str]], optional) – Optional denylist for local relative paths

Returns:

Public commit metadata for the created commit

Return type:

CommitInfo

Raises:

ValueError – Raised when folder_path is not a local directory.

Example:

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     repo_root = Path(tmpdir)
...     source = repo_root / "source"
...     source.mkdir()
...     (source / "demo.txt").write_text("hello", encoding="utf-8")
...     backend = RepositoryBackend(repo_root / "repo")
...     _ = backend.create_repo()
...     info = backend.upload_large_folder(folder_path=source)
...     info.commit_message
'Upload large folder using hubvault'