Welcome to hubvault

Overview

hubvault is a local, embedded, API-first repository system for versioning large machine learning artifacts such as weights, datasets, and generated outputs. The public API intentionally feels close to huggingface_hub where that alignment improves usability, while the repository remains completely self-contained on disk.

The shortest accurate description is:

Git-like history and refs
Hugging Face style file APIs
a repository root that remains valid after moving, zipping, or restoring it
explicit write operations and detached read views

What hubvault provides

hubvault currently ships a working local repository surface with:

Git-like commits, trees, refs, tags, reflogs, and merges
Hugging Face style upload/download/list APIs on top of a local repo root
Detached download and snapshot views that cannot corrupt committed data
Chunked large-file storage together with public oid and sha256 metadata
Verification, storage analysis, garbage collection, and history squashing
A git-like CLI exposed as both hubvault and hv

Where it fits best

hubvault is designed for deep-learning artifact repositories that should remain useful without first operating heavyweight infrastructure. It is a good fit when you need to persist large model weights, datasets, evaluation outputs, or experiment bundles, but a hosted Hub, a Docker or Kubernetes stack, or an external object storage service such as OSS or S3 would add too much operational cost, would not work offline, or would be constrained by free-tier resource limits.

In that setting, hubvault provides a repo-local repository with atomic mutations, stable committed data, rollback-oriented recovery, detached read views, verification, garbage collection, storage overview, and history squashing. The point is not to replace every remote collaboration service; it is to give one directory enough repository semantics to maintain large ML data locally and predictably.

What makes the project different

hubvault is intentionally opinionated about a few things:

The repo root is the artifact. There is no hidden sidecar database or external metadata service.
Read paths are detached views. A file returned by hf_hub_download() is safe to read, but editing it must not mutate committed truth.
Writes are explicit. The system does not pretend there is a mutable working tree.
Maintenance is public. Verification, storage analysis, GC, and history squashing are first-class APIs.
Infrastructure stays small. You do not need Docker, Kubernetes, a daemon, an external object store, or a hosted service just to keep a durable artifact repository.

Design constraints

hubvault is built around a few non-negotiable constraints:

Portable repository root: moving or archiving a repo directory must not break it
Atomic writes: interrupted writes are treated as if they never happened
Cross-process locking: writers exclude other readers and writers during publication
Public API first: examples and integrations should go through public models and commands
Cross-platform support: Linux, macOS, and Windows remain first-class targets

Compatibility

hubvault aligns with Git / Hugging Face where that alignment is user-visible:

commit/tree/blob IDs are Git-style 40-hex OIDs
public file sha256 values are bare 64-hex digests
download paths preserve the original repo-relative suffix

hubvault intentionally differs where local embedded semantics matter:

no remote service or pull request system
no mutable workspace abstraction
read-facing paths are detached views, not writable repository aliases

How to read this documentation

If you are new to the project, the best order is:

read Installation
work through Quick Start
continue with Branch, Tag, and Merge Workflow for branches, tags, and merge behavior
read Service and ASGI Startup when you want the embedded HTTP server or ASGI deployment
continue with Remote Client Usage for the Python remote client
use Bundled Web UI for the bundled browser UI
use CLI Workflow if you prefer a command-line workflow
study Verification, GC, and History Squashing before operating large long-lived repositories
read Repository Structure and How It Works when you need to understand storage layout and safety design

Tutorials

Installation Install the package, verify both the Python API and the hubvault / hv CLI, and confirm the environment is usable.
Quick Start Create a repo, make commits, read files, and understand detached download/snapshot views.
Branch, Tag, and Merge Workflow Work with branches, tags, merge results, commit history, and reflog inspection.
Service and ASGI Startup Start the embedded HTTP server, use the import-friendly server module, and deploy the ASGI app.
Remote Client Usage Use hubvault.remote.HubVaultRemoteApi against a running server.
Bundled Web UI Operate the bundled browser UI, upload queue, and packaged static frontend flow.
CLI Workflow Use the git-like CLI without assuming Git’s mutable workspace model.
Verification, GC, and History Squashing Decide when to use quick/full verification, GC, and history squashing.
Repository Structure and How It Works Understand the on-disk layout, object semantics, chunked storage, and atomic transaction model.

API Reference

API Documentation

Design Notes

The implementation roadmap lives in plan/init/ in the repository. Those documents capture the design baseline, compatibility decisions, storage format, atomicity model, and execution phases behind the current implementation.

Those design notes are useful if you need to understand why hubvault differs from HF or Git in certain places, especially around detached views, explicit write operations, cross-process locking, and rollback-only recovery.

Community and Support

GitHub Repository: https://github.com/HansBug/hubvault
Issue Tracker: https://github.com/HansBug/hubvault/issues
PyPI Package: https://pypi.org/project/hubvault/

License

hubvault is released under the GNU General Public License v3.0. See the LICENSE file for details.