Welcome to hubvault
Overview
hubvault is a local, embedded, API-first repository system for versioning
large machine learning artifacts such as weights, datasets, and generated
outputs. The public API intentionally feels close to huggingface_hub where
that alignment improves usability, while the repository remains completely
self-contained on disk.
The shortest accurate description is:
Git-like history and refs
Hugging Face style file APIs
a repository root that remains valid after moving, zipping, or restoring it
explicit write operations and detached read views
What hubvault provides
hubvault currently ships a working local repository surface with:
Git-like commits, trees, refs, tags, reflogs, and merges
Hugging Face style upload/download/list APIs on top of a local repo root
Detached download and snapshot views that cannot corrupt committed data
Chunked large-file storage together with public
oidandsha256metadataVerification, storage analysis, garbage collection, and history squashing
A git-like CLI exposed as both
hubvaultandhv
Where it fits best
hubvault is designed for deep-learning artifact repositories that should remain useful without first operating heavyweight infrastructure. It is a good fit when you need to persist large model weights, datasets, evaluation outputs, or experiment bundles, but a hosted Hub, a Docker or Kubernetes stack, or an external object storage service such as OSS or S3 would add too much operational cost, would not work offline, or would be constrained by free-tier resource limits.
In that setting, hubvault provides a repo-local repository with atomic mutations, stable committed data, rollback-oriented recovery, detached read views, verification, garbage collection, storage overview, and history squashing. The point is not to replace every remote collaboration service; it is to give one directory enough repository semantics to maintain large ML data locally and predictably.
What makes the project different
hubvault is intentionally opinionated about a few things:
The repo root is the artifact. There is no hidden sidecar database or external metadata service.
Read paths are detached views. A file returned by
hf_hub_download()is safe to read, but editing it must not mutate committed truth.Writes are explicit. The system does not pretend there is a mutable working tree.
Maintenance is public. Verification, storage analysis, GC, and history squashing are first-class APIs.
Infrastructure stays small. You do not need Docker, Kubernetes, a daemon, an external object store, or a hosted service just to keep a durable artifact repository.
Design constraints
hubvault is built around a few non-negotiable constraints:
Portable repository root: moving or archiving a repo directory must not break it
Atomic writes: interrupted writes are treated as if they never happened
Cross-process locking: writers exclude other readers and writers during publication
Public API first: examples and integrations should go through public models and commands
Cross-platform support: Linux, macOS, and Windows remain first-class targets
Compatibility
hubvault aligns with Git / Hugging Face where that alignment is user-visible:
commit/tree/blob IDs are Git-style 40-hex OIDs
public file
sha256values are bare 64-hex digestsdownload paths preserve the original repo-relative suffix
hubvault intentionally differs where local embedded semantics matter:
no remote service or pull request system
no mutable workspace abstraction
read-facing paths are detached views, not writable repository aliases
How to read this documentation
If you are new to the project, the best order is:
read Installation
work through Quick Start
continue with Branch, Tag, and Merge Workflow for branches, tags, and merge behavior
read Service and ASGI Startup when you want the embedded HTTP server or ASGI deployment
continue with Remote Client Usage for the Python remote client
use Bundled Web UI for the bundled browser UI
use CLI Workflow if you prefer a command-line workflow
study Verification, GC, and History Squashing before operating large long-lived repositories
read Repository Structure and How It Works when you need to understand storage layout and safety design
Tutorials
Installation Install the package, verify both the Python API and the
hubvault/hvCLI, and confirm the environment is usable.Quick Start Create a repo, make commits, read files, and understand detached download/snapshot views.
Branch, Tag, and Merge Workflow Work with branches, tags, merge results, commit history, and reflog inspection.
Service and ASGI Startup Start the embedded HTTP server, use the import-friendly server module, and deploy the ASGI app.
Remote Client Usage Use
hubvault.remote.HubVaultRemoteApiagainst a running server.Bundled Web UI Operate the bundled browser UI, upload queue, and packaged static frontend flow.
CLI Workflow Use the git-like CLI without assuming Git’s mutable workspace model.
Verification, GC, and History Squashing Decide when to use quick/full verification, GC, and history squashing.
Repository Structure and How It Works Understand the on-disk layout, object semantics, chunked storage, and atomic transaction model.
API Reference
API Documentation
Design Notes
The implementation roadmap lives in plan/init/ in the repository. Those
documents capture the design baseline, compatibility decisions, storage format,
atomicity model, and execution phases behind the current implementation.
Those design notes are useful if you need to understand why hubvault differs from HF or Git in certain places, especially around detached views, explicit write operations, cross-process locking, and rollback-only recovery.
Community and Support
GitHub Repository: https://github.com/HansBug/hubvault
Issue Tracker: https://github.com/HansBug/hubvault/issues
PyPI Package: https://pypi.org/project/hubvault/
License
hubvault is released under the GNU General Public License v3.0. See the LICENSE file for details.