Welcome to hubvault

Overview

hubvault is a local, embedded, API-first repository system for versioning large machine learning artifacts such as weights, datasets, and generated outputs. The public API intentionally feels close to huggingface_hub where that alignment improves usability, while the repository remains completely self-contained on disk.

The shortest accurate description is:

  • Git-like history and refs

  • Hugging Face style file APIs

  • a repository root that remains valid after moving, zipping, or restoring it

  • explicit write operations and detached read views

What hubvault provides

hubvault currently ships a working local repository surface with:

  • Git-like commits, trees, refs, tags, reflogs, and merges

  • Hugging Face style upload/download/list APIs on top of a local repo root

  • Detached download and snapshot views that cannot corrupt committed data

  • Chunked large-file storage together with public oid and sha256 metadata

  • Verification, storage analysis, garbage collection, and history squashing

  • A git-like CLI exposed as both hubvault and hv

Where it fits best

hubvault is designed for deep-learning artifact repositories that should remain useful without first operating heavyweight infrastructure. It is a good fit when you need to persist large model weights, datasets, evaluation outputs, or experiment bundles, but a hosted Hub, a Docker or Kubernetes stack, or an external object storage service such as OSS or S3 would add too much operational cost, would not work offline, or would be constrained by free-tier resource limits.

In that setting, hubvault provides a repo-local repository with atomic mutations, stable committed data, rollback-oriented recovery, detached read views, verification, garbage collection, storage overview, and history squashing. The point is not to replace every remote collaboration service; it is to give one directory enough repository semantics to maintain large ML data locally and predictably.

What makes the project different

hubvault is intentionally opinionated about a few things:

  • The repo root is the artifact. There is no hidden sidecar database or external metadata service.

  • Read paths are detached views. A file returned by hf_hub_download() is safe to read, but editing it must not mutate committed truth.

  • Writes are explicit. The system does not pretend there is a mutable working tree.

  • Maintenance is public. Verification, storage analysis, GC, and history squashing are first-class APIs.

  • Infrastructure stays small. You do not need Docker, Kubernetes, a daemon, an external object store, or a hosted service just to keep a durable artifact repository.

Design constraints

hubvault is built around a few non-negotiable constraints:

  • Portable repository root: moving or archiving a repo directory must not break it

  • Atomic writes: interrupted writes are treated as if they never happened

  • Cross-process locking: writers exclude other readers and writers during publication

  • Public API first: examples and integrations should go through public models and commands

  • Cross-platform support: Linux, macOS, and Windows remain first-class targets

Compatibility

hubvault aligns with Git / Hugging Face where that alignment is user-visible:

  • commit/tree/blob IDs are Git-style 40-hex OIDs

  • public file sha256 values are bare 64-hex digests

  • download paths preserve the original repo-relative suffix

hubvault intentionally differs where local embedded semantics matter:

  • no remote service or pull request system

  • no mutable workspace abstraction

  • read-facing paths are detached views, not writable repository aliases

How to read this documentation

If you are new to the project, the best order is:

  1. read Installation

  2. work through Quick Start

  3. continue with Branch, Tag, and Merge Workflow for branches, tags, and merge behavior

  4. read Service and ASGI Startup when you want the embedded HTTP server or ASGI deployment

  5. continue with Remote Client Usage for the Python remote client

  6. use Bundled Web UI for the bundled browser UI

  7. use CLI Workflow if you prefer a command-line workflow

  8. study Verification, GC, and History Squashing before operating large long-lived repositories

  9. read Repository Structure and How It Works when you need to understand storage layout and safety design

Tutorials

API Reference

API Documentation

Design Notes

The implementation roadmap lives in plan/init/ in the repository. Those documents capture the design baseline, compatibility decisions, storage format, atomicity model, and execution phases behind the current implementation.

Those design notes are useful if you need to understand why hubvault differs from HF or Git in certain places, especially around detached views, explicit write operations, cross-process locking, and rollback-only recovery.

Community and Support

License

hubvault is released under the GNU General Public License v3.0. See the LICENSE file for details.