mirror of
https://github.com/usatiuk/backup.git
synced 2025-10-26 17:37:47 +01:00
103 lines
3.4 KiB
Markdown
103 lines
3.4 KiB
Markdown
# Backup app
|
||
|
||
## Key features
|
||
|
||
- Deduplicated backups with additional compression and encryption
|
||
|
||
- Files are split into chunks and stored in a repository as a list of
|
||
pointers to these chunks
|
||
|
||
- These chunks are reused for all files in the repository, based on
|
||
their hash matching.
|
||
|
||
- If when creating an archive, a file hasn’t changed it is reused from
|
||
a previous archive
|
||
|
||
- Content defined chunking algorithm based on a rolling hash
|
||
|
||
- Unlike when splitting file in fixed size chunks, still works
|
||
when there is an insertion/deletion of bytes in middle/beginning
|
||
of the file
|
||
|
||
- These Files, Chunks and Archives are stored in object storage
|
||
Repository
|
||
|
||
- Default implementation - FileRepository, stores these objects in
|
||
the file system
|
||
|
||
- Possible other implementations - database, cloud object storage…
|
||
|
||
- Multithreading
|
||
|
||
- Directories are ignored if they have a `.nobackup` file
|
||
|
||
- Files are ignored in a directory recursively based on a `.ignore`
|
||
file - one line contains one regex rule, which, if a directory entry
|
||
matches, it is ignored
|
||
|
||
# Quick start
|
||
|
||
Initialize a repository with compression and encryption (you can also
|
||
specify other options like –from here, they will be written to the
|
||
repository if it’s possible, so far changing options of an already
|
||
existing repo is not implemented)
|
||
|
||
sembackup init --repo <target dir> --compression zlib --compression-level 4 --encryption aes --password <password> --salt <random salt>
|
||
|
||
Run a backup
|
||
|
||
sembackup run --from <source dir> --repo <target dir> --password <password>
|
||
|
||
List avaiable archives
|
||
|
||
sembackup list --repo <repo dir> --password <password>
|
||
|
||
List files in an archive
|
||
|
||
sembackup list-files --repo <repo dir> --password <password> --aid <archive id>
|
||
|
||
Restore a backup
|
||
|
||
sembackup restore --repo <repo dir> --password <password> --aid <archive id> --to <destination>
|
||
|
||
Compare source dir with latest archive
|
||
|
||
sembackup diff --repo <repo dir> --password <password> --from <source dir>
|
||
|
||
Compare source dir with an archive
|
||
|
||
sembackup diff --repo <repo dir> --password <password> --from <source dir> --aid <archive id>
|
||
|
||
Compare two archives
|
||
|
||
sembackup diff --repo <repo dir> --password <password> --from <source dir> --aid <archive id> --aid2 <archive id>
|
||
|
||
Compare subdirectory in the source dir with latest archive
|
||
|
||
sembackup diff --repo <repo dir> --password <password> --from <source dir> --prefix <subdir>
|
||
|
||
# Data format
|
||
|
||
All data is represented as objects, stored in a repository.
|
||
|
||
An `Archive` object represents a snapshot of the file system in the
|
||
moment of its creation, and consists of a list of pointers (Object ids)
|
||
to `File` objectsю
|
||
|
||
`File` object consists of its basic metadata, and a list of chunks,
|
||
identified by their ids, which can be shared between multiple files (and
|
||
within the same file) if their MD5 hashes match.
|
||
|
||
`Chunk` object is a binary blob, identified by its MD5 hash.
|
||
|
||
These objects are children of `Object`, providing a `getKey()` method,
|
||
(name for `Archive`, path for `File`, and MD5 hash for `Chunk`) which is
|
||
used by `Repository` to make them easily accessible.
|
||
|
||
In default (and so far the only) repository implementation
|
||
`FileRepository` these objects are grouped together into files of size
|
||
approixmately `repo-target` MB, (by default 128), and there exists a
|
||
key-value index of indexed objects written into `index`, and an
|
||
`offsets` file recording the location and offset of each object in the
|
||
file system.
|