librsync  2.0.2
format.md
1 # File formats {#page_formats}
2 
3 ## Generalities
4 
5 There are two file formats used by `librsync` and `rdiff`: the
6 *signature* file, which summarizes a data file, and the *delta* file,
7 which describes the edits from one data file to another.
8 
9 librsync does not know or care about any formats in the data files.
10 
11 All integers are big-endian.
12 
13 ## Magic numbers
14 
15 All librsync files start with a `uint32` magic number identifying them. These are declared in `librsync.h`:
16 
17 ```
18 /** A delta file. At present, there's only one delta format. **/
19 RS_DELTA_MAGIC = 0x72730236, /* r s \2 6 */
20 
21 /**
22  * A signature file with MD4 signatures. Backward compatible with
23  * librsync < 1.0, but strongly deprecated because it creates a security
24  * vulnerability on files containing partly untrusted data. See
25  * <https://github.com/librsync/librsync/issues/5>.
26  **/
27 RS_MD4_SIG_MAGIC = 0x72730136, /* r s \1 6 */
28 
29 /**
30  * A signature file using the BLAKE2 hash. Supported from librsync 1.0.
31  **/
32 RS_BLAKE2_SIG_MAGIC = 0x72730137 /* r s \1 7 */
33 ```
34 
35 ## Signatures
36 
37 Signatures consist of a header followed by a number of block
38 signatures.
39 
40 Each block signature gives signature hashes for one block of
41 `block_len` bytes from the input data file. The final data block
42 may be shorter. The number of blocks in the signature is therefore
43 
44  ceil(input_len/block_len)
45 
46 The signature header is (see `rs_sig_s_header`):
47 
48  u32 magic; // either RS_MD4_SIG_MAGIC or RS_BLAKE2_SIG_MAGIC
49  u32 block_len; // bytes per block
50  u32 strong_sum_len; // bytes per strong sum in each block
51 
52 The block signature contains a rolling or weak checksum used to find
53 moved data, and a strong hash used to check the match is correct.
54 The weak checksum is computed as in `rollsum.c`. The strong hash is
55 either MD4 or BLAKE2 depending on the magic number.
56 
57 To make the signatures smaller at a cost of a greater chance of collisions,
58 the `strong_sum_len` in the header can cause the strong sum to be truncated
59 to the left after computation.
60 
61 Each signature block format is (see `rs_sig_do_block`):
62 
63  u32 weak_sum;
64  u8[strong_sum_len] strong_sum;
65 
66 ## Delta files
67 
68 TODO(https://github.com/librsync/librsync/issues/46): Document delta format.