mirror of
https://github.com/trapexit/mergerfs.git
synced 2025-06-24 22:51:26 +08:00
76 lines
3.9 KiB
Markdown
76 lines
3.9 KiB
Markdown
# inodecalc
|
|
|
|
Inodes (`st_ino`) are unique identifiers within a filesystem. Each
|
|
mounted filesystem has device ID (st_dev) as well and together they
|
|
can uniquely identify a file on the whole of the system. Entries on
|
|
the same device with the same inode are in fact references to the same
|
|
underlying file. It is a many to one relationship between names and an
|
|
inode. Directories, however, do not have multiple links on most
|
|
systems due to the complexity they add.
|
|
|
|
FUSE allows the server (mergerfs) to set inode values but not device
|
|
IDs. Creating an inode value is somewhat complex in mergerfs' case as
|
|
files aren't really in its control. If a policy changes what directory
|
|
or file is to be selected or something changes out of band it becomes
|
|
unclear what value should be used. Most software does not to care what
|
|
the values are but those that do often break if a value changes
|
|
unexpectedly. The tool find will abort a directory walk if it sees a
|
|
directory inode change. NFS can return stale handle errors if the
|
|
inode changes out of band. File dedup tools will usually leverage
|
|
device ids and inodes as a shortcut in searching for duplicate files
|
|
and would resort to full file comparisons should it find different
|
|
inode values.
|
|
|
|
mergerfs offers multiple ways to calculate the inode in hopes of
|
|
covering different usecases.
|
|
|
|
* `passthrough`: Passes through the underlying inode value. Mostly
|
|
intended for testing as using this does not address any of the
|
|
problems mentioned above and could confuse file deduplication
|
|
software as inodes from different filesystems can be the same.
|
|
* `path-hash`: Hashes the relative path of the entry in question. The
|
|
underlying file's values are completely ignored. This means the
|
|
inode value will always be the same for that file path. This is
|
|
useful when using NFS and you make changes out of band such as copy
|
|
data between branches. This also means that entries that do point to
|
|
the same file will not be recognizable via inodes. That does not
|
|
mean hard links don't work. They will.
|
|
* `path-hash32`: 32bit version of path-hash.
|
|
* `devino-hash`: Hashes the device id and inode of the underlying
|
|
entry. This won't prevent issues with NFS should the policy pick a
|
|
different file or files move out of band but will present the same
|
|
inode for underlying files that do too.
|
|
* `devino-hash32`: 32bit version of devino-hash.
|
|
* `hybrid-hash`: Performs path-hash on directories and devino-hash on
|
|
other file types. Since directories can't have hard links the static
|
|
value won't make a difference and the files will get values useful
|
|
for finding duplicates. Probably the best to use if not using
|
|
NFS. As such it is the default.
|
|
* `hybrid-hash32`: 32bit version of hybrid-hash.
|
|
|
|
32bit versions are provided as there is some software which does not
|
|
handle 64bit inodes well.
|
|
|
|
While there is a risk of hash collision in tests of a couple of
|
|
million entries there were zero collisions. Unlike a typical
|
|
filesystem FUSE filesystems can reuse inodes and not refer to the same
|
|
entry. The internal identifier used to reference a file in FUSE is
|
|
different from the inode value presented. The former is the nodeid and
|
|
is actually a tuple of 2 64bit values: nodeid and generation. This
|
|
tuple is not client facing. The inode that is presented to the client
|
|
is passed through the kernel uninterpreted.
|
|
|
|
From FUSE docs for `use_ino`:
|
|
|
|
> Honor the st_ino field in the functions getattr() and
|
|
> fill_dir(). This value is used to fill in the st_ino field
|
|
> in the stat(2), lstat(2), fstat(2) functions and the d_ino
|
|
> field in the readdir(2) function. The filesystem does not
|
|
> have to guarantee uniqueness, however some applications
|
|
> rely on this value being unique for the whole filesystem.
|
|
> Note that this does *not* affect the inode that libfuse
|
|
> and the kernel use internally (also called the "nodeid").
|
|
|
|
**NOTE:** As of version 2.35.0 the use_ino option has been
|
|
removed. mergerfs should always be managing inode values.
|