Why I wrote it? I have got ~18M files ZFS data storage, where even
"find /storage" takes considerable amount of time, up to an hour.
So I have to use separate indexed database and search against it.
-locate family of utilities does exactly that. But none of them are
+"locate" family of utilities does exactly that. But none of them are
able to detect a few seldom made changes to the dataset, without
-traversing through the whole dataset anyway, taking much IO.
+traversing through the whole dataset anyway, consuming much IO.
Fortunately ZFS design with Merkle trees is able to show us the
difference quickly and without notable IO. "zfs diff" command's
to update its database with zfs-diff's output.
Why this utility is so relatively complicated? Initially it kept all
-database in memory, but that took 2-3 GiBs of memory, that is huge
+database in memory, but that took 2-3 GiBs of memory, that is a huge
amount. Moreover it fully loads it to perform any basic searches. So
current implementation uses temporary files and heavy use of data
streaming. Database in my case takes less than 128MiB of data. And
Argument to -update is the prefix stripped from each filename of
diff's output.
+
+Sometimes "+ and M lists are out of sync" error may be raised.
+Unfortunately you have to fix it manually. It may arise when you create
+completely new dataset and "zfs diff" shows "M"odification of its root
+directory, but there was no its "+" (creation). You can manually add "+"
+entry to the list you feed to stdin.
func dbCommit(dbPath string, tmp *os.File) {
umask := syscall.Umask(0)
syscall.Umask(umask)
- if err := os.Chmod(tmp.Name(), os.FileMode(0666&^umask)); err != nil {
+ if err := os.Chmod(tmp.Name(), os.FileMode(0o666&^umask)); err != nil {
log.Fatalln(err)
}
if err := os.Rename(tmp.Name(), dbPath); err != nil {