glocate -- ZFS-diff-friendly locate-like utility
+This utility is intended to keep the database of filesystem hierarchy
+and quickly display some part of it. Like ordinary *locate utilities.
+But unlike others, it is able to eat zfs-diff's output and apply the
+changes to existing database.
+
+Why I wrote it? Indexing, just "find /big" can take a considerable
+amount of time, like an hour or so, with many I/O operations spent. But
+my home NAS has relatively few number of changes made every day. The
+only possible quick way to determine what exactly was modified is to
+traverse over ZFS'es Merkle trees to find a difference between
+snapshots. Fortunately zfs-diff command does exactly that, providing
+pretty machine-friendly output.
+
+Why this utility is so complicated? Initially it kept all database in
+memory, but that takes 2-3 GiBs of memory, that is huge amount. Moreover
+it fully loads it to perform any basic searches. So current
+implementation uses temporary files and heavy use of data streaming.
+
+Its storage format is trivial:
+
+* 16-bit BE size of the following name
+* entity (file, directory, symbolic link, etc) name itself.
+ Directory has trailing "/"
+* single byte indicating current file's depth
+* 64-bit BE mtime seconds
+* 64-bit BE file or directory (sum of all files and directories) size
+
+Its indexing algorithm is following:
+
+* traverse over all filesystem hierarchy in a *sorted* order. All
+ records are written to temporary file, without directory sizes,
+ because they are not known in advance during the walking
+* during the walk, remember in memory each directory's total size
+* read all records from that temporary file, writing to another one, but
+ replacing directory sizes with ones remembered
+
+Searching is trivial:
+
+* there is no actual searching, just a streaming through all the
+ database file sequentially
+* if some root is specified, then the program will output only its
+ hierarchy path, exiting after it is finished
+
+Updating algorithm is following:
+
+* read all [-+MR] actions from zfs-diff, validating the whole format
+* each file's "R" becomes "-" and "+" actions
+* if there are directory "R", then collect them and stream from current
+ database to determine each path entity you have to "-" and "+"
+* each "+" adds an entry to the list of "M"s
+* sort all "-", "+" and "M" filenames in ascending order
+* get entity's information for each "M" (remembering its size and mtime)
+* stream current database records, writing them to temporary file
+* if record exists in "-"-list, then skip it
+* if any "+" exists in the *sorted* list, that has precedence over the
+ record from database, then insert it into the stream, taking size and
+ mtime information from "M"-list
+* if any "M" exists for the read record, then use it to alter it
+* all that time, directory size calculating algorithm also works, the
+ same one used during indexing
+* create another temporary file to copy the records with actualized
+ directory sizes
+
+How to use it?
+
+ $ zfs snap big@snap1
+ $ cd /big ; glocate -db /tmp/glocate.db -index
+
+ $ glocate -db /tmp/glocate.db
+ [list of all files]
+
+ $ glocate -db /tmp/glocate.db -machine
+ [machine parseable list of files with sizes and mtimes]
+
+ $ glocate -db /tmp/glocate.db -tree
+ [beauty tree-like list of files with sizes and mtimes]
+
+ $ glocate -db /tmp/glocate.db some/sub/path
+ [just a part of the whole hierarchy]
+
+and update it carefully:
+
+ $ zfs snap big@snap2
+ $ zfs diff -FH big@snap2 | glocate -db /tmp/glocate.db -strip /big/ -update
+
glocate is copylefted free software: see the file COPYING for copying
conditions.