glocate -- ZFS-diff-friendly locate-like utility
+This utility is intended to keep the database of filesystem hierarchy
+and quickly display some part of it. Like ordinary *locate utilities.
+But unlike others, it is able to eat zfs-diff's output and apply the
+changes to existing database.
+
+Why I wrote it? Indexing, just "find /big" can take a considerable
+amount of time, like an hour or so, with many I/O operations spent. But
+my home NAS has relatively few number of changes made every day. The
+only possible quick way to determine what exactly was modified is to
+traverse over ZFS'es Merkle trees to find a difference between
+snapshots. Fortunately zfs-diff command does exactly that, providing
+pretty machine-friendly output.
+
+Why this utility is so complicated? Initially it kept all database in
+memory, but that takes 2-3 GiBs of memory, that is huge amount. Moreover
+it fully loads it to perform any basic searches. So current
+implementation uses temporary files and heavy use of data streaming.
+
+Its storage format is simple: Zstandard-compressed list of records:
+
+* 16-bit BE size of the following name
+* entity (file, directory, symbolic link, etc) name itself.
+ Directory has trailing "/"
+* single byte indicating current file's depth
+* 64-bit BE mtime seconds
+* 64-bit BE file or directory (sum of all files and directories) size
+
+Its indexing algorithm is following:
+
+* traverse over all filesystem hierarchy in a *sorted* order. All
+ records are written to temporary file, without directory sizes,
+ because they are not known in advance during the walking
+* during the walk, remember in memory each directory's total size
+* read all records from that temporary file, writing to another one, but
+ replacing directory sizes with ones remembered
+
+Searching is trivial:
+
+* searching is performed on each record streamed from the database
+* if -root is specified, then search will stop after that hierarchy part
+ is over
+* by default all elements are printed, unless you provide a single
+ argument that becomes "*X*" pattern matched on case-lowered path
+ elements
+
+Updating algorithm is following:
+
+* read all [-+MR] actions from zfs-diff, validating the whole format
+* each file's "R" becomes "-" and "+" actions
+* if there are directory "R", then collect them and stream from current
+ database to determine each path entity you have to "-" and "+"
+* each "+" adds an entry to the list of "M"s
+* sort all "-", "+" and "M" filenames in ascending order
+* get entity's information for each "M" (remembering its size and mtime)
+* stream current database records, writing them to temporary file
+* if record exists in "-"-list, then skip it
+* if any "+" exists in the *sorted* list, that has precedence over the
+ record from database, then insert it into the stream, taking size and
+ mtime information from "M"-list
+* if any "M" exists for the read record, then use it to alter it
+* all that time, directory size calculating algorithm also works, the
+ same one used during indexing
+* create another temporary file to copy the records with actualized
+ directory sizes
+
+How to use it?
+
+ $ zfs snap big@snap1
+ $ cd /big ; glocate -db /tmp/glocate.db -index
+
+ $ glocate -db /tmp/glocate.db
+ [list of all files]
+
+ $ glocate -db /tmp/glocate.db -machine
+ [machine parseable list of files with sizes and mtimes]
+
+ $ glocate -db /tmp/glocate.db -tree
+ [beauty tree-like list of files with sizes and mtimes]
+
+ $ glocate -db /tmp/glocate.db -root music
+ [just a music hierarchy path]
+
+ $ glocate -db /tmp/glocate.db -root music blasphemy | grep "/$"
+ music/Blasphemy-2001-Gods_Of_War_+_Blood_Upon_The_Altar/
+ music/Cryptopsy-1994-Blasphemy_Made_Flesh/
+ music/Infernal_Blasphemy-2005-Unleashed/
+ music/Ravenous-Assembled_In_Blasphemy/
+ music/Sect_Of_Execration-2002-Baptized_Through_Blasphemy/
+ music/Spectral_Blasphemy-2012-Blasphmemial_Catastrophic/
+
+and update it carefully, providing the strip prefix to -update:
+
+ $ zfs snap big@snap2
+ $ zfs diff -FH big@snap2 | glocate -db /tmp/glocate.db -update /big/
+
glocate is copylefted free software: see the file COPYING for copying
conditions.