glocate -- ZFS-diff-friendly locate-like utility

This utility is intended to keep the database of filesystem hierarchy
and quickly display some part of it. Like ordinary *locate utilities.
But unlike others, it is able to eat zfs-diff's output and apply the
changes to existing database.

Why I wrote it? Indexing, just "find /big" can take a considerable
amount of time, like an hour or so, with many I/O operations spent. But
my home NAS has relatively few number of changes made every day. The
only possible quick way to determine what exactly was modified is to
traverse over ZFS'es Merkle trees to find a difference between
snapshots. Fortunately zfs-diff command does exactly that, providing
pretty machine-friendly output.

Why this utility is so complicated? Initially it kept all database in
memory, but that takes 2-3 GiBs of memory, that is huge amount. Moreover
it fully loads it to perform any basic searches. So current
implementation uses temporary files and heavy use of data streaming.

Its storage format is trivial:

* 16-bit BE size of the following name
* entity (file, directory, symbolic link, etc) name itself.
  Directory has trailing "/"
* single byte indicating current file's depth
* 64-bit BE mtime seconds
* 64-bit BE file or directory (sum of all files and directories) size

Its indexing algorithm is following:

* traverse over all filesystem hierarchy in a *sorted* order. All
  records are written to temporary file, without directory sizes,
  because they are not known in advance during the walking
* during the walk, remember in memory each directory's total size
* read all records from that temporary file, writing to another one, but
  replacing directory sizes with ones remembered

Searching is trivial:

* there is no actual searching, just a streaming through all the
  database file sequentially
* if some root is specified, then the program will output only its
  hierarchy path, exiting after it is finished

Updating algorithm is following:

* read all [-+MR] actions from zfs-diff, validating the whole format
* each file's "R" becomes "-" and "+" actions
* if there are directory "R", then collect them and stream from current
  database to determine each path entity you have to "-" and "+"
* each "+" adds an entry to the list of "M"s
* sort all "-", "+" and "M" filenames in ascending order
* get entity's information for each "M" (remembering its size and mtime)
* stream current database records, writing them to temporary file
* if record exists in "-"-list, then skip it
* if any "+" exists in the *sorted* list, that has precedence over the
  record from database, then insert it into the stream, taking size and
  mtime information from "M"-list
* if any "M" exists for the read record, then use it to alter it
* all that time, directory size calculating algorithm also works, the
  same one used during indexing
* create another temporary file to copy the records with actualized
  directory sizes

How to use it?

    $ zfs snap big@snap1
    $ cd /big ; glocate -db /tmp/glocate.db -index

    $ glocate -db /tmp/glocate.db
    [list of all files]

    $ glocate -db /tmp/glocate.db -machine
    [machine parseable list of files with sizes and mtimes]

    $ glocate -db /tmp/glocate.db -tree
    [beauty tree-like list of files with sizes and mtimes]

    $ glocate -db /tmp/glocate.db some/sub/path
    [just a part of the whole hierarchy]

and update it carefully:

    $ zfs snap big@snap2
    $ zfs diff -FH big@snap2 | glocate -db /tmp/glocate.db -strip /big/ -update

glocate is copylefted free software: see the file COPYING for copying
conditions.