Slightly refactored documentation

[glocate.git] / FORMAT
diff --git a/FORMAT b/FORMAT

new file mode 100644 (file)

index 0000000..6becd33
--- /dev/null
+++ b/FORMAT
@@ -0,0 +1,49 @@
+Storage format is simple: Zstandard-compressed list of records:
+
+* 16-bit BE size of the following name
+* entity (file, directory, symbolic link, etc) name itself.
+  Directory has trailing "/"
+* single byte indicating current file's depth
+* 64-bit BE mtime seconds
+* 64-bit BE file or directory (sum of all files and directories) size
+
+Index algorithm:
+
+* traverse over all filesystem hierarchy in a *sorted* order. All
+  records are written to temporary file, without directory sizes,
+  because they are not known in advance during the walk
+* during the walk, remember in memory each directory's total size
+* read all records from that temporary file, writing to another one,
+  replacing directory sizes with ones remembered
+
+Search is trivial:
+
+* searching is performed on each record streamed from the database
+* if -root is specified, then search will stop after that hierarchy
+  part is over
+* by default all elements are printed, unless you provide a single
+  argument that becomes "*X*" pattern matched on case-lowered path
+  elements
+
+Update algorithm:
+
+* read all [-+MR] actions from "zfs diff -FH", validating the whole
+  format
+* each "R" for the file becomes "-" and "+" actions
+* if there are "R"s for directories, then stream current database and
+  get each file entity for those directories, making "-" and "+"
+  actions correspondingly
+* each "+" also adds an entry to the list of "M"s
+* sort all "-", "+" and "M" filenames in ascending order
+* get entity's information for each "M" (remembering its size and mtime)
+* stream current database records, writing them to temporary file,
+  taking into account, that:
+  * if record exists in "-"-list, then skip it
+  * if any "+" exists in the *sorted* list, that has precedence over
+    the record from database, then insert it into the stream, taking
+    size and mtime information from "M"-list
+  * if any "M" exists for the read record, then use it to alter it
+* all that time, directory size calculating algorithm, same used during
+  the index procedure, also works in parallel
+* create another temporary file to copy the records with actualized
+  directory sizes