X-Git-Url: http://www.git.stargrave.org/?p=glocate.git;a=blobdiff_plain;f=FORMAT;fp=FORMAT;h=6becd33abe9fdf803b8be52985845656a1b7c331a389e6e2ac02143e6858a8ef;hp=0000000000000000000000000000000000000000000000000000000000000000;hb=ecb9a41c73d8d7d8c75215c57d5d91f6c476098d756489036bedb8f136e8fb9e;hpb=e5a64361a0537c82d3fbc21b986fc2815f394571f46a0e4128ced789a6712f36 diff --git a/FORMAT b/FORMAT new file mode 100644 index 0000000..6becd33 --- /dev/null +++ b/FORMAT @@ -0,0 +1,49 @@ +Storage format is simple: Zstandard-compressed list of records: + +* 16-bit BE size of the following name +* entity (file, directory, symbolic link, etc) name itself. + Directory has trailing "/" +* single byte indicating current file's depth +* 64-bit BE mtime seconds +* 64-bit BE file or directory (sum of all files and directories) size + +Index algorithm: + +* traverse over all filesystem hierarchy in a *sorted* order. All + records are written to temporary file, without directory sizes, + because they are not known in advance during the walk +* during the walk, remember in memory each directory's total size +* read all records from that temporary file, writing to another one, + replacing directory sizes with ones remembered + +Search is trivial: + +* searching is performed on each record streamed from the database +* if -root is specified, then search will stop after that hierarchy + part is over +* by default all elements are printed, unless you provide a single + argument that becomes "*X*" pattern matched on case-lowered path + elements + +Update algorithm: + +* read all [-+MR] actions from "zfs diff -FH", validating the whole + format +* each "R" for the file becomes "-" and "+" actions +* if there are "R"s for directories, then stream current database and + get each file entity for those directories, making "-" and "+" + actions correspondingly +* each "+" also adds an entry to the list of "M"s +* sort all "-", "+" and "M" filenames in ascending order +* get entity's information for each "M" (remembering its size and mtime) +* stream current database records, writing them to temporary file, + taking into account, that: + * if record exists in "-"-list, then skip it + * if any "+" exists in the *sorted* list, that has precedence over + the record from database, then insert it into the stream, taking + size and mtime information from "M"-list + * if any "M" exists for the read record, then use it to alter it +* all that time, directory size calculating algorithm, same used during + the index procedure, also works in parallel +* create another temporary file to copy the records with actualized + directory sizes