--- /dev/null
+ sgodup -- file deduplication utility
+ ====================================
+
+DESCRIPTION AND USAGE
+
+sgodup is utility for duplicate files detection. You supply two
+directories: the base and one with possible duplicates, utility
+determines duplicate files and replaces them with the links. It
+is aimed to have very high performance.
+
+There are just few arguments:
+
+-basedir -- directory with files that are possible link targets
+ -dupdir -- directory with possible duplicates, which are replaced
+ with the links to basedir's files
+ -action -- * print: just print to stdout duplicate file path with
+ relative path to basedir's corresponding file
+ * symlink: create symbolic link with relative path to
+ basedir's corresponding file
+ * hardlink: create hard link instead
+ -chmod -- if specified, then chmod files in basedir and dupdir
+ during scan phase. Octal representation is expected
+ -fsync -- fsync directories where linking occurs
+
+There are three stages:
+
+* basedir directory scan: collect all *regular* file paths, sizes and
+ inodes. If -chmod is specified, then apply it at once. Empty files are
+ ignored
+* dupdir directory scan: same as above. If there is no basedir's file
+ with the same size, then skip dupdir's file (obviously it can not be
+ duplicate). Check that no basedir's files have the same inode, skip
+ dupdir's file otherwise, because it is already hardlinked
+* deduplication stage. For each dupdir file, find basedir file with the
+ same size and compare their contents, to determine if dupdir's one is
+ the duplicate. Perform specified action if so. There are two separate
+ queues and processing cycles:
+
+ * small files, up to 4 KiB (one disk sector): files are fully read and
+ compared in memory
+ * large files (everything else): read and compare first 4 KiB of files
+ in memory. If they are not equal, then this is not a duplicate.
+ Fully read each file's contents sequentially with 128 KiB chunks and
+ calculate BLAKE2b-512 digest otherwise
+
+Progress is showed at each stage: how many files are counted/processed,
+total size of the files, how much space is deduplicated.
+
+ 2020/03/19 22:57:07 processing basedir...
+ 2020/03/19 22:57:07 464,329 / 0 (0%) files scanned
+ 2020/03/19 22:57:07 534 GiB / 0 B (0%)
+ 2020/03/19 22:57:12 processing dupdir...
+ 2020/03/19 22:57:12 362,245 / 0 (0%) files scanned
+ 2020/03/19 22:57:12 362 GiB / 0 B (0%)
+ 2020/03/19 22:57:17 deduplicating...
+ 2020/03/19 22:58:18 8,193 / 362,245 (2%) files processed
+ 2020/03/19 22:58:18 7.7 GiB / 362 GiB (2%) deduplicated
+ [...]
+ 2020/03/20 11:17:20 321,123 files deduplicated
+
+It is safe to specify same directory as a basedir and dupdir.
+
+SAFETY AND CONSISTENCY
+
+POSIX has no ability to atomically replace regular file with with
+symbolic/hard link. So file is removed first, then link created. sgodup
+cautiously prevents possible interruption by signal (TERM, INT) of those
+two calls. But any other failure could possibly break the program after
+file removal without link creation, leading to its loss!
+
+It is recommended to use filesystems with snapshot capability to be able
+to rollback and restore removed file. Or you can use "-action print"
+beforehand to collect the duplicates and use it as a log for possible
+recovery.
+
+There are no warranties and any defined behaviour if directories (and files
+within) where utility is working with are modified.
+
+LICENCE
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, version 3 of the License.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.