Download link for 0.1.0 release

[tofuproxy.git] / doc / warcs.texi
diff --git a/doc/warcs.texi b/doc/warcs.texi

index b20b120e301679292c05fd1bcf991f827ee3675c..7408e9137fd85c8141717d5290c9ed454364627b 100644 (file)
--- a/doc/warcs.texi
+++ b/doc/warcs.texi
@@ -24,13 +24,11 @@ Zstandard compressed WARC, as in
  Multi-frame format is properly indexed. Dictionary at the beginning
  is also supported.
  
-It is processed with with @command{unzstd} (@file{cmd/unzstd/unzstd})
+It is processed with @command{unzstd} (@file{cmd/zstd/unzstd})
  utility. It eats compressed stream from @code{stdin}, outputs
  decompressed data to @code{stdout}, and prints each frame size with
  corresponding decompressed data size to 3rd file descriptor (if it is
-opened). You can adjust path to it with
-@code{-X go.stargrave.org/tofuproxy/warc.UnZSTDPath} command line option
-during building.
+opened).
  
  @end table
  
@@ -80,17 +78,17 @@ save in-memory index to the disk as @file{....idx.gob} files. During
  the next load, if those files exists, they are used as index immediately,
  without expensive WARC parsing.
  
-@code{redo warc-extract.cmd} utility uses exactly the same code for
-parsing WARCs. It can be used to check if WARCs can be successfully
+@code{cmd/warc-extract/warc-extract} utility uses exactly the same code
+for parsing WARCs. It can be used to check if WARCs can be successfully
  loaded, to list all URIs after, to extract some specified URI and to
-pre-generate @file{.idx.gob} indexes.
+pre-generate @file{.idx.gob} indices.
  
  @example
-$ warc-extract.cmd -idx \
+$ cmd/warc-extract/warc-extract -idx \
      smth.warc-00000.warc.gz \
      smth.warc-00001.warc.gz \
      smth.warc-00002.warc.gz
-$ warc-extract.cmd -uri http://some/uri \
+$ cmd/warc-extract/warc-extract -uri http://some/uri \
      smth.warc-00000.warc.gz \
      smth.warc-00001.warc.gz \
      smth.warc-00002.warc.gz
@@ -101,9 +99,8 @@ from any kind of already existing WARCs. It has better compression ratio
  and much higher decompression speed, than @file{.warc.gz}.
  
  @example
-$ redo cmd/enzstd/enzstd
-$ ./warc-extract.cmd -for-enzstd /path/to.warc.gz |
-    cmd/enzstd/enzstd > /path/to.warc.zst
+$ cmd/warc-extract/warc-extract -for-enzstd /path/to.warc.gz |
+    cmd/zstd/enzstd > /path/to.warc.zst
  @end example
  
  @url{https://www.gnu.org/software/wget/, GNU Wget} can be easily used to
@@ -114,3 +111,6 @@ $ wget ... [--page-requisites] [--recursive] \
      --no-warc-keep-log --no-warc-digests [--warc-max-size=XXX] \
      --warc-file smth.warc ...
  @end example
+
+Or even more simpler @url{https://git.jordan.im/crawl/tree/README.md, crawl}
+utility written on Go too.