X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;ds=sidebyside;f=doc%2Fwarcs.texi;fp=doc%2Fwarcs.texi;h=757f344c82c255fbaa0ac418248af254484b3364;hb=457df95b08ca88100d31b7108124d459b82eb7ac;hp=7408e9137fd85c8141717d5290c9ed454364627b;hpb=e3a943d5379613732c48fbd9941b0fee8ff23477;p=tofuproxy.git diff --git a/doc/warcs.texi b/doc/warcs.texi index 7408e91..757f344 100644 --- a/doc/warcs.texi +++ b/doc/warcs.texi @@ -38,7 +38,7 @@ opened). Load WARCs: @example -$ tee fifos/add-warcs < warcs.txt +$ tee fifos/add-warcs fifos/del-warcs +$ echo another.warc >fifos/del-warcs @end example One possibility that @file{smth.warc-00002.warc.gz} has no URIs is that @@ -73,7 +73,7 @@ it contains continuation segmented records. @end itemize Loading of WARC involves its whole reading and remembering where is each -URI response is located. You can @code{echo SAVE > fifos/add-warcs} to +URI response is located. You can @code{echo SAVE >fifos/add-warcs} to save in-memory index to the disk as @file{....idx.gob} files. During the next load, if those files exists, they are used as index immediately, without expensive WARC parsing. @@ -100,7 +100,7 @@ and much higher decompression speed, than @file{.warc.gz}. @example $ cmd/warc-extract/warc-extract -for-enzstd /path/to.warc.gz | - cmd/zstd/enzstd > /path/to.warc.zst + cmd/zstd/enzstd >/path/to.warc.zst @end example @url{https://www.gnu.org/software/wget/, GNU Wget} can be easily used to