From e7b1de1343a276afdc4be044177bc1ef8fabb022 Mon Sep 17 00:00:00 2001 From: Sergey Matveev Date: Sun, 20 Feb 2022 18:51:49 +0300 Subject: [PATCH] Sectioned usage --- doc/index.texi | 3 +- doc/usage.texi | 237 ------------------------------------- doc/usage/browse.texi | 54 +++++++++ doc/usage/build.texi | 9 ++ doc/usage/clear.texi | 30 +++++ doc/usage/download.texi | 13 ++ doc/usage/encs.texi | 32 +++++ doc/usage/feedsdir.texi | 22 ++++ doc/usage/index.texi | 16 +++ doc/usage/install.texi | 13 ++ doc/usage/news.texi | 15 +++ doc/usage/options.texi | 8 ++ doc/usage/parse.texi | 15 +++ doc/usage/reindex.texi | 12 ++ doc/usage/search.texi | 24 ++++ doc/{ => usage}/warcs.texi | 4 +- 16 files changed, 266 insertions(+), 241 deletions(-) delete mode 100644 doc/usage.texi create mode 100644 doc/usage/browse.texi create mode 100644 doc/usage/build.texi create mode 100644 doc/usage/clear.texi create mode 100644 doc/usage/download.texi create mode 100644 doc/usage/encs.texi create mode 100644 doc/usage/feedsdir.texi create mode 100644 doc/usage/index.texi create mode 100644 doc/usage/install.texi create mode 100644 doc/usage/news.texi create mode 100644 doc/usage/options.texi create mode 100644 doc/usage/parse.texi create mode 100644 doc/usage/reindex.texi create mode 100644 doc/usage/search.texi rename doc/{ => usage}/warcs.texi (93%) diff --git a/doc/index.texi b/doc/index.texi index 6f0b17f..a8b603c 100644 --- a/doc/index.texi +++ b/doc/index.texi @@ -49,7 +49,6 @@ copy in mailbox and another copy in @command{mu}'s Xapian database. @include storage.texi @include mail.texi -@include usage.texi -@include warcs.texi +@include usage/index.texi @bye diff --git a/doc/usage.texi b/doc/usage.texi deleted file mode 100644 index 328201a..0000000 --- a/doc/usage.texi +++ /dev/null @@ -1,237 +0,0 @@ -@node Usage -@unnumbered Usage - -How @strong{I} use it: - -@table @asis - -@item Get its source code - -@example -$ git clone git://git.stargrave.org/feeder.git -$ cd feeder -@end example - -@item Compile @command{feed2mdir} utility - -@example -$ ( cd cmd/feed2mdir ; go build ) -@end example - -@item Create feeds state directories - -You can create feeds subdirectories under @file{feeds/} manually: - -@example -$ mkdir -p feeds/my_first_feed/@{cur,new,tmp@} -$ echo http://example.com/feed.atom > feeds/my_first_feed/url -@end example - -or convert Newsboat @file{urls} file (containing many lines with URLs) -with @command{urls2feeds.zsh} to subdirectories hierarchy: - -@example -$ ./urls2feeds.zsh < ~/.newsboat/urls -$ cat feeds/blog.stargrave.org_russian_feed.atom/url -http://blog.stargrave.org/russian/feed.atom -@end example - -@command{urls2feeds.zsh} won't touch already existing directories and will -warn if some of them disappeared from @file{urls}. - -@item Check configuration options - -@file{cmd/env.rc} contains list of various options you can override by -environment variables, like @command{curl}, @command{wget}, -@command{zstd}, @command{parallel} command invocations, -@code{User-Agent}, number of download/parse jobs run in parallel and so on. - -@item Download your feed(s) data - -@example -$ cmd/download.sh feeds/blog.stargrave.org_russian_feed.atom -$ ./feeds-download.zsh # to invoke parallel downloading of everything -@end example - -Probably you want to change its default @env{$PROXY} value. It uses -@command{curl}, that is aware of @code{If-Modified-Since} and -@code{ETag} headers, compressed content encodings and HTTP redirections. -If you want to see verbose output, then set @env{FEEDER_CURL_VERBOSE=1}. - -@item Parse your feeds - -@example -$ cmd/parse.sh feeds/blog.stargrave.org_russian_feed.atom -$ ./feeds-parse.zsh # to parse all feeds in parallel -@end example - -@item Download-n-parse - -You can also download and parse the feeds immediately: - -@example -$ ./feeds-dnp.zsh -@end example - -@item Quick overview of the news: - -@example -$ ./feeds-news.zsh -habr.com_ru_rss_interesting: 7 -habr.com_ru_rss_news: 3 -lobste.rs_rss: 3 -naked-science.ru_?yandex_feed=news: 1 -planet.fsfe.org_atom.xml: 1 -www.astronews.ru_astronews.xml: 1 -www.darkside.ru_news_rss: 5 -@end example - -@item Run Mutt - -@example -$ ./feeds-browse.sh -@end example - -That will read all feeds titles and create @file{mutt.rc} sourceable -configuration file with predefined helpers and @code{mailboxes} -commands. - -That configuration contains @code{auto_view text/html}, that expects -proper @file{mailcap} configuration file with @code{text/html} entry to -exists. Mutt has some built-in default search paths for, but you can -override them with @env{$MAILCAPS} environment variable. There is -example @file{contrib/mailcap}. - -Mutt will be started in mailboxes browser mode (I will skip many entries): - -@verbatim - 1 N [ 1|101] 2021-02-17 20:41 Cryptology ePrint Archive/ - 3 [ 0| 8] 2021-12-02 19:28 Thoughts/ - 32 [ 0| 8] 2021-02-17 19:32 apenwarr/ -101 [ 10| 50] 2021-02-14 13:40 Блог Stargrave на русском comments/ -102 [ 0| 51] 2021-02-17 19:37 Блог Stargrave на русском/ -316 [ 0| 44] 2021-02-17 19:33 Eaten By A Grue: Infocom, Text Adventures, and Interactive Fiction/ -@end verbatim - -ePrint has new entries since last downloading/parsing. Stargrave's blog -comments have nothing new, but still ten unread entries. - -If we open "Eaten By A Grue" mailbox, then will see its entries: - -@verbatim - 1 [2021-01-30 11:00] Zork Zero: The Revenge of Megaboz (0,8K) - 2 [2021-06-12 11:01] Journey: The Quest Begins (0,8K) - 3 [2021-04-28 11:00] Eaten By A Cruise (0,8K) -[...] ----Mutt: feeds/monsterfeet.com_grue.rss [Nachr:44 60K]--- -@end verbatim - -@item Press @code{q} to return to mailbox browser again - -This is made for convenience, because you will often switch your -mailboxes (feeds), but @code{q} quits Mutt by default. - -@item Press @code{A} to mark all messages read - -And again this is made for convenience. It will mark both new -(@strong{N}) and old-but-unread (@strong{O}) messages as read. You will -see left tag-marks near each message to understand what was touched. - -@item Press @code{o} to open links and enclosures URLs - -Do it in pager mode and you message will be piped to -@command{cmd/x-urlview.sh}, that will show all @code{X-URL} -and @code{X-Enclosure} links. - -@item Index your messages - -@example -$ ./feeds-index.sh -@end example - -That will create @file{mu/} and @file{search/} directories and run -@command{mu index} indexing, that is safely can be done incrementally -after each download/parse cycle. - -@item Search something - -Press @code{} in Mutt's index and enter your mu/Xapian search query. -Let's search for articles mentioning -@url{https://en.wikipedia.org/wiki/Planetfall, Planetfall} during -2019-2021 period: @code{Planetfall date:2019..2021}. @command{mu} will -create symbolic links in @file{search/} subdirectory to the message. Press -@code{} to switch that mailbox: - -@verbatim - 1 [2021-12-20 07:08] Missed Classic: Stationfall - When Food Dispensers Attack (The Adventurers Guild) - 2 [2021-11-20 04:52] Missed Classic 102: Stationfall - Introduction (1987) (The Adventurers Guild) - 3 [2021-11-19 17:54] Boffo Games (The Digital Antiquarian) - 4 [2021-10-30 23:05] Missed Classic 100: The Manhole (1988) (The Adventurers Guild) - 5 [2020-05-17 22:16] Round 04 Reveal (unWinnable State) - 6 [2020-05-16 22:29] Round 03 Reveal (unWinnable State) - 7 [2020-04-20 11:00] Planetfall (Eaten By A Grue: Infocom, Text Adventures, and Interactive Fiction) - 8 [2020-04-09 11:00] Beyond Zork (Eaten By A Grue: Infocom, Text Adventures, and Interactive Fiction) --%-Mutt: =search [Nachr:8 215K]--- -@end verbatim - -Pay attention that there is different index format, lacking unnecessary -message flags display and adding name of the feed in parenthesis. - -@item Cleanup excess number of messages - -By default (@env{$FEEDER_MAX_ITEMS}) only 100 entries are processed. -Parser only appends them, but does not remove obsolete ones. - -@example -$ ./feeds-clear.zsh -$ cmd/clear.zsh feeds/FEED # to clear single feed -@end example - -will clear everything exceeding the quantity limit. You can set that -limit on per-feed basis. For example @code{echo 50 > feed/FEED/max}. -0 means no limit and keep all the messages. - -Pay attention that @file{new/} directory is not touched, so you won't -loose completely new and unread messages when you are on vacation and -left @command{cron}-ed workers. - -@item If you want to clean download state - -@example -$ cmd/download-clean.sh feed/FEED -@end example - -@anchor{Enclosures} -@item Download enclosures - -Many feeds include links to so-called enclosures, like audio files for -podcasts. While you mail is not processed by MUA, its @file{new/} -messages still there, you can run enclosure downloading process, that -uses @url{https://www.gnu.org/software/wget/, GNU Wget}. Each -enclosure's filename is more or less filesystem-friendly with the -current timestamp in it. - -@example -$ ./feeds-encs.zsh -[...] -monsterfeet.com_grue.rss/encs/20220218-152822-traffic.libsyn.com_monsterfeet_grue_018.mp3 -www.astronews.ru_astronews.xml/encs/20220219-115710-www.astronews.ru_news_2022_20220216125238.jpg -[...] -$ file feeds/**/encs/*/ -monsterfeet.com_grue.rss/encs/20220218-152822-traffic.libsyn.com_monsterfeet_grue_018.mp3: - Audio file with ID3 version 2.2.0, contains:MPEG ADTS, layer III, v1, 96 kbps, 44.1 kHz, Monaural -www.astronews.ru_astronews.xml/encs/20220219-115710-www.astronews.ru_news_2022_20220216125238.jpg: - JPEG image data, JFIF standard 1.01, ... -@end example - -@command{feeds-encs.zsh} does not parallelize jobs, because enclosure are -often heavy enough to satiate your Internet link. @command{wget}'s -progress is also printed both to stderr and @file{feeds/FEED/encs.log}. - -Of course you can also download only single feed's enclosures: - -@example -$ cmd/encs.zsh path/to/FEED [optional overriden destination directory] -@end example - -@end table diff --git a/doc/usage/browse.texi b/doc/usage/browse.texi new file mode 100644 index 0000000..756e626 --- /dev/null +++ b/doc/usage/browse.texi @@ -0,0 +1,54 @@ +@node Browse +@section Browse + +Generate @file{mutt.rc} and run it with: + +@example +$ ./feeds-browse.sh +@end example + +@file{mutt.rc} should contain all feeds mailboxes with human readable +labels/titles. + +Also it contains contains @code{auto_view text/html}, that expects +proper @file{mailcap} configuration file with @code{text/html} entry to +exists. Mutt has some built-in default search paths for, but you can +override them with @env{$MAILCAPS} environment variable. There is +example @file{contrib/mailcap}. + +Mutt will be started in mailboxes browser mode (I will skip many entries): + +@verbatim + 1 N [ 1|101] 2021-02-17 20:41 Cryptology ePrint Archive/ + 3 [ 0| 8] 2021-12-02 19:28 Thoughts/ + 32 [ 0| 8] 2021-02-17 19:32 apenwarr/ +101 [ 10| 50] 2021-02-14 13:40 Блог Stargrave на русском comments/ +102 [ 0| 51] 2021-02-17 19:37 Блог Stargrave на русском/ +316 [ 0| 44] 2021-02-17 19:33 Eaten By A Grue: Infocom, Text Adventures, and Interactive Fiction/ +@end verbatim + +ePrint has new entries since last downloading/parsing. Stargrave's blog +comments have nothing new, but still ten unread entries. + +If we open "Eaten By A Grue" mailbox, then will see its entries: + +@verbatim + 1 [2021-01-30 11:00] Zork Zero: The Revenge of Megaboz (0,8K) + 2 [2021-06-12 11:01] Journey: The Quest Begins (0,8K) + 3 [2021-04-28 11:00] Eaten By A Cruise (0,8K) +[...] +---Mutt: feeds/monsterfeet.com_grue.rss [Nachr:44 60K]--- +@end verbatim + +Press @code{q} to return to mailbox browser again. This is made for +convenience, because you will often switch your mailboxes (feeds), but +@code{q} quits Mutt by default. + +Press @code{A} to mark all messages read. And again this is made for +convenience. It will mark both new (@strong{N}) and old-but-unread +(@strong{O}) messages as read. You will see left tag-marks near each +message to understand what was touched. + +Press @code{o} in pager mode to open links and enclosures URLs. Your +message will be piped to @command{cmd/x-urlview.sh}, that will show all +@code{X-URL} and @code{X-Enclosure} links. diff --git a/doc/usage/build.texi b/doc/usage/build.texi new file mode 100644 index 0000000..a5ec5da --- /dev/null +++ b/doc/usage/build.texi @@ -0,0 +1,9 @@ +@node Build +@section Build + +Compile @command{feed2mdir} utility with @url{https://go.dev/, Go} compiler. +It may also download several dependencies. + +@example +$ ( cd cmd/feed2mdir ; go build ) +@end example diff --git a/doc/usage/clear.texi b/doc/usage/clear.texi new file mode 100644 index 0000000..d525aba --- /dev/null +++ b/doc/usage/clear.texi @@ -0,0 +1,30 @@ +@node Clear +@section Clear + +Clear excess number of messages with: + +@example +$ ./feeds-clear.zsh +$ cmd/clear.zsh feeds/FEED # to clear single feed +@end example + +By default (@env{$FEEDER_MAX_ITEMS}) only 100 entries are processed. +Parser only appends posts, but does not remove obsolete ones. + +You can set that limit on per-feed basis: + +@example +$ echo 50 > feed/FEED/max +@end example + +@strong{0} means no limit and keep all the messages. + +Pay attention that @file{new/} directory is not touched, so you won't +loose completely new and unread messages when you are on vacation and +left @command{cron}-ed workers. + +To clean download state for some reason: + +@example +$ cmd/download-clean.sh feed/FEED +@end example diff --git a/doc/usage/download.texi b/doc/usage/download.texi new file mode 100644 index 0000000..e7cae31 --- /dev/null +++ b/doc/usage/download.texi @@ -0,0 +1,13 @@ +@node Download +@section Download + +Download your feed data with: + +@example +$ cmd/download.sh feeds/blog.stargrave.org_russian_feed.atom +$ ./feeds-download.zsh # to invoke parallel downloading of everything +@end example + +It uses @command{curl}, that is aware of @code{If-Modified-Since} and +@code{ETag} headers, compressed content encodings and HTTP redirections. +If you want to see verbose output, then set @env{FEEDER_CURL_VERBOSE=1}. diff --git a/doc/usage/encs.texi b/doc/usage/encs.texi new file mode 100644 index 0000000..77cc9af --- /dev/null +++ b/doc/usage/encs.texi @@ -0,0 +1,32 @@ +@node Enclosures +@section Enclosures + +Many feeds include links to so-called enclosures, like audio files for +podcasts. While you mail is not processed by MUA, its @file{new/} +messages are still there, so you can run enclosure downloading process, +that uses @url{https://www.gnu.org/software/wget/, GNU Wget}. Each +enclosure's filename is more or less filesystem-friendly with the +current timestamp. + +@example +$ ./feeds-encs.zsh +[...] +monsterfeet.com_grue.rss/encs/20220218-152822-traffic.libsyn.com_monsterfeet_grue_018.mp3 +www.astronews.ru_astronews.xml/encs/20220219-115710-www.astronews.ru_news_2022_20220216125238.jpg +[...] +$ file feeds/**/encs/*/ +monsterfeet.com_grue.rss/encs/20220218-152822-traffic.libsyn.com_monsterfeet_grue_018.mp3: + Audio file with ID3 version 2.2.0, contains:MPEG ADTS, layer III, v1, 96 kbps, 44.1 kHz, Monaural +www.astronews.ru_astronews.xml/encs/20220219-115710-www.astronews.ru_news_2022_20220216125238.jpg: + JPEG image data, JFIF standard 1.01, ... +@end example + +@command{feeds-encs.zsh} does not parallelize jobs, because enclosure are +often heavy enough to satiate your Internet link. @command{wget}'s +progress is also printed both to stderr and @file{feeds/FEED/encs.log}. + +Of course you can download only single feed's enclosures: + +@example +$ cmd/encs.zsh path/to/FEED [optional overriden destination directory] +@end example diff --git a/doc/usage/feedsdir.texi b/doc/usage/feedsdir.texi new file mode 100644 index 0000000..0518009 --- /dev/null +++ b/doc/usage/feedsdir.texi @@ -0,0 +1,22 @@ +@node FeedsDir +@section Feeds directory + +Create feeds state directories under @file{feeds/}. You can do it +manually: + +@example +$ mkdir -p feeds/my_first_feed/@{cur,new,tmp@} +$ echo http://example.com/feed.atom > feeds/my_first_feed/url +@end example + +Or you can convert Newsboat @file{urls} file (containing many lines with +URLs) with @command{urls2feeds.zsh} to subdirectories hierarchy: + +@example +$ ./urls2feeds.zsh < ~/.newsboat/urls +$ cat feeds/blog.stargrave.org_russian_feed.atom/url +http://blog.stargrave.org/russian/feed.atom +@end example + +@command{urls2feeds.zsh} won't touch already existing directories and will +warn if any of them disappears from @file{urls}. diff --git a/doc/usage/index.texi b/doc/usage/index.texi new file mode 100644 index 0000000..d585174 --- /dev/null +++ b/doc/usage/index.texi @@ -0,0 +1,16 @@ +@node Usage +@unnumbered Usage + +@include usage/install.texi +@include usage/build.texi +@include usage/feedsdir.texi +@include usage/options.texi +@include usage/download.texi +@include usage/parse.texi +@include usage/news.texi +@include usage/browse.texi +@include usage/reindex.texi +@include usage/search.texi +@include usage/clear.texi +@include usage/encs.texi +@include usage/warcs.texi diff --git a/doc/usage/install.texi b/doc/usage/install.texi new file mode 100644 index 0000000..ffe5dc0 --- /dev/null +++ b/doc/usage/install.texi @@ -0,0 +1,13 @@ +@node Install +@section Install + +Currently I am lazy enough to create tarballs. Just get its source code: + +@example +$ git clone git://git.stargrave.org/feeder.git +@end example + +You can also use +@url{https://git.stargrave.org/feeder.git}, +@url{http://git.stargrave.org/feeder.git}, +@url{http://y.git.stargrave.org/feeder.git} URLs instead. diff --git a/doc/usage/news.texi b/doc/usage/news.texi new file mode 100644 index 0000000..8bcc8df --- /dev/null +++ b/doc/usage/news.texi @@ -0,0 +1,15 @@ +@node NewPosts +@section NewPosts + +Quick overview of feeds with new posts: + +@example +$ ./feeds-news.zsh +habr.com_ru_rss_interesting: 7 +habr.com_ru_rss_news: 3 +lobste.rs_rss: 3 +naked-science.ru_?yandex_feed=news: 1 +planet.fsfe.org_atom.xml: 1 +www.astronews.ru_astronews.xml: 1 +www.darkside.ru_news_rss: 5 +@end example diff --git a/doc/usage/options.texi b/doc/usage/options.texi new file mode 100644 index 0000000..29d775b --- /dev/null +++ b/doc/usage/options.texi @@ -0,0 +1,8 @@ +@node Options +@section Options + +There few configuration options taken from @file{cmd/env.rc}. You can +override them either with environment variables, or by editing that file +directly. You can override @command{curl}, @command{wget}, +@command{zstd}, @command{parallel} command invocations, +@code{User-Agent}, number of download/parse jobs run in parallel and so on. diff --git a/doc/usage/parse.texi b/doc/usage/parse.texi new file mode 100644 index 0000000..874dc4d --- /dev/null +++ b/doc/usage/parse.texi @@ -0,0 +1,15 @@ +@node Parse +@section Parse + +Parse your feeds with: + +@example +$ cmd/parse.sh feeds/blog.stargrave.org_russian_feed.atom +$ ./feeds-parse.zsh # to parse all feeds in parallel +@end example + +You can also download and parse the feeds at once: + +@example +$ ./feeds-dnp.zsh +@end example diff --git a/doc/usage/reindex.texi b/doc/usage/reindex.texi new file mode 100644 index 0000000..2686bf9 --- /dev/null +++ b/doc/usage/reindex.texi @@ -0,0 +1,12 @@ +@node Reindex +@section Reindex + +(re)Index your messages with: + +@example +$ ./feeds-index.sh +@end example + +That will create @file{mu/} and @file{search/} directories and run +@command{mu index} indexing, that is safely can be done incrementally +after each download/parse cycle. diff --git a/doc/usage/search.texi b/doc/usage/search.texi new file mode 100644 index 0000000..7468bd1 --- /dev/null +++ b/doc/usage/search.texi @@ -0,0 +1,24 @@ +@node Search +@section Search + +Press @code{} in Mutt's index and enter your mu/Xapian search query. +Let's search for articles mentioning +@url{https://en.wikipedia.org/wiki/Planetfall, Planetfall} during +2019-2021 period: @code{Planetfall date:2019..2021}. @command{mu} will +create symbolic links in @file{search/} subdirectory to the message. +Press @code{} to switch that mailbox: + +@verbatim + 1 [2021-12-20 07:08] Missed Classic: Stationfall - When Food Dispensers Attack (The Adventurers Guild) + 2 [2021-11-20 04:52] Missed Classic 102: Stationfall - Introduction (1987) (The Adventurers Guild) + 3 [2021-11-19 17:54] Boffo Games (The Digital Antiquarian) + 4 [2021-10-30 23:05] Missed Classic 100: The Manhole (1988) (The Adventurers Guild) + 5 [2020-05-17 22:16] Round 04 Reveal (unWinnable State) + 6 [2020-05-16 22:29] Round 03 Reveal (unWinnable State) + 7 [2020-04-20 11:00] Planetfall (Eaten By A Grue: Infocom, Text Adventures, and Interactive Fiction) + 8 [2020-04-09 11:00] Beyond Zork (Eaten By A Grue: Infocom, Text Adventures, and Interactive Fiction) +-%-Mutt: =search [Nachr:8 215K]--- +@end verbatim + +Pay attention that there is different index format, lacking unnecessary +message flags display and adding name of the feed in parenthesis. diff --git a/doc/warcs.texi b/doc/usage/warcs.texi similarity index 93% rename from doc/warcs.texi rename to doc/usage/warcs.texi index 0c179f5..77ccb6f 100644 --- a/doc/warcs.texi +++ b/doc/usage/warcs.texi @@ -1,5 +1,5 @@ @node WARCs -@unnumbered WARCs +@section WARCs Similarly to @ref{Enclosures, enclosures} downloading, you may run downloading of @code{X-URL} URLs, pointing to the article itself. If it @@ -28,7 +28,7 @@ $ for w (feeds/*/warcs/*.warc) print $w:a > path/to/tofuproxy/fifos/add-warcs And then visit @url{http://warc/} URL (when @command{tofuproxy} already acts as a proxy) to view and visit existing URLs. -Of course you can also download only single feed's enclosures: +Of course you can download only single feed: @example $ cmd/warcs.zsh path/to/FEED [optional overriden destination directory] -- 2.44.0