Eric Wong [Sun, 12 Sep 2021 07:47:15 +0000 (07:47 +0000)]
init: set a useful description
"Unnamed repository" for v1 inboxes was misleading, and having a
non-existent description for v2 was equally annoying, so set a
short description based on the primary address.
We remove descriptions when setting up new test inboxes to
preserve the behavior of the t/lei-mirror.t test case.
Eric Wong [Sun, 12 Sep 2021 07:47:12 +0000 (07:47 +0000)]
new public-inbox-{clone,fetch} commands
Setting up and maintaining git-only mirrors of v2 inboxes is
complex since multiple commands are required to clone and fetch
into epochs.
Unlike grokmirror, these commands do not require any
configuration. Instead, they rely on existing git config files
and work like "git clone --mirror" and "git fetch",
respectively.
Like grokmirror, they use manifest.js.gz, but only on a
per-inbox basis so users won't have to clone every inbox of a
large instance nor edit config files to include/exclude inboxes
they're interested in.
Eric Wong [Sat, 11 Sep 2021 23:30:46 +0000 (23:30 +0000)]
www: use ->ALL for per-inbox manifest.js.gz, too
With 11 epochs on LKML, the lkml/manifest.js.gz response time
goes from around 60ms to around 10ms, a significant improvement.
And improve test coverage while we're at it.
When generating per-inbox manifests, we were forgetting to
cleanup per-epoch "git cat-file --batch" processes. Our
previous method of generating modified times was also stupidly
inefficient, so replace the pipeline with a single
"git for-each-ref" invocation.
Eric Wong [Sat, 11 Sep 2021 08:33:19 +0000 (08:33 +0000)]
lei q|lcat: support "-f reply" output format
When composing replies in "git format-patch" cover letters,
I'd been relying on "lei q -f text ...", but that still requires
several steps to make it suitable for composing a reply:
* s/^/> / to quote the body
* drop existing In-Reply-To+References
* s/^Message-ID:/In-Reply-To:/;
* add an attribute line
...
"lei q -f reply" takes care of most of that and users will
only have to trim "From " lines, unnecessary results and
over-quoted text (and trimming is likely less error-prone
than doing all the steps above manually).
This should also be a good replacement for
"git format-patch --in-reply-to=...", since copying long
Message-IDs can be error-prone (and this lets you include
quoted text in replies).
Eric Wong [Sat, 11 Sep 2021 00:19:17 +0000 (00:19 +0000)]
lei: normalize whitespace in remote queries
Having redundant "+" in URLs is ugly and can hurt cacheability
of queries. Even with "quoted phrase searches", Xapian seems
unaffected by redundant spaces, so just normalize the ASCII
white spaces to ' ' (%20) when fed via STDIN or saved-search
config file.
Eric Wong [Fri, 10 Sep 2021 13:10:04 +0000 (13:10 +0000)]
INSTALL: depend on URI rather than URI::Escape
As far as I can tell, URI::Escape has always been a part of the
`URI' package (aka "distribution" on CPAN) and not distributed
separately (unlike URI::Escape::XS). So avoid confusing users
with `URI::Escape' and just document `URI' instead.
Along the same lines, we depend on the `Plack' package rather
than Plack::Util or Plack::Builder, after all.
Eric Wong [Fri, 10 Sep 2021 11:46:53 +0000 (11:46 +0000)]
lei up: only delay non-zero "# $NR written to ..."
"# 0 written to $FOLDER" messages aren't important to the
user, so we can show them in real time and allow them to
be lost in the terminal scroll. When >0 messages are
written to a folder, we'll show them last so a user
will know which folders to open with their MUA.
Eric Wong [Fri, 10 Sep 2021 09:08:49 +0000 (09:08 +0000)]
lei: do not read ~/.netrc by default
Since ~/.netrc isn't widely used by most (if any) NNTP and IMAP
clients, we won't read it by default for lei. AFAIK, ~/.netrc
is mainly by FTP clients (e.g. ftp(1) and lftp(1)). wget uses
it by default for HTTP(S) (and FTP), but curl does not.
To avoid breaking stable release use cases, public-inbox-watch
continues to read ~/.netrc by default.
The --netrc switch is supported by all existing lei commands
which may use curl.
Eric Wong [Fri, 10 Sep 2021 09:15:36 +0000 (09:15 +0000)]
lei add-external --mirror: quiet unlink error on ENOENT
If the mirror.done file doesn't exist for unlink, it's because
we already got another error, so don't confuse users by noting
an unlink error since the ENOENT is expected in the face of
other errors.
Eric Wong [Fri, 10 Sep 2021 05:51:00 +0000 (05:51 +0000)]
lei add-external --mirror: deduce paths for PSGI mount prefixes
The current manifest.js.gz generation in WWW doesn't account for
PSGI mount prefixes (and grokmirror 1.x appears to work fine).
In other words, <https://yhbt.net/lore/lkml/manifest.js.gz>
currently has keys like "/lkml/git/0.git" and not
"/lore/lkml/git/0.git" where "/lore" is the PSGI mount prefix.
This works fine with the prefix accounted for in my grokmirror
(1.x) repos.conf like this:
site = https://yhbt.net/lore/
manifest = https://yhbt.net/lore/manifest.js.gz
Adding the PSGI mount prefix in manifest.js.gz is probably not
desirable since it would force the prefix into the locally
cloned path by grokmirror, and all the cloned directories
would have the remote PSGI mount prefix prepended to the
toplevel.
So, "lei add-external --mirror" needs to account for PSGI
mount prefixes by deducing the prefix based on available keys
in the manifest.js.gz hash table.
Eric Wong [Thu, 9 Sep 2021 05:25:05 +0000 (05:25 +0000)]
net_reader: support Mail::IMAPClient Ignoresizeerrors
Some proprietary servers may do wacky things and give the
wrong size, so Mail::IMAPClient has a knob for this which
we can expose to users to workaround this.
Eric Wong [Thu, 9 Sep 2021 05:25:03 +0000 (05:25 +0000)]
net_reader: combine Net::NNTP and IMAPClient args
Since these are keyed by IMAP and NNTP URIs which can never
conflict, it simplifies our internals to keep them in one big
hash since we'll add POP3 and JMAP client support.
Eric Wong [Thu, 9 Sep 2021 05:25:02 +0000 (05:25 +0000)]
net_reader: imap_opt => cfg_opt
Since this our internal IMAP options are keyed by URI section,
there's no need to have separate hashes for NNTP and IMAP
options since they URI already distinguishes them.
This will make future changes to support POP3 and JMAP and
arg caching with lei/store easier.
Eric Wong [Thu, 9 Sep 2021 05:25:01 +0000 (05:25 +0000)]
net_reader: nntp_opt => cfg_opt
Since this our internal NNTP options are keyed by URI section,
there's no need to have separate hashes for NNTP and IMAP
options since they URI already distinguishes them.
This will make future changes to support POP3 and JMAP and
arg caching with lei/store easier.
Eric Wong [Thu, 9 Sep 2021 05:25:00 +0000 (05:25 +0000)]
net_reader: preserve memoized IMAPClient arg for SOCKS
Multiple invocations of mic_new may happen in long-lived
processes, so do not let mic_new make irreversible changes
to the cached args when using a SOCKS proxy.
Eric Wong [Tue, 7 Sep 2021 22:42:08 +0000 (22:42 +0000)]
doc: acknowledge the MMDF mailbox format
While I don't currently see a point in supporting MMDF, we'll
still acknowledge it since mutt actually supports it. Expand a
bit on MH while we're at it, since MH seems at least relevant.
Eric Wong [Tue, 7 Sep 2021 14:05:48 +0000 (14:05 +0000)]
news_www: avoid uninitialized variables
PATH_INFO may not have enough slashes for newsgroup name in the
URL at all, so ensure we don't try to further process requests
which have no chance of having a newsgroup name.
Eric Wong [Tue, 7 Sep 2021 11:32:10 +0000 (11:32 +0000)]
lei up: support --all for IMAP folders
Since "lei up" is expected to be a heavily-used command,
better support for IMAP seems like a reasonable idea.
This is inefficient since we waste an IMAP(S) TCP connection
since it dies when an auth-only LeiUp worker process dies, but
it's better than not working at all, right now.
Eric Wong [Mon, 6 Sep 2021 12:58:03 +0000 (12:58 +0000)]
lei_auth: simplify users
There's no need to alias net_merge_all in each WQ class
which uses LeiAuth, `$obj->$sub' works even when `$sub'
is a fully-qualified subroutine name with `::' in it.
perlobj(1) documents it under "Method Call Variations".
Eric Wong [Sat, 4 Sep 2021 21:36:58 +0000 (21:36 +0000)]
lei_to_mail+mbox_reader: fix handling of empty/bogus emails
We may be handling invalid mboxes, so just return no objects in
that case. While "lei q" on HTTP(S) externals expects a gzipped
mboxrd, there's always a chance something else gzipped can be
sent to us.
There's also changes to lei_to_mail to better handle emails
which lack a body and/or headers (e.g. t/solve/bare.patch)
Eric Wong [Fri, 3 Sep 2021 08:54:27 +0000 (08:54 +0000)]
lei: fix read/write IMAP access
xt/net_writer-imap.t was completely broken in recent months and
I completely forgot this test. net->add_url still only accepts
bare scalars (and not scalar refs), so we must set that up
properly. Furthermore, our changes to do FLAGS-only
synchronization in lei of old messages was causing us to not
handle FLAGS properly for the test.
Eric Wong [Fri, 3 Sep 2021 08:54:26 +0000 (08:54 +0000)]
lei_xsearch: avoid false-positives on externals w/ L: and kw:
We need to use LeiSearch->qparse_new to handle (and filter out)
"L:" and "kw:" search prefixes to avoid hitting false positives
when externals are involved. Unfortunately, this doesn't work
for remote HTTP(S) externals, but those aren't enabled by
default.
Eric Wong [Fri, 3 Sep 2021 08:54:24 +0000 (08:54 +0000)]
lei up --all: avoid double-close on shared STDOUT
This is merely to avoid perl setting errors internally which
were not user visible. The double-close wasn't a problem in
practice since we open a new file hanlde for the mbox or
mbox.gz anyways, so the new t/lei-up.t test case shows no
regressions nor fixes.
Eric Wong [Fri, 3 Sep 2021 08:54:22 +0000 (08:54 +0000)]
lei: ->child_error less error-prone
I was calling "child_error(1, ...)" in a few places where I meant
to be calling "child_error(1 << 8, ...)" and inadvertantly
triggering SIGHUP in script/lei. Since giving a zero exit code
to child_error makes no sense, just allow falsy values to
default to 1 << 8.
Eric Wong [Fri, 3 Sep 2021 08:54:21 +0000 (08:54 +0000)]
lei/store: quiet down link(2) warnings
ENOENT can be too common due to timing and concurrent access
from MUAs and "lei export-kw", and other mail synchronization
tools (e.g. mbsync and offlineimap).
Eric Wong [Fri, 3 Sep 2021 08:54:20 +0000 (08:54 +0000)]
lei: dump errors to syslog, and not to CLI
Dumping errors from the previous run can often get lost, so just
spew to syslog since it's a standard place to put errors that
don't make it to a client. Note: we don't rely on $SIG{__WARN__}
since some of the Net:: stuff will write directly to STDERR
(as will external processes).
Eric Wong [Thu, 2 Sep 2021 22:36:47 +0000 (22:36 +0000)]
tests: "make check-run" favors reliability over speed
Sharing a single lei-daemon across multiple processes still
exhibits reliability problems, and reliably checking
lei-daemon's inotify internals seems impossible without.
Even without lei-daemon sharing, "make check-run" is a few
seconds faster than "make check" for me.
Eric Wong [Thu, 2 Sep 2021 10:17:58 +0000 (10:17 +0000)]
lei: propagate keyword changes from lei/store
This works with existing inotify/EVFILT_VNODE functionality to
propagate changes made from one Maildir to another Maildir.
I chose the lei/store worker process to handle this since
propagating changes back into lei-daemon on a massive scale
could lead to dead-locking while both processes are attempting
to write to each other. Eliminating IPC overhead is a nice
side effect, but could hurt performance if Maildirs are slow.
The code for "lei export-kw" is significantly revamped to match
the new code used in the "lei/store" daemon. It should be more
correct w.r.t. corner-cases and stale entries, but perhaps
better tests need to be written.
squashed:
t/lei-auto-watch: increase delay for FreeBSD kevent
My FreeBSD VM seems to need longer for this test than inotify
under Linux, likely because the kevent support code needs to be
more complicated.
Eric Wong [Thu, 2 Sep 2021 10:17:56 +0000 (10:17 +0000)]
lei_mail_sync: do not use transactions
For lei-index to work in parallel with MUA access and upcoming
inotify-based updates, mail_sync.sqlite3 needs to always be
up-to-date to read-only worker processes (ahead of everything
else). So rely on the default auto-commit behavior and hope
SQLite WAL can reduce some of the overheads involved with
writes.
Eric Wong [Wed, 1 Sep 2021 00:17:32 +0000 (00:17 +0000)]
extindex: --gc removes messages from over, too
While messages from removed inboxes were removed from Xapian
search, --gc failed to remove messages from over.sqlite3
entirely. They no longer show up in the topic summary view.
Eric Wong [Mon, 30 Aug 2021 23:44:53 +0000 (23:44 +0000)]
www_text/mirror: spell out "external index" and "public inbox"
"extindex" and "public-inbox" are project-specific terms which
are probably unsuitable for folks who are seeing this for the
first time.
Use "public inbox" when referring to actual public inboxes,
since "public-inbox" is merely the name for this particular
implementation and others have adopted the same concept (IMHO
the concept is more important than any particular
implementation).
Eric Wong [Mon, 30 Aug 2021 23:44:52 +0000 (23:44 +0000)]
www_stream: extra link to mirroring information in the footer
This may be redundant with the "mirror" link at the top right,
but maybe people will miss one. Properly capitalize the
"Code repositories" text while we're at it.
Eric Wong [Sat, 28 Aug 2021 11:50:07 +0000 (11:50 +0000)]
www_stream: description header links to top $INBOX_URL
Making the inbox description link back to the most recent
per-inbox topics from text/ and $OID/s/ URLs seems useful,
rather than keeping the description up there.
Followup-to: 6c853f5256f3a324 ("www: improve navigation around contemporary threads")
Eric Wong [Sat, 28 Aug 2021 11:50:06 +0000 (11:50 +0000)]
www: move mirror instructions to /text/
This makes the mirroring and code retrieval instructions less
obstructive. Relying on WwwText means we only use our Linkify
module to make hrefs of full URLs; making relative and shortened
hrefs off-limits; hopefully this isn't too much of a problem.
coderepo information remains duplicated on every page since
(IMHO) coderepos are an important feature; but nobody besides me
has ever bothered to configure coderepos, so I suppose it's
fine...
Eric Wong [Fri, 27 Aug 2021 12:08:45 +0000 (12:08 +0000)]
www_listing: fix odd "locate inbox" cases
Searching inboxes with an empty query no longer gives 500 errors
due to Xapian. Also, improve the error message when no inboxes
match, since saying no inboxes exist yet is wrong.
Eric Wong [Thu, 26 Aug 2021 12:33:34 +0000 (12:33 +0000)]
www: avoid incorrect instructions for extindex
There's no way to clone an extindex, since there's no git
storage associated with them. So attempt to link to the
HTML listing of public-inboxes, instead.
Eric Wong [Thu, 26 Aug 2021 12:33:33 +0000 (12:33 +0000)]
www_stream: sh-friendly .onion URLs wrapping
The long v3 .onion URL was causing havoc on small mobile
displays, so extract "hostname" into a variable which can
still used as a Bourne shell snippet.
While we're at it, include "torsocks" in the git command used
for .onion URLs since that's the (near)-universal wrapper for
Tor-ifying things (like git) which are dynamically linked to
libc.
Eric Wong [Thu, 26 Aug 2021 12:33:31 +0000 (12:33 +0000)]
get rid of unnecessary bytes::length usage
The only place where we could return wide characters with -httpd
was the raw $INBOX_DIR/description text, which is now converted
to octets.
All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode,
so length() and bytes::length() are equivalent on reads. For
socket writes, any non-octet data would warn about wide characters
and we are strict in warnings with test_httpd.
All gzipped buffers are also octets, as is PublicInbox::Eml->body,
and anything from PerlIO objects ("git cat-file --batch" output,
filesystems), so bytes::length was unnecessary in all those places.
Eric Wong [Wed, 18 Aug 2021 11:41:02 +0000 (11:41 +0000)]
wwwlisting: support global CSS in HTML view
Since CSS can be overridden by a static webserver on a per-inbox
basis, we need a similar pattern to deal with the instance-wide
WwwListing HTML. "/+/" probably won't conflict with any current
nor future public inbox names.
I don't think it'll cause problems with common linkifiers or URL
extractors, either (and it's unlikely anybody would want to
share URLs of just the CSS in a plain text(-like) format).
Eric Wong [Wed, 25 Aug 2021 08:40:39 +0000 (08:40 +0000)]
lei up: improve --all=local stderr output
The "# $NR written to $DEST ($total matches)" messages are
arguably the most useful output of "lei up --all=local",
but they get intermixed with progress messages from various
workers. Queue up these finalization messages and only spit
them out on ->DESTROY.
Eric Wong [Tue, 24 Aug 2021 22:49:24 +0000 (22:49 +0000)]
imap+nntp: die loudly if ->mm or ->over disappear
While the WWW front-end can gracefully handle ->mm and ->over
disappearing (in most cases), IMAP+NNTP front-ends are completely
dependent on these and failed mysteriously when they go missing
after startup.
These will hopefully make issues like what Konstantin
encountered more obvious:
Eric Wong [Tue, 24 Aug 2021 13:06:39 +0000 (13:06 +0000)]
lei: non-blocking lei/store->done in lei-daemon
This allows client sockets to wait for "done" commits to
lei/store while the daemon reacts asynchronously. The goal
of this change is to keep the script/lei client alive until
lei/store commits changes to the filesystem, but without
blocking the lei-daemon event loop. It depends on Perl
refcounting to close the socket.
This change also highlighted our over-use of "done" requests to
lei/store processes, which is now corrected so we only issue it
on collective socket EOF rather than upon reaping every single
worker.
This also fixes "lei forget-mail-sync" when it is the initial
command.
This took several iterations and much debugging to arrive at the
current implementation:
1. The initial iteration of this change utilized socket passing
from lei-daemon to lei/store, which necessitated switching
from faster pipes to slower Unix sockets.
2. The second iteration switched to registering notification sockets
independently of "done" requests, but that could lead to early
wakeups when "done" was requested by other workers. This
appeared to work most of the time, but suffered races under
high load which were difficult to track down.
Finally, this iteration passes the stringified socket GLOB ref
to lei/store which is echoed back to lei-daemon upon completion
of that particular "done" request.
Eric Wong [Thu, 19 Aug 2021 01:36:38 +0000 (01:36 +0000)]
lei q: make --save the default
Since "lei up" is more often useful than not and incurs neglible
overhead; enable --save by default and allow --no-save to work.
This also fixes a long-standing when overwriting --output
destinations with saved searches: dedupe data from previous
searches are reset and no longer influences the new (changed)
search, so results no longer go missing if two sequential
invocations of "lei q --save" point to the same --output.