Eric Wong [Tue, 7 Sep 2021 11:32:10 +0000 (11:32 +0000)]
lei up: support --all for IMAP folders
Since "lei up" is expected to be a heavily-used command,
better support for IMAP seems like a reasonable idea.
This is inefficient since we waste an IMAP(S) TCP connection
since it dies when an auth-only LeiUp worker process dies, but
it's better than not working at all, right now.
Eric Wong [Tue, 7 Sep 2021 11:32:09 +0000 (11:32 +0000)]
lei: dump and clear log at exit
This may be helpful for diagnosing errors in case we missed any.
Eric Wong [Tue, 7 Sep 2021 11:32:08 +0000 (11:32 +0000)]
xt/net_writer_imap: test "lei up" on single IMAP output
That's the minimum, at least...
Eric Wong [Mon, 6 Sep 2021 12:58:03 +0000 (12:58 +0000)]
lei_auth: simplify users
There's no need to alias net_merge_all in each WQ class
which uses LeiAuth, `$obj->$sub' works even when `$sub'
is a fully-qualified subroutine name with `::' in it.
perlobj(1) documents it under "Method Call Variations".
Eric Wong [Mon, 6 Sep 2021 12:58:02 +0000 (12:58 +0000)]
lei_auth: remove net_merge_done1 step
It turns out this step is unnecessary, since SOCK_SEQPACKET
ordering is guaranteed and we know wq_broadcast calls will
always be handled sequentially.
Eric Wong [Mon, 6 Sep 2021 12:58:01 +0000 (12:58 +0000)]
lei_auth: diagram for current behavior
Before making potentially major changes, lets clarify readers'
understanding of how LeiAuth currently works.
Eric Wong [Mon, 6 Sep 2021 07:20:12 +0000 (07:20 +0000)]
lei_search: xsmsg_vmd: retry_reopen properly
The deeper eval was preventing retry_reopen from retrying
with readers and writers working in parallel:
FOO=imaps://example.com/INBOX.huge
lei lcat $FOO -f mboxcl | lei tag -F mboxcl +L:bar -
Fixes: c7bcfe6cd6648ff0 ("lei: diagnostics for /Document \d+ not found/ errors")
Eric Wong [Mon, 6 Sep 2021 07:11:53 +0000 (07:11 +0000)]
net_reader: don't approve/reject credentials w/o "fill"
Credentials sourced via ~/.netrc should not be written to
git-credential.
Eric Wong [Sat, 4 Sep 2021 21:36:58 +0000 (21:36 +0000)]
lei_to_mail+mbox_reader: fix handling of empty/bogus emails
We may be handling invalid mboxes, so just return no objects in
that case. While "lei q" on HTTP(S) externals expects a gzipped
mboxrd, there's always a chance something else gzipped can be
sent to us.
There's also changes to lei_to_mail to better handle emails
which lack a body and/or headers (e.g. t/solve/bare.patch)
Link: https://public-inbox.org/meta/20210903151500.h72mzcpqixgtytjs@meerkat.local/
Eric Wong [Fri, 3 Sep 2021 08:54:27 +0000 (08:54 +0000)]
lei: fix read/write IMAP access
xt/net_writer-imap.t was completely broken in recent months and
I completely forgot this test. net->add_url still only accepts
bare scalars (and not scalar refs), so we must set that up
properly. Furthermore, our changes to do FLAGS-only
synchronization in lei of old messages was causing us to not
handle FLAGS properly for the test.
Eric Wong [Fri, 3 Sep 2021 08:54:26 +0000 (08:54 +0000)]
lei_xsearch: avoid false-positives on externals w/ L: and kw:
We need to use LeiSearch->qparse_new to handle (and filter out)
"L:" and "kw:" search prefixes to avoid hitting false positives
when externals are involved. Unfortunately, this doesn't work
for remote HTTP(S) externals, but those aren't enabled by
default.
Eric Wong [Fri, 3 Sep 2021 08:54:25 +0000 (08:54 +0000)]
lei inspect: support reading eml from --stdin
This can be useful inside mutt since I was diagnosing why
a label ("L:$FOO") search was giving me a false-positive
search result...
Eric Wong [Fri, 3 Sep 2021 08:54:24 +0000 (08:54 +0000)]
lei up --all: avoid double-close on shared STDOUT
This is merely to avoid perl setting errors internally which
were not user visible. The double-close wasn't a problem in
practice since we open a new file hanlde for the mbox or
mbox.gz anyways, so the new t/lei-up.t test case shows no
regressions nor fixes.
Eric Wong [Fri, 3 Sep 2021 08:54:23 +0000 (08:54 +0000)]
lei: use lei->lms in place of lse->lms in a few places
We can golf out some code and refcounts this way.
Eric Wong [Fri, 3 Sep 2021 08:54:22 +0000 (08:54 +0000)]
lei: ->child_error less error-prone
I was calling "child_error(1, ...)" in a few places where I meant
to be calling "child_error(1 << 8, ...)" and inadvertantly
triggering SIGHUP in script/lei. Since giving a zero exit code
to child_error makes no sense, just allow falsy values to
default to 1 << 8.
Eric Wong [Fri, 3 Sep 2021 08:54:21 +0000 (08:54 +0000)]
lei/store: quiet down link(2) warnings
ENOENT can be too common due to timing and concurrent access
from MUAs and "lei export-kw", and other mail synchronization
tools (e.g. mbsync and offlineimap).
Eric Wong [Fri, 3 Sep 2021 08:54:20 +0000 (08:54 +0000)]
lei: dump errors to syslog, and not to CLI
Dumping errors from the previous run can often get lost, so just
spew to syslog since it's a standard place to put errors that
don't make it to a client. Note: we don't rely on $SIG{__WARN__}
since some of the Net:: stuff will write directly to STDERR
(as will external processes).
Eric Wong [Thu, 2 Sep 2021 22:12:59 +0000 (22:12 +0000)]
www: handle name-only publicinbox.*.url entries
Apparently URLs can be configured relatively for HTTP(S) setups,
attempt to support them when linking to cross-posted messages.
This also fixes the top-row (mirror/help/color/Atom feed) links
in /$INBOX_URL/$EXTMSG_MSGID/T/ (and /t/) URLs.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210902191239.cmbxlmjqcsmdzmqp@meerkat.local/
Eric Wong [Thu, 2 Sep 2021 22:36:47 +0000 (22:36 +0000)]
tests: "make check-run" favors reliability over speed
Sharing a single lei-daemon across multiple processes still
exhibits reliability problems, and reliably checking
lei-daemon's inotify internals seems impossible without.
Even without lei-daemon sharing, "make check-run" is a few
seconds faster than "make check" for me.
Eric Wong [Thu, 2 Sep 2021 22:36:46 +0000 (22:36 +0000)]
t/lei-auto-watch: improve test reliability
On slower systems, even a 100ms delay may not be enough;
so loop and retry in hopes of an early exit for faster
systems.
Eric Wong [Thu, 2 Sep 2021 10:17:58 +0000 (10:17 +0000)]
lei: propagate keyword changes from lei/store
This works with existing inotify/EVFILT_VNODE functionality to
propagate changes made from one Maildir to another Maildir.
I chose the lei/store worker process to handle this since
propagating changes back into lei-daemon on a massive scale
could lead to dead-locking while both processes are attempting
to write to each other. Eliminating IPC overhead is a nice
side effect, but could hurt performance if Maildirs are slow.
The code for "lei export-kw" is significantly revamped to match
the new code used in the "lei/store" daemon. It should be more
correct w.r.t. corner-cases and stale entries, but perhaps
better tests need to be written.
squashed:
t/lei-auto-watch: increase delay for FreeBSD kevent
My FreeBSD VM seems to need longer for this test than inotify
under Linux, likely because the kevent support code needs to be
more complicated.
Eric Wong [Thu, 2 Sep 2021 10:17:57 +0000 (10:17 +0000)]
lei_input: set and prepare watches early
This will be needed as we track changes in real-time, especially
for "lei index" since there's no storage involved.
Eric Wong [Thu, 2 Sep 2021 10:17:56 +0000 (10:17 +0000)]
lei_mail_sync: do not use transactions
For lei-index to work in parallel with MUA access and upcoming
inotify-based updates, mail_sync.sqlite3 needs to always be
up-to-date to read-only worker processes (ahead of everything
else). So rely on the default auto-commit behavior and hope
SQLite WAL can reduce some of the overheads involved with
writes.
Eric Wong [Wed, 1 Sep 2021 00:17:32 +0000 (00:17 +0000)]
extindex: --gc removes messages from over, too
While messages from removed inboxes were removed from Xapian
search, --gc failed to remove messages from over.sqlite3
entirely. They no longer show up in the topic summary view.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/20210830201723.dehoul4y6gpqf2cp@nitro.local/
Eric Wong [Tue, 31 Aug 2021 22:23:00 +0000 (22:23 +0000)]
extindex: document --all and indexlevel=basic interaction
Cache is expensive, so help users save on storage by documenting
this behavior.
Eric Wong [Tue, 31 Aug 2021 19:38:03 +0000 (19:38 +0000)]
lei up: only show finmsg in top-level lei-daemon
->DESTROY can get triggered in child processes, which
unnecessarily duplicates messages queued up for display
when lei spawns extra workers.
Eric Wong [Tue, 31 Aug 2021 11:21:26 +0000 (11:21 +0000)]
lei/store: correctly delete entries from over
Some of these errors were inadvertantly lost due to delayed
error reporting in the past.
Eric Wong [Tue, 31 Aug 2021 11:21:25 +0000 (11:21 +0000)]
lei: fix error reporting from lei/store -> lei-daemon
We must set autoflush to ensure timely notification of clients;
and lei-daemon must not block when waiting on reads in case of
spurious wakeups.
Finally, if no clients are connected to lei-daemon, write to
syslog to ensure the error is visible.
Eric Wong [Tue, 31 Aug 2021 11:21:24 +0000 (11:21 +0000)]
lei_mail_sync: set_src uses binary OIDs
Another step towards moving more of our internals to use binary
OIDs to avoid needless conversions before hitting disk.
Eric Wong [Tue, 31 Aug 2021 11:21:23 +0000 (11:21 +0000)]
lei: refresh watches before MUA spawn for Maildir
If we possibly just wrote or created a Maildir, ensure it's
monitored by the lei watch mechanism.
Eric Wong [Tue, 31 Aug 2021 11:21:22 +0000 (11:21 +0000)]
lei note-event: always flush changes on daemon exit
Because the timer may not fire in time before daemon shutdown.
Eric Wong [Tue, 31 Aug 2021 11:21:21 +0000 (11:21 +0000)]
t/lei-watch: avoid race between glob + readlink
Open file handles in lei-daemon may be unstable so we need to
account for readlink() returning undef.
Eric Wong [Tue, 31 Aug 2021 11:21:20 +0000 (11:21 +0000)]
lei_mail_sync: make rename_folder more robust
We need to account for past canonicalization errors and deal
with cases which violate uniqueness constraints in
mail_sync.sqlite3
Eric Wong [Tue, 31 Aug 2021 11:21:19 +0000 (11:21 +0000)]
lei_mail_sync: simplify group2folders
No need to loop when we can rely on grep.
Eric Wong [Tue, 31 Aug 2021 11:21:18 +0000 (11:21 +0000)]
lei prune-mail-sync: handle --all (no args)
This still needs tests, but I noticed "--all" w/o "local" or
"remote" was not working correctly since split() returned
an empty array.
Eric Wong [Tue, 31 Aug 2021 11:21:17 +0000 (11:21 +0000)]
lei_mail_sync: forget_folder: simplify code
No need to bump refcounts of {dbh} nor declare extra variables
for a rarely-called function.
Eric Wong [Mon, 30 Aug 2021 23:44:54 +0000 (23:44 +0000)]
www_listing: add note about mirroring information
Perhaps this can be expanded to include grokmirror information
in the future. For now, just give a hint about the "mirror"
link for each inbox.
Eric Wong [Mon, 30 Aug 2021 23:44:53 +0000 (23:44 +0000)]
www_text/mirror: spell out "external index" and "public inbox"
"extindex" and "public-inbox" are project-specific terms which
are probably unsuitable for folks who are seeing this for the
first time.
Use "public inbox" when referring to actual public inboxes,
since "public-inbox" is merely the name for this particular
implementation and others have adopted the same concept (IMHO
the concept is more important than any particular
implementation).
Eric Wong [Mon, 30 Aug 2021 23:44:52 +0000 (23:44 +0000)]
www_stream: extra link to mirroring information in the footer
This may be redundant with the "mirror" link at the top right,
but maybe people will miss one. Properly capitalize the
"Code repositories" text while we're at it.
Link: https://public-inbox.org/20210828175827.rgzwqbn7brl56oej@nitro.local/
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Eric Wong [Sat, 28 Aug 2021 11:50:07 +0000 (11:50 +0000)]
www_stream: description header links to top $INBOX_URL
Making the inbox description link back to the most recent
per-inbox topics from text/ and $OID/s/ URLs seems useful,
rather than keeping the description up there.
Followup-to: 6c853f5256f3a324 ("www: improve navigation around contemporary threads")
Eric Wong [Sat, 28 Aug 2021 11:50:06 +0000 (11:50 +0000)]
www: move mirror instructions to /text/
This makes the mirroring and code retrieval instructions less
obstructive. Relying on WwwText means we only use our Linkify
module to make hrefs of full URLs; making relative and shortened
hrefs off-limits; hopefully this isn't too much of a problem.
coderepo information remains duplicated on every page since
(IMHO) coderepos are an important feature; but nobody besides me
has ever bothered to configure coderepos, so I suppose it's
fine...
Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210826132747.6gxuwnhftyf7c6hp@nitro.local/
Eric Wong [Fri, 27 Aug 2021 22:03:02 +0000 (22:03 +0000)]
www: avoid potential auto-vivification on ibx->{url}
This may fix problems with the "all" link disappearing.
Link: https://public-inbox.org/meta/CAMwyc-Tw=v5yT1U1U66GSwwTK8OJXv8_YDu-=oXbZO3tHSnYWw@mail.gmail.com/
Eric Wong [Fri, 27 Aug 2021 12:08:45 +0000 (12:08 +0000)]
www_listing: fix odd "locate inbox" cases
Searching inboxes with an empty query no longer gives 500 errors
due to Xapian. Also, improve the error message when no inboxes
match, since saying no inboxes exist yet is wrong.
Eric Wong [Fri, 27 Aug 2021 12:08:44 +0000 (12:08 +0000)]
www_listing: show ->ALL at top of HTML listing
It's a special case and we can show it in the HTML display
without affecting manifest.js.gz generation.
Eric Wong [Thu, 26 Aug 2021 12:33:38 +0000 (12:33 +0000)]
move ->ids_after from mm to over
Since we favor ->over in WWW and IMAP, move this method to
->over to reduce open files in common cases.
This fixes the /$EXTINDEX_NAME/all.mbox.gz endpoint for extindex
entries (which may get expensive...).
Eric Wong [Thu, 26 Aug 2021 12:33:37 +0000 (12:33 +0000)]
www_text: add coderepo config support for extindex
At least manually configured coderepos "just work"
for extindex, though it probably could be automatic
and inherited from the publicinbox configs.
Eric Wong [Thu, 26 Aug 2021 12:33:36 +0000 (12:33 +0000)]
config: do not parse altid for extindex
There's currently no support for altid with extindex, and
there's likely no legacy precedent for using altid like there is
with single public-inboxes.
Eric Wong [Thu, 26 Aug 2021 12:33:35 +0000 (12:33 +0000)]
www_text: fix example config snippet for extindex
extindex doesn't use the same config stuff as normal
"publicinbox" entries, so we'll need a separate function
for them.
Eric Wong [Thu, 26 Aug 2021 12:33:34 +0000 (12:33 +0000)]
www: avoid incorrect instructions for extindex
There's no way to clone an extindex, since there's no git
storage associated with them. So attempt to link to the
HTML listing of public-inboxes, instead.
Eric Wong [Thu, 26 Aug 2021 12:33:33 +0000 (12:33 +0000)]
www_stream: sh-friendly .onion URLs wrapping
The long v3 .onion URL was causing havoc on small mobile
displays, so extract "hostname" into a variable which can
still used as a Bourne shell snippet.
While we're at it, include "torsocks" in the git command used
for .onion URLs since that's the (near)-universal wrapper for
Tor-ifying things (like git) which are dynamically linked to
libc.
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210816163654.c6gfzuezhji4l6s7@nitro.local/
Eric Wong [Thu, 26 Aug 2021 12:33:32 +0000 (12:33 +0000)]
ds: use bytes::substr and bytes::length module-wide for now
The use of substr within IO::Handle->write may not be correct if
we have wide characters, so handle it ourselves.
bytes.pm usage is probably better fixed in PublicInbox::NNTP,
but the effort required is higher, so we'll just keep bytes in
DS for now.
Eric Wong [Thu, 26 Aug 2021 12:33:31 +0000 (12:33 +0000)]
get rid of unnecessary bytes::length usage
The only place where we could return wide characters with -httpd
was the raw $INBOX_DIR/description text, which is now converted
to octets.
All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode,
so length() and bytes::length() are equivalent on reads. For
socket writes, any non-octet data would warn about wide characters
and we are strict in warnings with test_httpd.
All gzipped buffers are also octets, as is PublicInbox::Eml->body,
and anything from PerlIO objects ("git cat-file --batch" output,
filesystems), so bytes::length was unnecessary in all those places.
Eric Wong [Wed, 18 Aug 2021 11:41:02 +0000 (11:41 +0000)]
wwwlisting: support global CSS in HTML view
Since CSS can be overridden by a static webserver on a per-inbox
basis, we need a similar pattern to deal with the instance-wide
WwwListing HTML. "/+/" probably won't conflict with any current
nor future public inbox names.
I don't think it'll cause problems with common linkifiers or URL
extractors, either (and it's unlikely anybody would want to
share URLs of just the CSS in a plain text(-like) format).
Eric Wong [Wed, 25 Aug 2021 08:40:40 +0000 (08:40 +0000)]
lei_mail_sync: remove warning message from caller
We can afford to be liberal in what messages we accept
internally, since LeiToMail uses a trailing slash internally.
Eric Wong [Wed, 25 Aug 2021 08:40:39 +0000 (08:40 +0000)]
lei up: improve --all=local stderr output
The "# $NR written to $DEST ($total matches)" messages are
arguably the most useful output of "lei up --all=local",
but they get intermixed with progress messages from various
workers. Queue up these finalization messages and only spit
them out on ->DESTROY.
Eric Wong [Tue, 24 Aug 2021 22:49:24 +0000 (22:49 +0000)]
imap+nntp: die loudly if ->mm or ->over disappear
While the WWW front-end can gracefully handle ->mm and ->over
disappearing (in most cases), IMAP+NNTP front-ends are completely
dependent on these and failed mysteriously when they go missing
after startup.
These will hopefully make issues like what Konstantin
encountered more obvious:
Link: https://public-inbox.org/meta/20210824204855.ejspej4z7r2rpu63@nitro.local/
Eric Wong [Tue, 24 Aug 2021 13:06:39 +0000 (13:06 +0000)]
lei: non-blocking lei/store->done in lei-daemon
This allows client sockets to wait for "done" commits to
lei/store while the daemon reacts asynchronously. The goal
of this change is to keep the script/lei client alive until
lei/store commits changes to the filesystem, but without
blocking the lei-daemon event loop. It depends on Perl
refcounting to close the socket.
This change also highlighted our over-use of "done" requests to
lei/store processes, which is now corrected so we only issue it
on collective socket EOF rather than upon reaping every single
worker.
This also fixes "lei forget-mail-sync" when it is the initial
command.
This took several iterations and much debugging to arrive at the
current implementation:
1. The initial iteration of this change utilized socket passing
from lei-daemon to lei/store, which necessitated switching
from faster pipes to slower Unix sockets.
2. The second iteration switched to registering notification sockets
independently of "done" requests, but that could lead to early
wakeups when "done" was requested by other workers. This
appeared to work most of the time, but suffered races under
high load which were difficult to track down.
Finally, this iteration passes the stringified socket GLOB ref
to lei/store which is echoed back to lei-daemon upon completion
of that particular "done" request.
Eric Wong [Tue, 24 Aug 2021 13:04:06 +0000 (13:04 +0000)]
lei: add missing LeiWatch lazy-load
I'm not sure if this class will actually be needed, but
we need to load it while we're using it.
Eric Wong [Thu, 19 Aug 2021 09:49:34 +0000 (09:49 +0000)]
lei: implicitly watch all Maildirs it knows about
This allows MUA-made flag changes to Maildirs to be instantly
read and acknowledged for future search results.
In the future, it may be used to speed up --augment and
--import-before (the default) with with "lei q".
Eric Wong [Thu, 19 Aug 2021 01:36:38 +0000 (01:36 +0000)]
lei q: make --save the default
Since "lei up" is more often useful than not and incurs neglible
overhead; enable --save by default and allow --no-save to work.
This also fixes a long-standing when overwriting --output
destinations with saved searches: dedupe data from previous
searches are reset and no longer influences the new (changed)
search, so results no longer go missing if two sequential
invocations of "lei q --save" point to the same --output.
Eric Wong [Tue, 17 Aug 2021 08:52:41 +0000 (08:52 +0000)]
lei forget-mail-sync: rely on lei/store process
As implied in commit
6ff03ba2be9247f1
("lei export-kw: do not write directly to mail_sync.sqlite3"),
modifying mail_sync.sqlite3 directly can lead to conflicts
and making everything go through lei/store is easier.
Eric Wong [Tue, 17 Aug 2021 08:52:40 +0000 (08:52 +0000)]
ipc: remove WQ_MAX_WORKERS
We no longer rely on IO::FDPass, so there's no longer a reason
to limit this internally.
Eric Wong [Tue, 17 Aug 2021 08:52:39 +0000 (08:52 +0000)]
lei: add ->lms shortcut for LeiMailSync
We access this read-only in many places (and will in more),
so provide a shortcut to simplify callers.
Eric Wong [Mon, 16 Aug 2021 23:35:20 +0000 (23:35 +0000)]
view: remove mbox.gz and Atom from topic view
This declutters the topic view since these links seem rarely
used. Atom and mbox.gz links probably make most sense when
users have read the HTML and decide the topic is worth following
or downloading.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210816154444.sj3ks2sikq3x2ywx@nitro.local/
Eric Wong [Mon, 16 Aug 2021 22:42:45 +0000 (22:42 +0000)]
user_content: update + wrap lines for CSS change
Fixes: 86df4acd140d61ab ("Duplicate base css definitions in stylesheets")
Konstantin Ryabitsev [Mon, 16 Aug 2021 14:50:15 +0000 (10:50 -0400)]
Duplicate base css definitions in stylesheets
All pages carry the following inlined css declaration:
<style>pre{white-space:pre-wrap}*{font-size:100%;font-family:monospace}</style>
However, site security policies may deliberately prohibit execution of
inline content such as scripts and stylesheets as an extra layer of
protection against XSS vulnerabilities. For example, with the following
HTTP headers returned by the server, the inline styles above will be
ignored:
Content-Security-Policy: default-src 'self'
This causes public-inbox content to be rendered poorly on mobile devices
due to the default <pre> behaviour. Duplicating this declaration into
the contrib stylesheets makes sure that these styles are applied even
with the strictest security policies in place.
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Eric Wong [Mon, 16 Aug 2021 07:29:17 +0000 (07:29 +0000)]
t/run.perl: fix "make check-run" on FreeBSD 11.x
Persistent lei-daemon still leads to ECONNRESET client errors on
FreeBSD, and maxing out the kern.ipc.soacceptqueue sysctl (as
documented in the FreeBSD listen(2) manpage) doesn't seem to
help.
"make check-run" is still 4-5s faster than "make check" on my
FreeBSD VM even after this change, so it's still a worthwhile
improvement.
Eric Wong [Mon, 16 Aug 2021 05:39:33 +0000 (05:39 +0000)]
lei_search: avoid unconditional warning when no exception
Oops, we shouldn't warn on "$@" unless "$@" is truthy.
Fixes: c7bcfe6cd6648ff0 ("lei: diagnostics for /Document \d+ not found/ errors")
Eric Wong [Mon, 16 Aug 2021 05:30:22 +0000 (05:30 +0000)]
www: uninitialized vars due to extindex lacking address
Some messages have From/To/Cc headers munged to be unparseable to
Email::Address::XS, the fallback is to use the default inbox address.
-extindex do not have an address on their own, so just fall back to
using 'unknown@example.com' for now.
An example of such a message:
https://yhbt.net/lore/all/
20201002154535.28412-1-fw@strlen.de/
Eric Wong [Sat, 14 Aug 2021 07:42:39 +0000 (07:42 +0000)]
www: avoid uninitialized vars from shadowed Message-IDs
For /all/ (extindex) and like, Message-ID reuse from client
errors or list-injected footers can cause threading weirdness.
Avoid auto-vivification in the mapping table and dereferencing
of unknown messages.
Eric Wong [Sat, 14 Aug 2021 00:29:44 +0000 (00:29 +0000)]
lei: hexdigest mocks account for unwanted headers
PublicInbox::Import never imports @UNWANTED_HEADERS, so ensure
our mock blob OIDs do the same. This ought to prevent
duplicates if the PSGI mboxrd download starts setting
"X-Status: F" like "lei q -tt .."
Eric Wong [Sat, 14 Aug 2021 00:29:43 +0000 (00:29 +0000)]
lei <q|up>: wait on remote mboxrd imports synchronously
This ought to avoid /Document \d+ not found/ errors from Xapian
when seeing a message for the first time by not attempting to
read keywords for totally unseen messages.
Eric Wong [Sat, 14 Aug 2021 00:29:42 +0000 (00:29 +0000)]
lei: diagnostics for /Document \d+ not found/ errors
This may help diagnose "Exception: Document \d+ not found"
errors I'm seeing from "lei up" with HTTPS endpoints.
Eric Wong [Thu, 12 Aug 2021 23:40:27 +0000 (23:40 +0000)]
lei up: note errors if one output destination fails
We can keep going if one (out of multiple) output destinations
fail, but the error needs to be communicated to the caller as an
exit code.
Eric Wong [Thu, 12 Aug 2021 23:40:26 +0000 (23:40 +0000)]
lei up: support multiple output folders w/o --all=local
Being able to update 1 folder, or all (local) folders is
sometimes too limiting, so just allow updating any subset
of local folders.
Eric Wong [Wed, 11 Aug 2021 11:26:18 +0000 (11:26 +0000)]
lei: attempt to canonicalize away "/../" pathnames
As documented, File::Spec->canonpath does not canonicalize
"/../". While we want to do our best to preserve symlinks in
pathnames, leaving "/../" can mislead our inotify|kqueue usage.
Eric Wong [Wed, 11 Aug 2021 11:26:17 +0000 (11:26 +0000)]
lei_saved_search: canonicalized relative save paths
Storing relative paths with '..' in them can be expensive to
resolve when running 'lei up', so prefer storing canonicalized
absolute paths. We only do this for paths with '..' in them,
though, since this can lose symlink info.
Eric Wong [Wed, 11 Aug 2021 11:26:16 +0000 (11:26 +0000)]
treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and
it's not worth our time to start. So stop wasting CPU cycles
giving the illusion that we'd care about non-*nix pathnames.
Eric Wong [Sun, 8 Aug 2021 20:07:47 +0000 (20:07 +0000)]
lei_xsearch: improve Xapian open failure messages
Displaying $! can help users diagnose resource limit problems
such as EMFILE/ENFILE/ENOMEM. $@ is currently useful for XS
Search::Xapian and perhaps future versions of the Xapian.pm SWIG
bindings.
Eric Wong [Sun, 8 Aug 2021 01:14:17 +0000 (01:14 +0000)]
searchidx: die on Xapian load errors
Xapian bindings may not be installed or be out-of-date w.r.t. the
Perl version, improve the visibility of errors in those cases.
Cleanup and drop some redundant checks while we're at it.
Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>
Link: https://public-inbox.org/meta/87k0ky5mbd.fsf@toke.dk/
Eric Wong [Sun, 8 Aug 2021 01:14:16 +0000 (01:14 +0000)]
tests: fix test failures when Xapian is missing
We still support usage without Xapian, so ensure our tests
work when Xapian bindings are missing
Eric Wong [Sun, 8 Aug 2021 01:03:50 +0000 (01:03 +0000)]
httpd: set psgi.url_scheme to 'https' for TLS listeners
For users using the native TLS functionality of -httpd (instead
of using nginx + Plack::Middleware::ReverseProxy),
psgi.url_scheme=http was wrong and would lead to improper
redirects.
Eric Wong [Fri, 6 Aug 2021 00:29:52 +0000 (00:29 +0000)]
li2wrap: avoid double-close on Linux::Inotify2 <2.3
LI2Wrap was not working as expected due to the missing bless
to override ->DESTROY. This bug showed up in an message check in
t/lei-q-remote-import.t
Fixes: 7fc6e30aeab9925b ("lei: close inotify FD in forked child")
Eric Wong [Thu, 5 Aug 2021 02:33:40 +0000 (02:33 +0000)]
lei export-kw: workaround race in updating Maildir locations
Inotify updates may simultaneously remove or update the location
of a message, so ensure we at least have knowledge of the new
location if the old one cannot be updated.
Eric Wong [Wed, 4 Aug 2021 10:02:48 +0000 (10:02 +0000)]
extindex: fix boost with partial runs
Boost relies on knowledge of all inboxes in a given config file
to work properly. So while we support indexing a subset of
inboxes, we must still account for boost in inboxes we're not
indexing. So split internal inbox groups into "known" and
"active", where previously we only cared for inboxes which were
being actively indexed.
Furthermore, boost checks need to be applied when a
message arrives in different inboxes across multiple
invocations.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210802204058.vscbxs5q7xyolyu2@nitro.local/
Eric Wong [Wed, 4 Aug 2021 10:02:47 +0000 (10:02 +0000)]
extindex: do not over-account for cross-posted messages
Cross-posted messages don't result in massive writes to the
Xapian DBs like a completely unseen message would, so stop
accounting for their size. This ought to improve performance
for heavily cross-posted setups, but --commit-interval still
has effect.
Eric Wong [Thu, 29 Jul 2021 10:01:31 +0000 (10:01 +0000)]
lei: close inotify FD in forked child
Linux::Inotify2 2.3+ includes an ->fh method to give us the
ability to safely close an FD without hitting EBADF (and
automatically use FD_CLOEXEC).
We'll still need a new wrapper class (LI2Wrap) to handle it for
users of old versions, though.
Link: http://lists.schmorp.de/pipermail/perl/2021q3/thread.html
Eric Wong [Fri, 30 Jul 2021 12:18:55 +0000 (12:18 +0000)]
extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to
v2 inboxes, we'll support -xcpdb (reshard+upgrade) and
-compact all the same to give admins tuning+upgrade
options.
Eric Wong [Fri, 30 Jul 2021 12:18:54 +0000 (12:18 +0000)]
admin: index_inbox: drop unnecessary check
No callers pass an unblessed pathname to index_inbox,
only Inbox object refs.
Eric Wong [Wed, 28 Jul 2021 00:37:19 +0000 (00:37 +0000)]
listener: maximize listen(2) backlog
This helps avoid errors from script/lei dying on ECONNRESET
when a single lei-daemon is serving all tests when run via
"make check-run".
Instead of using some arbitrary limit, use INT_MAX and let
the kernel clamp it (both Linux and FreeBSD do).
There's no need to call listen() in LEI.pm, either, since
Listener->new takes care of it.
Eric Wong [Wed, 28 Jul 2021 00:37:18 +0000 (00:37 +0000)]
lei: die on ECONNRESET
ECONNRESET should be rare on a private local socket, and if
we hit it, it's because we're hitting the listen() limit.
Eric Wong [Tue, 27 Jul 2021 10:44:29 +0000 (10:44 +0000)]
treewide: s/sequential_shard/sequential-shard/g
The underscore variant was never documented and maintaining
the difference between the command-line and internal hash
is not worth it.
Eric Wong [Sun, 25 Jul 2021 12:44:23 +0000 (12:44 +0000)]
extindex: support --jobs/-j properly on creation for shard count
This wasn't wired up properly, but Xapian appears to suffer from
I/O amplification problems as DB shards get larger:
https://lists.xapian.org/pipermail/xapian-discuss/2019-February/009727.html
<23640.32170.703368.841021@y.dockes.com>
Of course, we shouldn't have too many shards, either; because
performance problems with too many shards was the entire reason
extindex was created:
https://lists.xapian.org/pipermail/xapian-discuss/2020-August/009823.html
<
20200826064728.GA32239@dcvr>
Eric Wong [Sun, 25 Jul 2021 12:03:33 +0000 (12:03 +0000)]
doc: lei-{p2q,rediff}: note implicit --stdin
lei actually uses implicit --stdin everywhere, but I thing
these patch-related commands are the most common use of them.
Eric Wong [Sun, 25 Jul 2021 11:15:06 +0000 (11:15 +0000)]
t/lei-watch.t: improve test reliability
On single CPU (and overloaded SMP) systems, we can't rely on
inotify in lei-daemon firing before a "lei note-event done"
client hits it. So force in a single tick() to ensure the
scheduler can yield to lei-daemon and see the inotify wakeup
before "lei note-event done" to commit the write.
Eric Wong [Sun, 25 Jul 2021 10:40:17 +0000 (10:40 +0000)]
init: support git <2.30 for "-c KEY=VALUE" args
It turns out `--fixed-value' is a relatively new git-config(1)
feature in git 2.30+ (December 2020). So use the quotemeta
perlop for now since it seems compatible-enough for POSIX ERE
used by git.
Eric Wong [Sun, 25 Jul 2021 00:43:32 +0000 (00:43 +0000)]
lei_mail_sync: locations_for API uses oidbin for comparisons
Favor oidbin use internally to reduce internal memory traffic.
Eric Wong [Sun, 25 Jul 2021 00:43:31 +0000 (00:43 +0000)]
lei_inspect: fix typo
Not sure how this wasn't caught, earlier...
Eric Wong [Sun, 25 Jul 2021 00:43:30 +0000 (00:43 +0000)]
lei_search: favor binary OID comparisons
Reduce memory traffic and code, too.
Eric Wong [Sun, 25 Jul 2021 00:43:29 +0000 (00:43 +0000)]
extsearchidx: favor binary comparison in common case
We'll use 20-byte SHA-1 comparisons instead of 40-byte
hex representations for a minor reduction in memory
traffic.