However, site security policies may deliberately prohibit execution of
inline content such as scripts and stylesheets as an extra layer of
protection against XSS vulnerabilities. For example, with the following
HTTP headers returned by the server, the inline styles above will be
ignored:
Content-Security-Policy: default-src 'self'
This causes public-inbox content to be rendered poorly on mobile devices
due to the default <pre> behaviour. Duplicating this declaration into
the contrib stylesheets makes sure that these styles are applied even
with the strictest security policies in place.
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Eric Wong [Mon, 16 Aug 2021 07:29:17 +0000 (07:29 +0000)]
t/run.perl: fix "make check-run" on FreeBSD 11.x
Persistent lei-daemon still leads to ECONNRESET client errors on
FreeBSD, and maxing out the kern.ipc.soacceptqueue sysctl (as
documented in the FreeBSD listen(2) manpage) doesn't seem to
help.
"make check-run" is still 4-5s faster than "make check" on my
FreeBSD VM even after this change, so it's still a worthwhile
improvement.
Eric Wong [Mon, 16 Aug 2021 05:30:22 +0000 (05:30 +0000)]
www: uninitialized vars due to extindex lacking address
Some messages have From/To/Cc headers munged to be unparseable to
Email::Address::XS, the fallback is to use the default inbox address.
-extindex do not have an address on their own, so just fall back to
using 'unknown@example.com' for now.
An example of such a message:
https://yhbt.net/lore/all/20201002154535.28412-1-fw@strlen.de/
Eric Wong [Sat, 14 Aug 2021 07:42:39 +0000 (07:42 +0000)]
www: avoid uninitialized vars from shadowed Message-IDs
For /all/ (extindex) and like, Message-ID reuse from client
errors or list-injected footers can cause threading weirdness.
Avoid auto-vivification in the mapping table and dereferencing
of unknown messages.
Eric Wong [Sat, 14 Aug 2021 00:29:44 +0000 (00:29 +0000)]
lei: hexdigest mocks account for unwanted headers
PublicInbox::Import never imports @UNWANTED_HEADERS, so ensure
our mock blob OIDs do the same. This ought to prevent
duplicates if the PSGI mboxrd download starts setting
"X-Status: F" like "lei q -tt .."
Eric Wong [Sat, 14 Aug 2021 00:29:43 +0000 (00:29 +0000)]
lei <q|up>: wait on remote mboxrd imports synchronously
This ought to avoid /Document \d+ not found/ errors from Xapian
when seeing a message for the first time by not attempting to
read keywords for totally unseen messages.
Eric Wong [Wed, 11 Aug 2021 11:26:18 +0000 (11:26 +0000)]
lei: attempt to canonicalize away "/../" pathnames
As documented, File::Spec->canonpath does not canonicalize
"/../". While we want to do our best to preserve symlinks in
pathnames, leaving "/../" can mislead our inotify|kqueue usage.
Eric Wong [Wed, 11 Aug 2021 11:26:17 +0000 (11:26 +0000)]
lei_saved_search: canonicalized relative save paths
Storing relative paths with '..' in them can be expensive to
resolve when running 'lei up', so prefer storing canonicalized
absolute paths. We only do this for paths with '..' in them,
though, since this can lose symlink info.
Eric Wong [Wed, 11 Aug 2021 11:26:16 +0000 (11:26 +0000)]
treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and
it's not worth our time to start. So stop wasting CPU cycles
giving the illusion that we'd care about non-*nix pathnames.
Eric Wong [Sun, 8 Aug 2021 20:07:47 +0000 (20:07 +0000)]
lei_xsearch: improve Xapian open failure messages
Displaying $! can help users diagnose resource limit problems
such as EMFILE/ENFILE/ENOMEM. $@ is currently useful for XS
Search::Xapian and perhaps future versions of the Xapian.pm SWIG
bindings.
Eric Wong [Sun, 8 Aug 2021 01:14:17 +0000 (01:14 +0000)]
searchidx: die on Xapian load errors
Xapian bindings may not be installed or be out-of-date w.r.t. the
Perl version, improve the visibility of errors in those cases.
Cleanup and drop some redundant checks while we're at it.
Eric Wong [Sun, 8 Aug 2021 01:03:50 +0000 (01:03 +0000)]
httpd: set psgi.url_scheme to 'https' for TLS listeners
For users using the native TLS functionality of -httpd (instead
of using nginx + Plack::Middleware::ReverseProxy),
psgi.url_scheme=http was wrong and would lead to improper
redirects.
Eric Wong [Thu, 5 Aug 2021 02:33:40 +0000 (02:33 +0000)]
lei export-kw: workaround race in updating Maildir locations
Inotify updates may simultaneously remove or update the location
of a message, so ensure we at least have knowledge of the new
location if the old one cannot be updated.
Eric Wong [Wed, 4 Aug 2021 10:02:48 +0000 (10:02 +0000)]
extindex: fix boost with partial runs
Boost relies on knowledge of all inboxes in a given config file
to work properly. So while we support indexing a subset of
inboxes, we must still account for boost in inboxes we're not
indexing. So split internal inbox groups into "known" and
"active", where previously we only cared for inboxes which were
being actively indexed.
Furthermore, boost checks need to be applied when a
message arrives in different inboxes across multiple
invocations.
Eric Wong [Wed, 4 Aug 2021 10:02:47 +0000 (10:02 +0000)]
extindex: do not over-account for cross-posted messages
Cross-posted messages don't result in massive writes to the
Xapian DBs like a completely unseen message would, so stop
accounting for their size. This ought to improve performance
for heavily cross-posted setups, but --commit-interval still
has effect.
Eric Wong [Fri, 30 Jul 2021 12:18:55 +0000 (12:18 +0000)]
extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to
v2 inboxes, we'll support -xcpdb (reshard+upgrade) and
-compact all the same to give admins tuning+upgrade
options.
Eric Wong [Sun, 25 Jul 2021 11:15:06 +0000 (11:15 +0000)]
t/lei-watch.t: improve test reliability
On single CPU (and overloaded SMP) systems, we can't rely on
inotify in lei-daemon firing before a "lei note-event done"
client hits it. So force in a single tick() to ensure the
scheduler can yield to lei-daemon and see the inotify wakeup
before "lei note-event done" to commit the write.
Eric Wong [Sun, 25 Jul 2021 10:40:17 +0000 (10:40 +0000)]
init: support git <2.30 for "-c KEY=VALUE" args
It turns out `--fixed-value' is a relatively new git-config(1)
feature in git 2.30+ (December 2020). So use the quotemeta
perlop for now since it seems compatible-enough for POSIX ERE
used by git.
Eric Wong [Sun, 25 Jul 2021 00:11:03 +0000 (00:11 +0000)]
extsearchidx: use more appropriate max for dedupe
The over.msgid table may contain ghost Message-IDs and also
Message-IDs of deleted spam messages, so over->max isn't a
good aproproximation of dedupe progress.
Eric Wong [Sun, 25 Jul 2021 00:11:01 +0000 (00:11 +0000)]
extindex: support --dedupe[=MSGID]
Sometimes I just want to dedupe a single Message-ID to test
something, and this lets me do it.
This patch appears to do what its supposed to. But it also
appears to be finding duplicates that were previously missed.
That's a good thing, but I wish I understood what seems to be
fixed :x
I'm not sure why the previous ExtSearchIdx.pm (blob 357312b8)
was causing messages to be missed, even, and why this patch
seems to fix it... And it's not infinite looping, either.
Anyways, before this patch, "-extindex --dedupe" was taking ~5
min to no-op every message (after the initial full --dedupe run
which took over a day to run). No-op --dedupes now take just
under 2 hours to scan every single cross-posted message for a
no-op dedupe. The initial dedupe took nearly 44 hours on my
system for <https://yhbt.net/lore/all/> due to SATA-2 TLC SSD
latency on 3 gigantic Xapian shards.
Running --dedupe with this change seems to prevent
/BUG\?.*?not deduplicated properly/ stderr messages from being
triggered by View.pm. Current versions of -extindex do not
seem susceptible to introducing duplicates.
Eric Wong [Fri, 23 Jul 2021 10:56:11 +0000 (10:56 +0000)]
lei: avoid SQLite COUNT() for dedupe
SQLite COUNT() is a slow operation that does a full table scan
with no conditions. There's no need for it, since lei dedupe
only needs to know if it's empty or not to decide between
new/ and cur/ for Maildir outputs.
Eric Wong [Fri, 23 Jul 2021 10:56:10 +0000 (10:56 +0000)]
t/lei*: check error messages on failures
I just hit an unreproducible failure in t/lei-p2q.t and
lacked $lei_err information to diagnose it. Hopefully
this helps track down odd failures in the future.
Eric Wong [Wed, 21 Jul 2021 14:07:06 +0000 (14:07 +0000)]
lei: auto-refresh watches in config, cancel missing
This makes behavior less surprising on restarts as we no longer
lose state on restarts, so there's no need to manually run "lei
add-watch" to re-enable watches. This also allows us to
transparently handle changes if somebody edits the lei config
file directly or via git-config(1).
Eric Wong [Mon, 19 Jul 2021 08:59:35 +0000 (08:59 +0000)]
lei: start implementing inotify Maildir support
This allows lei to automatically note keyword (message flag)
changes made to a Maildir and propagate it into lei/store:
lei add-watch --state=tag-ro /path/to/Maildir
This doesn't persist across restarts, yet. In the future,
it will be applied automatically to "lei q" output Maildirs
by default (with an option to disable it).
State values of tag-rw, index-<ro|rw>, import-<ro|rw> will all
be supported for Maildir.
This represents a fairly major internal change that's fairly
intrusive, but the whole daemon-oriented design was to
facilitate being able to automatically monitor (and propagate)
Maildir/IMAP flag changes.
Eric Wong [Wed, 21 Jul 2021 14:05:49 +0000 (14:05 +0000)]
extsearch: support publicinbox.*.boost parameter
This behaves identically the lei external "boost" parameter in
prioritizing raw messages for extindex.
Relying exclusively on the config file order doesn't work well
for mirrors since it's impossible to guarantee config file
ordering via grokmirror hooks.
Config file ordering remains the default if boost is
unconfigured, or in case of ties.
Note: I chose the name "boost" rather than "priority" or "rank"
since I always get confused by whether higher or lower numbers
take precedence when it comes to kernel scheduling. "weight" is
also a part of Xapian API terminology, which we currently do not
expose to configuration (but may in the future).
Eric Wong [Tue, 20 Jul 2021 08:58:58 +0000 (08:58 +0000)]
httpd: fix SIGHUP by invalidating cache on reload
Since we require separate PublicInbox::HTTPD instances for each
listen socket address (in order to support {SERVER_<NAME|PORT>}
for PSGI env), the old cache needed to be invalidated on rare
app refreshes.
SIGHUP has always been broken in -httpd (but not -imapd or
-nntpd) due to this cache.
Update the daemon documentation and 5.10.1-ize some bits while
we're in the area.
Eric Wong [Thu, 8 Jul 2021 08:25:19 +0000 (08:25 +0000)]
extindex: dedupe: reduce SQLite contention and dirty data
Complex queries causes SQLite to block readers for longer than
their retry period. For dedupe, it was also preventing us from
making good use of checkpoints due to the query time.
With many deduplications, checkpoints are necessary to maintain
system health due to having too much data piled up.
Eric Wong [Wed, 7 Jul 2021 23:24:55 +0000 (23:24 +0000)]
extsearchidx: ignore Eml warnings across the board
There's nothing we can do about misformatted emails and headers
we get from untrusted sources. They're too noisy and those
messages already exist in public-inboxes, anyways, so just
keep things quiet so we can spot real problems more easily.
Eric Wong [Tue, 6 Jul 2021 12:42:03 +0000 (12:42 +0000)]
extindex: --gc: avoid SQLite lock conflict on shard cleanup
Xapian shard cleanup only requires read-only access to
over.sqlite3, so avoid opening it with read-write access since
create_tables will hit lock conflicts on "INSERT OR IGNORE"
statements.
Eric Wong [Tue, 6 Jul 2021 12:42:02 +0000 (12:42 +0000)]
extindex: implement --dedupe to fix old extindices
This is intended to fix older indices that had deduplication
bugs for matching content. It'll also make dealing with
future changes to ContentHash easier since that's never
guaranteed stable.
It also supports --dry-run to print changes only without
making them.
Eric Wong [Tue, 6 Jul 2021 12:42:01 +0000 (12:42 +0000)]
eml: relax warn_ignore regexps for current Email::Address::XS
These seem needed with the data I'm currently working on, but I
haven't changed my version of Email::Address::XS since my last
Debian stable upgrade (to buster).
Eric Wong [Fri, 2 Jul 2021 21:02:23 +0000 (21:02 +0000)]
lei import: increase flags search batch size, display progress
IMAP flag-only synchronization doesn't fetch entire messages,
so we can safely bump the batch size iff a user specified one
for full messages to 10000 times that.
Since I sometimes wonder why nothing happens for several seconds
after starting "lei import $URL", we'll also show some progress
during the flag synchronization phase.
Eric Wong [Fri, 2 Jul 2021 20:42:09 +0000 (20:42 +0000)]
extsearchidx: extra assertions for deduplication flow
I haven't found any bugs from this (still looking for missed
deduplication bugs), and it's a bit shorter and more likely to
catch future bugs. Clean up an unnecessary ->{mid} array copy
while we're at it, too.
Eric Wong [Wed, 30 Jun 2021 17:58:54 +0000 (17:58 +0000)]
searchidx: default BATCH_BYTES to 8MB on 64-bit systems
This default seems closer to reasonable on 64-bit systems which
are the norm these days. 32-bit systems gain 48K so it's an
even 1 MB, but we need to keep 32-bit systems from using too
much since there's still some ancient systems out there with
small inboxes.
Eric Wong [Fri, 25 Jun 2021 01:06:39 +0000 (01:06 +0000)]
extindex: maintain pack symlinks and use "git multi-pack-index"
This is a fair amount of complexity, but it speeds up
"git cat-file --batch" startup by 3-4% with 50K packfiles
with a hot kernel cache.
This appears extremely sensitive to RAM available to
the kernel page cache with my SATA 2 SSD. Faster storage
and more RAM can bring loading pack.
2.60s vs 2.69s were the best cases on my workstation with and
without the multi-pack-index, however times could be all over
the place (even in the minutes) with more activity on my
workstation.
Getting sub-minute times requires a git patch to speed up
alt_odb_usable():
<https://lore.kernel.org/20210624005806.12079-1-e@80x24.org/>
Otherwise, prepare to wait several minutes.
It's also easier to patch and install git locally since the
git.git build system defaults to prefix=$HOME and dealing with
dynamic linking with libgit2 is more difficult for end users
relying on Inline::C.
libgit2 remains in use for the non-ALL.git case, but maybe it's
not necessary (libgit2 is significantly slower than git in
Debian 10 due to SHA-1 collision checking).
Eric Wong [Wed, 23 Jun 2021 11:14:22 +0000 (07:14 -0400)]
www: do not warn on blank query parameters
Sometimes users (or bots) may lead queries with '&' and
trigger uninitialized variable warnings, just ignore them
and give consumers a $ctx->{qp}->{''} entry.
While we're in the area, pass a regexp rather than scalar string
to the `split' perlop to prevent Perl from recompiling the
regexp on every call.
Eric Wong [Wed, 23 Jun 2021 11:14:21 +0000 (07:14 -0400)]
www_listing: start updating for pagination + search
When dealing with thousands of inboxes, displaying all of
them on a single page isn't going to work. So steal some
pagination and search results code from the message search
to generate some basic HTML output that looks good in w3m.
Eric Wong [Wed, 23 Jun 2021 11:14:20 +0000 (07:14 -0400)]
search: make xap_terms easier-to-use and use it more
This allows us to simplify callers throughout, and exceptions are
can no longer be silently hidden. MiscSearch now uses xap_terms
for looking up eidx_key terms for a code reduction.
We also simplify LeiStore->_msg_kw for runtime use by moving the
MsetIterator handling into t/lei_store.t test case.
Eric Wong [Tue, 22 Jun 2021 10:04:36 +0000 (10:04 +0000)]
lei: use open() perlop for -C (chdir)
This is for consistency with the open() at initial accept, in
case we hit a code path which expects Perl directory handles
rather than "file handles". Both work with the chdir() perlop
(fchdir(2), in our case).
Eric Wong [Sun, 20 Jun 2021 04:33:19 +0000 (04:33 +0000)]
lei sucks: don't warn or error out on missing dependencies
%INC can hold undef. This can be hit on a Linux machine missing
Linux::Inotify2. Loading PublicInbox::KQNotify is attempted and
PublicInbox/KQNotify.pm always exists, causing the `undef' entry
in %INC when it fails to load IO::KQueue.
Eric Wong [Sat, 19 Jun 2021 03:22:28 +0000 (03:22 +0000)]
view: extra check to for redundant messages in HTML view
There appears to be some cases of duplicates appearing due to
-extindex. I haven't nailed down the cause of it, yet, but
this should make things easier for readers using the PSGI
HTML interface in the meantime.
The raw mboxrd remains undeduplicated for now, and the
correct fix/workaround would be some fsck-like mode for
public-inbox-extindex.
Eric Wong [Fri, 18 Jun 2021 21:44:38 +0000 (18:44 -0300)]
scripts: add syscall-list tool for development
We'll be supporting inotify directly as we do with epoll so so
Linux users won't have to deal with XS, extra DSOs or install
Linux::Inotify2 (and common::sense) modules.
Eric Wong [Thu, 17 Jun 2021 22:00:47 +0000 (22:00 +0000)]
lei/store: cull redundant docids based on blob OID
I'm not sure how this happened (only once for me in March), but
it should not happen... In any case, we'll operate on the
lowest numbered docid and cull redundant index entries when
lei/store is open for read-write.
This also fixes the normal lei/store removal path to clean up
the xref3 table (since it's not done automatically for
public-facing -eidx due to the multi-list nature of it).
Eric Wong [Sun, 13 Jun 2021 18:12:06 +0000 (18:12 +0000)]
lei index+import: reject keywords from R/O IMAP
Since users can't set IMAP flags in read-only IMAP folders,
we won't clobber local flags when importing from IMAP. This
also enables the local_blob fallback used for lei-index to
be used for index deduplication.
Eric Wong [Sat, 12 Jun 2021 00:10:45 +0000 (00:10 +0000)]
net_reader: canonicalize URL args on add_url
This fixes cases when users specify an IMAP or NNTP URL
with standard port numbers explicitly.
In other words, this allows users to use
"lei ls-mail-source nntps://public-inbox.org:563/" and
"lei ls-mail-source imaps://public-inbox.org:993/"
without hitting "BUG:" errors.
Eric Wong [Fri, 11 Jun 2021 09:42:40 +0000 (09:42 +0000)]
lei ls-mail-source: list IMAP folders and NNTP groups
While other tools can provide the same functionality, having
integration with git-credential is convenient, here. Caching
and completion will be implemented separately.
Eric Wong [Wed, 9 Jun 2021 23:27:50 +0000 (20:27 -0300)]
lei tag: less confusing warning about unimported messages
"unimported" is more meaningful than "missing", here. And
instead of having every worker spew about unimported messages,
we'll accumulate and only print one warning line. This
necessitated alterating ->DESTROY behavior and persisting
the client socket within the $lei object itself, not just
the PktOp consumer object.
Eric Wong [Wed, 9 Jun 2021 22:39:24 +0000 (22:39 +0000)]
lei import: support --new-only for IMAP
Taking ~40s to synchronize a ~75K message IMAP folder is
still a lot of time, so support an option to only touch
new messages.
This is similar to "offlineimap -q" (quick) or "mbsync --new"
switches, but lei already accepts "-q" as a shortcut for
--quiet. "--new" could work, but "--new-only" might be more
descriptive (or "--only-new"?), since the default fetches
also fetches new messages.
v2: warn for non-IMAP sources, I'm not sure it's worth it for
Maildir or other sources, yet. It will also make sense
for MH and JMAP once we support them.
Eric Wong [Wed, 9 Jun 2021 07:47:49 +0000 (07:47 +0000)]
lei tag: parallelize Maildir access
Since Maildir isn't guaranteed to have any sort of order, we
can parallelize inputs, here. On a 4-core system, this reduced
one of my tag invocations from 5.5 to 1.4s.
Eric Wong [Wed, 9 Jun 2021 10:03:05 +0000 (10:03 +0000)]
lei/store: do eidx_init before creating R/W lms dbh
Sharing lms->{dbh} with eidx shards appears to be the cause of
the "Issuing rollback() due to DESTROY without explicit
disconnect() of DBD::SQLite::db handle" messages I've been
seeing from "lei up".