]> Sergey Matveev's repositories - public-inbox.git/log
public-inbox.git
3 years agoexamples/cgit-commit-filter: improve quoted text handling
Eric Wong [Sat, 13 Feb 2021 02:15:03 +0000 (02:15 +0000)]
examples/cgit-commit-filter: improve quoted text handling

With an example such as:

something before "quoted phrase" something after

The Xapian will now see:

[ "something before", "quoted phrase", "something after" ]

whereas before it would see:

[ "something before", "quoted", "phrase", "something after" ]

which should improve search results accuracy when looking
up commits by commit title (subject).

3 years agodoc: lei-overview: add performance and bash completion sections
Kyle Meyer [Sat, 27 Feb 2021 18:03:28 +0000 (13:03 -0500)]
doc: lei-overview: add performance and bash completion sections

Take care of a couple of the items mentioned at
<https://public-inbox.org/meta/20210218202818.GA19443@dcvr>.

3 years agodoc: lei-import: drop markup of "stdin"
Kyle Meyer [Sat, 27 Feb 2021 18:03:27 +0000 (13:03 -0500)]
doc: lei-import: drop markup of "stdin"

stdin isn't placed in C<> elsewhere.

3 years agodoc: lei: update manpages
Kyle Meyer [Sat, 27 Feb 2021 18:03:26 +0000 (13:03 -0500)]
doc: lei: update manpages

Catch up with recent developments.

3 years agolei_xsearch: more detail about ->xdb call chain
Eric Wong [Fri, 26 Feb 2021 09:41:41 +0000 (22:41 -1100)]
lei_xsearch: more detail about ->xdb call chain

I was just wondering this myself :x

3 years agot/lei_store: rename $lst to $sto
Eric Wong [Fri, 26 Feb 2021 09:41:40 +0000 (22:41 -1100)]
t/lei_store: rename $lst to $sto

`$sto' is prevalent throughout the rest of the lei code,
and `$lst' seems like an abbreviation for "list".

I don't like the noise from commits like this, but I hope the
long-term payoff being less confusing to new developers is worth
it...

3 years agolei import|convert: support mbox locking on reads
Eric Wong [Fri, 26 Feb 2021 09:41:39 +0000 (22:41 -1100)]
lei import|convert: support mbox locking on reads

In case somebody is writing non-atomically, ensure we
take read locks when opening mbox files for reading.

v2: squash: load MboxLock even for .eml files

3 years agolei q: support mbox locking by default
Eric Wong [Fri, 26 Feb 2021 09:41:38 +0000 (22:41 -1100)]
lei q: support mbox locking by default

While this diverges from from mairix(1) behavior, it's the safer
option.  We'll follow Debian policy by supporting fcntl and
dotlocks by default (in that order).  Users who do not want
locking can use "--lock=none"

This will be used in a read-only capacity for watching
mailboxes for keyword updates via inotify or EVFILT_VNODE.

3 years agolei: style fix for $oldset declaration
Eric Wong [Fri, 26 Feb 2021 09:41:37 +0000 (22:41 -1100)]
lei: style fix for $oldset declaration

We want /^sub oldset/ to match to keep editors and
things like ctags happy.

3 years agolei q: -tt marks direct hits as "flagged"
Eric Wong [Thu, 25 Feb 2021 10:11:06 +0000 (10:11 +0000)]
lei q: -tt marks direct hits as "flagged"

This can be used to quickly distinguish messages which were
direct hits when doing thread expansion vs messages that
were merely part of the same thread.

This is NOT mairix-derived behavior, but I occasionally found
it useful when looking at results in an MUA to know whether
a message was a direct hit or not.

This makes "-t" consistent with non-"-t" cases as far as keyword
reading goes.

3 years agotest_common: io_modes: always support read/write
Eric Wong [Thu, 25 Feb 2021 10:11:05 +0000 (10:11 +0000)]
test_common: io_modes: always support read/write

This avoids warnings when redirecting STDIN to a scalarref
via run_script().

3 years agolei import: use --in-format/-F for consistency
Eric Wong [Thu, 25 Feb 2021 10:11:04 +0000 (10:11 +0000)]
lei import: use --in-format/-F for consistency

Since we recommend $IN_FORMAT:$LOCATION, this is hopefully not
intrusive (not that this is released software, yet).  This is
to be consistent with "lei convert" usage.

We'll keep "-f" only for output formats, since that is used
for "lei q" and "lei convert" for outputs

3 years agolei convert: support IMAP output and "-F eml" inputs
Eric Wong [Thu, 25 Feb 2021 10:11:03 +0000 (10:11 +0000)]
lei convert: support IMAP output and "-F eml" inputs

eml ("message/rfc822" MIME type) is supported by "lei import",
so it probably makes sense to support via convert, at least
for tests.  And IMAP support is supported in "lei q -o $MFOLDER",
so this only required renaming {nrd} => {net} and initializing
outputs before augment preparation (creating the IMAP folder)

3 years agolei q: auto-memoize remote messages into lei/store
Eric Wong [Wed, 24 Feb 2021 23:37:18 +0000 (05:37 +0600)]
lei q: auto-memoize remote messages into lei/store

This lets users avoid network traffic on subsequent searches at
the expense of local disk space.  --no-import-remote may be
specified to reverse this trade-off for users with little
storage.

3 years agolei_external: don't treat IPv6 URLs as globs
Eric Wong [Wed, 24 Feb 2021 23:37:17 +0000 (05:37 +0600)]
lei_external: don't treat IPv6 URLs as globs

IPv6 addresses are hexadecimals and colons inside brackets, so
add some DWIM-ery to ensure we don't attempt to treat addresses
like "http://[dead:beef]/foo/" as a glob.

3 years agonet_reader: trim exports and remove unused uri_new
Eric Wong [Wed, 24 Feb 2021 11:31:54 +0000 (17:31 +0600)]
net_reader: trim exports and remove unused uri_new

More network things for -watch are isolated in NetReader, now,
so fewer exports are necessary.

3 years agowatch: switch IMAP and NNTP fetch loops to NetReader
Eric Wong [Wed, 24 Feb 2021 11:31:53 +0000 (17:31 +0600)]
watch: switch IMAP and NNTP fetch loops to NetReader

NetReader::<imap|nntp>_each were based on the -watch
code they now replace.

v2: do not warn on EINTR if user quit to fix occasional
    test failure in t/imapd.t

3 years agolei <import|convert>: support NNTP sources
Eric Wong [Wed, 24 Feb 2021 11:31:52 +0000 (17:31 +0600)]
lei <import|convert>: support NNTP sources

We can read NNTP in -watch and Net::NNTP is shipped with Perl5,
so lei import and convert have no excuse not to support NNTP
as a client.

Authentication is not tested, yet; but should be close to what
IMAP is like...

3 years agoadd PublicInbox::URInntps package
Eric Wong [Wed, 24 Feb 2021 11:31:51 +0000 (17:31 +0600)]
add PublicInbox::URInntps package

We prefer the IANA-registered form of URIs to avoid confusing
users, but the URI package has yet to support it.

cf. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983419

3 years agolei: avoid needless env passing to subcommands
Eric Wong [Mon, 22 Feb 2021 21:38:22 +0000 (03:38 +0600)]
lei: avoid needless env passing to subcommands

We already localize %ENV before calling dispatch(), so
it's needless overhead in spawn() to be checking env for
undef values in those cases.

3 years agotreewide: avoid "delete local" construct on hashes
Eric Wong [Mon, 22 Feb 2021 21:38:21 +0000 (03:38 +0600)]
treewide: avoid "delete local" construct on hashes

Apparently this feature is only in Perl 5.12+, and we're
still on Perl 5.10.

3 years agowww: use PublicInbox::WwwStream
Uwe Kleine-König [Wed, 24 Feb 2021 08:54:56 +0000 (09:54 +0100)]
www: use PublicInbox::WwwStream

This prevents the following problem logged to the webserver's error log:

E: Undefined subroutine &PublicInbox::WwwStream::code_footer called at /usr/share/perl5/PublicInbox/WwwListing.pm line 102.
 in PublicInbox::ConfigIter=ARRAY(0x557aea68b1a8)::each_section at /usr/share/perl5/PublicInbox/ConfigIter.pm line 37.

Fixes: 7a3946ef122e ("www: support listing of inboxes")
3 years agolei_to_mail: remove unused OnDestroy import
Eric Wong [Tue, 23 Feb 2021 10:01:16 +0000 (04:01 -0600)]
lei_to_mail: remove unused OnDestroy import

3 years agolei q: reduce default lei2mail workers
Eric Wong [Tue, 23 Feb 2021 10:01:15 +0000 (04:01 -0600)]
lei q: reduce default lei2mail workers

While disk I/O is typically buffered for good scheduling,
git blob decoding uses a non-trivial amount of CPU time
and it helps to leave some CPU available for it.

3 years agolei: support "-C" to chdir in all sub commands
Eric Wong [Tue, 23 Feb 2021 10:01:14 +0000 (04:01 -0600)]
lei: support "-C" to chdir in all sub commands

We'll also support "-C" at the end of most commands to give
users a little more flexibility when building command-lines.
This conflicts with "lei daemon-kill -CHLD", so that's
special-cased since "-C" makes no sense with daemon-kill,
anyways.

Unlike "git show", the to-be-implemented "lei show" will diverge
and enable "--find-copies[=<n>]" by default, so "-C[<n>]" won't
be necessary.

3 years agodoc: lei: favor "-o format:$PATHNAME" over "-f"
Kyle Meyer [Tue, 23 Feb 2021 03:45:52 +0000 (22:45 -0500)]
doc: lei: favor "-o format:$PATHNAME" over "-f"

The --format argument is redundant and may be dropped entirely.
Update the lei manpages to prefer the format prefix.

cf. https://public-inbox.org/meta/20210217044032.GA17934@dcvr/

3 years agolei_auth: trim and remove leftover worker code
Eric Wong [Mon, 22 Feb 2021 11:22:59 +0000 (08:22 -0300)]
lei_auth: trim and remove leftover worker code

LeiAuth is no longer a separate worker process.  Instead, it's
used directly by LeiToMail and LeiImport for sharing auth info
from the first worker to the rest of the workers, using
lei-daemon as a message router.  So drop the old code to reduce
human cognitive load and interpreter memory overhead.

3 years agolei convert: inline convert_start
Eric Wong [Mon, 22 Feb 2021 11:22:58 +0000 (08:22 -0300)]
lei convert: inline convert_start

Since we stopped using LeiAuth as a WQ worker, keeping this
around as a single-use sub makes no sense and wastes several
KB of memory.

3 years agonet_reader: mic_get: reuse connections if cache enabled
Eric Wong [Mon, 22 Feb 2021 11:22:57 +0000 (08:22 -0300)]
net_reader: mic_get: reuse connections if cache enabled

We only enable {mic_cached} in WQ workers, and those
aren't expected to fork again going forward.  So cache
here avoid a penalty for the non-augmenting (imap_delete_all)
call with "lei q"

3 years agolei q: reduce wasted IMAP connection for auth
Eric Wong [Mon, 22 Feb 2021 11:22:56 +0000 (08:22 -0300)]
lei q: reduce wasted IMAP connection for auth

We can rework the first lei2mail worker to authenticate, and
then share auth info with the rest of the lei2mail workers.  As
with "lei import", this uses PktOp and lei-daemon to share
updated credentials between the first an subsequent l2m workers.

3 years agolei_auth: migrate common auth code from lei_import
Eric Wong [Mon, 22 Feb 2021 11:22:55 +0000 (08:22 -0300)]
lei_auth: migrate common auth code from lei_import

lei_to_mail will be able to use this, too.

3 years agolei import: no separate auth worker
Eric Wong [Mon, 22 Feb 2021 11:22:54 +0000 (08:22 -0300)]
lei import: no separate auth worker

We'll start sharing auth info from the first worker to the
rest of the workers via wq_broadcast.

This lays the groundwork for getting rid of LeiAuth workers for
authentication work and reducing network round trips required
for IMAP.

3 years agolei convert: auth directly from worker process
Eric Wong [Mon, 22 Feb 2021 11:22:53 +0000 (08:22 -0300)]
lei convert: auth directly from worker process

Since this only has one worker, we can auth directly in the
worker since the convert worker now has access to the script/lei
{sock} for running "git credential".

3 years agolei: _lei_cfg: return empty hashref if unconfigured
Eric Wong [Mon, 22 Feb 2021 11:22:52 +0000 (08:22 -0300)]
lei: _lei_cfg: return empty hashref if unconfigured

Existing callers in LeiExternal actually depend on this,
and LeiAuth shouldn't need to be creating a config file
just to do a conversion against an anonymous IMAP server.

3 years agolei: keep client {sock} in short-lived workers
Eric Wong [Mon, 22 Feb 2021 11:22:51 +0000 (08:22 -0300)]
lei: keep client {sock} in short-lived workers

For non-persistent workers, there's no harm in keeping the
client socket open.  This means we can avoid dancing around
closing it in PublicInbox::LeiAuth::ipc_atfork_child.
Eventually, other WQ workers will trigger "git credential"
spawning in script/lei directly.

3 years agolei_auth: rename {nrd} field to {net} for clarity
Eric Wong [Mon, 22 Feb 2021 11:22:50 +0000 (08:22 -0300)]
lei_auth: rename {nrd} field to {net} for clarity

We're authing for both reads and writes, this makes it
clear that we support both NetReader and NetWriter.

3 years agolei_store: populate ALL.git/alternates with new epochs
Eric Wong [Mon, 22 Feb 2021 06:18:55 +0000 (06:18 +0000)]
lei_store: populate ALL.git/alternates with new epochs

Since eidx_init updates ALL.git/objects/info/alternates, we need
to ensure new epochs we create from LeiStore->importer exist
before eidx_init writes alternates.

Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/8735xou0gq.fsf@kyleam.com/
3 years agot/lei*: drop $lei->(...) sub
Eric Wong [Sun, 21 Feb 2021 19:59:06 +0000 (01:59 +0600)]
t/lei*: drop $lei->(...) sub

lei() and lei_ok() are superior since they offer prototype
checks and lei_ok() adds another check + description DRY-ness.

The $lei sub was only bound to a variable since it was in
t/lei.t and named subs don't work well with the key2sub()
wrapper.

3 years agolei-daemon: prefer graceful shutdowns
Eric Wong [Sun, 21 Feb 2021 18:28:15 +0000 (00:28 +0600)]
lei-daemon: prefer graceful shutdowns

We'll keep the daemon alive as long as a a script/lei client
remains connected.  This ought to improve user experience
and is in line with what -imapd/-httpd/-nntpd users have
expected over the years.

3 years agotests: clean up t/home* consistently
Eric Wong [Sun, 21 Feb 2021 18:28:07 +0000 (00:28 +0600)]
tests: clean up t/home* consistently

And update t/home2/README while we're at it.

Followup-to: 4ea3975dbed0a533 ("tests: setup_public_inboxes: use IMAP-friendly newsgroups")
3 years agot/www_listing: require grok-pull version 2 or later
Kyle Meyer [Sun, 21 Feb 2021 21:46:11 +0000 (16:46 -0500)]
t/www_listing: require grok-pull version 2 or later

The grok-pull-based tests in www_listing are incompatible with
Grokmirror v2 in two ways: the generated configuration format and the
expected exit codes.  Update the tests to work with v2, and skip them
for earlier versions.

This was tested with the latest release of Grokmirror, v2.0.7.  Note
that the "pull" and "fsck" sections are required even though they're
empty.

3 years agot/www_listing: reword grok-pull skip message
Kyle Meyer [Sun, 21 Feb 2021 21:46:10 +0000 (16:46 -0500)]
t/www_listing: reword grok-pull skip message

Make it clear that this skip is because grok-pull isn't available at
all because the next commit will add another skip for older versions
of Grokmirror.

3 years agot/www_listing: correct the number of tests for grok-pull skip
Kyle Meyer [Sun, 21 Feb 2021 21:46:09 +0000 (16:46 -0500)]
t/www_listing: correct the number of tests for grok-pull skip

3 years agolei2mail: parallel augment for lock-free stores
Eric Wong [Sun, 21 Feb 2021 07:41:34 +0000 (07:41 +0000)]
lei2mail: parallel augment for lock-free stores

This lets us make use of multiple cores on IMAP and Maildir
backed by SSD (or better) storage.  This benefits IMAP stores
with high network latency, but may still penalize IMAP servers
with rotational storage.

3 years agonet_reader: use and accept URIimap objects in more places
Eric Wong [Sun, 21 Feb 2021 07:41:33 +0000 (07:41 +0000)]
net_reader: use and accept URIimap objects in more places

This flexibility should save us some code down-the-line.

3 years agoipc: support setting a locked number of WQ workers
Eric Wong [Sun, 21 Feb 2021 07:41:32 +0000 (07:41 +0000)]
ipc: support setting a locked number of WQ workers

We can use this to ensure sharded work doesn't do unexpected
things if workers are added/removed.  We currently don't
increase/decrease workers once a workqueue is started, but
non-lei code (-httpd/imapd) may start doing so.

This also fixes a bug where lei2mail workers could not
be adjusted via --jobs on the command-line.

3 years agolei q: move augment into lei2mail workers
Eric Wong [Sun, 21 Feb 2021 07:41:31 +0000 (07:41 +0000)]
lei q: move augment into lei2mail workers

This is a step which will allow us to parallelize augment
on Maildir and IMAP.

3 years agoipc: add wq_broadcast
Eric Wong [Sun, 21 Feb 2021 07:41:30 +0000 (07:41 +0000)]
ipc: add wq_broadcast

We'll give workqueues a broadcast mechanism to ensure all
workers see a certain message.  We'll also tag each worker
with {-wq_worker_nr} in preparation for work distribution.

This is intended to avoid extra connection and fork() costs
from LeiAuth in a future commit.

3 years agolei q: support IMAP/IMAPS --output destinations
Eric Wong [Sun, 21 Feb 2021 07:41:29 +0000 (07:41 +0000)]
lei q: support IMAP/IMAPS --output destinations

Augment (and dedupe) aren't parallel, yet, so its more sensitive to
high-latency networks.

3 years agoinbox_writable: require PublicInbox::MdirReader
Eric Wong [Sun, 21 Feb 2021 07:41:28 +0000 (07:41 +0000)]
inbox_writable: require PublicInbox::MdirReader

This wasn't causing known failures, but maybe it was or will in
the future.

3 years agot/net_reader-imap: fix under TEST_RUN_MODE=0
Eric Wong [Fri, 19 Feb 2021 19:36:56 +0000 (19:36 +0000)]
t/net_reader-imap: fix under TEST_RUN_MODE=0

PublicInbox::Config isn't loaded elsewhere by this file.

3 years agoURIimap: overload "" to ->as_string
Eric Wong [Fri, 19 Feb 2021 12:09:55 +0000 (05:09 -0700)]
URIimap: overload "" to ->as_string

This interpolation is used by the upstream URI package
and we rely on it elsewhere for HTTP(S) URIs, so save
ourselves some surprises down the line.

3 years agonet_writer: start implementing IMAP write support
Eric Wong [Fri, 19 Feb 2021 12:09:54 +0000 (05:09 -0700)]
net_writer: start implementing IMAP write support

Requiring TEST_IMAP_WRITE_URL to be set to a writable IMAP
server URL isn't ideal, but it works for now until we have time
to setup a mock dovecot/cyrus/etc... instance for testing.

3 years agonet_reader: handle single-message IMAP mailboxes
Eric Wong [Fri, 19 Feb 2021 12:09:53 +0000 (05:09 -0700)]
net_reader: handle single-message IMAP mailboxes

Due to an off-by-one error, we were unable to read mailboxes
with only a single message of UID:1.  Without this fix, the
message with UID:1 could only be read after UID:2 was created;
so there's no permanent data loss as long as a new message
showed up.

This affects all releases of public-inbox-watch with IMAP
support, though it probably went unnoticed because single
message inboxes are rare.

3 years agotests: require Mail::IMAPClient for IMAP tests
Eric Wong [Fri, 19 Feb 2021 12:09:52 +0000 (05:09 -0700)]
tests: require Mail::IMAPClient for IMAP tests

All of our current IMAP code relies on Mail::IMAPClient
at the moment, so ensure we skip those tests on systems
without that module.

3 years agolei_to_mail: get rid of empty _post_augment_maildir
Eric Wong [Fri, 19 Feb 2021 12:09:51 +0000 (05:09 -0700)]
lei_to_mail: get rid of empty _post_augment_maildir

We won't have _post_augment_imap when we add IMAP support,
either.

_pre_augment_imap will not exist, either, since opening an
IMAP(S) connection can be time consuming so we'll roll that
into imap_common_init.

3 years agot/lei-externals: favor "-o format:$PATHNAME" over "-f"
Eric Wong [Fri, 19 Feb 2021 12:09:50 +0000 (05:09 -0700)]
t/lei-externals: favor "-o format:$PATHNAME" over "-f"

It'll be less ambiguous for inputs with "lei convert" and "lei import"

cf. https://public-inbox.org/meta/20210217044032.GA17934@dcvr/

3 years agoemergency: modernize and reduce syscalls
Eric Wong [Fri, 19 Feb 2021 00:58:32 +0000 (00:58 +0000)]
emergency: modernize and reduce syscalls

As with LeiToMail, we'll exclusively rely on O_EXCL and EEXIST
instead of "-f" (stat(2)) for file name collision checking.
Furthermore, we can rely on link(2) error handling instead of
using stat(2) to check the result of link(2).

We'll still keep the hostname in these filenames, but memoize it
on a per-instance basis since hostname changes are rare and we
can assume it won't change between "tmp" and "cur".

We'll also start embedding the PID as {"tmp.$$"} into the fiel
name to guard against accidental deletion in child processes,
instead of requiring an extra hash lookup.

Finally, avoid multiple getpid(2) syscalls in internal subs
since glibc no longer caches in getpid(3).

We'll also favor constant comparison of $! against EEXIST for
inlining. and stop doing ->autoflush when we only have a single
print + flush.

3 years agolei_to_mail: Maildir: ensure link(2) succeeds
Eric Wong [Fri, 19 Feb 2021 00:58:31 +0000 (00:58 +0000)]
lei_to_mail: Maildir: ensure link(2) succeeds

link(2) may fail with errors other than EEXIST; just bail out
since something is likely seriously wrong.

3 years agolei: check for IMAP auth errors
Eric Wong [Thu, 18 Feb 2021 20:22:25 +0000 (23:22 +0300)]
lei: check for IMAP auth errors

We need to ensure authentication failures and error codes get
propagated to the parent process(es) properly.

v2: update MANIFEST
v3: LeiAuth.pm ->_lei_cfg bit moved to a previous commit

3 years agolei: consolidate the bulk of the IPC code
Eric Wong [Thu, 18 Feb 2021 20:22:24 +0000 (23:22 +0300)]
lei: consolidate the bulk of the IPC code

The backends for "lei add-external --mirror", "lei convert", and
"lei import" all share a similar pattern for spawning background
workers.  Hoist out the common parts to slim down our code base
a bit.

The LeiXSearch and LeiToMail workers for "lei q" remains a the
odd duck due to the deep pipelining and parallelization.

3 years agolei import: add IMAP and (maildir|mbox*):$PATHNAME support
Eric Wong [Thu, 18 Feb 2021 20:22:23 +0000 (23:22 +0300)]
lei import: add IMAP and (maildir|mbox*):$PATHNAME support

This makes "lei import" more similar to "lei convert" and
allows importing from disparate sources simultaneously.

We'll also fix some ->child_error usage errors and make
the style of the code more similar to the "lei convert"
code.

v2: fix missing requires

3 years agolei convert: mail format conversion sub-command
Eric Wong [Thu, 18 Feb 2021 20:22:22 +0000 (23:22 +0300)]
lei convert: mail format conversion sub-command

This will make testing IMAP support for other commands easier, as
it doesn't write to lei/store at all.  Like the pager and MUA,
"git credential" is always spawned by script/lei (and not
lei-daemon) so it has a controlling terminal for password
prompts.

v2: fix missing requires, correct test ordering
v3: ensure config exists for IMAP auth

3 years agolei: completion: bash: generalize nospace usage
Eric Wong [Thu, 18 Feb 2021 12:27:09 +0000 (18:27 +0600)]
lei: completion: bash: generalize nospace usage

We'll be completing more options with ':', '//' and '=' in the
future, so make it easier to disable trailing spaces on
completions.

3 years agot/lei_to_mail: remove unnecessary arg passing
Eric Wong [Wed, 17 Feb 2021 10:07:03 +0000 (09:07 -0100)]
t/lei_to_mail: remove unnecessary arg passing

{zpipe} is contained entirely within the $l2m object, now.

3 years agotests: setup_public_inboxes: use IMAP-friendly newsgroups
Eric Wong [Wed, 17 Feb 2021 10:07:02 +0000 (09:07 -0100)]
tests: setup_public_inboxes: use IMAP-friendly newsgroups

-imapd won't support newsgroups ending with /\.[0-9]+\z/ since
it reserves those for partitioning inboxes into 50K slices.
So bump the home[0-9]+ version and switch to IMAP-friendly
newsgroup names.

3 years agolei import: move check_input_format to lei
Eric Wong [Wed, 17 Feb 2021 10:07:01 +0000 (09:07 -0100)]
lei import: move check_input_format to lei

We'll be supporting "lei convert" in a future change; so it
makes sense to share a common internal API for common error
messages.

3 years agolei import: start rearranging code for IMAP support
Eric Wong [Wed, 17 Feb 2021 10:07:00 +0000 (09:07 -0100)]
lei import: start rearranging code for IMAP support

More to come in a later commit; some error handling and failure
modes will be trickier with IMAP due to authentication.

3 years agowatch: connect to NNTP and IMAP in config order
Eric Wong [Wed, 17 Feb 2021 10:06:59 +0000 (09:06 -0100)]
watch: connect to NNTP and IMAP in config order

This is hopefully less surprising to users when they're prompted
for credentials.

3 years agowatch: move imap_common_init to NetReader
Eric Wong [Wed, 17 Feb 2021 10:06:58 +0000 (09:06 -0100)]
watch: move imap_common_init to NetReader

We'll use this in LeiImport and likely other places.

3 years agolei: bless config
Eric Wong [Wed, 17 Feb 2021 10:06:57 +0000 (09:06 -0100)]
lei: bless config

We'll be needing ->url_match from PublicInbox::Config

3 years agolei: fail_handler: use correct exit code
Eric Wong [Mon, 15 Feb 2021 07:43:44 +0000 (07:43 +0000)]
lei: fail_handler: use correct exit code

We were shifting in the wrong direction :x

3 years agot/psgi_search: fix test around date boundaries
Eric Wong [Mon, 15 Feb 2021 02:36:38 +0000 (02:36 +0000)]
t/psgi_search: fix test around date boundaries

git approxidate won't actually return times in the future,
so "1.{hour,day,year}.from.now" all return the current epoch
time.

So just use "now" and ensure we have a predictable time zone for
testing.

3 years agosearch: query_approxidate: cleanup regexp, more tests
Eric Wong [Thu, 11 Feb 2021 05:57:28 +0000 (12:57 +0700)]
search: query_approxidate: cleanup regexp, more tests

The cleanup doesn't seem to matter, I initially thought I needed
to handle "" (two double quotes) explicitly because that's what
Xapian does to escape a double quote inside a double-quoted
phrase.  It turns out we only need to be able to pass phrases
through to Xapian unmodified, and the existing group of
["\x{201c}\x{201d}] is sufficient for our purposes.

3 years agombox_reader: do not chomp non-blank EOL
Eric Wong [Fri, 12 Feb 2021 07:05:52 +0000 (00:05 -0700)]
mbox_reader: do not chomp non-blank EOL

It's conceivable some cases won't generate an empty line before
an mboxrd or mboxo From_ line.  Ensure we can handle that case
and don't leave the Eml->{bdy} without a trailing LF character.

And drop an unnecessary alarm import while we're in the area.

3 years agoimport_mbox: use MboxReader
Eric Wong [Fri, 12 Feb 2021 07:05:51 +0000 (00:05 -0700)]
import_mbox: use MboxReader

It supports more mbox variants and it's trailing newline
behavior is probably more correct despite the previous change
to PublicInbox::Filter::Vger.

3 years agofilter/vger: kill trailing newlines aggressively
Eric Wong [Fri, 12 Feb 2021 07:05:50 +0000 (00:05 -0700)]
filter/vger: kill trailing newlines aggressively

PublicInbox::MboxReader->(mboxrd|mboxo) only deletes the last
trailing newline, not every single trailing newline like
InboxWritable->import_mbox does.

Testing PublicInbox::MboxReader->mboxrd (next commit) with
scripts/import_vger_from_mbox on the LKML archive I got 2018 for
v2 development; this difference was responsible for a single
spam message(*) from out of 2722831 not being filtered correctly
and returning a different result.

(*) dated 2014-08-25

3 years agosearch: disallow spaces in argv approxidate queries
Eric Wong [Wed, 10 Feb 2021 19:57:59 +0000 (18:57 -0100)]
search: disallow spaces in argv approxidate queries

This is for consistency with --stdin and WWW front ends
which can't distinguish between phrase searches and
prefix ranges used for d:/dt:/rt:.

In any case, I expect users on the lei command-line are more
likely to use `5.days.ago' instead of `"5 days ago"'

3 years agosearch: use git approxidate in WWW and "lei q --stdin"
Eric Wong [Wed, 10 Feb 2021 19:57:58 +0000 (18:57 -0100)]
search: use git approxidate in WWW and "lei q --stdin"

This greatly improves the usability of d:, dt:, and rt: search
prefixes for users already familiar git's "approxidate" feature.

That is, users familiar with the --(since|after|until|before)=
options in git-log(1) and similar commands will be able to use
those dates in the WWW UI.

3 years agodoc: lei: update manpages
Kyle Meyer [Thu, 11 Feb 2021 04:04:15 +0000 (23:04 -0500)]
doc: lei: update manpages

Catch up with recent developments.

3 years agodoc: add lei-import(1)
Kyle Meyer [Thu, 11 Feb 2021 04:04:14 +0000 (23:04 -0500)]
doc: add lei-import(1)

3 years agodoc: lei: prefer 'location' and 'dirname'
Kyle Meyer [Thu, 11 Feb 2021 04:04:13 +0000 (23:04 -0500)]
doc: lei: prefer 'location' and 'dirname'

This follows the help output change in 52342875 (lei help: split out
into separate file, 2021-02-06).

3 years agodoc: lei q: use 'mfolder' as --output placeholder
Kyle Meyer [Thu, 11 Feb 2021 04:04:12 +0000 (23:04 -0500)]
doc: lei q: use 'mfolder' as --output placeholder

'mfolder' is familiar to mairix users, and 'path' isn't a good choice
because support will be added for IMAP.

Link: https://public-inbox.org/meta/YCBh62OqkYnr5cqw@dcvr
3 years agotests: skip properly with git <2.6
Eric Wong [Wed, 10 Feb 2021 21:50:48 +0000 (21:50 +0000)]
tests: skip properly with git <2.6

Tested with git 1.8.3.1 on CentOS 7.x

`plan skip_all => ...' doesn't work after some tests have run,
we have to call skip() instead.

3 years agosearch: fix argv handling of quoted phrases
Eric Wong [Wed, 10 Feb 2021 09:59:26 +0000 (08:59 -0100)]
search: fix argv handling of quoted phrases

This fixes both an old bug in "lei q" argv handling and one
recent regression introduced with the change to use approxidate.

Field prefixes are also handled correctly inside parenthesized
statements when the field follows "(" without a separation
character.

Fixes: fbb7ccabbf54a405 ("lei q: use git approxidate with d:, dt: and rt: ranges")
3 years agolei_external: fix+test handling of escaped braces
Eric Wong [Wed, 10 Feb 2021 08:38:39 +0000 (07:38 -0100)]
lei_external: fix+test handling of escaped braces

While '{' and '}' are rare in path names, somebody may still
use them or deal with software which does (e.g. GNU arch).

3 years agonet_reader: new package split from -watch
Eric Wong [Wed, 10 Feb 2021 07:07:49 +0000 (07:07 +0000)]
net_reader: new package split from -watch

We'll be using some of this for IMAP and NNTP support in lei,
too.  More will need to be done to improve code sharing and
reusability, soon, but this is a start.

3 years agolei: note some TODO items (curl, externals)
Eric Wong [Wed, 10 Feb 2021 07:07:48 +0000 (07:07 +0000)]
lei: note some TODO items (curl, externals)

I don't know if it's worth it to use libcurl directly
(nor the effort to support and maintain tests)

3 years agolei ls-external: support --local and --remote
Eric Wong [Wed, 10 Feb 2021 07:07:47 +0000 (07:07 +0000)]
lei ls-external: support --local and --remote

Similar to "lei q", "--local" means only local and "--remote"
means remote only.  I can't think of a reason to have --no-*
variants for these switches.

There's also updates to the TestCommon for more common lei
cases.

3 years agotest_common: support lei-daemon only testing
Eric Wong [Wed, 10 Feb 2021 07:07:46 +0000 (07:07 +0000)]
test_common: support lei-daemon only testing

Daemon-only tests can be significantly faster due to cached
configs; so give developers a chance to test only daemons to
improve productivity.

The differences between daemon and oneshot modes are minimal,
at this point.

3 years agolei_external: remove unnecessary Exporter use
Eric Wong [Wed, 10 Feb 2021 07:07:45 +0000 (07:07 +0000)]
lei_external: remove unnecessary Exporter use

We don't need to export for methods which are only called via
"->" or "->can".

3 years agolei *external: glob improvements, ls-external filtering
Eric Wong [Wed, 10 Feb 2021 07:07:44 +0000 (07:07 +0000)]
lei *external: glob improvements, ls-external filtering

The "ls-external" now accepts the same glob patterns used by
with lei q --{include,only,exclude}.  If no glob is detected, it
will be treated as a literal substring match (like "grep -F").

Inverting matches is also supported ("grep -v").

3 years agotests|lei: fixes for TEST_RUN_MODE=0 and lei oneshot
Eric Wong [Tue, 9 Feb 2021 08:09:37 +0000 (07:09 -0100)]
tests|lei: fixes for TEST_RUN_MODE=0 and lei oneshot

DESTROY callbacks can clobber $?, so we must take care to
preserve it when exiting.  We'll also try to make an effort to
ensure better DESTROY ordering and delete as much as possible
before x_it finishes.

We also need to load PublicInbox::Config when setting up
public inboxes.

3 years agolei: replace "I:"-prefixed info messages with "#"
Eric Wong [Tue, 9 Feb 2021 08:09:36 +0000 (07:09 -0100)]
lei: replace "I:"-prefixed info messages with "#"

The "#" is what TAP <https://testanything.org/> uses,
which is also consistent with what our (and many other)
test suites emit.

3 years agot/run.perl: drop Cwd dependency
Eric Wong [Tue, 9 Feb 2021 08:09:35 +0000 (07:09 -0100)]
t/run.perl: drop Cwd dependency

Perl 5.8.8/5.10.0+ can use fchdir(), and we depend on 5.10.1+

3 years agolei q: prefix --alert ops with ':' instead of '-'
Eric Wong [Tue, 9 Feb 2021 08:09:34 +0000 (07:09 -0100)]
lei q: prefix --alert ops with ':' instead of '-'

Using dashed keywords confuses the option parser without
"=" signs (and bash completion doesn't yet work with "=").

So use ":" instead of "-" as the prefix for internal ops,
since ":" is just as unlikely to be the first character of
an executable file in a user's $PATH.

3 years agouse MdirReader in -watch and InboxWritable
Eric Wong [Tue, 9 Feb 2021 08:09:33 +0000 (07:09 -0100)]
use MdirReader in -watch and InboxWritable

MdirReader now handles files in "$MAILDIR/new" properly and
is stricter about what it accepts.  eml_from_path is also
made robust against FIFOs while eliminating TOCTOU races with
between stat(2) and open(2) calls.

3 years agot/run.perl: fix for >128 tests
Eric Wong [Tue, 9 Feb 2021 08:09:32 +0000 (07:09 -0100)]
t/run.perl: fix for >128 tests

We need to explicitly close the write-end of the pipe in workers
to ensure they don't prevent each other from seeing EOF.

Also, make a note to keep using the pipe for now since
Linux <3.14 had broken read(2) semantics when file descriptions
are shared across threads/processes.

3 years agolei: split out MdirReader package, lazy-require earlier
Eric Wong [Tue, 9 Feb 2021 08:09:31 +0000 (07:09 -0100)]
lei: split out MdirReader package, lazy-require earlier

We'll do more requires in the top-level lei-daemon process to
save work in workers.  We can also work towards aborting on
user errors in lei-daemon rather than worker processes.

"lei import -f mbox*" is finally tested inside t/lei_to_mail.t

3 years agogit: ->qx: respect caller's $/ in array context
Eric Wong [Tue, 9 Feb 2021 08:09:30 +0000 (07:09 -0100)]
git: ->qx: respect caller's $/ in array context

This could lead to bad results when doing ls-tree -z
for v2 import in case there's multiple files.  In any case,
the `local $/ = "\0"' in Import.pm is also eliminated to
reduce potential confusion and surprises.