Eric Wong [Thu, 31 Dec 2020 13:51:52 +0000 (13:51 +0000)]
avoid calling waitpid from children in DESTROY
Objects with DESTROY callbacks get propagated to children, so we
must be careful to not invoke waitpid from children on their
sibling processes. Only parents (and their parents...) can reap
child processes.
Eric Wong [Thu, 31 Dec 2020 13:51:51 +0000 (13:51 +0000)]
lei: avoid Spawn package when starting daemon
Spawn was designed to speed up process spawning inside
long-lived daemons with largish memory usage. It does not help
for short-lived scripts which only exist to start and connect to
a daemon.
This change actually speeds up initial lei startup from
~190ms to ~140ms(!). Normal usage once the daemon is running
is unaffected, at <20ms for help text.
While we're in the area, simplify Cwd error message generation,
too.
Eric Wong [Thu, 31 Dec 2020 13:51:50 +0000 (13:51 +0000)]
syscall: SFD_NONBLOCK can be a constant, again
Since Perl exposes O_NONBLOCK as a constant, we can safely make
SFD_NONBLOCK a constant, too. This is not the case for
SFD_CLOEXEC, since O_CLOEXEC is not exposed by Perl despite
being used internally in the interpreter.
Eric Wong [Thu, 31 Dec 2020 13:51:49 +0000 (13:51 +0000)]
use PublicInbox::DS for dwaitpid
This simplifies our code and provides a more consistent API for
error handling. PublicInbox::DS can be loaded nowadays on all
*BSDs and Linux distros easily without extra packages to
install.
The downside is possibly increased startup time, but it's
probably not as a big problem with lei being a daemon
(and -mda possibly following suite).
Eric Wong [Thu, 31 Dec 2020 13:51:47 +0000 (13:51 +0000)]
searchidxshard: call DS->Reset at worker start
The daemon for the local email interface will be inside
the DS->EventLoop. -watch currently doesn't trigger this
bug since it doesn't enable parallelism, but it may in
the future.
Eric Wong [Thu, 31 Dec 2020 13:51:46 +0000 (13:51 +0000)]
lei_to_mail: open FIFOs O_WRONLY so we block
Opening a FIFO with O_RDWR always succeeds on Linux, which
cause the cat(1) process invoked by t/lei_to_mail.t to get
stuck. Furthermore O_APPEND makes no sense on FIFOs and
perhaps there's some kernel out there which will reject it.
Eric Wong [Thu, 31 Dec 2020 13:51:40 +0000 (13:51 +0000)]
lei_to_mail: unlink mboxes if not augmenting
This matches mairix(1) behavior and may be safer if there's
concurrent readers on the existing mbox, especially since
we don't do currently implement mbox locking (nor does mairix).
Eric Wong [Thu, 31 Dec 2020 13:51:39 +0000 (13:51 +0000)]
ipc: use shutdown(2), base atfork* callback
shutdown(2) on a socket can be preferable if there's multiple
forked processes writing to a single worker and we really want
to shut things down ASAP.
It may also be good to provide an ipc_worker_exit method which
subclasses can override if needed for graceful shutdown. But we
won't need equivalents to atexit(3) since we can rely on DESTROY
handlers given this is Perl5.
Eric Wong [Thu, 31 Dec 2020 13:51:36 +0000 (13:51 +0000)]
mid: use defined-or with `push' for uniqueness check
As shown recently in commit a05445fb400108e60ede7d377cf3b26a0392eb24
("config: config_fh_parse: micro-optimize"), the relying on
the return value of `push' and defined-or operators can avoid
modifying a the hash value scalar with an increment.
Eric Wong [Thu, 31 Dec 2020 13:51:35 +0000 (13:51 +0000)]
lei: rename "extinbox" => "external"
The words "extinbox" and "extindex" are too close and easy to
confuse with the other. Rename "extinbox" to "external", since
these could be IMAP, JMAP or other non-public-inbox search APIs.
Eric Wong [Thu, 31 Dec 2020 13:51:32 +0000 (13:51 +0000)]
ipc: generic IPC dispatch based on Storable
I intend to use this with LeiStore when importing from multiple
slow sources at once (e.g. curl, IMAP, etc). This is because
over.sqlite3 can only have a single writer, and we'll have
several slow readers running in parallel.
Watch and SearchIdxShard should also be able to use this code
in the future, but this will be proven with LeiStore, first.
Eric Wong [Thu, 31 Dec 2020 13:51:30 +0000 (13:51 +0000)]
lei_to_mail: support for non-seekable outputs
Users may wish to pipe output to "git am", "spamc",
or similar, so we need to support those cases and
not bail out on lseek(2) or ftruncate(2) failures.
Eric Wong [Thu, 31 Dec 2020 13:51:27 +0000 (13:51 +0000)]
lei_to_mail: start --augment, dedupe, bz2 and xz
--augment will match the mairix(1) option of the same
name to augment existing search results. We'll need
to implement deduplication for a better user experience.
mutt ships with compressed mbox support for bz2 and xz,
at least, so we'll support those out-of-the-box.
Eric Wong [Thu, 31 Dec 2020 13:51:25 +0000 (13:51 +0000)]
lei_to_mail: start atomic and compressed mbox writing
We'll allow using multiple workers to write to a single
mbox (which could be compressed). This is can be done
safely with O_APPEND + syswrite for uncompressed files,
and using a lock when piping to pigz/gzip/bzip2/xz.
Eric Wong [Fri, 1 Jan 2021 04:51:46 +0000 (04:51 +0000)]
Merge tag 'v1.6.1' into eidx
public-inbox 1.6.1 - minor bugfix release
* tag 'v1.6.1': (31 commits)
public-inbox 1.6.1 - minor bugfix release
import: drop X-Status in addition to Status
eml: fix undefined vars on <Perl 5.28
t/config: test --get-urlmatch for git <2.26
inboxidle: avoid needless syscalls on refresh
inboxidle: clue users into resolving ENOSPC from inotify
inbox: name variable for values loop iterator
public-inbox-v[12]-format.pod: make lexgrog happy
manifest.js.gz: fix per-inbox /$INBOX/manifest.js.gz
Fix manpage section of perl module documentation
t/psgi_v2: ignore warnings on missing P::M::ReverseProxy
daemon: support --daemonize without Net::Server::Daemonize
doc: v2-format: drop repeated word
over: ensure old, merged {tid} is really gone
wwwattach: prevent deep-linking via Referer match
t/eml.t: workaround newer Email::MIME* behavior
nntp: attempt RFC 5536 3.1.5-conformant Path: headers
nntp: delimit Newsgroup: header with commas
tls: epollbit: account for miscellaneous OpenSSL errors
scripts/dupe-finder: restore $dbh variable
...
Eric Wong [Thu, 31 Dec 2020 13:24:36 +0000 (13:24 +0000)]
Merge remote-tracking branch 'origin/master' into lorelei
* origin/master: (58 commits)
ds: flatten + reuse @events, epoll_wait style fixes
ds: simplify EventLoop implementation
check defined return value for localized slurp errors
import: check for git->qx errors, clearer return values
git: qx: avoid extra "local" for scalar context case
search: remove {mset} option for ->mset method
search: remove pointless {relevance} setting
miscsearch: take reopen from Search and use it
extsearch: unconditionally reopen on access
extindex: allow using --all without EXTINDEX_DIR
extindex: add undocumented --no-scan switch
extindex: enable autoflush on STDOUT/STDERR
extindex: various --watch signal handling fixes
extindex: --watch for inotify-based updates
eml: fix undefined vars on <Perl 5.28
t/config: test --get-urlmatch for git <2.26
default to CORE::warn in $SIG{__WARN__} handlers
inbox: name variable for values loop iterator
inboxidle: avoid needless syscalls on refresh
inboxidle: clue users into resolving ENOSPC from inotify
...
Eric Wong [Sat, 26 Dec 2020 11:13:11 +0000 (11:13 +0000)]
lei: rename proposed "query" command to "q", add JSON output
Using "query" as a verb may be confusing when we'll also refer to
them as nouns with the "<ls|rm|mv>-query" sub commands. "query"
is also many characters to type without tab-completion on what I
expect to be one of the most commonly used sub-commands
Furthermore, "q" is also the common query parameter name used by
our PSGI interface, as is the case with several major web search
engines; so there's an element of familiarity there.
The name "search" was disregarded because "show" could be a
commonly used lei sub-command, too, and typing "se" for
tab-completion may be slow since two-handed typists on QWERTY
keyboards won't be able to use alternating hands.
"f" or "find" could be a possibility here, too; but we're
currently using the term "forget" as a weaker version of
"remove" or "rm", though "ignore" could be substituted for
"forget", perhaps...
Kyle Meyer noted the lack of (proposed) JSON output support
so that's been added to the proposed UI.
Eric Wong [Sun, 27 Dec 2020 20:02:51 +0000 (20:02 +0000)]
lei_xsearch: cross-(inbox|extindex) search
While a single extindex combines multiple inboxes into a single
search index, extindex still requires up-front indexing on items
which can be searched. XSearch has no on-disk footprint itself
and uses Xapian DBs of existing publicinbox and extindex
("extinbox") exclusively.
XSearch still suffers from the multi-shard Xapian scalability
problems which led to the creation of extindex, but I expect the
number of shards to remain relatively low.
I envision users hosting public-inbox instances on their
workstations will only have two extindex combined by this, one
read-only extindex for serving public archives, and one
read-write extindex managed by LeiStore for private mail.
Eric Wong [Thu, 17 Dec 2020 09:14:48 +0000 (09:14 +0000)]
import: drop X-Status in addition to Status
It's actually supported by mutt, dovecot[1], and likely some other
software to augment the Status: header. While dovecot doesn't
expose X-Status to clients, mutt will write 'A' (answered) and
'F' to X-Status (but not T (draft)).
So we'll drop it like we do Status since it's not suitable for
public mail, but stick it in an @UNWANTED_HEADERS array will
allow us to configure an override if needed.
Consistently returning the equivalent of pollfd.revents in a
portable manner was never worth the effort for us, as we use the
same ->event_step callback regardless of POLLIN/POLLOUT/POLLHUP.
Being a Perl, @events knows it size and we don't have to return
a maximum index for the caller to iterate on.
We can also avoid redundant integer coercion ("+0") since we
ensure everything is an IV in other places.
Finally, vec() is preferable to ("\0" x $size) for resizing
buffers because it only needs to write the extended portion
and not overwrite the entire buffer.
Eric Wong [Sun, 27 Dec 2020 02:53:06 +0000 (02:53 +0000)]
ds: simplify EventLoop implementation
More importantly, make it easier-to-find the sub by avoiding
runtime manipulation of subroutine names. There's no point in
avoiding a potential call to _InitPoller in EventLoop since
entering EventLoop is rare.
On the contrary, PublicInbox::DS->new is called often and this
change to avoid entering _InitPoller there may have more
benefits (which may still be unmeasurable).
Eric Wong [Sun, 27 Dec 2020 19:38:29 +0000 (19:38 +0000)]
search: remove {mset} option for ->mset method
The ->mset method always returns a Xapian mset nowadays, so
naming a parameter {mset} is too confusing. As it does with
MiscSearch, setting the {relevance} parameter to -1 now sorts by
ascending docid order. -2 is now supported for descending
docid order, too, since it may be useful for lei users.
Eric Wong [Sun, 27 Dec 2020 11:01:42 +0000 (11:01 +0000)]
miscsearch: take reopen from Search and use it
As with ExtSearch, MiscSearch lacks a janky cleanup timer of
PublicInbox::Inbox objects, leading to info about
inboxes/newsgroups going stale. Fortunately, we don't use
MiscSearch very heavily, yet.
In the future, we may be able to detect new inboxes without
having to SIGHUP or restart daemons using MiscSearch.
Eric Wong [Sun, 27 Dec 2020 11:01:41 +0000 (11:01 +0000)]
extsearch: unconditionally reopen on access
Since ExtSearch lacks the janky cleanup timer of
PublicInbox::Inbox objects, its search results get stale.
Reopen the Xapian DB on every ->search call for now, as
reducing reopen calls doesn't seem worth the complexity.
The Xapian::Database::reopen operation itself takes only ~50us
on my old workstation with 3 shards totaling <200GB. Other
parts of Xapian dominates the search time, so the reopen seems
inconsequential with single-digit shard counts.
Eric Wong [Sat, 26 Dec 2020 10:16:24 +0000 (10:16 +0000)]
extindex: allow using --all without EXTINDEX_DIR
If "--all" is specified to index all inboxes, implicitly choose
the configured [extindex "all"] external index since "--all" is
incompatible with specifying inbox directories on the
command-line.
Eric Wong [Sat, 26 Dec 2020 10:16:23 +0000 (10:16 +0000)]
extindex: add undocumented --no-scan switch
This makes diagnosing --watch problems easier when there's
50K inboxes by avoiding the lengthy scan (which is the reason
--watch exists in the first place).
Eric Wong [Sat, 26 Dec 2020 10:16:22 +0000 (10:16 +0000)]
extindex: enable autoflush on STDOUT/STDERR
With --watch, the output may be redirected to a pipe or socket
which Perl may decide to buffer. Ensure Perl doesn't buffer
these outputs since they can provide real-time status updates
in response to signals or FS activity.
Eric Wong [Sat, 26 Dec 2020 10:16:21 +0000 (10:16 +0000)]
extindex: various --watch signal handling fixes
We need to clobber the SIGUSR1 resync queue on SIGHUP to
invalidate old inbox objects. Furthermore, the lengthy
initial scan needs to ignore signals intended for the
event loop to avoid unexpected behavior. Finally, add
some progress output to inform users on the terminal
to inform users' of progress.
Eric Wong [Sat, 26 Dec 2020 01:44:37 +0000 (01:44 +0000)]
extindex: --watch for inotify-based updates
This reuses existing InboxIdle infrastructure to update external
indices based on per-inbox updates. This is an alternative to
auto-updating external indices via the -index command and also
works with existing uses of -mda and public-inbox-watch.
Using inotify (or EVFILT_VNODE) allows watching thousands of
inboxes without having to scan every single one at every
invocation.
This is especially beneficial in cases where an external index
is not writable to the users writing to per-inbox indices.
Eric Wong [Sat, 26 Dec 2020 12:25:42 +0000 (12:25 +0000)]
eml: fix undefined vars on <Perl 5.28
Encode::MIME::Header::_decode_octets did not correctly default
to Encode::FB_DEFAULT until Encode 2.93 (perl5.git commit 0c541dc5633a341cf44b818014b58e7f8be532e9). Provide the default
again to work with older Perls.
Eric Wong [Sat, 26 Dec 2020 12:30:35 +0000 (12:30 +0000)]
t/config: test --get-urlmatch for git <2.26
While git 1.8.5 learned --get-urlmatch, git did not learn to
match URLs against wildcards until 2.26. So only depend on
1.8.5 for this test since 2.26 is too new.
Eric Wong [Sat, 26 Dec 2020 09:34:39 +0000 (09:34 +0000)]
inboxidle: avoid needless syscalls on refresh
We don't have to replace a bunch of existing watches
with identical new ones. On Linux with Linux::Inotify2
installed, this avoids a storm of inotify_add_watch(2)
and inotify_rm_watch(2) syscalls on SIGHUP with -imapd
and "-extindex --watch"
Eric Wong [Sat, 26 Dec 2020 05:59:22 +0000 (05:59 +0000)]
inboxidle: clue users into resolving ENOSPC from inotify
It may not be obvious to users a ENOSPC error is from hitting
a (tunable) kernel-imposed limit on inotify watches, and not
some storage device running out of space. Give them a hint
here to reduce our own support burden.
Eric Wong [Sat, 26 Dec 2020 08:12:52 +0000 (08:12 +0000)]
inbox: name variable for values loop iterator
->on_inbox_unlock callbacks could clobber $_, and this seems to
fix a problem with -extindex --watch failing to index some
inboxes after SIGHUP reload.
Uwe Kleine-König [Tue, 22 Dec 2020 17:18:10 +0000 (18:18 +0100)]
public-inbox-v[12]-format.pod: make lexgrog happy
The Debian package linter (lintian) emits the following warning:
W: bad-whatis-entry
N:
N: A manual page should start with a NAME section, which lists the
N: program name and a brief description. The NAME section is used to
N: generate a database that can be queried by commands like apropos and
N: whatis. You are seeing this tag because lexgrog was unable to parse
N: the NAME section.
N:
N: Manual pages for multiple programs, functions, or files should list
N: each separated by a comma and a space, followed by \- and a common
N: description.
N:
N: Listed items may not contain any spaces. A manual page for a two-level
N: command such as fs listacl must look like fs_listacl so the list is
N: read correctly.
N:
N: Refer to the lexgrog(1) manual page, the groff_man(7) manual page, and
N: the groff_mdoc(7) manual page for details.
N:
N: Severity: warning
N:
N: Check: documentation/manual
N:
N: Renamed from: manpage-has-bad-whatis-entry
N:
for public-inbox-v1-format and public-inbox-v2-format.
Adapt the descriptions to make lexgrog and so lintian happy.
Uwe Kleine-König [Fri, 18 Dec 2020 11:56:14 +0000 (12:56 +0100)]
Fix manpage section of perl module documentation
On Debian (at least) perl documentation is supposed to be installed in
section 3pm. With the build system hardcoding this to 3 instead this
results in a warning by the Debian package linter:
W: public-inbox: wrong-manual-section usr/share/man/man3/PublicInbox::Git.3.gz:74 3 != 3pm
W: public-inbox: wrong-manual-section usr/share/man/man3/PublicInbox::Import.3.gz:74 3 != 3pm
W: public-inbox: wrong-manual-section usr/share/man/man3/PublicInbox::SaPlugin::ListMirror.3.gz:74 3 != 3pm
W: public-inbox: wrong-manual-section ... use --no-tag-display-limit to see all (or pipe to a file/program)
Eric Wong [Wed, 16 Dec 2020 04:39:37 +0000 (04:39 +0000)]
t/psgi_v2: ignore warnings on missing P::M::ReverseProxy
Plack::Test::ExternalServer doesn't depend on
Plack::Middleware::ReverseProxy, so we need to account for
some warnings in stderr if P::M::RP is missing.
Eric Wong [Tue, 15 Dec 2020 11:47:16 +0000 (11:47 +0000)]
daemon: support --daemonize without Net::Server::Daemonize
We don't actually need Net::Server::Daemonize to support
the --daemonize flag, since the daemonize() sub provided
by N::S::D doesn't exactly do the things we want.
Eric Wong [Fri, 4 Dec 2020 12:09:29 +0000 (12:09 +0000)]
over: ensure old, merged {tid} is really gone
We must use the result of link_refs() since it can trigger
merge_threads() and invalidate $old_tid. In case
merge_threads() isn't triggered, link_refs() will return
$old_tid anyways.
When rethreading and allocating new {tid}, we also must update
the row where the now-expired {tid} came from to ensure only the
new {tid} is seen when reindexing subsequent messages in
history. Otherwise, every subsequently reindexed+rethreaded
message could end up getting a new {tid}.
Eric Wong [Mon, 23 Nov 2020 14:15:35 +0000 (14:15 +0000)]
wwwattach: prevent deep-linking via Referer match
This prevents `<img src=' tags from being used to deep-link
image attachments from HTML outside of the current host and
reduces potential for abuse.
Some browsers (e.g. Firefox) favor content detection and will
display images irrespective of the Content-Type header being
"application/octet-stream", and "Content-Disposition: attachment"
doesn't stop them, either.
Eric Wong [Sun, 15 Nov 2020 08:56:09 +0000 (08:56 +0000)]
t/eml.t: workaround newer Email::MIME* behavior
Recent (2020) versions of Email::MIME (and/or dependencies)
have different behavior than historical versions which seem
to be less DWIM and perhaps technically more correct. We'll
retain historical behavior for now, since it doesn't seem to
cause real problems and DWIM-ness is often required to make
sense of historical mail.
Tested on a FreeBSD 11.4 VM with the following packages:
Perhaps some NNTP clients would be unhappy with the old value
"y". So use a bit more bandwidth+space to use the server-name
and historical "!not-for-mail" tail-entry to better conform to
a published RFC.
Eric Wong [Fri, 30 Oct 2020 02:13:58 +0000 (02:13 +0000)]
tls: epollbit: account for miscellaneous OpenSSL errors
Apparently they happen (triggered by my -imapd instance), so
bail out by closing the underlying socket rather than stopping
the event loop and daemon process.
Eric Wong [Sun, 20 Sep 2020 01:43:15 +0000 (01:43 +0000)]
config: warn on multiple values for some fields
Our code doesn't support multi-values for these, and having
unexpected arrays leads to unexpected results (e.g. showing
stuff like "ARRAY(0xDEADBEEFADD12E55)" in user interfaces). So
warn and only use the last value (matching git-config(1)
behavior without `--get-all').
Eric Wong [Thu, 17 Sep 2020 21:25:22 +0000 (21:25 +0000)]
doc: txt2pre: more manpage URLs
We host our own -imapd manpage, and we started using a few more
git commands (fast-import for ages). We'll also need to link to
manpages.debian.org and live with long URLs for a few
non-standard manpages in software we reference.
Eric Wong [Sat, 26 Dec 2020 12:25:42 +0000 (12:25 +0000)]
eml: fix undefined vars on <Perl 5.28
Encode::MIME::Header::_decode_octets did not correctly default
to Encode::FB_DEFAULT until Encode 2.93 (perl5.git commit 0c541dc5633a341cf44b818014b58e7f8be532e9). Provide the default
again to work with older Perls.
Eric Wong [Sat, 26 Dec 2020 12:30:35 +0000 (12:30 +0000)]
t/config: test --get-urlmatch for git <2.26
While git 1.8.5 learned --get-urlmatch, git did not learn to
match URLs against wildcards until 2.26. So only depend on
1.8.5 for this test since 2.26 is too new.
Eric Wong [Sat, 26 Dec 2020 01:44:36 +0000 (01:44 +0000)]
default to CORE::warn in $SIG{__WARN__} handlers
As with CORE::die and $SIG{__DIE__}, it turns out CORE::warn is
safe to use inside $SIG{__WARN__} handlers without triggering
infinite recursion. So fall back to reusing CORE::warn instead
of creating a new sub.
Eric Wong [Sat, 26 Dec 2020 08:12:52 +0000 (08:12 +0000)]
inbox: name variable for values loop iterator
->on_inbox_unlock callbacks could clobber $_, and this seems to
fix a problem with -extindex --watch failing to index some
inboxes after SIGHUP reload.
Eric Wong [Sat, 26 Dec 2020 09:34:39 +0000 (09:34 +0000)]
inboxidle: avoid needless syscalls on refresh
We don't have to replace a bunch of existing watches
with identical new ones. On Linux with Linux::Inotify2
installed, this avoids a storm of inotify_add_watch(2)
and inotify_rm_watch(2) syscalls on SIGHUP with -imapd
and "-extindex --watch"
Eric Wong [Sat, 26 Dec 2020 05:59:22 +0000 (05:59 +0000)]
inboxidle: clue users into resolving ENOSPC from inotify
It may not be obvious to users a ENOSPC error is from hitting
a (tunable) kernel-imposed limit on inotify watches, and not
some storage device running out of space. Give them a hint
here to reduce our own support burden.
Eric Wong [Fri, 25 Dec 2020 10:21:15 +0000 (10:21 +0000)]
index: filter out indexlevel=basic from extindex
extindex users will likely want to use indexlevel=basic for
per-inbox indices, however extindex itself doesn't support basic
index level (yet?). Let's ensure we don't trip up extindex
users who specify "-L basic" on the -index command-line.
Eric Wong [Fri, 25 Dec 2020 10:21:14 +0000 (10:21 +0000)]
v2writable: don't verify tip if reindexing
We only rely on git-rev-parse to resolve symbolic names ("HEAD")
to a SHA-* git commit ID. We'll assume any git commit IDs we
get from SQLite DBs are valid and let "git-log" fail if it
isn't.
Eric Wong [Fri, 25 Dec 2020 10:21:12 +0000 (10:21 +0000)]
index: do not attach inbox to extindex unless updated
We'll count the number of log changes (regardless of index or
unindex) and only attach inboxes to ExtSearchIdx objects when
they get new work. We'll also reduce lock bouncing and only
update external indices after all per-inbox indexing is done.
This also updates existing v2 indexing/unindexing callers
to be more consistent and ensures unindex log entries update
per-inbox last commit information.
Eric Wong [Fri, 25 Dec 2020 10:21:11 +0000 (10:21 +0000)]
extsearchidx: close DB handles after use if FD constrained
Most distros ship with low RLIMIT_NOFILE limits and surprises
may lurk for admins who configure many inboxes. Keep FD usage
under control to avoid EMFILE errors at inopportune times during
reindex.
From what I can tell, this is the only place where extindex can
have unpredictable FD growth when there's thousands of inboxes,
and it's in an extremely rare code path.
Eric Wong [Fri, 25 Dec 2020 10:21:10 +0000 (10:21 +0000)]
extsearchidx: delay SQLite availability checks
This will make attach_inbox faster for no-op calls. It also
helps us avoid races in case msgmap or over.sqlite3 gets
unlinked while -extindex is running.
Eric Wong [Thu, 24 Dec 2020 10:09:19 +0000 (10:09 +0000)]
index: support --fast-noop / -F switch
Note: I'm not sure if it's worth documenting and supporting this
long-term.
We can can avoid taking locks for invocations of "index --all"
and rely on high-resolution ctime (struct timespec st_ctim)
comparisons of msgmap.sqlite3 and the packed-refs + refs/heads
directory of the newest epoch.
This cuts public-inbox-index invocations with
"--all --no-update-extindex -L basic" down from 0.92s to 0.31s.
The change with "-L medium" or "-L full" and (default) non-zero
jobs is even more drastic, reducing a 12-13s no-op invocation
down to the same 0.31s