Eric Wong [Thu, 12 Dec 2019 21:16:49 +0000 (21:16 +0000)]
daemon: use DESTROY for unlinking --pid-file
This gets rid of the last "END{}" block in our code and cleans
up a (temporary) circular reference.
Furthermore, ensure the cleanup code still works in all
configurations by adding tests and testing both the -W1
(default, 1 worker) and -W0 (no workers) code paths.
Eric Wong [Fri, 29 Nov 2019 12:25:07 +0000 (12:25 +0000)]
msgtime: drop Date::Parse for RFC2822
Date::Parse is not optimized for RFC2822 dates and isn't
packaged on OpenBSD. It's still useful for historical
email when email clients were less conformant, but is
less relevant for new emails.
Eric Wong [Fri, 29 Nov 2019 12:25:06 +0000 (12:25 +0000)]
add msgtime_cmp maintainer test
Changes will be coming for MsgTime to stop depending on
Date::Parse due to lack of package availability on OpenBSD
and suboptimal performance on RFC822 dates.
Eric Wong [Fri, 29 Nov 2019 12:25:05 +0000 (12:25 +0000)]
git: async batch interface
This is a transitionary interface which does NOT require an
event loop. It can be plugged into in current synchronous code
without major surgery.
It allows HTTP/1.1 pipelining-like functionality by taking
advantage of predictable and well-specified POSIX pipe semantics
by stuffing multiple git cat-file requests into the --batch pipe
With xt/git_async_cmp.t and GIANT_GIT_DIR=git.git, the async
interface is 10-25% faster than the synchronous interface since
it can keep the "git cat-file" process busier.
This is expected to improve performance on systems with slower
storage (but multiple cores).
Eric Wong [Sun, 1 Dec 2019 10:50:19 +0000 (10:50 +0000)]
build: support doc generation w/o GNU make
We can replace the GNU-isms for building docs with Perl5
equivalents. The only downside is the resulting Makefile
gets larger, but that's the price of portability.
Eric Wong [Fri, 29 Nov 2019 10:14:13 +0000 (10:14 +0000)]
spawn: remove support for clearing the env
It's unnecessary code which I'm not sure we ever used. In
retrospect, completely clearing the environment doesn't make
sense for the processes we spawn. We don't need to clobber
individual environment variables in our code, either
(and if we did for tests, we can use 'local').
Eric Wong [Fri, 29 Nov 2019 10:14:11 +0000 (10:14 +0000)]
TODO: update and add a few more items
SpamAssassin has used re2c (via sa-compile) for many years, now,
and it seems to work fine, there. GMime also looks promising
when combined with Inline::C since GMime can operate on mmap-ed
regions.
Given the inevitable demise of many .orgs when price rise;
supporting a URL rewriter similar to .mailmap makes sense.
And HTTP CONNECT seems like something our -httpd can support
to let firewalled users read over NNTP.
Eric Wong [Fri, 29 Nov 2019 10:14:10 +0000 (10:14 +0000)]
ds: ->Reset initializes $nextq
I haven't noticed this being a problem in practice, but
be consistent with the rest of the singleton stuff.
Since we always call Reset() at load time, only do
initialization in that sub and not at declaration.
Eric Wong [Fri, 29 Nov 2019 10:14:09 +0000 (10:14 +0000)]
t/common: set $0 when running script w/o fork
We can localize changes to $0 so $0 is restored when the
"script" sub is done. This will be helpful when we encounter
a stuck/slow processes during our tests (hopefully never!)
Eric Wong [Fri, 29 Nov 2019 10:14:08 +0000 (10:14 +0000)]
t: localize the PI_CONFIG env
We don't want the user's ~/.public-inbox/config to be read from
during tests. I only noticed this because I had a non-existent
pathname for one of my inboxes :x
I've also verified this change by running "inotifywait
~/.public-inbox/config -m" in another terminal while running
"make check"; (perhaps a portable solution could make it
into the test suite).
Eric Wong [Wed, 27 Nov 2019 11:07:18 +0000 (11:07 +0000)]
Makefile.PL: MANIFEST dependency fix
We need to force an update to Makefile (not Makefile.PL) when
MANIFEST changes. Since "Makefile" (aka. "$(FIRST_MAKEFILE)")
is already a single-colon make target; we can't create a
double-colon rule to augment it. So we'll continue using a
"Makefile.PL" rule, but have it recreate the resulting Makefile
Finally, change the "check" target to use "prove -b" instead of
"prove -l" so we test against "blib/lib", since what's in the
"blib" dir will be installed.
Fixes: 4c20de0694d06ff3 ("Makefile.PL: add dependency on MANIFEST contents")
Eric Wong [Wed, 27 Nov 2019 01:33:33 +0000 (01:33 +0000)]
httpd|nntpd: avoid missed signal wakeups
Our attempt at using a self-pipe in signal handlers was
ineffective, since pure Perl code execution is deferred
and Perl doesn't use an internal self-pipe/eventfd. In
retrospect, I actually prefer the simplicity of Perl in
this regard...
We can use sigprocmask() from Perl, so we can introduce
signalfd(2) and EVFILT_SIGNAL support on Linux and *BSD-based
systems, respectively. These OS primitives allow us to avoid a
race where Perl checks for signals right before epoll_wait() or
kevent() puts the process to sleep.
The (few) systems nowadays without signalfd(2) or IO::KQueue
will now see wakeups every second to avoid missed signals.
Eric Wong [Wed, 27 Nov 2019 01:33:32 +0000 (01:33 +0000)]
dskqxs: fix missing EV_DISPATCH define
Oops, IO::KQueue support was broken due to this missing
constant. Add a new ds-kqxs.t test case to ensure we
test the IO::KQueue path if IO::KQueue is available.
Eric Wong [Mon, 25 Nov 2019 05:24:54 +0000 (05:24 +0000)]
msgtime: deal with strange minutes in TZ offsets
I'm not sure if TZ minute offsets aside from '00' or '30' exist,
but lets just deal with them properly when negative. Examples
taken from various inboxes on lore.kernel.org. These are mostly
message from spammers, but some are legitimate messages.
Eric Wong [Sun, 24 Nov 2019 00:22:37 +0000 (00:22 +0000)]
tests: move giant inbox/git dependent tests to xt/
xt/ is typically reserved for "eXtended tests" intended for
the maintainers and not ordinary users. Since these require
special configuration and do nothing by waste cycles
during startup, they qualify.
Eric Wong [Sun, 24 Nov 2019 00:22:36 +0000 (00:22 +0000)]
t/perf-*.t: use $ENV{GIANT_INBOX_DIR} consistently
It's more consistent with our current terminology and
"PI_DIR" is already used to override ~/.public-inbox/
(which holds "config" and possibly other files which affect
all inboxes for a particular user, but is not an inbox itself);
so stop advertising GIANT_PI_DIR in skip messages.
Eric Wong [Sun, 24 Nov 2019 00:22:35 +0000 (00:22 +0000)]
tests: quiet down commit graph
Newer versions of git enable the commit graph by default.
Since we blow away our temporary directories every test,
generating graphis is a waste and clutters stderr with
"Computing commit graph generation numbers" messages.
Eric Wong [Sun, 24 Nov 2019 00:22:33 +0000 (00:22 +0000)]
xapcmd: replace Xtmpdirs with File::Temp->newdir
Since we're using Perl 5.10.1 and File::Temp 0.19+, we don't
need Xtmpdirs at all for cleaning up tempdirs on failure and
can just rely on the DESTROY handler provided by File::Temp.
Eric Wong [Sun, 24 Nov 2019 00:22:32 +0000 (00:22 +0000)]
t/nntpd-validate: get rid of threads dependency
Threads are officially discouraged by perl5-porters and proves
problematic with my Perl installation when using run_mode=1
to speed up tests. So just use fork() and pipes to share
results from Net::NNTP.
Eric Wong [Sun, 24 Nov 2019 00:22:31 +0000 (00:22 +0000)]
t/common: start_script replaces spawn_listener
We can shave several hundred milliseconds off tests which spawn
daemons by preloading and avoiding startup time for common
modules which are already loaded in the parent process.
This also gives ENV{TAIL} support to all tests which support
daemons which log to stdout/stderr.
Eric Wong [Sun, 24 Nov 2019 00:22:30 +0000 (00:22 +0000)]
daemon: avoid race when quitting workers
While the master process has a self-pipe to avoid missing
signals, worker processes lack that aside from a pipe to
detect master death.
That pipe doesn't exist when there's no master process,
so it's possible DS::close never finishes because it
never woke up from epoll_wait. So create a pipe on
the worker_quit signal and force it into epoll/kevent
so it wakes up right away.
Eric Wong [Sun, 24 Nov 2019 00:22:28 +0000 (00:22 +0000)]
daemon: use sigprocmask to block signals at startup
`$SIG{FOO} = "IGNORE"' will cause the daemon to miss signals
entirely. Instead, we can use sigprocmask to block signal
delivery until we have our signal handlers setup. This closes a
race where a PID file can be written for an init script and a
signal to be dropped via "IGNORE".
Eric Wong [Sun, 24 Nov 2019 00:22:26 +0000 (00:22 +0000)]
t/nntpd-tls: sometimes SSL_connect succeeds quickly
It seems caching can happen within OpenSSL or negotiation
can be delayed in some cases. In any case, don't barf on
PublicInbox::TLS::epollbit() when connect_SSL succeeds
unexpectedly.
Eric Wong [Sun, 24 Nov 2019 00:22:25 +0000 (00:22 +0000)]
t/httpd-corner: wait for worker process death
We need to ensure the worker process is terminated before
starting a new connection, so leave a persistent HTTP/1.1
connection open and wait for the SIGKILL to take effect
and drop the client.
Eric Wong [Sun, 24 Nov 2019 00:22:22 +0000 (00:22 +0000)]
tests: use strict everywhere
The "strict" pragma makes code easier to debug, and we had
undeclared variables as a result in t/watch_maildir_v2.t.
So use it everywhere to be consistent with the rest of our
code.
Eric Wong [Sun, 24 Nov 2019 03:12:37 +0000 (03:12 +0000)]
check for File::Temp 0.19 for ->newdir method
This is distributed with Perl 5.10.1 and onwards, so it should
not be an installation burden for any users. I'm planning to
move away from tempdir() entirely and use File::Temp->newdir to
remove dependencies on END{} blocks.
Eric Wong [Fri, 15 Nov 2019 09:50:43 +0000 (09:50 +0000)]
t/common: introduce run_script wrapper for t/cgi.t
This will give us a consistent interface for running
test scripts in more performant ways while still giving
us a consistent interface to recreate real-world behavior
via spawn() (fork + execve), if needed.
The default run_mode (1) is faster and can run within the test
process with some minor adjustments to our code to avoid global
state.
This avoids the significante overhead of Perl code loading,
parsing and compilation phases.
Eric Wong [Fri, 15 Nov 2019 09:50:41 +0000 (09:50 +0000)]
xapcmd: do not fire END and DESTROY handlers in child
We need to bypass whatever Test::More does with END/DESTROY
handlers for use in lon-lived process. This doesn't affect
any of our normal code since we don't use END/DESTROY for
Xapcmd and its callers.
Eric Wong [Fri, 15 Nov 2019 09:50:40 +0000 (09:50 +0000)]
import: only pass Inbox object to SearchIdx->new
SearchIdx->new no longer accepts a GIT_DIR path as its argument
since commit 585314673236d664729fe3ab2d4fb229d1c0f2d5
("searchidx: require PublicInbox::Inbox (or InboxWritable) ref")
Eric Wong [Fri, 15 Nov 2019 09:50:34 +0000 (09:50 +0000)]
admin: get rid of singleton $CFG var
PublicInbox::Admin::config() just adds an extra layer of
indirection which we barely rely on. So get rid of this
global variable and make it easier to run tests in the
future without relying on global state.
Eric Wong [Sat, 16 Nov 2019 02:34:39 +0000 (02:34 +0000)]
mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
IO::Compress::Gzip is a wrapper around Compress::Raw::Zlib,
anyways, and being able to easily detach buffers to return them
via ->getline is nice. This results in a 1-2% performance
improvement when fetching giant mboxes.
Eric Wong [Thu, 14 Nov 2019 10:57:32 +0000 (10:57 +0000)]
doc: mknews: support Email::MIME <1.930
Email::MIME::header_str is not available until 1.930, so the
rest of our code uses Email::MIME::header for compatibility
with distros, since CentOS 7.x only has 1.926.
Eric Wong [Thu, 14 Nov 2019 01:12:11 +0000 (01:12 +0000)]
inboxwritable: drop {-importer} cyclic reference
InboxWritable caching the result of ->importer leads to a
circular references with returned (V2Writable|Import) object
holds onto the calling InboxWritable object.
With public-inbox-watch, this leads to a memory leak if a user
is reloading via SIGHUP after a message is imported (it would
only become noticeable with SIGHUPs after every message imported).
I would not expect anybody to to notice this in real-world
usage. I only noticed this since I was making -xcpdb suitable
for long-lived process use (e.g. "mod_perl style") and a flock
remained unreleased on v1 inboxes after resharding.
WatchMaildir (used by -watch) already handles caching of the
importer object itself, and all of our other real-world uses of
->importer are short-lived or designed for batch scripts, so
there's no need to cache the importer result internally.
Eric Wong [Wed, 13 Nov 2019 07:57:38 +0000 (07:57 +0000)]
xapcmd: localize %SIG changes using "local"
Perl's "local" allows changes to %SIG (and %ENV) to be limited
to its enclosing block. This allows us to get rid of a global
variable and ad-hoc method for restoring signal handlers.
Eric Wong [Thu, 14 Nov 2019 01:03:38 +0000 (01:03 +0000)]
solvergit: use --unidiff-zero with git-apply(1)
I sometimes post context-free documentation patches generated
with "-U0" to reduce size and bandwidth overhead when replacing
URLs or updating copyright notices. git-apply(1) needs the
--unidiff-zero switch to work properly with context-free
patches.
Given our search looks for blob OIDs, and we're never going
to be running the code we regenerate, "--unidiff-zero" ought
to be safe.
We already load PublicInbox::Spawn for which(), so using spawn()
isn't unreasonable. And rely on "skip" to log the omitted test
if w3m is missing, which means we need to update the "&&"
escaping test to be self-referential on the same line.
File::Temp was totally unused, there; and we can use "open ...,undef"
in Perl to easily create anonymous temporary files for use with
spawn().
Eric Wong [Mon, 4 Nov 2019 03:01:34 +0000 (03:01 +0000)]
t/httpd-corner.t: drop unnecessary bytes:: for length()
We don't need to force byte semantics for a buffer we clearly
create (via ->read) with byte semantics. Since we didn't
"use bytes" in t/httpd-corner.t, it was inadvertantly made
available by IPC::Run (which goes away, next).
Eric Wong [Mon, 4 Nov 2019 03:01:33 +0000 (03:01 +0000)]
t/*.t: remove IPC::Run dependency for git commands
One small step towards making tests easier-to-run. We can rely
on "local $ENV{GIT_DIR}" for potentially shell-unsafe path
names, and the rest of our path names are relative and don't
contain characters which require escaping.
Eric Wong [Fri, 8 Nov 2019 20:20:17 +0000 (20:20 +0000)]
edit: propagate correct editor exit code
exit($?) is never correct, since ($? >> 8) is needed to extract
the correct exit code, as other information (e.g. such as signal)
is encoded in $? in addition to the exit code.
Eric Wong [Mon, 4 Nov 2019 11:13:47 +0000 (11:13 +0000)]
tests: rely on PublicInbox::Git for pathname safety
It's possible (but unlikely) a user will put spaces in TMPDIR
and cause File::Temp::tempdir() to return a temporary directory
with spaces in the filename, making it unsafe for shell
expansion.
PublicInbox::Git didn't exist when t/mda.t was written, and
I just forgot about PublicInbox::Git->qx for t/plack.t :x
Eric Wong [Sun, 3 Nov 2019 06:48:58 +0000 (06:48 +0000)]
searchidxshard: reuse $SIG{__WARN__} callback from Admin
We don't want to define $SIG{__WARN__} in the worker to call an
existing non-default callback. Instead update ->{current_info}
the same way the V2Writable master process does.
I noticed this while reindexing with a large XAPIAN_FLUSH_THRESHOLD
and seeing the wrong epoch on my terminal from a shard because the
shard worker was spawned while reindexing a higher-numbered epoch.