Eric Wong [Sun, 19 Jun 2016 09:59:31 +0000 (09:59 +0000)]
examples/*@.service: wait one day for graceful shutdown
Because sometimes folks will want to download gigantic mboxes
or make large clones over Tor which are not resume-friendly.
Note: the timeout logic in nntpd is somewhat over-aggressive
and can break some large slrnpulls. This ought to be easily
recoverable on the client-side, though, since it's based on
per-message fetches.
Eric Wong [Sun, 19 Jun 2016 06:55:42 +0000 (06:55 +0000)]
http: constrain getline/close responses by time
This allows us to yield control to other clients gracefully if
getline takes too long to generate a chunk. This is more
expensive but should not cost a syscall on modern 64-bit systems.
Eric Wong [Sun, 19 Jun 2016 04:50:40 +0000 (04:50 +0000)]
mbox: set gzip timestamp to the Unix epoch
This allows consistency between different invocations from
roughly the same period and is no worse for caching any any of
our existing HTML and Atom feeds.
We cannot set the timestamp to the end date since messages
may be added to the repository while we are iterating
(and this streaming mechanism will pick them up).
Eric Wong [Sun, 19 Jun 2016 02:13:52 +0000 (02:13 +0000)]
watch_maildir: tighten up path checks
Only mark seen messages as spam, otherwise it could be
too aggressive and cause problems or over training.
We wouldn't want a wayward FIFO ruining our day, either :)
Eric Wong [Sat, 18 Jun 2016 23:25:20 +0000 (23:25 +0000)]
watch_maildir: spam removal support
We can support spam removal by watching a special "spam"
Maildir, too. We can run public-inbox-learn as a separate
step, and that command will be improved to support
auto-learning, too.
Eric Wong [Sat, 18 Jun 2016 10:53:32 +0000 (10:53 +0000)]
spawn: try to keep signals blocked in spawned child
While we only want to stop our daemons and gracefully destroy
subprocesses, it is common for 'Ctrl-C' from a terminal to kill
the entire pgroup.
Killing an entire pgroup nukes subprocesses like git-upload-pack
breaks graceful shutdown on long clones. Make a best effort to
ensure git-upload-pack processes are not broken when somebody
signals an entire process group.
Followup-to: commit 37bf2db81bbbe114d7fc5a00e30d3d5a6fa74de5
("doc: systemd examples should only kill one process")
Eric Wong [Sat, 18 Jun 2016 08:49:05 +0000 (08:49 +0000)]
view: consolidate per-message newline handling
We don't want to blindly append a trailing newline
if the message ends in quoted text leading to a <span>,
as a newline is already added to a <span>...
Eric Wong [Sat, 18 Jun 2016 00:22:34 +0000 (00:22 +0000)]
view: minor tweaks to reduce long lines
Fold addressee fields to better delimit destinations,
reduce horizontal rule <hr /> to reduce scrolling,
and use spaces to indent "git send-email" example.
Eric Wong [Fri, 17 Jun 2016 21:32:59 +0000 (21:32 +0000)]
view: introduce WwwStream interface
This will allow us to commonalize HTML generation in the future
and is the start of moving existing HTML generation to a "pull"
streaming model (from the existing "push" one).
Using the getline/close pull model is superior to the existing
$fh->write streaming as it allows us to throttle response
generation based on backpressure from slow clients.
Eric Wong [Fri, 17 Jun 2016 01:56:05 +0000 (01:56 +0000)]
import: auto-update index when done
This prevents multiple update processes from stepping over
each other while called under the lock, and also allows the
new -watch process to update the index iff indexing was
desired.
Eric Wong [Fri, 17 Jun 2016 01:23:22 +0000 (01:23 +0000)]
watch: quiet down rejected header matches
People may use this directive because they prefer to merge
several mailing lists into one local mailbox, so there may
be many messages and we should not needlessly clutter logs
for this.
Eric Wong [Fri, 17 Jun 2016 01:12:26 +0000 (01:12 +0000)]
search: increase limit for thread search
Some threads are easily over 100 messages, so the 50 limit is
not enough. It is likely that 1000 messages is not enough,
either, and we will need to tune our threading to handle more
messages and supply options for configurability.
Eric Wong [Thu, 16 Jun 2016 22:45:28 +0000 (22:45 +0000)]
watch: introduce watch directive
This will allow users to run importers off existing mail
accounts where they may not have access to run -mda.
Currently, we only support Maildirs, but IMAP ought to be
doable.
Eric Wong [Wed, 15 Jun 2016 00:14:29 +0000 (00:14 +0000)]
mda: hook up new filter functionality
This removes the Email::Filter dependency as well as the
signature-breaking scrubber code. We now prefer to
reject unacceptable messages and grudgingly (and blindly)
mirror messages we're not the primary endpoint for.
Eric Wong [Wed, 15 Jun 2016 00:14:28 +0000 (00:14 +0000)]
emergency: implement new emergency Maildir delivery
This is transactional and hopefully safer in case we hit SIGSEGV
or SIGKILL during processing, as the tmp/ copy will remain on
the FS even if DESTROY/END handlers are not called.
Eric Wong [Wed, 15 Jun 2016 00:14:26 +0000 (00:14 +0000)]
mda: precheck no longer depends on Email::Filter
Email::Filter doesn't offer any functionality we need, here;
and our dependency on Email::Filter will gradually be removed
since it (and Email::LocalDelivery) seem abandoned and we
can have more-fine-grained control by rolling our own Maildir
delivery which can work transactionally.
Eric Wong [Wed, 15 Jun 2016 00:14:25 +0000 (00:14 +0000)]
t/mda: use only Maildir for testing
Remove mbox tests since mbox is unreliable due to raciness
and incompatible implementations. We will drop support for
mbox emergency destinations, soon.
Eric Wong [Tue, 14 Jun 2016 06:54:57 +0000 (06:54 +0000)]
nntp: do not double-encode UTF-8 body
Or whatever the appropriate Perl terminology, is...
And we will need to do something appropriate for other
encodings, too. I still barely understand Perl Unicode
despite attempting to understand the docs over the years..
Eric Wong [Mon, 13 Jun 2016 22:56:27 +0000 (22:56 +0000)]
doc: systemd examples should only kill one process
For our daemons, killing only the master process is enough.
Killing the entire control group (as done by default in
systemd) may cause subprocesses such as git to shut down
unexpectedly.
Having systemd kill workers directly will also cause an
immediate shutdown since the master would've already signaled
the workers; and workers will die after two shutdown requests.
Eric Wong [Mon, 13 Jun 2016 04:53:30 +0000 (04:53 +0000)]
examples: systemd socket and service definitions for daemons
Since our daemons are built to take advantage of socket activation,
provide example files to allow systems administrators to hit the
ground running with systemd.
Example init files for other systems greatly appreciated.
Eric Wong [Tue, 7 Jun 2016 12:57:42 +0000 (12:57 +0000)]
Merge branch 'unsubscribe'
* unsubscribe:
unsubscribe.milter: use default postfork dispatcher
unsubscribe: prevent decrypt from showing random crap
examples/unsubscribe-psgi@.service: disable worker processes
unsubscribe: bad URL fixup
unsubscribe: get off mah lawn^H^H^Hist
Eric Wong [Mon, 30 May 2016 04:39:57 +0000 (04:39 +0000)]
http: yield body->getline running time
We cannot let a client monopolize the single-threaded server
even if it can drain the socket buffer faster than we can
emit data.
While we're at it, acknowledge the this behavior (which happens
naturally) in httpd/async.
The same idea is present in NNTP for the long_response code.
This is the HTTP followup to:
commit 0d0fde0bff97 ("nntp: introduce long response API for streaming")
commit 79d8bfedcdd2 ("nntp: avoid signals for long responses")
Eric Wong [Mon, 30 May 2016 02:10:36 +0000 (02:10 +0000)]
script/*{mda,learn}: no strict params for Email::MIME::ContentType
User input is imperfect, do not pollute our mail logs with
warnings we cannot fix. This is documented in the
Email::MIME::ContentType manpage so it should remain supported.
Eric Wong [Mon, 30 May 2016 00:51:44 +0000 (00:51 +0000)]
git-http-backend: remove dependency on Plack::Request
Plack::Request is unnecessary overhead for this given the
strictness of git-http-backend. Furthermore, having to make
commit 311c2adc8c63 ("avoid Plack::Request parsing body")
to avoid tempfiles should not have been necessary.
Eric Wong [Sat, 28 May 2016 01:57:14 +0000 (01:57 +0000)]
remove redundant NewsGroup class
Most of its functionality is in the PublicInbox::Inbox class.
While we're at it, we no longer auto-create newsgroup names
based on the inbox name, since newsgroup names probably deserve
some thought when it comes to hierarchy.
Eric Wong [Sat, 28 May 2016 01:57:08 +0000 (01:57 +0000)]
t/plack: ensure we can cascade on common endpoints
We don't serve things like robots.txt, favicon.ico, or
.well-known/ endpoints ourselves, but ensure we can be
used with Plack::App::Cascade for others.
Eric Wong [Fri, 27 May 2016 08:03:31 +0000 (08:03 +0000)]
unsubscribe.milter: use default postfork dispatcher
Let postfix (or sendmail :P) control the concurrency limit
instead of doing it ourselves. This is necessary because SMTP
connections are completely synchronous at this point and a
slow/idle SMTP connection will monopolize the worker process.
Eric Wong [Fri, 27 May 2016 07:23:18 +0000 (07:23 +0000)]
httpd/async: do not needlessly weaken
The restart_read callback has no chance of circular reference,
and weakening $self before we create it can cause $self to
be undefined inside the callback (seen during stress testing).
Eric Wong [Fri, 27 May 2016 05:59:14 +0000 (05:59 +0000)]
httpd/async: prevent circular reference
We must avoid circular references which can cause leaks in
long-running processes. This callback is dangerous since
it may never be called to properly terminate everything.
Eric Wong [Wed, 25 May 2016 01:44:46 +0000 (01:44 +0000)]
remove Email::Address dependency
git has stricter requirements for ident names (no '<>')
which Email::Address allows.
Even in 1.908, Email::Address also has an incomplete fix for
CVE-2015-7686 with a DoS-able regexp for comments. Since we
don't care for or need all the RFC compliance of Email::Address,
avoiding it entirely may be preferable.
Email::Address will still be installed as a requirement for
Email::MIME, but it is only used by the
Email::MIME::header_str_set which we do not use
Eric Wong [Tue, 24 May 2016 03:41:53 +0000 (03:41 +0000)]
git-http-backend: use qspawn to limit running processes
Having an excessive amount of git-pack-objects processes is
dangerous to the health of the server. Queue up process spawning
for long-running responses and serve them sequentially, instead.
Eric Wong [Tue, 24 May 2016 03:41:52 +0000 (03:41 +0000)]
http: fix various race conditions
We no longer override Danga::Socket::event_write and instead
re-enable reads by queuing up another callback in the $close
response callback. This is necessary because event_write may not be
completely done writing a response, only the existing buffered data.
Furthermore, the {closed} field can almost be set at any time when
writing, so we must check it before acting on pipelined requests as
well as during write callbacks in more().
Eric Wong [Tue, 24 May 2016 03:41:51 +0000 (03:41 +0000)]
standardize timer-related event-loop code
Standardize the code we have in place to avoid creating too many
timer objects. We do not need exact timers for things that don't
need to be run ASAP, so we can play things fast and loose to avoid
wasting power with unnecessary wakeups.
We only need two classes of timers:
* asap - run this on the next loop tick, after operating on
@Danga::Socket::ToClose to close remaining sockets
* later - run at some point in the future. It could be as
soon as immediately (like "asap"), and as late as 60s into
the future.
In the future, we support an "emergency" switch to fire "later"
timers immediately.
Eric Wong [Mon, 23 May 2016 07:19:45 +0000 (07:19 +0000)]
http: chunk in the server, not middleware
Since PSGI does not require Transfer-Encoding: chunked or
Content-Length, we cannot expect random apps we host to chunk
their responses.
Thus, to improve interoperability, chunk at the HTTP layer like
other PSGI servers do. I'm chosing a more syscall-intensive method
(via multiple send(...MSG_MORE) for now to reduce copy + packet
overhead.
Eric Wong [Mon, 23 May 2016 04:01:14 +0000 (04:01 +0000)]
git-http-backend: refactor to support cleanup
We will have clients dropping connections during long clone
and fetch operations; so do not retain references holding
backend processes once we detect a client has dropped.