Eric Wong [Wed, 15 Jun 2016 00:14:29 +0000 (00:14 +0000)]
mda: hook up new filter functionality
This removes the Email::Filter dependency as well as the
signature-breaking scrubber code. We now prefer to
reject unacceptable messages and grudgingly (and blindly)
mirror messages we're not the primary endpoint for.
Eric Wong [Wed, 15 Jun 2016 00:14:28 +0000 (00:14 +0000)]
emergency: implement new emergency Maildir delivery
This is transactional and hopefully safer in case we hit SIGSEGV
or SIGKILL during processing, as the tmp/ copy will remain on
the FS even if DESTROY/END handlers are not called.
Eric Wong [Wed, 15 Jun 2016 00:14:26 +0000 (00:14 +0000)]
mda: precheck no longer depends on Email::Filter
Email::Filter doesn't offer any functionality we need, here;
and our dependency on Email::Filter will gradually be removed
since it (and Email::LocalDelivery) seem abandoned and we
can have more-fine-grained control by rolling our own Maildir
delivery which can work transactionally.
Eric Wong [Wed, 15 Jun 2016 00:14:25 +0000 (00:14 +0000)]
t/mda: use only Maildir for testing
Remove mbox tests since mbox is unreliable due to raciness
and incompatible implementations. We will drop support for
mbox emergency destinations, soon.
Eric Wong [Tue, 14 Jun 2016 06:54:57 +0000 (06:54 +0000)]
nntp: do not double-encode UTF-8 body
Or whatever the appropriate Perl terminology, is...
And we will need to do something appropriate for other
encodings, too. I still barely understand Perl Unicode
despite attempting to understand the docs over the years..
Eric Wong [Mon, 13 Jun 2016 22:56:27 +0000 (22:56 +0000)]
doc: systemd examples should only kill one process
For our daemons, killing only the master process is enough.
Killing the entire control group (as done by default in
systemd) may cause subprocesses such as git to shut down
unexpectedly.
Having systemd kill workers directly will also cause an
immediate shutdown since the master would've already signaled
the workers; and workers will die after two shutdown requests.
Eric Wong [Mon, 13 Jun 2016 04:53:30 +0000 (04:53 +0000)]
examples: systemd socket and service definitions for daemons
Since our daemons are built to take advantage of socket activation,
provide example files to allow systems administrators to hit the
ground running with systemd.
Example init files for other systems greatly appreciated.
Eric Wong [Tue, 7 Jun 2016 12:57:42 +0000 (12:57 +0000)]
Merge branch 'unsubscribe'
* unsubscribe:
unsubscribe.milter: use default postfork dispatcher
unsubscribe: prevent decrypt from showing random crap
examples/unsubscribe-psgi@.service: disable worker processes
unsubscribe: bad URL fixup
unsubscribe: get off mah lawn^H^H^Hist
Eric Wong [Mon, 30 May 2016 04:39:57 +0000 (04:39 +0000)]
http: yield body->getline running time
We cannot let a client monopolize the single-threaded server
even if it can drain the socket buffer faster than we can
emit data.
While we're at it, acknowledge the this behavior (which happens
naturally) in httpd/async.
The same idea is present in NNTP for the long_response code.
This is the HTTP followup to:
commit 0d0fde0bff97 ("nntp: introduce long response API for streaming")
commit 79d8bfedcdd2 ("nntp: avoid signals for long responses")
Eric Wong [Mon, 30 May 2016 02:10:36 +0000 (02:10 +0000)]
script/*{mda,learn}: no strict params for Email::MIME::ContentType
User input is imperfect, do not pollute our mail logs with
warnings we cannot fix. This is documented in the
Email::MIME::ContentType manpage so it should remain supported.
Eric Wong [Mon, 30 May 2016 00:51:44 +0000 (00:51 +0000)]
git-http-backend: remove dependency on Plack::Request
Plack::Request is unnecessary overhead for this given the
strictness of git-http-backend. Furthermore, having to make
commit 311c2adc8c63 ("avoid Plack::Request parsing body")
to avoid tempfiles should not have been necessary.
Eric Wong [Sat, 28 May 2016 01:57:14 +0000 (01:57 +0000)]
remove redundant NewsGroup class
Most of its functionality is in the PublicInbox::Inbox class.
While we're at it, we no longer auto-create newsgroup names
based on the inbox name, since newsgroup names probably deserve
some thought when it comes to hierarchy.
Eric Wong [Sat, 28 May 2016 01:57:08 +0000 (01:57 +0000)]
t/plack: ensure we can cascade on common endpoints
We don't serve things like robots.txt, favicon.ico, or
.well-known/ endpoints ourselves, but ensure we can be
used with Plack::App::Cascade for others.
Eric Wong [Fri, 27 May 2016 08:03:31 +0000 (08:03 +0000)]
unsubscribe.milter: use default postfork dispatcher
Let postfix (or sendmail :P) control the concurrency limit
instead of doing it ourselves. This is necessary because SMTP
connections are completely synchronous at this point and a
slow/idle SMTP connection will monopolize the worker process.
Eric Wong [Fri, 27 May 2016 07:23:18 +0000 (07:23 +0000)]
httpd/async: do not needlessly weaken
The restart_read callback has no chance of circular reference,
and weakening $self before we create it can cause $self to
be undefined inside the callback (seen during stress testing).
Eric Wong [Fri, 27 May 2016 05:59:14 +0000 (05:59 +0000)]
httpd/async: prevent circular reference
We must avoid circular references which can cause leaks in
long-running processes. This callback is dangerous since
it may never be called to properly terminate everything.
Eric Wong [Wed, 25 May 2016 01:44:46 +0000 (01:44 +0000)]
remove Email::Address dependency
git has stricter requirements for ident names (no '<>')
which Email::Address allows.
Even in 1.908, Email::Address also has an incomplete fix for
CVE-2015-7686 with a DoS-able regexp for comments. Since we
don't care for or need all the RFC compliance of Email::Address,
avoiding it entirely may be preferable.
Email::Address will still be installed as a requirement for
Email::MIME, but it is only used by the
Email::MIME::header_str_set which we do not use
Eric Wong [Tue, 24 May 2016 03:41:53 +0000 (03:41 +0000)]
git-http-backend: use qspawn to limit running processes
Having an excessive amount of git-pack-objects processes is
dangerous to the health of the server. Queue up process spawning
for long-running responses and serve them sequentially, instead.
Eric Wong [Tue, 24 May 2016 03:41:52 +0000 (03:41 +0000)]
http: fix various race conditions
We no longer override Danga::Socket::event_write and instead
re-enable reads by queuing up another callback in the $close
response callback. This is necessary because event_write may not be
completely done writing a response, only the existing buffered data.
Furthermore, the {closed} field can almost be set at any time when
writing, so we must check it before acting on pipelined requests as
well as during write callbacks in more().
Eric Wong [Tue, 24 May 2016 03:41:51 +0000 (03:41 +0000)]
standardize timer-related event-loop code
Standardize the code we have in place to avoid creating too many
timer objects. We do not need exact timers for things that don't
need to be run ASAP, so we can play things fast and loose to avoid
wasting power with unnecessary wakeups.
We only need two classes of timers:
* asap - run this on the next loop tick, after operating on
@Danga::Socket::ToClose to close remaining sockets
* later - run at some point in the future. It could be as
soon as immediately (like "asap"), and as late as 60s into
the future.
In the future, we support an "emergency" switch to fire "later"
timers immediately.
Eric Wong [Mon, 23 May 2016 07:19:45 +0000 (07:19 +0000)]
http: chunk in the server, not middleware
Since PSGI does not require Transfer-Encoding: chunked or
Content-Length, we cannot expect random apps we host to chunk
their responses.
Thus, to improve interoperability, chunk at the HTTP layer like
other PSGI servers do. I'm chosing a more syscall-intensive method
(via multiple send(...MSG_MORE) for now to reduce copy + packet
overhead.
Eric Wong [Mon, 23 May 2016 04:01:14 +0000 (04:01 +0000)]
git-http-backend: refactor to support cleanup
We will have clients dropping connections during long clone
and fetch operations; so do not retain references holding
backend processes once we detect a client has dropped.
Eric Wong [Mon, 23 May 2016 01:17:28 +0000 (01:17 +0000)]
config: use popen_rd when spawning `git config'
We may spawn this in a large server process, so be sure
to take advantage of the optional vfork() support when
for folks who set PERL_INLINE_DIRECTORY.
Eric Wong [Sun, 22 May 2016 09:06:03 +0000 (09:06 +0000)]
git-http-backend: switch to async_pass
This simplifies the code somewhat; but it could probably
still be made simpler. It will need to support command
queueing for expensive commands so expensive processes
can be queued up.
Eric Wong [Sat, 21 May 2016 23:45:27 +0000 (23:45 +0000)]
http: support async_pass for Danga::Socket
This will allow us to minimize buffering after we wait
(possibly a long time) for readability. This also greatly
reduces the amount of Danga::Socket-specific knowledge we
have in our PSGI code, making it easier for others to
understand.
This unsubscribe PSGI endpoint should never incur enough load to
justify using multiple worker processes. If it's unstable and
crashes, systemd can automatically restart it.
Eric Wong [Sat, 21 May 2016 04:12:45 +0000 (04:12 +0000)]
unsubscribe: bad URL fixup
Fixup a comment about s/query string/PATH_INFO/ while
we're at it, as pre-published versions of this used
query strings before I determined it could be harder
to copy+paste URLs with query parameters in them.
Eric Wong [Sat, 21 May 2016 03:03:17 +0000 (03:03 +0000)]
mbox: switch generation over to pull model
This allows us to easily provide gigantic inboxes
with proper backpressure handling for slow clients.
It also eliminates public-inbox-httpd and Danga::Socket-specific
knowledge from this class, making it easier to follow for
those used to generic PSGI applications.
Eric Wong [Sat, 21 May 2016 03:03:16 +0000 (03:03 +0000)]
http: reduce over-buffering for getline responses
By switching to a "pull"-based I/O model for reading
application responses, we should be able to throttle
buffering to slow clients more effectively and avoid
wasting precious RAM.
This will also allow us to more Danga::Socket-specific
knowledge out of the PSGI application and keep it
confined to PublicInbox::HTTP.
Eric Wong [Wed, 18 May 2016 01:23:05 +0000 (01:23 +0000)]
unsubscribe: get off mah lawn^H^H^Hist
While public-inbox is intended primarily for archival,
SMTP list subscriptions are still in use in most places
and users are likely to want a good unsubscribe mechanism.
HTTP (or HTTPS) links in the List-Unsubscribe header are
often preferable since some users may use an incorrect
email address for mailto: links.
Thus, it is useful to provide an example which generates an
HTTPS link for users to click on. The default .psgi requires
a POST confirmation (as destructive actions with GET are
considered bad practice). However, the "confirm" parameter
may be disabled for a true "one-click" unsubscribe.
The generated URLs are hopefully short enough and both shell
and highlighting-friendly to reduce copy+paste errors.
Eric Wong [Thu, 19 May 2016 10:23:28 +0000 (10:23 +0000)]
msg_iter: workaround broken Email::MIME versions
Email::MIME >= 1.923 and < 1.935 would drop too many newlines
in attachments. This would lead to ugly text files without
a proper trailing newline if using quoted-printable, 7bit, or
8bit. Attachments encoded with base64 were not affected.
These versions of Email::MIME are widely available in Debian 8
(Jessie) and even Ubuntu LTS distros so we will need to support
this workaround for a while.
Eric Wong [Wed, 18 May 2016 20:30:31 +0000 (20:30 +0000)]
msg_iter: new internal API for iterating through MIME
Unlike Email::MIME::walk_parts, this is non-recursive and gives
depth + index offset information about the part for creating
links for later retrieval
It is intended for read-only access and changes are not
propagated to the parent; however future versions of it
may clobber bodies or the original version as it iterates
to reduce memory overhead.
It is intended for making it easy to locate attachments within a
message in the WWW view.