]> Sergey Matveev's repositories - public-inbox.git/log
public-inbox.git
7 years agoMakefile.PL: allow N to be overridden
Eric Wong [Sat, 28 May 2016 01:57:10 +0000 (01:57 +0000)]
Makefile.PL: allow N to be overridden

Relying on the number of processors isn't a great idea
since some of our tests rely on delays to test blocking
and slow client behavior.

7 years agohttp: clarify comments about layering violation
Eric Wong [Sat, 28 May 2016 01:57:09 +0000 (01:57 +0000)]
http: clarify comments about layering violation

It's a low priority, but acknowledge it.

7 years agot/plack: ensure we can cascade on common endpoints
Eric Wong [Sat, 28 May 2016 01:57:08 +0000 (01:57 +0000)]
t/plack: ensure we can cascade on common endpoints

We don't serve things like robots.txt, favicon.ico, or
.well-known/ endpoints ourselves, but ensure we can be
used with Plack::App::Cascade for others.

7 years agoconfig: fix NewsWWW fallback for newsgroups in HTTP URLs
Eric Wong [Fri, 27 May 2016 08:57:42 +0000 (08:57 +0000)]
config: fix NewsWWW fallback for newsgroups in HTTP URLs

Oops, added a test to prevent regressions while we're at it.

7 years agogit-http-backend: close pipe for generic PSGI on errors
Eric Wong [Fri, 27 May 2016 08:20:59 +0000 (08:20 +0000)]
git-http-backend: close pipe for generic PSGI on errors

The generic PSGI code needs to avoid resource leaks if
smart cloning is disabled (due to resource contraints).

7 years agogit-http-backend: move real close to GetlineBody
Eric Wong [Fri, 27 May 2016 08:20:58 +0000 (08:20 +0000)]
git-http-backend: move real close to GetlineBody

This makes more sense as it keeps management of rpipe
nice and neat.

7 years agohttpd/async: do not needlessly weaken
Eric Wong [Fri, 27 May 2016 07:23:18 +0000 (07:23 +0000)]
httpd/async: do not needlessly weaken

The restart_read callback has no chance of circular reference,
and weakening $self before we create it can cause $self to
be undefined inside the callback (seen during stress testing).

Fixes: 395406118cb2 ("httpd/async: prevent circular reference")
7 years agogit-http-backend: fix aborts for generic PSGI clone
Eric Wong [Fri, 27 May 2016 05:59:16 +0000 (05:59 +0000)]
git-http-backend: fix aborts for generic PSGI clone

We need to avoid circular references in the generic PSGI layer,
do it by abusing DESTROY.

7 years agohttp: avoid circular reference for getline responses
Eric Wong [Fri, 27 May 2016 05:59:15 +0000 (05:59 +0000)]
http: avoid circular reference for getline responses

Lightly tested, this seems to work when mass-aborting
responses.  Will still need to automate the testing...

7 years agohttpd/async: prevent circular reference
Eric Wong [Fri, 27 May 2016 05:59:14 +0000 (05:59 +0000)]
httpd/async: prevent circular reference

We must avoid circular references which can cause leaks in
long-running processes.  This callback is dangerous since
it may never be called to properly terminate everything.

7 years agoremove Email::Address dependency
Eric Wong [Wed, 25 May 2016 01:44:46 +0000 (01:44 +0000)]
remove Email::Address dependency

git has stricter requirements for ident names (no '<>')
which Email::Address allows.

Even in 1.908, Email::Address also has an incomplete fix for
CVE-2015-7686 with a DoS-able regexp for comments.  Since we
don't care for or need all the RFC compliance of Email::Address,
avoiding it entirely may be preferable.

Email::Address will still be installed as a requirement for
Email::MIME, but it is only used by the
Email::MIME::header_str_set which we do not use

7 years agogit-http-backend: use qspawn to limit running processes
Eric Wong [Tue, 24 May 2016 03:41:53 +0000 (03:41 +0000)]
git-http-backend: use qspawn to limit running processes

Having an excessive amount of git-pack-objects processes is
dangerous to the health of the server.  Queue up process spawning
for long-running responses and serve them sequentially, instead.

7 years agohttp: fix various race conditions
Eric Wong [Tue, 24 May 2016 03:41:52 +0000 (03:41 +0000)]
http: fix various race conditions

We no longer override Danga::Socket::event_write and instead
re-enable reads by queuing up another callback in the $close
response callback.  This is necessary because event_write may not be
completely done writing a response, only the existing buffered data.

Furthermore, the {closed} field can almost be set at any time when
writing, so we must check it before acting on pipelined requests as
well as during write callbacks in more().

7 years agostandardize timer-related event-loop code
Eric Wong [Tue, 24 May 2016 03:41:51 +0000 (03:41 +0000)]
standardize timer-related event-loop code

Standardize the code we have in place to avoid creating too many
timer objects.  We do not need exact timers for things that don't
need to be run ASAP, so we can play things fast and loose to avoid
wasting power with unnecessary wakeups.

We only need two classes of timers:

* asap - run this on the next loop tick, after operating on
  @Danga::Socket::ToClose to close remaining sockets

* later - run at some point in the future.  It could be as
  soon as immediately (like "asap"), and as late as 60s into
  the future.

In the future, we support an "emergency" switch to fire "later"
timers immediately.

7 years agohttp: avoid uninitialized variable
Eric Wong [Mon, 23 May 2016 08:21:08 +0000 (08:21 +0000)]
http: avoid uninitialized variable

Oops, really gotta start checking logs in tests :x

Fixes: bb38f0fcce739 ("http: chunk in the server, not middleware")
7 years agohttp: chunk in the server, not middleware
Eric Wong [Mon, 23 May 2016 07:19:45 +0000 (07:19 +0000)]
http: chunk in the server, not middleware

Since PSGI does not require Transfer-Encoding: chunked or
Content-Length, we cannot expect random apps we host to chunk
their responses.

Thus, to improve interoperability, chunk at the HTTP layer like
other PSGI servers do.  I'm chosing a more syscall-intensive method
(via multiple send(...MSG_MORE) for now to reduce copy + packet
overhead.

7 years agogit-http-backend: refactor to support cleanup
Eric Wong [Mon, 23 May 2016 04:01:14 +0000 (04:01 +0000)]
git-http-backend: refactor to support cleanup

We will have clients dropping connections during long clone
and fetch operations; so do not retain references holding
backend processes once we detect a client has dropped.

7 years agogit-http-backend: avoid Plack::Request parsing body
Eric Wong [Mon, 23 May 2016 03:57:45 +0000 (03:57 +0000)]
git-http-backend: avoid Plack::Request parsing body

Only check query parameters since there's no useful body
in there.

7 years agoTODO: update linkification notes
Eric Wong [Mon, 23 May 2016 01:33:40 +0000 (01:33 +0000)]
TODO: update linkification notes

Some readers will want to use "HTTPS Everywhere" conveniently;
and I will support it.

7 years agogit-http-backend: cleanup vestigial the process limiter code
Eric Wong [Mon, 23 May 2016 01:21:00 +0000 (01:21 +0000)]
git-http-backend: cleanup vestigial the process limiter code

This bit is still being redone to support gigantic repos.

7 years agoconfig: use popen_rd when spawning `git config'
Eric Wong [Mon, 23 May 2016 01:17:28 +0000 (01:17 +0000)]
config: use popen_rd when spawning `git config'

We may spawn this in a large server process, so be sure
to take advantage of the optional vfork() support when
for folks who set PERL_INLINE_DIRECTORY.

7 years agot/config.t: remove GIT_DIR usage in test
Eric Wong [Mon, 23 May 2016 01:14:32 +0000 (01:14 +0000)]
t/config.t: remove GIT_DIR usage in test

Followup-to: commit 24e0219f364ed402f9136227756e0f196dc651aa
("remove GIT_DIR env usage in favor of --git-dir")

7 years agodaemon: ignore SIGWINCH when connected to terminal
Eric Wong [Mon, 23 May 2016 01:00:15 +0000 (01:00 +0000)]
daemon: ignore SIGWINCH when connected to terminal

Users may change terminal sizes if the process is connected to a
terminal, so we can't reasonably expect SIGWINCH to work as
intended.

7 years agospawn: note we do not use absolute paths within our code
Eric Wong [Sun, 22 May 2016 20:59:25 +0000 (20:59 +0000)]
spawn: note we do not use absolute paths within our code

We can't rely on absolute paths when installed on other
systems.

Unfortunately, mlmmj-* requires them, but none of the core
code will use it.

7 years agowww: avoid warnings on bad offsets for Xapian
Eric Wong [Sun, 22 May 2016 20:44:34 +0000 (20:44 +0000)]
www: avoid warnings on bad offsets for Xapian

The offset argument must be an integer for Xapian,
however users (or bots) type the darndest things.

AFAIK this has no security implications besides triggering
a warning (which could lead to out-of-space-errors)

7 years agogit-http-backend: switch to async_pass
Eric Wong [Sun, 22 May 2016 09:06:03 +0000 (09:06 +0000)]
git-http-backend: switch to async_pass

This simplifies the code somewhat; but it could probably
still be made simpler.  It will need to support command
queueing for expensive commands so expensive processes
can be queued up.

7 years agohttp: rework async_pass support
Eric Wong [Sun, 22 May 2016 03:58:00 +0000 (03:58 +0000)]
http: rework async_pass support

Unfortunately, the original design did not work because
middleware can wrap the response body and make `async_pass'
invisible to HTTP.pm

7 years agogit-http-backend: simplify dumb serving
Eric Wong [Sun, 22 May 2016 07:59:52 +0000 (07:59 +0000)]
git-http-backend: simplify dumb serving

We can rely entirely on getline + close callbacks
and be compatible with 100% of PSGI servers.

7 years agogit-http-backend: remove process limit
Eric Wong [Sun, 22 May 2016 07:55:50 +0000 (07:55 +0000)]
git-http-backend: remove process limit

We will figure out a different way to avoid overloading...

7 years agot/spawn.t: additional tests for popen_rd
Eric Wong [Sun, 22 May 2016 07:49:04 +0000 (07:49 +0000)]
t/spawn.t: additional tests for popen_rd

We need to ensure $? is set properly for users.

7 years agohttp: pass reference to Danga::Socket::write
Eric Wong [Sun, 22 May 2016 06:17:30 +0000 (06:17 +0000)]
http: pass reference to Danga::Socket::write

This can avoid an expensive copy for big strings.

7 years agohttp: fix typo: write_buf => write_buf_size
Eric Wong [Sun, 22 May 2016 06:17:29 +0000 (06:17 +0000)]
http: fix typo: write_buf => write_buf_size

Otherwise, we get deep recursion as we keep calling
recursively on giant responses

7 years agohttp: async getline supports push_back_read
Eric Wong [Sun, 22 May 2016 00:33:59 +0000 (00:33 +0000)]
http: async getline supports push_back_read

Sometimes we need to read something to ensure it's a successful
response.

7 years agohttp: support async_pass for Danga::Socket
Eric Wong [Sat, 21 May 2016 23:45:27 +0000 (23:45 +0000)]
http: support async_pass for Danga::Socket

This will allow us to minimize buffering after we wait
(possibly a long time) for readability.  This also greatly
reduces the amount of Danga::Socket-specific knowledge we
have in our PSGI code, making it easier for others to
understand.

7 years agoimport: avoid needless git update-server-info
Eric Wong [Sat, 21 May 2016 10:52:18 +0000 (10:52 +0000)]
import: avoid needless git update-server-info

We don't need to update-server-info (or read-tree) if fast
import was spawned for removals and no changes were made.

7 years agodaemon: simplify forking
Eric Wong [Sat, 21 May 2016 10:37:09 +0000 (10:37 +0000)]
daemon: simplify forking

We shouldn't need sigprocmask unless we're running multiple
native threads or using vfork, neither of which is the case,
here.

7 years agolocalize $/ in more places to avoid potential problems
Eric Wong [Sat, 21 May 2016 05:27:06 +0000 (05:27 +0000)]
localize $/ in more places to avoid potential problems

This hopefully makes the intent of the code clearer, too.
The the HTTP use of the numeric reference for getline
caused problems in Git.pm, already.

7 years agombox: switch generation over to pull model
Eric Wong [Sat, 21 May 2016 03:03:17 +0000 (03:03 +0000)]
mbox: switch generation over to pull model

This allows us to easily provide gigantic inboxes
with proper backpressure handling for slow clients.

It also eliminates public-inbox-httpd and Danga::Socket-specific
knowledge from this class, making it easier to follow for
those used to generic PSGI applications.

7 years agohttp: reduce over-buffering for getline responses
Eric Wong [Sat, 21 May 2016 03:03:16 +0000 (03:03 +0000)]
http: reduce over-buffering for getline responses

By switching to a "pull"-based I/O model for reading
application responses, we should be able to throttle
buffering to slow clients more effectively and avoid
wasting precious RAM.

This will also allow us to more Danga::Socket-specific
knowledge out of the PSGI application and keep it
confined to PublicInbox::HTTP.

7 years agossoma-replay: use TMPDIR for temporary path
Eric Wong [Fri, 20 May 2016 22:35:16 +0000 (22:35 +0000)]
ssoma-replay: use TMPDIR for temporary path

Otherwise, tempfile() will use the current working directory,
which may not be writable.

7 years agowww: tighten up allowable filenames for attachments
Eric Wong [Thu, 19 May 2016 22:02:56 +0000 (22:02 +0000)]
www: tighten up allowable filenames for attachments

Having a file start with '.' or '-' can be confusing
and for users, so do not allow it.

7 years agoview: reduce clutter for attachments w/o description
Eric Wong [Thu, 19 May 2016 21:18:32 +0000 (21:18 +0000)]
view: reduce clutter for attachments w/o description

For attachments without a filename or description, reduce
the amount of precious screen space required to display
a link to it.

7 years agowww: validate and check filenames in URLs
Eric Wong [Thu, 19 May 2016 19:23:13 +0000 (19:23 +0000)]
www: validate and check filenames in URLs

We shall ensure links continue working for this.

7 years agomsg_iter: workaround broken Email::MIME versions
Eric Wong [Thu, 19 May 2016 10:23:28 +0000 (10:23 +0000)]
msg_iter: workaround broken Email::MIME versions

Email::MIME >= 1.923 and < 1.935 would drop too many newlines
in attachments.  This would lead to ugly text files without
a proper trailing newline if using quoted-printable, 7bit, or
8bit.  Attachments encoded with base64 were not affected.

These versions of Email::MIME are widely available in Debian 8
(Jessie) and even Ubuntu LTS distros so we will need to support
this workaround for a while.

7 years agowww: support downloading attachments
Eric Wong [Thu, 19 May 2016 02:42:05 +0000 (02:42 +0000)]
www: support downloading attachments

This can be useful for lists where the convention is to
attach (rather than inline) patches into the message body.

7 years agoswitch read-only uses of walk_parts to msg_iter
Eric Wong [Thu, 19 May 2016 00:31:50 +0000 (00:31 +0000)]
switch read-only uses of walk_parts to msg_iter

msg_iter lets us know the index of the attachment,
allow us to make more sensible labels and in a future
commit, hyperlinks to download attachments.

7 years agomsg_iter: new internal API for iterating through MIME
Eric Wong [Wed, 18 May 2016 20:30:31 +0000 (20:30 +0000)]
msg_iter: new internal API for iterating through MIME

Unlike Email::MIME::walk_parts, this is non-recursive and gives
depth + index offset information about the part for creating
links for later retrieval

It is intended for read-only access and changes are not
propagated to the parent; however future versions of it
may clobber bodies or the original version as it iterates
to reduce memory overhead.

It is intended for making it easy to locate attachments within a
message in the WWW view.

7 years agoview: rely on Email::MIME::body_str for decoding
Eric Wong [Tue, 10 May 2016 19:36:54 +0000 (19:36 +0000)]
view: rely on Email::MIME::body_str for decoding

Or is it "encoding"?  Gah, Perl character set handling
confuses me no matter how many times I RTFM :<

This contains placeholders for attachment downloading
which will be in a future commit.

7 years agonntpd: avoid uninitialized warning
Eric Wong [Thu, 19 May 2016 08:06:05 +0000 (08:06 +0000)]
nntpd: avoid uninitialized warning

Oops, but at least it was mostly harmless, just ugly.

Followup-to: 9bfe40e7a4ac 'nntp: use "newsgroup" instead of "name"''
7 years agonntpd: reject control characters entirely
Eric Wong [Wed, 18 May 2016 18:58:04 +0000 (18:58 +0000)]
nntpd: reject control characters entirely

There's no place for them in the commands and we don't take
messages; potentially printing them into a log opened in a
terminal is too dangerous.

Hoist out read_til_dot in the test while we're at it.

7 years agotests: add check-www-inbox script
Eric Wong [Wed, 18 May 2016 02:48:37 +0000 (02:48 +0000)]
tests: add check-www-inbox script

This can be useful for hammering a live HTTP server
with requests to ensure it does not fall over under
load.

7 years agoview: avoid redirect to reply endpoint
Eric Wong [Wed, 18 May 2016 02:34:46 +0000 (02:34 +0000)]
view: avoid redirect to reply endpoint

Oops, but perhaps the "reply" endpoint should be embedded
into the permalink message view itself to reduce URLs.

7 years agofeed: inline feed entry generation
Eric Wong [Wed, 18 May 2016 02:27:07 +0000 (02:27 +0000)]
feed: inline feed entry generation

Remove unnecessary wrapper subroutines and constants
which are only used once.

7 years agohttp: release resources when idle
Eric Wong [Tue, 17 May 2016 08:16:47 +0000 (08:16 +0000)]
http: release resources when idle

This lets us release old git processes so unlinked packs
(leftover from repacking) can be released.  This may also
be helpful for Xapian as indices get rebuilt for tuning.

For SQLite (msgmap), the there may be no benefit besides
reducing FD pressure.

Followup changes will unify the Inbox and NewsGroup
classes and allow better code-sharing between NNTP and
HTTP classes (as well as the planned POP3 class).

7 years agoview: escape Message-ID for "next" link
Eric Wong [Tue, 17 May 2016 05:39:06 +0000 (05:39 +0000)]
view: escape Message-ID for "next" link

Oops, we need to escape Message-IDs since they can contain
bad characters such as '%' in them.  '@' actually seems fine
and does not need to be escaped; however, but we've been
doing it forever.

7 years agowww: fix for running under mount paths
Eric Wong [Sun, 15 May 2016 06:31:50 +0000 (06:31 +0000)]
www: fix for running under mount paths

We try to avoid issues like these by using relative URLs
in hrefs, but we can't avoid the problem with Location:
for redirects and Atom feeds which are likely to be
rehosted elsewhere.

We also reorder some of the code to work around a weird
issue on the psgi-plack mailing list:
<20160516073750.GA11931@dcvr.yhbt.net>
(Somewhere on https://groups.google.com/group/psgi-plack
 but it's probably not bookmarkable)

7 years agoconfig: allow taking an existing reference
Eric Wong [Sun, 15 May 2016 06:31:49 +0000 (06:31 +0000)]
config: allow taking an existing reference

This should make creating test cases easier and faster.

7 years agodeclare Inbox object for reusability
Eric Wong [Sat, 14 May 2016 06:10:36 +0000 (06:10 +0000)]
declare Inbox object for reusability

From the beginning, we've avoided objects here in favor
of faster startup time; but it may not be worth it
since a persistent httpd/nntpd is faster and -mda
isn't hit as often.

7 years agodoc: sync ~/.spamassassin/user_prefs with my prod machine
Eric Wong [Mon, 16 May 2016 02:31:04 +0000 (02:31 +0000)]
doc: sync ~/.spamassassin/user_prefs with my prod machine

This is what I'm running on public-inbox.org as of today.

7 years agombox: support /$INBOX/all.mbox.gz endpoint
Eric Wong [Sun, 15 May 2016 23:30:06 +0000 (23:30 +0000)]
mbox: support /$INBOX/all.mbox.gz endpoint

Allows easily downloading the entire archive without
special tools.  In any case, it's not yet advertised to via
HTML until we can test it better.  It'll also support range
queries in the future to avoid wasting bandwidth.

7 years agombox: consistent header order when decompressed
Eric Wong [Sun, 15 May 2016 23:43:20 +0000 (23:43 +0000)]
mbox: consistent header order when decompressed

This should make validating the output easier
when testing between different servers.

7 years agogit-http-backend: set cache headers
Eric Wong [Sun, 15 May 2016 03:44:58 +0000 (03:44 +0000)]
git-http-backend: set cache headers

Mostly stolen from git upstream, these should prevent any caches
such as varnish or squid from acting improperly.

7 years agorename most instances of "list" to "inbox"
Eric Wong [Sat, 14 May 2016 03:02:42 +0000 (03:02 +0000)]
rename most instances of "list" to "inbox"

A public-inbox is NOT necessarily a mailing list, but it
could serve as an input point for zero, one, or infinite
mailing lists :D

7 years agonntp: use "newsgroup" instead of "name"
Eric Wong [Sat, 14 May 2016 02:54:09 +0000 (02:54 +0000)]
nntp: use "newsgroup" instead of "name"

This reduces the cognitive overhead for mapping names of
configuration values to internal field names of our classes.
Further changes along these lines coming...

7 years agoimport ssoma-replay example script I've been using
Eric Wong [Sat, 14 May 2016 02:17:47 +0000 (02:17 +0000)]
import ssoma-replay example script I've been using

Unfortunately, most users still prefer their mail delivered
over SMTP; so we'll at least document mlmmj integration for now
until we can popularize pull-based reading over POP3/NNTP/ssoma.

7 years agot/nntpd: test for wide characters and UTF-8 mangling
Eric Wong [Sat, 14 May 2016 01:45:29 +0000 (01:45 +0000)]
t/nntpd: test for wide characters and UTF-8 mangling

We'll need to test non-UTF-8 messages at some point, too.
There are lots of legacy-encoded messages in old archives
and I would not bet we behave sanely w.r.t. those.

7 years agot/nntpd: avoid fork+exec for search indexing
Eric Wong [Sat, 14 May 2016 01:24:08 +0000 (01:24 +0000)]
t/nntpd: avoid fork+exec for search indexing

The Xapian search index is required for the NNTP server, so
there's no point in calling system() for it like we do in
other tests.  This should speed up the test a small amount.

7 years agobuild: support eatmydata in "make check" target by default
Eric Wong [Sat, 14 May 2016 01:16:15 +0000 (01:16 +0000)]
build: support eatmydata in "make check" target by default

This should help poor developers who still use rotating disks on
cheap netbooks.

7 years agonntp: fixup "Wide character" warnings
Eric Wong [Fri, 13 May 2016 12:12:41 +0000 (12:12 +0000)]
nntp: fixup "Wide character" warnings

We need Perl to believe everything we send is UTF-8,
make it so, even if it may not be.

Fixes: 265e79ff82ce 'Revert "nntp: proper UTF-8 support (hopefully?)"'
7 years agoRevert "nntp: proper UTF-8 support (hopefully?)"
Eric Wong [Sun, 8 May 2016 22:03:16 +0000 (22:03 +0000)]
Revert "nntp: proper UTF-8 support (hopefully?)"

This reverts commit f81ad477cb013d05b9b11fa051a9ebc5983a5be6.

The raw, undecoded body is probably what should be sent over the
wire anyways for clients to deal with.  We'll need this to avoid
deprecation warnings with Perl 5.24+ since we use
send()/recv()/sysread().

7 years agogit-http-backend: do not drop connection on successful finish
Eric Wong [Thu, 12 May 2016 09:32:39 +0000 (09:32 +0000)]
git-http-backend: do not drop connection on successful finish

We can maintain the client HTTP connection if the process exited
with failure as long as we terminated our own response properly.

7 years agoimport: fallback to email if '<>' exists in author name
Eric Wong [Thu, 12 May 2016 09:06:56 +0000 (09:06 +0000)]
import: fallback to email if '<>' exists in author name

git doesn't handle '<' and '>' characters in the author
name at all regardless of quoting, not just matched pairs.
So fall back to using the email as the author name since
the commit info isn't critical, anyways (shallow clones
are fine).

7 years agoimport: normalize body by stripping trailing newlines
Eric Wong [Thu, 12 May 2016 09:06:28 +0000 (09:06 +0000)]
import: normalize body by stripping trailing newlines

Mbox formatters may add extra newlines at the end of the
message, and that's not relevant for comparing messages
for deletion.

7 years agombox: sort messages by ascending date
Eric Wong [Fri, 6 May 2016 01:15:31 +0000 (01:15 +0000)]
mbox: sort messages by ascending date

This allows messages to be read in chronological order when
read without a mail client (e.g. with "zcat t.mbox.gz | less")

7 years agot/view: note possibly invalid test...
Eric Wong [Thu, 5 May 2016 20:11:42 +0000 (20:11 +0000)]
t/view: note possibly invalid test...

Ugh, I really need to get off my ass to write automated tests for
an Apache2 + mod_perl config.

7 years agogit-http-backend: reduce memory use for clone/fetch
Eric Wong [Tue, 3 May 2016 02:34:57 +0000 (02:34 +0000)]
git-http-backend: reduce memory use for clone/fetch

When serving large static files or large packs, we may call
Danga::Socket::write directly to queue up callbacks to resume
reading and defer firing them until the socket is writable.
This prevents us from scheduling writes or buffering until we
know the socket is writable and prevents needless buffering by
Danga::Socket when faced with slow clients.

For smart clones, this comes at the cost of throttling the
output of "git pack-objects" to the speed of the client
connection.  This is probably not ideal, but is the behavior of
the standard git-daemon, too; and is preferable to running the
httpd out-of-memory.  Buffering to the filesystem may be an
option in the future...

7 years agohttp: move empty string check into write callback
Eric Wong [Tue, 3 May 2016 02:52:23 +0000 (02:52 +0000)]
http: move empty string check into write callback

This empty string check is for middlewares such as Deflater
which may write empty strings, not for direct real callers of
Danga::Socket who (presumably) know what they're doing.

7 years agospawnpp: use native perl %ENV outside of mod_perl
Eric Wong [Tue, 3 May 2016 06:20:54 +0000 (06:20 +0000)]
spawnpp: use native perl %ENV outside of mod_perl

We only need to use env(1) under mod_perl; since mod_perl
is uncommon nowadays, support native %ENV for a teeny
speedup for folks uncomfortable with running vfork via
Inline::C snippet.

7 years agot/*.t: reduce -mda calls
Eric Wong [Mon, 2 May 2016 07:52:41 +0000 (07:52 +0000)]
t/*.t: reduce -mda calls

Process startup times are atrocious for fast tests and there's far
too much setup involved.  Rely on git-fast-import instead; but
more work is needed in this area.

7 years agot/nntpd.t: stop hard coding message :bytes into test
Eric Wong [Mon, 2 May 2016 07:36:05 +0000 (07:36 +0000)]
t/nntpd.t: stop hard coding message :bytes into test

It limits flexibility and makes it harder to switch
to use PublicImport::Import.

7 years agonntp: append Archived-At and List-Archive headers
Eric Wong [Mon, 2 May 2016 04:22:40 +0000 (04:22 +0000)]
nntp: append Archived-At and List-Archive headers

For readers using NNTP, we should do our best to advertise the
clonable HTTP/HTTPS URLs and the message permalink URL for
ease-of-referencing messages, since we don't want the NNTP server
and it's sequential article numbers to be relied on.

7 years agoview: disable subject threading
Eric Wong [Mon, 2 May 2016 03:20:22 +0000 (03:20 +0000)]
view: disable subject threading

Broken threads should be exposed to hopefully encourage people to
use proper mail clients which set In-Reply-To headers.

7 years agohttp: remove needless binmode call
Eric Wong [Mon, 2 May 2016 01:25:34 +0000 (01:25 +0000)]
http: remove needless binmode call

Unnecessary on *nix, and we won't support systems
which do insane things.

7 years agospawn: proper signal handling for vfork
Eric Wong [Mon, 2 May 2016 08:48:46 +0000 (08:48 +0000)]
spawn: proper signal handling for vfork

We cannot afford to fire Perl-level signal handlers in the
vforked child process since they're not designed to run in
the child like that.

Thus we need to block all signals before calling vfork, reset
signal dispositions in the child, and restore the signal mask in
the parent.

ref: https://ewontfix.com/7

7 years agogit-http-backend: use real lseek for Content-Range
Eric Wong [Sun, 1 May 2016 22:18:35 +0000 (22:18 +0000)]
git-http-backend: use real lseek for Content-Range

Since we use sysread, we must use sysseek for symmetry although
PerlIO may be doing a real lseek with "seek", anyways.

Fixes: 310819ea86ac ("git-http-backend: favor sysread for regular files")
7 years agodaemon: reduce timer-related allocations
Eric Wong [Sun, 1 May 2016 10:14:28 +0000 (10:14 +0000)]
daemon: reduce timer-related allocations

We can reduce the allocation and overhead needed for
Danga::Socket timers for immediately-executed responses by
combining identical timers and reducing anonymous sub creation.

7 years agomda: export @BAD_HEADERS variable
Eric Wong [Sun, 1 May 2016 08:54:10 +0000 (08:54 +0000)]
mda: export @BAD_HEADERS variable

This should allow users to change and add headers as needed.
While we're at it, add the X-Original-To header Postfix likes
to add; it seems like pointless bloat with the existence of
(important) Received: headers.

7 years agolinkify: match more URL characters [:,\$] and schemes
Eric Wong [Sun, 1 May 2016 01:54:07 +0000 (01:54 +0000)]
linkify: match more URL characters [:,\$] and schemes

Adding ':' (colon), ',' (comma), '$' (dollar sign) and
supporting TLS-enabled schemes: ftps, nntps variants as
well as gopher :D

7 years agolinkify: match '~' (tilde) in URLs
Eric Wong [Sun, 1 May 2016 01:47:10 +0000 (01:47 +0000)]
linkify: match '~' (tilde) in URLs

Tilde is common for some homepages: http://example.org/~user/
There's probably some other acceptable characters I'm missing.

7 years agodaemon: graceful shutdown warning and limit removal
Eric Wong [Sat, 30 Apr 2016 02:57:40 +0000 (02:57 +0000)]
daemon: graceful shutdown warning and limit removal

git clones may take longer than 30s, much longer...  So prepare
to wait almost indefinitely for sockets to timeout and document
the second signal behavior for immediate shutdown.

While we're at it, move parent death handling to a separate
class to avoid Danga::Socket->AddOtherFds, since that does not
allow proper handling the parent pipe being closed and would
actually misterminate a worker prematurely.  t/nntpd.t is update
to illustrate the failure with workers enabled.

We will work to keep memory usage low and let clients take their
time without interrupting them.

7 years agohttp: graceful shutdown for pi-httpd.async callers
Eric Wong [Sat, 30 Apr 2016 02:57:39 +0000 (02:57 +0000)]
http: graceful shutdown for pi-httpd.async callers

git clones may take a long time and it's wrong to
drop connections in the middle of a transaction.

7 years agosearchmsg: ensure long subject lines are not broken
Eric Wong [Sat, 30 Apr 2016 02:02:53 +0000 (02:02 +0000)]
searchmsg: ensure long subject lines are not broken

Noticed when using a long URL in the subject.

7 years agohttp: avoid lseek if no input
Eric Wong [Fri, 29 Apr 2016 20:21:39 +0000 (20:21 +0000)]
http: avoid lseek if no input

This saves us a system call for common GET/HEAD requests
with no upload body.

7 years agoTODO: add item for .mailmap support
Eric Wong [Fri, 29 Apr 2016 20:06:14 +0000 (20:06 +0000)]
TODO: add item for .mailmap support

Email addresses get out-of-date, so make sure they're mapped
properly for future readers.  git and linux-kernel already have
an established convention for this, so we will follow it.

7 years agohttp: improve error handling for aborted responses
Eric Wong [Fri, 29 Apr 2016 03:32:20 +0000 (03:32 +0000)]
http: improve error handling for aborted responses

We need to abort connections properly if a response is prematurely
truncated.  This includes problems with serving static files, since
a clumsy admin or broken FS could return truncated responses and
inadvertently leave a client waiting (since the client saw
"Content-Length" in the header and expected a certain length).

7 years agogit-http-backend: check EINTR as well as EAGAIN
Eric Wong [Mon, 7 Mar 2016 19:10:33 +0000 (19:10 +0000)]
git-http-backend: check EINTR as well as EAGAIN

The blocking PSGI server may cause EINTR to be hit, here.

8 years agohttp: avoid corking on "Content-Length: 0" response
Eric Wong [Fri, 29 Apr 2016 04:00:24 +0000 (04:00 +0000)]
http: avoid corking on "Content-Length: 0" response

We must use a normal write instead of send(.., MSG_MORE)
when writing responses of "Content-Length: 0" to avoid
the corking effect MSG_MORE provides.  We only want to
cork headers if we will send a non-empty body.

Fixes: c3eeaf664cf0 ("http: clarify intent for persistence")
This needs a proper test.

8 years agogithttpbackend: clamp to one smart HTTP request at-a-time
Eric Wong [Thu, 28 Apr 2016 01:56:08 +0000 (01:56 +0000)]
githttpbackend: clamp to one smart HTTP request at-a-time

Server admins may not be able to afford to have too many
git-pack-objects processes running at once.  Since PSGI
HTTP servers should already be configured to use multiple
processes for other requests; limit concurrency of smart
backends to one; and fall back to dumb responses if we're
already generating a pack.

8 years agogithttpbackend: fall back to dumb if smart HTTP is off
Eric Wong [Thu, 28 Apr 2016 01:56:07 +0000 (01:56 +0000)]
githttpbackend: fall back to dumb if smart HTTP is off

Using http.getanyfile still keeps the http-backend process
alive, so it's better to break out of that process and
handle serving entirely within the HTTP server.

8 years agoimport: run git-update-server-info when done
Eric Wong [Thu, 28 Apr 2016 01:03:31 +0000 (01:03 +0000)]
import: run git-update-server-info when done

We should update $GIT_DIR/info/refs for dumb HTTP clients
whenever we make changes to the repository.  The best place
to update is immediately after making commits.

This fixes a bug where public-inbox-learn did not properly
update $GIT_DIR/info/refs after inserting or removing
messages.