Eric Wong [Sat, 22 Aug 2015 08:00:37 +0000 (08:00 +0000)]
remove XML::Atom::SimpleFeed dependency
We will attempt to generate Atom feeds "by hand" as the
XML::Atom::SimpleFeed API does not support streaming output.
Since email is large and servers are small, this should prevent
wasting memory when we generate larger feeds.
Of course, we hope clients use SAX parsers capable of handling
large streams without slurping.
Eric Wong [Sat, 22 Aug 2015 00:06:45 +0000 (00:06 +0000)]
stream HTML views as much as possible
This should allow progressive rendering on the client and reduce
memory usage on the server. Unfortunately XML::Atom::SimpleFeed
does not yet support streaming, so we may not use it in the
future.
Eric Wong [Thu, 20 Aug 2015 10:17:34 +0000 (10:17 +0000)]
search: preserve References: order in document data
We need proper ordering of References to thread messages
correctly. We would lose this order if we load the terms
from the database, so set it directly document data.
Do not bother with a separate In-Reply-To, since Mail::Thread
just merges the IRT into References. This bumps our schema
version once again.
Eric Wong [Thu, 20 Aug 2015 08:54:32 +0000 (08:54 +0000)]
avoid using header_raw for Message-ID retrieval
This is for consistency with ssoma. I doubt it makes
a difference in practice, but in case somebody decides
any of the Message-ID-containing headers should have
strange characters, we'll decode and attempt to thread
them. This isn't an attack vector, just a way to
make messages thread improperly which is pointless...
Eric Wong [Thu, 20 Aug 2015 02:43:20 +0000 (02:43 +0000)]
index: layout fix + title and Atom feed links at top
Add some spacing between topics to improve readability when
scanning or in case a subject gets too long.
The title and Atom feed may not be highly-visible otherwise.
While we're at it, use the proper "Atom feed" terminology since
some folks may not understand just what "atom" means.
Eric Wong [Thu, 20 Aug 2015 02:32:29 +0000 (02:32 +0000)]
search: bump schema version to 5 for subject_path
In "index: simplify main landing page if search-enabled",
subject normalization went a little farther to drop trailing
'.' characters, so we will need to re-index.
Eric Wong [Thu, 20 Aug 2015 02:30:32 +0000 (02:30 +0000)]
view: reduce memory usage when displaying large threads
We want to minimize the time any large objects or strings
are referenced. We can do threading entirely from the
mini_mime-generated messages and lazilly load full messages
when rendering the display.
Eric Wong [Thu, 20 Aug 2015 02:30:27 +0000 (02:30 +0000)]
use tables for rendering comment nesting
This is more space efficient since we don't need to place padding
bytes in front of every line. While this unfortunately does not
render well on lynx; w3m, links, elinks can all render tables
sanely.
Tables are also superior for long lines which require wrapping
inside <pre> containers.
Eric Wong [Thu, 20 Aug 2015 02:30:25 +0000 (02:30 +0000)]
feed: remove threading from index
We'll be making the index smarter for people with search
support enabled. Otherwise, it'll be chronological and
a bit dumb. At least it'll use less memory.
Eric Wong [Tue, 18 Aug 2015 01:11:05 +0000 (01:11 +0000)]
search: common Subject: normalization for Re: prefixes
Drop German ("Aw:") support since it's non-standard and
is not supported by Mail::Thread and non-English prefixes
are more likely to conflict with prefixes used in Free Software
development where ("subsection:") prefixes are common and English is the
common language.
Anyways we don't filter "Vs: " (Finnish) or "Sv: "
(Norwegian, Swedish, Danish, Icelandic), either.
Eric Wong [Mon, 17 Aug 2015 20:15:31 +0000 (20:15 +0000)]
view: do not recompress already-compressed MID for anchors
This is merely for display, so on the off chance somebody does
send a 40-byte MID with nothing but hexadecimal characters,
the worst that could happen is we repeat an anchor name in the
rendered HTML. This has no impact on git archival or Xapian
indexing.
Eric Wong [Mon, 17 Aug 2015 16:49:31 +0000 (16:49 +0000)]
search: simplify indexing operation
There's no need to make a transaction for each message when doing
incremental indexing against a git repository. While we're at it,
simplify the interface for callers, too and do not auto-create
the Xapian database if it was not explicitly enabled.
Eric Wong [Mon, 17 Aug 2015 07:46:54 +0000 (07:46 +0000)]
mid: compress Message-IDs with '%' in them
Some HTTP servers (apache2 2.2.22-13+deb7u5) on my system
apparently do not handle "%25" correctly. I'm not yet sure if
it's something weird with my rewrite rules or what....
Eric Wong [Mon, 17 Aug 2015 02:41:14 +0000 (02:41 +0000)]
search: use raw headers without MIME decoding
This should be less error-prone in case somebody tries to screw with
us and our thread_id mechanism or somehow waste our resources.
Unfortunately Mail::Thread isn't smart enough for this, yet, so we
may need to downgrade to Email::Simple objects as a workaround.
Or simply not worry about the display so much if somebody is
intentionally trying to make it thread badly/incorrectly.
Eric Wong [Mon, 17 Aug 2015 02:41:10 +0000 (02:41 +0000)]
favor /t/ to /s/, since subjects may change mid-thread
/t/ always falls back to subject path searching anyways,
so there's little lost besides perhaps more readable URLs.
Unfortunately people still use non-compliant mail clients which fail
to set In-Reply-To or References headers :<
Eric Wong [Sat, 15 Aug 2015 23:57:39 +0000 (23:57 +0000)]
view: reply threading adjustment
Give changes in subject their own line to reduce line wrapping,
but avoid showing any redundant subjects by maintaining a hash
of subjects already displayed.
Eric Wong [Tue, 14 Jul 2015 21:01:18 +0000 (21:01 +0000)]
reject HTML loudly and automatically
This should hopefully reduce the delay between when a user fails
to send plain-text to when an admin such as myself notices the
HTML mail in a sea of spam.
Unfortunately, this can lead to backscatter, so avoid doing it
until its passed through spamc, at least.
Eric Wong [Mon, 12 Jan 2015 01:16:04 +0000 (01:16 +0000)]
import_slrnspool: fork a process for each message
This prevents process growth when importing large messages.
Memory growth could be due to the sliding sbrk window in glibc malloc
or a circular reference in the Email::* Perl code somewhere.
Eric Wong [Sun, 11 Jan 2015 11:03:41 +0000 (11:03 +0000)]
import_slrnspool: use ssoma-mda instead
Some mailing lists (e.g. git@vger.kernel.org) accept messages
via Bcc: and possibly other things which get rejected by
the strict PublicInbox::Filter rules. So rely on ssoma-mda
instead.
This prefers a recent revision of ssoma-mda (commit 7fce38e9
onwards) to display subject/author/date information in the
commit message.