Eric Wong [Sun, 30 Aug 2015 01:51:22 +0000 (01:51 +0000)]
view: remove "threadlink" from thread view
We're already inside the thread, and our thread summary inside
/m/$MESSAGE_ID/ is already sufficient got navigate back to the
/t/$MESSAGE_ID/ page. So I think it's sufficient to keep the
/t/$MESSAGE_ID/ page lighter with fewer links and avoid
introducing strange terminology.
In contrast, "permalink" is relatively well-known and
not an alien term to readers:
Eric Wong [Sun, 30 Aug 2015 01:26:46 +0000 (01:26 +0000)]
mid2path: clean MID of angle brackets '<>'
We screwed up and needed to fix URL generation with '<>'
in them. Regardless, users may attempt to copy and paste
URLs with '<>' in them, do not punish them for that.
Eric Wong [Sun, 30 Aug 2015 01:04:31 +0000 (01:04 +0000)]
public-inbox-index: resolve git directory if run inside one
I often forget to pass the correct path to a git directory
or run from inside one. Fortunately git is script-friendly
and allows easily resolving the correct GIT_DIR path.
Eric Wong [Sun, 30 Aug 2015 00:38:05 +0000 (00:38 +0000)]
search: do not index references and inreplyto terms
We no longer need them, as we can rely on index-time thread
resolution and thread merging. This allows us to index less
data and hopefully increase efficiency.
Eric Wong [Sun, 30 Aug 2015 00:22:43 +0000 (00:22 +0000)]
view: display thread outline in single-message view
If Xapian search is available, we can load most of the
entire thread and show a more meaningful navigation tree
than the References: and In-Reply-To: headers. Searching
on those headers themselves is unreliable because it is
possible for clients to omit some references.
Eric Wong [Fri, 28 Aug 2015 00:00:47 +0000 (00:00 +0000)]
search: do not iterate through entire termlist
A document may have many terms, so this hurts performance
if we blindly iterate. Unfortunately, we can't rely on the
order of the termlist just yet, either, so we must repeatedly
restart the search for now until we're ready to bump schema
versions.
Eric Wong [Fri, 28 Aug 2015 00:21:46 +0000 (00:21 +0000)]
GitCatFile: remove unnecessary FD_CLOEXEC setting
Unless some idiot raises $^F, we should not have to care about
the close-on-exec flag. Everything since Perl 3.0 seems to set
it by default, and 5.6 got more consistent about it.
Eric Wong [Thu, 27 Aug 2015 04:34:01 +0000 (04:34 +0000)]
wire up to display non-suffixed Message-ID links
These URLs are preferable in case somebody decides to get cute and
use a suffix we would've used to prevent others from linking to
their message. The common /m/$MESSAGE_ID/ URLs are now 4 characters
shorter so may fit better on terminals.
Eric Wong [Thu, 27 Aug 2015 04:34:00 +0000 (04:34 +0000)]
mid: extract Message-ID from inside '<>'
This is necessary for some mailers which include comment text
in in the In-Reply-To header, merely assuming there is nothing
outside of '<>' as we were doing is not enough.
Eric Wong [Thu, 27 Aug 2015 04:33:59 +0000 (04:33 +0000)]
wire up shorter, less ambiguous URLs
We will prefer URLs without suffixes for now to avoid ambiguity
in case a Message-ID ends with ".html", ".txt", ".mbox.gz" or
any other suffix we may use.
Static file compatibility is preserved by using a trailing slash
as most servers can/will fall back to an index.html file in this
case.
For raw text files, we will follow gmane's lead with "/raw"
Eric Wong [Mon, 24 Aug 2015 02:25:46 +0000 (02:25 +0000)]
view: refactor $state as a hash
Using hash means we no longer have to document and remember what
every field does. The original array form was insane premature
optimization and crazy. Who wrote that? Oh wait, I was on
drugs :<
Eric Wong [Sun, 23 Aug 2015 20:05:41 +0000 (20:05 +0000)]
cleanup calls to header_obj
Dereference header_obj only once when performance may be
critical, or simplify our code by calling "header" directly on
the Email::{Simple,MIME} object if not.
Eric Wong [Sun, 23 Aug 2015 19:41:28 +0000 (19:41 +0000)]
hopefully fix broken permissions for search
We must preserve the umask for the entirety of the indexing
operation, as Xapian transactions replace entire files
atomically instead of writing them in place.
Eric Wong [Sun, 23 Aug 2015 00:31:28 +0000 (00:31 +0000)]
mbox: clarify our use of the the mboxrd variant
Commenting it in the From: line seems appropriate and
reduces compatibility problems in case a MUA cannot handle
trailing comments after the timestamp.
Eric Wong [Sun, 23 Aug 2015 00:02:34 +0000 (00:02 +0000)]
.txt links return an mbox instead
This improves compatibility and allows individual messages
to be concatenated into an existing mbox without further
modifications. "git format-patch" does something similar
(but does not do "From " line escaping(!))
Eric Wong [Sat, 22 Aug 2015 11:41:24 +0000 (11:41 +0000)]
view: wire up mbox.gz links
To reduce clutter, we will not link to uncompressed versions.
Users should be able to download entire threads for offline
reading, enable this feature for them.
Eric Wong [Sat, 22 Aug 2015 08:00:37 +0000 (08:00 +0000)]
remove XML::Atom::SimpleFeed dependency
We will attempt to generate Atom feeds "by hand" as the
XML::Atom::SimpleFeed API does not support streaming output.
Since email is large and servers are small, this should prevent
wasting memory when we generate larger feeds.
Of course, we hope clients use SAX parsers capable of handling
large streams without slurping.
Eric Wong [Sat, 22 Aug 2015 00:06:45 +0000 (00:06 +0000)]
stream HTML views as much as possible
This should allow progressive rendering on the client and reduce
memory usage on the server. Unfortunately XML::Atom::SimpleFeed
does not yet support streaming, so we may not use it in the
future.
Eric Wong [Thu, 20 Aug 2015 10:17:34 +0000 (10:17 +0000)]
search: preserve References: order in document data
We need proper ordering of References to thread messages
correctly. We would lose this order if we load the terms
from the database, so set it directly document data.
Do not bother with a separate In-Reply-To, since Mail::Thread
just merges the IRT into References. This bumps our schema
version once again.
Eric Wong [Thu, 20 Aug 2015 08:54:32 +0000 (08:54 +0000)]
avoid using header_raw for Message-ID retrieval
This is for consistency with ssoma. I doubt it makes
a difference in practice, but in case somebody decides
any of the Message-ID-containing headers should have
strange characters, we'll decode and attempt to thread
them. This isn't an attack vector, just a way to
make messages thread improperly which is pointless...
Eric Wong [Thu, 20 Aug 2015 02:43:20 +0000 (02:43 +0000)]
index: layout fix + title and Atom feed links at top
Add some spacing between topics to improve readability when
scanning or in case a subject gets too long.
The title and Atom feed may not be highly-visible otherwise.
While we're at it, use the proper "Atom feed" terminology since
some folks may not understand just what "atom" means.
Eric Wong [Thu, 20 Aug 2015 02:32:29 +0000 (02:32 +0000)]
search: bump schema version to 5 for subject_path
In "index: simplify main landing page if search-enabled",
subject normalization went a little farther to drop trailing
'.' characters, so we will need to re-index.
Eric Wong [Thu, 20 Aug 2015 02:30:32 +0000 (02:30 +0000)]
view: reduce memory usage when displaying large threads
We want to minimize the time any large objects or strings
are referenced. We can do threading entirely from the
mini_mime-generated messages and lazilly load full messages
when rendering the display.
Eric Wong [Thu, 20 Aug 2015 02:30:27 +0000 (02:30 +0000)]
use tables for rendering comment nesting
This is more space efficient since we don't need to place padding
bytes in front of every line. While this unfortunately does not
render well on lynx; w3m, links, elinks can all render tables
sanely.
Tables are also superior for long lines which require wrapping
inside <pre> containers.
Eric Wong [Thu, 20 Aug 2015 02:30:25 +0000 (02:30 +0000)]
feed: remove threading from index
We'll be making the index smarter for people with search
support enabled. Otherwise, it'll be chronological and
a bit dumb. At least it'll use less memory.
Eric Wong [Tue, 18 Aug 2015 01:11:05 +0000 (01:11 +0000)]
search: common Subject: normalization for Re: prefixes
Drop German ("Aw:") support since it's non-standard and
is not supported by Mail::Thread and non-English prefixes
are more likely to conflict with prefixes used in Free Software
development where ("subsection:") prefixes are common and English is the
common language.
Anyways we don't filter "Vs: " (Finnish) or "Sv: "
(Norwegian, Swedish, Danish, Icelandic), either.
Eric Wong [Mon, 17 Aug 2015 20:15:31 +0000 (20:15 +0000)]
view: do not recompress already-compressed MID for anchors
This is merely for display, so on the off chance somebody does
send a 40-byte MID with nothing but hexadecimal characters,
the worst that could happen is we repeat an anchor name in the
rendered HTML. This has no impact on git archival or Xapian
indexing.
Eric Wong [Mon, 17 Aug 2015 16:49:31 +0000 (16:49 +0000)]
search: simplify indexing operation
There's no need to make a transaction for each message when doing
incremental indexing against a git repository. While we're at it,
simplify the interface for callers, too and do not auto-create
the Xapian database if it was not explicitly enabled.
Eric Wong [Mon, 17 Aug 2015 07:46:54 +0000 (07:46 +0000)]
mid: compress Message-IDs with '%' in them
Some HTTP servers (apache2 2.2.22-13+deb7u5) on my system
apparently do not handle "%25" correctly. I'm not yet sure if
it's something weird with my rewrite rules or what....