]>
Sergey Matveev's repositories - public-inbox.git/log
Eric Wong [Tue, 18 Aug 2015 03:17:17 +0000 (03:17 +0000)]
view: close anchor tag correctly before starting another
Noticed by tidy
Eric Wong [Tue, 18 Aug 2015 03:17:16 +0000 (03:17 +0000)]
public-inbox-index: exit with usage if not given an arg
I often forget how to use this myself :x
Eric Wong [Tue, 18 Aug 2015 02:05:32 +0000 (02:05 +0000)]
thread: another workaround for a Mail::Thread bug
Yay for monkey patching!
ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=795913
ref: https://rt.cpan.org/Ticket/Display.html?id=106498
Eric Wong [Tue, 18 Aug 2015 01:13:03 +0000 (01:13 +0000)]
search: bump SCHEMA_VERSION to 4
The following two commits affect indexing behavior, so
change the schema version to avoid compatibility problems
or missing messages:
search: common Subject: normalization for Re: prefixes
search: avoid creating ghosts for circular References
Eric Wong [Tue, 18 Aug 2015 01:11:06 +0000 (01:11 +0000)]
search: expose $PublicInbox::Search::LANG variable
This makes it easier to reconfigure for non-English users
Eric Wong [Tue, 18 Aug 2015 01:11:05 +0000 (01:11 +0000)]
search: common Subject: normalization for Re: prefixes
Drop German ("Aw:") support since it's non-standard and
is not supported by Mail::Thread and non-English prefixes
are more likely to conflict with prefixes used in Free Software
development where ("subsection:") prefixes are common and English is the
common language.
Anyways we don't filter "Vs: " (Finnish) or "Sv: "
(Norwegian, Swedish, Danish, Icelandic), either.
ref:
https://en.wikipedia.org/wiki/RE_(e-mail)#Abbreviations_in_other_languages
Eric Wong [Tue, 18 Aug 2015 01:11:04 +0000 (01:11 +0000)]
search: avoid creating ghosts for circular References
Some mail software incorrectly creates circular references
and causes us to create ghosts before the actual mail doc
is created.
Eric Wong [Tue, 18 Aug 2015 01:08:28 +0000 (01:08 +0000)]
view: cleaner Message-ID filtering for References
Avoid compiling a weird and potentially fragile regexp every
time and use the same logic as our search module to dedupe
References.
Eric Wong [Mon, 17 Aug 2015 20:15:31 +0000 (20:15 +0000)]
view: do not recompress already-compressed MID for anchors
This is merely for display, so on the off chance somebody does
send a 40-byte MID with nothing but hexadecimal characters,
the worst that could happen is we repeat an anchor name in the
rendered HTML. This has no impact on git archival or Xapian
indexing.
Eric Wong [Mon, 17 Aug 2015 16:49:31 +0000 (16:49 +0000)]
search: simplify indexing operation
There's no need to make a transaction for each message when doing
incremental indexing against a git repository. While we're at it,
simplify the interface for callers, too and do not auto-create
the Xapian database if it was not explicitly enabled.
Eric Wong [Mon, 17 Aug 2015 08:19:44 +0000 (08:19 +0000)]
public-inbox-{learn,mda}: automatically sync index
We'll ignore errors, for now, but should eventually warn or
log. And yes, this is a dirty, dirty hack but we'll fix this
ASAP tomorrow.
Eric Wong [Mon, 17 Aug 2015 08:05:03 +0000 (08:05 +0000)]
view: always compress Message-IDs for anchors
Valid URLs do not make valid anchor ids.
Eric Wong [Mon, 17 Aug 2015 07:56:39 +0000 (07:56 +0000)]
search: bump schema version for '%' compression change
commit
0fea7793b22efd2596983283947ee43687e0cfac
("mid: compress Message-IDs with '%' in them")
requires re-indexing of repositories with '%' in Message-IDs :<
Eric Wong [Mon, 17 Aug 2015 07:46:54 +0000 (07:46 +0000)]
mid: compress Message-IDs with '%' in them
Some HTTP servers (apache2 2.2.22-13+deb7u5) on my system
apparently do not handle "%25" correctly. I'm not yet sure if
it's something weird with my rewrite rules or what....
Eric Wong [Mon, 17 Aug 2015 03:20:44 +0000 (03:20 +0000)]
search: apply mid_compression to subject paths, too
Otherwise we'll be wasting space in our index for long
subjects.
Eric Wong [Mon, 17 Aug 2015 02:41:18 +0000 (02:41 +0000)]
drop bodies and messages ASAP after processing
We can rely on reference counting to lower memory usage for
big messages.
Eric Wong [Mon, 17 Aug 2015 02:41:16 +0000 (02:41 +0000)]
feed: disable the generator statement
No need to waste bandwidth, here
Eric Wong [Mon, 17 Aug 2015 02:41:14 +0000 (02:41 +0000)]
search: use raw headers without MIME decoding
This should be less error-prone in case somebody tries to screw with
us and our thread_id mechanism or somehow waste our resources.
Unfortunately Mail::Thread isn't smart enough for this, yet, so we
may need to downgrade to Email::Simple objects as a workaround.
Or simply not worry about the display so much if somebody is
intentionally trying to make it thread badly/incorrectly.
Eric Wong [Mon, 17 Aug 2015 02:41:13 +0000 (02:41 +0000)]
terminology: replies => followups
Replies are only direct replies, but followups could be any message
further down the thread. The latter is more useful.
Eric Wong [Mon, 17 Aug 2015 02:41:12 +0000 (02:41 +0000)]
www: simplify parameter passing to feed
No need to create a new hash when we can reuse the existing one
more.
Eric Wong [Mon, 17 Aug 2015 02:41:11 +0000 (02:41 +0000)]
WWW: eliminate "top" parameter for feeds
This parameter hasn't been used since
commit
5adf8d639e9b5abd4cbac975d70ddc0fb76541fc
("feed: dead code elimination around dropped endpoints")
Eric Wong [Mon, 17 Aug 2015 02:41:10 +0000 (02:41 +0000)]
favor /t/ to /s/, since subjects may change mid-thread
/t/ always falls back to subject path searching anyways,
so there's little lost besides perhaps more readable URLs.
Unfortunately people still use non-compliant mail clients which fail
to set In-Reply-To or References headers :<
Eric Wong [Mon, 17 Aug 2015 02:41:09 +0000 (02:41 +0000)]
feed: remove unnecesary time paramenter in index state
We no longer do "smart" time displays as of
commit
ea0e8649f90d1fd0850a41c0ca16642faadf4f14
("view: simplify timestamp generation").
In retrospect, that commit also made us more cache-friendly, too.
Eric Wong [Mon, 17 Aug 2015 02:41:06 +0000 (02:41 +0000)]
skip search test if search support is missing
We will not require Search::Xapian to be installed.
Eric Wong [Mon, 17 Aug 2015 03:11:43 +0000 (03:11 +0000)]
Merge remote-tracking branch 'origin/search'
* origin/search:
view: deduplicate common code for loading search results
SearchMsg: ensure metadata for ghost messages mid
implement /s/$SUBJECT_PATH.html lookups
search: remove unnecessary xpfx export
www: /t/$MESSAGE_ID.html for threads
view: hoist out index_walk function
view: reply threading adjustment
thread: common sorting code
view: display replies in per-message view
search: make search results more OO
extract redundant Message-ID handling code
search: implement index_sync to fixup indexer
initial search backend implementation
Eric Wong [Sun, 16 Aug 2015 20:51:05 +0000 (20:51 +0000)]
view: kill leading empty lines correctly
Was too sleepy to be coding last night :x
Eric Wong [Sun, 16 Aug 2015 09:12:24 +0000 (09:12 +0000)]
view: cleaner killing of leading/trailing whitespace
No point in wasting bytes even if gets compressed over
the wire, it'll use more memory when rendering on the
client.
Eric Wong [Sun, 16 Aug 2015 01:42:13 +0000 (01:42 +0000)]
view: hoist out index_walk function
We will reuse it for thread views when powered by Xapian.
Eric Wong [Sun, 16 Aug 2015 08:53:41 +0000 (08:53 +0000)]
view: deduplicate common code for loading search results
More to come later.
Eric Wong [Sun, 16 Aug 2015 08:32:18 +0000 (08:32 +0000)]
SearchMsg: ensure metadata for ghost messages mid
Ghosts have no document data in them.
Perhaps we should just rely on terms for Message-ID
and avoid storing that in the document data...
Eric Wong [Sun, 16 Aug 2015 08:14:40 +0000 (08:14 +0000)]
implement /s/$SUBJECT_PATH.html lookups
Quick-and-dirty wiring up of to Subject: paths.
This may prove more memorizable and easier-to-share than
/t/$MESSAGE_ID.html links, but less strict.
This changes our schema version to 1, since we now
use lower-case subject paths.
Eric Wong [Sun, 16 Aug 2015 07:25:11 +0000 (07:25 +0000)]
search: remove unnecessary xpfx export
SearchMsg calls it with the full module path anyways.
Eric Wong [Sun, 16 Aug 2015 02:17:14 +0000 (02:17 +0000)]
www: /t/$MESSAGE_ID.html for threads
This should bring up nearly the entire thread a given
Message-ID is linked to.
Eric Wong [Sun, 16 Aug 2015 01:42:13 +0000 (01:42 +0000)]
view: hoist out index_walk function
We will reuse it for thread views when powered by Xapian.
Eric Wong [Sat, 15 Aug 2015 23:57:39 +0000 (23:57 +0000)]
view: reply threading adjustment
Give changes in subject their own line to reduce line wrapping,
but avoid showing any redundant subjects by maintaining a hash
of subjects already displayed.
Eric Wong [Sat, 15 Aug 2015 23:41:21 +0000 (23:41 +0000)]
thread: common sorting code
We'll be sharing the same threading, so it makes sense to sort
replies using the same code and message headers without repeating
ourselves.
This also standardizes on sorting on X-PI-TS (Unix epoch in seconds)
instead over using X-PI-Date differently in two different places
Eric Wong [Sat, 15 Aug 2015 09:28:34 +0000 (09:28 +0000)]
view: display replies in per-message view
This can be used to quickly scan for replies in a message without
displaying an entire thread.
Eric Wong [Sat, 15 Aug 2015 09:28:33 +0000 (09:28 +0000)]
search: make search results more OO
This will relieve callers of the need to decode the data
we store internally in Xapian
Eric Wong [Sat, 15 Aug 2015 09:28:32 +0000 (09:28 +0000)]
extract redundant Message-ID handling code
Quit repeating ourselves and use a common MID module
instead.
Eric Wong [Sat, 15 Aug 2015 09:28:31 +0000 (09:28 +0000)]
search: implement index_sync to fixup indexer
We need to make the indexer executable and installable
while we're at it.
Eric Wong [Thu, 13 Aug 2015 02:32:22 +0000 (02:32 +0000)]
initial search backend implementation
This shall allow us to search for replies/threads more easily.
Eric Wong [Wed, 12 Aug 2015 22:41:10 +0000 (22:41 +0000)]
view: consistent ordering of Cc: addresses
This fixes a minor test failure in t/cgi.t
Tested with perl 5.18.2-2ubuntu1 on Ubuntu 14.04.3 LTS
Eric Wong [Wed, 5 Aug 2015 23:36:42 +0000 (23:36 +0000)]
view: remove unused $enc_mime Encoding object
Unneeded since commit
e022d3377fd2c50fd9931bf96394728958a90bf3
("huge refactor of encoding handling")
Eric Wong [Wed, 5 Aug 2015 23:29:34 +0000 (23:29 +0000)]
view: pass fallback encoding to enc_for
This fixes the fallback to message encoding if the message
itself was not UTF-8
Eric Wong [Sun, 2 Aug 2015 06:35:57 +0000 (06:35 +0000)]
public-inbox-learn: preserve headers for ham injection
We must inject headers properly for injecting ham, otherwise
List-Id headers get dropped.
Eric Wong [Wed, 29 Jul 2015 18:09:41 +0000 (18:09 +0000)]
view: simplify timestamp generation
It's seems less ambiguous to parse a consistent in quiet lists
where messages are sparse.
Eric Wong [Mon, 20 Jul 2015 21:53:14 +0000 (21:53 +0000)]
feed: extract subroutines for threading
We'll be using this in the future for displaying per-thread
views.
Eric Wong [Tue, 14 Jul 2015 21:09:50 +0000 (21:09 +0000)]
scripts/dc-dlvr.pre: ensure stderr gets back to the MTA
We want to be able to reject errors back to the MTA.
Eric Wong [Tue, 14 Jul 2015 21:01:18 +0000 (21:01 +0000)]
reject HTML loudly and automatically
This should hopefully reduce the delay between when a user fails
to send plain-text to when an admin such as myself notices the
HTML mail in a sea of spam.
Unfortunately, this can lead to backscatter, so avoid doing it
until its passed through spamc, at least.
Eric Wong [Mon, 6 Jul 2015 21:22:22 +0000 (21:22 +0000)]
feed: compile regexps only once
This avoids some runtime penalties associated with recompiling
a regexp based on a constant local variable.
Eric Wong [Mon, 6 Jul 2015 20:52:33 +0000 (20:52 +0000)]
view: reduce empty <a>, use "id" instead of "name" attributes
This is probably more compliant, and saves us a few bytes
on the uncompressed HTML.
Eric Wong [Mon, 6 Jul 2015 20:11:29 +0000 (20:11 +0000)]
feed: close body tag correctly in index
Oops, noticed by manual inspection. One day we'll run tidy in tests
to validate.
Eric Wong [Fri, 5 Jun 2015 17:45:26 +0000 (17:45 +0000)]
public-inbox-mda: preserve SpamAssassin headers in spam
We want to be able to prioritize spam downstream to check for
borderline cases.
Eric Wong [Wed, 4 Mar 2015 20:50:34 +0000 (20:50 +0000)]
view: fix linkification and quote-folding conflicts
We can't add newlines to links, unfortunately, because
quote-folding is line-based and (being regexp-based) needs
to happen after linkification.
Eric Wong [Mon, 9 Feb 2015 22:33:50 +0000 (22:33 +0000)]
view: generate links for common protocols in browsers
SpamAssassin queries URI blacklists, so it's probably OK
to enable this without being used as a linkfarm.
Eric Wong [Mon, 29 Dec 2014 19:25:50 +0000 (19:25 +0000)]
doc/design_www: remove item for auto-generated links
SpamAssassin queries URI blacklists, so it's probably OK
to start generating links in the future...
Eric Wong [Mon, 12 Jan 2015 01:16:04 +0000 (01:16 +0000)]
import_slrnspool: fork a process for each message
This prevents process growth when importing large messages.
Memory growth could be due to the sliding sbrk window in glibc malloc
or a circular reference in the Email::* Perl code somewhere.
Eric Wong [Sun, 11 Jan 2015 23:58:55 +0000 (23:58 +0000)]
import_slrnspool: load private config key
PublicInbox::Config->lookup won't return unknown keys
Eric Wong [Sun, 11 Jan 2015 23:55:27 +0000 (23:55 +0000)]
import_slrnspool: graceful exit for interruptibility
This should alleviate fears of interrupting the process.
Eric Wong [Sun, 11 Jan 2015 23:13:08 +0000 (23:13 +0000)]
import_slrnspool: make filtering optional
Eric Wong [Sun, 11 Jan 2015 11:03:41 +0000 (11:03 +0000)]
import_slrnspool: use ssoma-mda instead
Some mailing lists (e.g. git@vger.kernel.org) accept messages
via Bcc: and possibly other things which get rejected by
the strict PublicInbox::Filter rules. So rely on ssoma-mda
instead.
This prefers a recent revision of ssoma-mda (commit
7fce38e9
onwards) to display subject/author/date information in the
commit message.
Eric Wong [Sun, 11 Jan 2015 10:56:02 +0000 (10:56 +0000)]
.gitignore: relax MYMETA pattern
Newer systems may use .json instead of .yml
Eric Wong [Sun, 11 Jan 2015 08:54:25 +0000 (08:54 +0000)]
filter: handle missing Content-Type
Some mailers may omit the Content-Type header entirely,
so do detection and try to get the message through.
Eric Wong [Sun, 11 Jan 2015 04:46:27 +0000 (04:46 +0000)]
*slrnspool* old gmane archives set Original-To
Apparently it's not a problem with recent archives.
Eric Wong [Sun, 11 Jan 2015 04:43:59 +0000 (04:43 +0000)]
import_slrnspool: fix off-by-one error
We start with zero and only store the next valid ID.
Eric Wong [Sun, 11 Jan 2015 04:28:46 +0000 (04:28 +0000)]
scripts/import_slrnspool: new incremental importer
This allows incremental imports of slrn spools, ideal for
tracking lists via gmane.
Eric Wong [Mon, 22 Dec 2014 01:38:24 +0000 (01:38 +0000)]
doc: generate README.html instead of index.html
This allows us to generate links without caring about discoverability
and remains reasonably WYSIWYG for folks editing our documentation in
their favorite $EDITOR
Eric Wong [Mon, 22 Dec 2014 01:37:58 +0000 (01:37 +0000)]
Documentation/txt2pre: support #fragments and ftp://
Occasionally we'll use these for links.
Eric Wong [Thu, 13 Nov 2014 21:51:42 +0000 (21:51 +0000)]
-learn: nuke HTML portions when training as ham
Sometimes people send HTML email and I forget to fixup in my
MUA during moderation. Automatically strip out HTML portions
instead.
Eric Wong [Thu, 13 Nov 2014 21:20:29 +0000 (21:20 +0000)]
view: account for filter bugs which leak HTML into the repo
Ugh, apparently there's a (yet-to-be-fixed) bug in the Filter
code which caused an HTML message portion of a multipart message
to be displayed on the web UI. Account for that and nuke it.
Eric Wong [Sun, 2 Nov 2014 23:26:33 +0000 (23:26 +0000)]
view: add time to entries older than one day
This is occasionally useful and we're not as starved for screen
space now now that sender+timestamps are on a separate line.
Eric Wong [Sun, 2 Nov 2014 01:44:30 +0000 (01:44 +0000)]
view: rename "permalink" to "threadlink"
These may not be permanent, after all.
Better threading support can be done for message views, so
and the current index layout is still too busy and suboptimal.
Eric Wong [Sun, 26 Oct 2014 22:19:33 +0000 (22:19 +0000)]
examples/public-inbox.psgi: add usage to comments
I often forget how to run this
Eric Wong [Sun, 26 Oct 2014 09:45:52 +0000 (09:45 +0000)]
view: show raw message link as "raw"
"original" is a bit misleading, since we strip needless junk
like HTML from messages before it ever hits git.
Eric Wong [Sun, 5 Oct 2014 23:38:43 +0000 (23:38 +0000)]
TODO: add mlmmj integration item
Because some folks will want to receive email.
Eric Wong [Sun, 5 Oct 2014 22:49:06 +0000 (22:49 +0000)]
doc: add TODO to the website
Eric Wong [Sun, 5 Oct 2014 22:26:04 +0000 (22:26 +0000)]
TODO: add a placeholder for search
While we're at it, fix up a typo.
Eric Wong [Sun, 5 Oct 2014 00:23:43 +0000 (00:23 +0000)]
view: tweak attribution line
This reduces unnecessary white space and consistently places
the attribution under the Subject.
Eric Wong [Sat, 4 Oct 2014 02:51:29 +0000 (02:51 +0000)]
view: make the thread index less claustrophobic
At the cost of some vertical whitespace.
More bikeshedding...
Eric Wong [Sat, 4 Oct 2014 02:33:57 +0000 (02:33 +0000)]
public-inbox-init: fix multi-address setup
We must support multi-address mailing lists, so we do not
clobber existing addresses. However, we need to ensure
idempotency and ensure existing addresses are not reset.
Furthermore, we were not parsing the existing config correctly
due to a leaking $/.
Eric Wong [Mon, 22 Sep 2014 18:58:57 +0000 (18:58 +0000)]
view: relax line break detection
Often times any succession of "---" denotes the rest of the
message is too long to review at once.
Eric Wong [Sun, 21 Sep 2014 04:19:30 +0000 (04:19 +0000)]
public-inbox-init: manages the config files
This hopefully allows easier setup.
Eric Wong [Mon, 15 Sep 2014 20:58:41 +0000 (20:58 +0000)]
filter: ensure CRs do not show up in lynx conversions
Unix line endings are LF-only, so do not introduce or preserve
CRLF line endings when reading from lynx.
Eric Wong [Mon, 15 Sep 2014 20:47:33 +0000 (20:47 +0000)]
index: drop signatures from nested output
We have a less-ambiguous "more..." link nowadays if somebody
wants to see the full message.
Eric Wong [Mon, 15 Sep 2014 20:47:02 +0000 (20:47 +0000)]
hval: fixup bad line endings in HTML output
We should do this in filter, too, but sometimes we
prefer to avoid filtering the message at all.
Eric Wong [Mon, 15 Sep 2014 04:16:45 +0000 (04:16 +0000)]
index: add prev/next index navigation
This helps readers jump around more quickly when there are
large messages.
Eric Wong [Mon, 15 Sep 2014 03:17:29 +0000 (03:17 +0000)]
index group state parameters together for generation
This allows us to more-easily group and pass parameters.
Eric Wong [Mon, 15 Sep 2014 03:00:49 +0000 (03:00 +0000)]
view: standalone view shows link to index
Sometimes people come from sharing links and not the index,
so the back button in their browser does not really go back.
Eric Wong [Mon, 15 Sep 2014 02:49:18 +0000 (02:49 +0000)]
view: kill unnecessary assignment
Not sure what I was thinking
Eric Wong [Mon, 15 Sep 2014 02:34:33 +0000 (02:34 +0000)]
view: support SHA-1 of Message-IDs for message links
Some Message-IDs are crazy long, so support SHA-1s for them
instead. This allows shorter URLs to be generated and are
less likely
However, we'll still favor short Message-IDs whenever possible.
Eric Wong [Sun, 14 Sep 2014 23:06:11 +0000 (23:06 +0000)]
index: show short-ish permalinks to messages in threads
This should allow us to reference discussions more easily.
Eric Wong [Sat, 13 Sep 2014 21:50:31 +0000 (21:50 +0000)]
line-wrap generated HTML source around attrs for readability
It's important to keep HTML source readable to folks who prefer
to read raw HTML. This should improve readability of the HTML
source by keeping line length in check without wasting bytes.
Eric Wong [Sun, 7 Sep 2014 22:56:09 +0000 (22:56 +0000)]
feed: (cleanup) avoid redundant ->message call
Avoid redundant subroutine calls as their costs tend to add up.
Eric Wong [Sun, 7 Sep 2014 22:53:15 +0000 (22:53 +0000)]
feed: sort child messages (forward) chronologically
Only root messages should be sorted in reverse chronological order,
child messages should be chronological. This gives precedence to
_topics_, but also for initial replies.
This improves readability when several messages appear at the same
nesting level, and helps git patch series' appear correctly as:
[PATCH 0/3] ...
[PATCH 1/3] ...
[PATCH 2/3] ...
[PATCH 3/3] ...
Instead of (what it was previously):
[PATCH 0/3] ...
[PATCH 3/3] ...
[PATCH 2/3] ...
[PATCH 1/3] ...
Eric Wong [Sun, 7 Sep 2014 07:56:42 +0000 (07:56 +0000)]
view: fixup dropped newline in the last commit
Oops!
Eric Wong [Sun, 7 Sep 2014 07:53:47 +0000 (07:53 +0000)]
view: avoid extraneous space for subject-only messages
Sometimes, the subject says it all.
Eric Wong [Sat, 6 Sep 2014 23:31:40 +0000 (23:31 +0000)]
view: reduce redundant linkage in index
There's no point in having a "(more...)" and "link" pointing
to the same element, replace "link" with "more..." if we've
omitted text from the index.
Eric Wong [Wed, 3 Sep 2014 20:43:33 +0000 (20:43 +0000)]
view: increase context in index views
It's probably better to show too much than too little, even
if this means extra scrolling :<
Otherwise, we end up punishing messages who quote minimally
and end up loosing context. Unfortunately, too many people
over-quote nowadays.
Eric Wong [Sun, 31 Aug 2014 02:58:07 +0000 (02:58 +0000)]
view: kill leading whitespace in index
Leading whitespace is pointless, but some folks end up adding
it for some reason.
Eric Wong [Sun, 31 Aug 2014 02:52:27 +0000 (02:52 +0000)]
view: show quotes in index if parent is too old
It's helpful to show context if a message does not
appear on the current index page.