public-inbox.git
4 years agopublic-inbox 1.0.0 v1.0.0
Eric Wong [Thu, 8 Feb 2018 02:25:19 +0000 (02:25 +0000)]
public-inbox 1.0.0

Might as well, this release is mostly to serve as a checkpoint
for the start of new development on v2 stuff mentioned in the
TODO.

4 years agoMANIFEST: add AUTHORS file
Eric Wong [Thu, 8 Feb 2018 02:24:45 +0000 (02:24 +0000)]
MANIFEST: add AUTHORS file

4 years agoadd AUTHORS file
Eric Wong [Wed, 7 Feb 2018 21:14:03 +0000 (21:14 +0000)]
add AUTHORS file

This can be useful for tarball distributions which lack full git
history.

4 years agoupdate copyrights for 2018
Eric Wong [Wed, 7 Feb 2018 19:56:45 +0000 (19:56 +0000)]
update copyrights for 2018

Using update-copyrights from gnulib

While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.

4 years agoview: allow expanding directly to "nested" view
Eric Wong [Sat, 3 Feb 2018 02:50:46 +0000 (02:50 +0000)]
view: allow expanding directly to "nested" view

Sometimes, it can be desirable to jump directly to the "nested"
view when viewing a thread skeleton.  This makes it possible.
While we're at it, shorten some of the text to ensure it still
fits in 80 columns.

4 years agoview: close <pre> in reply instructions
Eric Wong [Tue, 30 Jan 2018 18:17:47 +0000 (18:17 +0000)]
view: close <pre> in reply instructions

We leave the mailto: link out when obfuscating address, so
do not stuff the "</pre>" closing tag into it.  Instead,
keep the closing tag in the same context as the opening one,
making it easier to keep track of.

4 years agoreply: follow obfuscation rules for HTML in sh args
Eric Wong [Mon, 29 Jan 2018 13:04:09 +0000 (13:04 +0000)]
reply: follow obfuscation rules for HTML in sh args

Namely, we do not want to obfuscate the mail address of the
site itself.

4 years agoview: adjust wording for reply-to-list configs
Eric Wong [Mon, 29 Jan 2018 11:49:56 +0000 (11:49 +0000)]
view: adjust wording for reply-to-list configs

This makes the wording less confusing when showing archives
for lists where the convention is reply-to-list.
I still hate reply-to-list, but it's still better than no
archives or list at all.

4 years agoatom: show metadata before message body
Eric Wong [Fri, 26 Jan 2018 21:54:00 +0000 (21:54 +0000)]
atom: show metadata before message body

This can allow streaming parsers (SAX) to work a little more
efficiently as they can handle/discard all the metadata before
the big content.

4 years agodoc/design_www: adjust some wording and URLs around CSS
Eric Wong [Thu, 25 Jan 2018 06:30:03 +0000 (06:30 +0000)]
doc/design_www: adjust some wording and URLs around CSS

I still hate that CSS is over-used, but colors are useful
and perhaps using them for highlighting won't be too bad;
but user-supplied colors will ALWAYS be supported.

4 years agoTODO: notes about v2 format for giant archives
Eric Wong [Tue, 16 Jan 2018 22:18:16 +0000 (22:18 +0000)]
TODO: notes about v2 format for giant archives

Inspired by interest in LKML archival:

https://public-inbox.org/meta/d5546b24-5840-4ae9-d25b-5e3e737ed73b@linuxfoundation.org

4 years agohval: only allow domain obfuscation in address
Eric Wong [Tue, 16 Jan 2018 05:08:22 +0000 (05:08 +0000)]
hval: only allow domain obfuscation in address

Obfuscating username portions of the email address leads
to having subsequent parts of the address not being obfuscated;
which could mean we show someone else's email entirely.

In other words, obfuscating "john.doe@example.com" becomes
might mean "doe@example.com" is picked up by scanners.

In other news, email address obfuscation is still a horrible
usability issue and only exists to appease misguided people.

4 years agoview: avoid deduping a single word in subject skeletons
Eric Wong [Thu, 21 Dec 2017 00:58:01 +0000 (00:58 +0000)]
view: avoid deduping a single word in subject skeletons

It is usually pointless to replace a single word with a '"' character.

4 years agosearch: force large mbox result downloads to POST
Eric Wong [Fri, 8 Dec 2017 20:54:09 +0000 (20:54 +0000)]
search: force large mbox result downloads to POST

This should prevent crawlers (including most robots.txt ignoring
ones) from burning our CPU time without severely compromising
usability for humans.

4 years agosearchview: nofollow on mbox downloads
Eric Wong [Thu, 7 Dec 2017 20:30:07 +0000 (20:30 +0000)]
searchview: nofollow on mbox downloads

Some search results are gigantic, and search engines are
unlikely to be able to handle gzipped mboxes anyways.

4 years agosearch: allow downloading search results as mbox
Eric Wong [Wed, 29 Nov 2017 09:33:17 +0000 (09:33 +0000)]
search: allow downloading search results as mbox

Allowing downloading of all search results as an gzipped mboxrd
file can be convenient for some users.

4 years agoview: avoid warning from negative repeat counts
Eric Wong [Wed, 29 Nov 2017 09:29:24 +0000 (09:29 +0000)]
view: avoid warning from negative repeat counts

Perl 5.22 started warning about this.

4 years agosearchview: s/threaded/nested/
Eric Wong [Wed, 29 Nov 2017 09:23:38 +0000 (09:23 +0000)]
searchview: s/threaded/nested/

We want to be consistent with the view change in
commit b223e6f49debb99b9132bc85d97a065ebcee00b9

4 years agowatch: use "spam" in commit message for removals
Eric Wong [Thu, 16 Nov 2017 19:23:49 +0000 (19:23 +0000)]
watch: use "spam" in commit message for removals

This makes it easy to identify the reason for message removals.

4 years agolearn: use "spam" as subject for removal commits (part #2)
Eric Wong [Thu, 16 Nov 2017 19:21:06 +0000 (19:21 +0000)]
learn: use "spam" as subject for removal commits (part #2)

We need to use the correct subject when doing global scanning,
too.  In fact, the per-recipient spam training path is entirely
redundant at this point.

4 years agolearn: use "spam" as subject for removal commits
Eric Wong [Thu, 16 Nov 2017 18:48:39 +0000 (18:48 +0000)]
learn: use "spam" as subject for removal commits

Sometimes an email is an innocent removal "rm" for a
misdirected, off-topic post, while most removed messages are
"spam".  Allow anybody to look at history and easily distinguish
the reason for removing the message.

4 years agoview: s/threaded/nested/ in view
Eric Wong [Wed, 18 Oct 2017 21:28:52 +0000 (21:28 +0000)]
view: s/threaded/nested/ in view

We always do threading, so perhaps it's not a good name.
"Nested" is probably more appropriate and closer to what
people are used to seeing.

4 years agombox: support inline filename via Content-Disposition header
Eric Wong [Wed, 4 Oct 2017 22:54:23 +0000 (22:54 +0000)]
mbox: support inline filename via Content-Disposition header

This is hopefully more sensical than "raw" files from
resulting downloads.

4 years agosearch: try to fill in ghosts when generating thread skeleton
Eric Wong [Tue, 3 Oct 2017 19:43:30 +0000 (19:43 +0000)]
search: try to fill in ghosts when generating thread skeleton

Since we attempt to fill in threads by Subject, our thread
skeletons can cross actual thread IDs, leading to the
possibility of false ghosts showing up in the skeleton.
Try to fill in the ghosts as well as possible by performing
a message lookup.

4 years agothreading: deal with improperly-terminated References headers
Eric Wong [Mon, 2 Oct 2017 22:19:16 +0000 (22:19 +0000)]
threading: deal with improperly-terminated References headers

We should not blindly join References and In-Reply-To headers
as a single string, because some messages can have an open
angle brace '<' in References: without a corresponding '>'.

5 years agowww: Atom stream respects timezone
Eric Wong [Thu, 13 Jul 2017 22:47:29 +0000 (22:47 +0000)]
www: Atom stream respects timezone

Oops, we must not discard the timezone when parsing dates
for the Atom stream.

5 years agoMANIFEST: add hosted list
Eric Wong [Thu, 13 Jul 2017 22:46:29 +0000 (22:46 +0000)]
MANIFEST: add hosted list

5 years agodoc: add ruby-dev mirror to the list of hosted mirrors
Eric Wong [Sun, 2 Jul 2017 21:43:51 +0000 (21:43 +0000)]
doc: add ruby-dev mirror to the list of hosted mirrors

It seems Xapian is not prepared for Japanese, unfortunately.

https://public-inbox.org/meta/20170702213657.GA5312@dcvr/

5 years agodoc: add a list of hosted archives for external projects
Eric Wong [Fri, 30 Jun 2017 18:47:33 +0000 (18:47 +0000)]
doc: add a list of hosted archives for external projects

This will hopefully increase visibility of some archives.

5 years agoview: cull redundant phrases in subjects
Eric Wong [Thu, 29 Jun 2017 21:48:30 +0000 (21:48 +0000)]
view: cull redundant phrases in subjects

There is no need to show the same phrases over and over again
in thread skeletons, it adds to visual noise and makes things
more difficult to read.

5 years agoscripts/import_maildir: rewrite to use Import
Eric Wong [Thu, 29 Jun 2017 07:14:28 +0000 (07:14 +0000)]
scripts/import_maildir: rewrite to use Import

This will be much faster and invoking -mda for every message.

5 years agohval: only perform one substitution when obfuscating
Eric Wong [Thu, 29 Jun 2017 00:10:38 +0000 (00:10 +0000)]
hval: only perform one substitution when obfuscating

Only one substitution character is necessary when obfuscating
email addresses.

5 years agomsgmap: reduce constant usage
Eric Wong [Mon, 26 Jun 2017 18:13:57 +0000 (18:13 +0000)]
msgmap: reduce constant usage

It is needless bloat and doesn't seem to help with readability,
in retrospect, either.

5 years agowatch: avoid potential race condition while quitting
Eric Wong [Mon, 26 Jun 2017 17:44:41 +0000 (17:44 +0000)]
watch: avoid potential race condition while quitting

We must not trigger future activity when initializing
a -watch shutdown.

5 years agotests: deal with the removal of '.' from @INC in newer Perl
Eric Wong [Mon, 26 Jun 2017 04:34:13 +0000 (04:34 +0000)]
tests: deal with the removal of '.' from @INC in newer Perl

Oops, this is needed for Perl 5.22 (tested 5.24.1) since '.'
was removed due to security problems.  Fwiw, I consider this
change to Perl an overreaction and do not agree with it.

5 years agowatch: commit changes to fast-import sooner
Eric Wong [Mon, 26 Jun 2017 02:56:03 +0000 (02:56 +0000)]
watch: commit changes to fast-import sooner

We should make changes visible sooner, even during
lengthy scans.

5 years agowatch: use "self-inotify-tempfile trick" for quit
Eric Wong [Sat, 24 Jun 2017 22:26:28 +0000 (22:26 +0000)]
watch: use "self-inotify-tempfile trick" for quit

This should be more reliable and safer as it'll ensure
existing fast-import instances are shut down properly.

5 years agowatch: improve fairness during full rescans
Eric Wong [Sat, 24 Jun 2017 07:33:44 +0000 (07:33 +0000)]
watch: improve fairness during full rescans

We need to ensure new messages are being processed
fairly during full rescans, so have the ->scan subroutine
yield and reschedule itself.  Additionally, having a
long-running task inside the signal handler is dangerous
and subject to reentrancy bugs.

Due to the limitations of the Filesys::Notify::Simple interface,
we cannot rely on multiplexing I/O interfaces (select, IO::Poll,
Danga::Socket, etc...) for this.  Forking a separate process
was considered, but it is more expensive for a mostly-idle
process.

So, we use a variant of the "self-pipe trick" via inotify (or
whatever Filesys::Notify::Simple gives us).  Instead of writing
to our own pipe, we write to a file in our own temporary
directory watched by Filesys::Notify::Simple to trigger events
in signal handlers.

5 years agospamc: retry on EINTR
Eric Wong [Sat, 24 Jun 2017 00:52:10 +0000 (00:52 +0000)]
spamc: retry on EINTR

Signals can fire on us at any time if we're using blocking sysread.

5 years agowatch: ensure HUP causes the scanner to be reloaded
Eric Wong [Sat, 24 Jun 2017 00:00:04 +0000 (00:00 +0000)]
watch: ensure HUP causes the scanner to be reloaded

Otherwise the old watcher may run indefinitely

5 years agomda: set List-ID correctly according to RFC2919
Eric Wong [Mon, 26 Jun 2017 03:05:39 +0000 (03:05 +0000)]
mda: set List-ID correctly according to RFC2919

Oops, due to an old mistake , List-ID was set incorrectly
in the MDA.  This could cause some breakage w.r.t. mail filters.

5 years agolinkify: handle URLs in parenthesized statements
Eric Wong [Fri, 23 Jun 2017 22:42:34 +0000 (22:42 +0000)]
linkify: handle URLs in parenthesized statements

Sometimes, URLs exist at the end of parethesized statements,
and we shouldn't unnecessarily capture that.

(example: https://public-inbox.org/ruby-core/20170623032722.GA8124@dcvr/)

5 years agoallow admins to configure non-obfuscated addresses/domains
Eric Wong [Fri, 23 Jun 2017 20:23:07 +0000 (20:23 +0000)]
allow admins to configure non-obfuscated addresses/domains

We will also treat all known list addresses as non-obfuscated.

By setting publicinbox.noObfuscate in ~/.public-inbox/config,
this will allow users to disable address obfuscation on a
per-domain or per-address basis.

5 years agoconfig: assume lists have multiple addresses
Eric Wong [Fri, 23 Jun 2017 19:41:51 +0000 (19:41 +0000)]
config: assume lists have multiple addresses

This should simplify the rest of our code for handling
the do-not-obfuscate list.

5 years agoview: add newline before mailto: instructions in reply
Eric Wong [Fri, 23 Jun 2017 09:51:47 +0000 (09:51 +0000)]
view: add newline before mailto: instructions in reply

This is necessary to retain consistent spacing around bullet
points.

Fixes: 666844ae42b5b17f ("reply: handle address obfuscation :<")
5 years agombox: show application/mbox for obfuscated inboxes
Eric Wong [Fri, 23 Jun 2017 03:39:08 +0000 (03:39 +0000)]
mbox: show application/mbox for obfuscated inboxes

Sigh, yet another place to handle obfuscation for misguided
people who expect it.  Maybe this will do something to prevent
spammers from getting addresses, while still allowing the
"curl $URL | git am" use case to work.

5 years agoreply: handle address obfuscation :<
Eric Wong [Fri, 23 Jun 2017 02:25:33 +0000 (02:25 +0000)]
reply: handle address obfuscation :<

We can show users a lightly-obfuscated Bourne shell command
for invoking "git send-email" for address obfuscation.  However,
I'm not sure if the mailto: arg will work effectively since
URL encoding is probably too well-known to be effective.

5 years agosearchidx: fallback to lookup on pre-set article numbers
Eric Wong [Fri, 23 Jun 2017 01:43:21 +0000 (01:43 +0000)]
searchidx: fallback to lookup on pre-set article numbers

Yet another hiccup from reusing pre-set article numbers on
various ruby-lang.org mailing lists.  This was causing messages
to not appear to NNTP readers which use XOVER.

5 years agomsgmap: ignore duplicates instead of dying
Eric Wong [Fri, 23 Jun 2017 01:19:46 +0000 (01:19 +0000)]
msgmap: ignore duplicates instead of dying

This prevents public-inbox-watch from dying when reloading
(and thus rescanning) already-imported directories.

5 years agowatchmaildir: deal with rejected (100) messages
Eric Wong [Fri, 23 Jun 2017 00:47:23 +0000 (00:47 +0000)]
watchmaildir: deal with rejected (100) messages

The RubyLang filter is strict about what messages it rejects, so
the spam learning path will not auto-train or remove messages
missing X-Mail-Count headers.

5 years agotest for PublicInbox::Filter::RubyLang
Eric Wong [Thu, 22 Jun 2017 21:25:39 +0000 (21:25 +0000)]
test for PublicInbox::Filter::RubyLang

This will make it easier to prevent breakage in the future.

5 years agofilter/rubylang: reuse altid entry from inbox object
Eric Wong [Thu, 22 Jun 2017 21:04:53 +0000 (21:04 +0000)]
filter/rubylang: reuse altid entry from inbox object

This allows users to DRY up their config a bit and avoid
specifying altid twice when reusing the NNTP-centric msgmap
for [ruby-*:\d+] serial numbers.

My current work-in-progress ~/.public-inbox/config entry
for the ruby-core list is:

------8<-------
[publicinbox "ruby-core"]
address = ruby-core@ruby-lang.org
url = //public-inbox.org/ruby-core
mainrepo = /path/to/ruby-core.git
newsgroup = inbox.comp.lang.ruby.core
watchheader = List-Id:<ruby-core.ruby-lang.org>
altid = serial:ruby-core:file=msgmap.sqlite3
watch = maildir:/path/to/Maildir/.INBOX.ruby
filter = PublicInbox::Filter::RubyLang

5 years agomsgmap: mid_insert ignores duplicates instead of die-ing
Eric Wong [Thu, 22 Jun 2017 19:51:23 +0000 (19:51 +0000)]
msgmap: mid_insert ignores duplicates instead of die-ing

This will allow smoother imports as occasional Message-ID
duplicates happen and the best we can do is ignore the
second one.

5 years agoadd filter for RubyLang lists
Eric Wong [Wed, 21 Jun 2017 23:33:49 +0000 (23:33 +0000)]
add filter for RubyLang lists

Unfortunately, it appears we have to reject this and instead add
support filtering at View time(*), due to DKIM signatures in
messages from ruby-lang.org.

(*) which may not be worth it

5 years agoimport: fix encoding issues from weird "raw" emails
Eric Wong [Tue, 20 Jun 2017 22:06:54 +0000 (22:06 +0000)]
import: fix encoding issues from weird "raw" emails

This seems to allow weirdly-encoded "raw" emails in
  blade.nagaokaut.ac.jp/ruby/ruby-core/*
to be handled without difficulties.

5 years agoview: implement optional address obfuscation
Eric Wong [Fri, 16 Jun 2017 02:03:32 +0000 (02:03 +0000)]
view: implement optional address obfuscation

This is lightly-tested and seems to work.  I'm still
hesitant to support this, but the alternative of receiving death
threats for displaying unobfuscated addresses seems to
be not worth it.

5 years agoreply: support Reply-To
Eric Wong [Wed, 14 Jun 2017 00:10:53 +0000 (00:10 +0000)]
reply: support Reply-To

Reply-To is common and probably should've been supported,
since day one, but we won't omit other addresses, either.

5 years agoreplyto parameter support
Eric Wong [Wed, 14 Jun 2017 00:10:52 +0000 (00:10 +0000)]
replyto parameter support

This allows us to support centralized mailing lists (which suck,
but better than no mailing list at all).

5 years agoview: split out reply logic into its own module
Eric Wong [Wed, 14 Jun 2017 00:10:51 +0000 (00:10 +0000)]
view: split out reply logic into its own module

We'll be adding more reply options for centralized mailing
lists.  So split out the logic so it's easy-to-find.
Organizing code is hard :<

5 years agosearchidx: remove messages correctly from Xapian index
Eric Wong [Thu, 15 Jun 2017 23:07:58 +0000 (23:07 +0000)]
searchidx: remove messages correctly from Xapian index

This fixes a bug introduced in
commit 7eeadcb62729b0efbcb53cd9b7b181897c92cf9a
("search: remove unnecessary abstractions and functionality")

5 years agosearch: allow searching within mail diffs
Eric Wong [Wed, 14 Jun 2017 00:14:48 +0000 (00:14 +0000)]
search: allow searching within mail diffs

This can be tied into a repository browser to browse
in-flight topics on a mailing list.

5 years agosearchidx: switch to accounting by message bytes
Eric Wong [Wed, 14 Jun 2017 00:14:47 +0000 (00:14 +0000)]
searchidx: switch to accounting by message bytes

Xapian memory usage is tied to the size of the indexed
text, so take the raw message size into account when
deciding when to flush Xapian data.

More importantly, we now flush Xapian before we have it
buffer beyond our maximum; and we do it unconditionally
to prevent even high priority processes from OOM-ing.

5 years agosearch: remove unnecessary abstractions and functionality
Eric Wong [Wed, 14 Jun 2017 00:14:46 +0000 (00:14 +0000)]
search: remove unnecessary abstractions and functionality

This simplifies the code a bit and reduces the translation
overhead for looking directly at data from tools shipped
with Xapian.

While we're at it, fix thread-all.t :)

5 years agofilter/subjecttag: account for missing Subject: header
Eric Wong [Fri, 12 May 2017 18:49:32 +0000 (18:49 +0000)]
filter/subjecttag: account for missing Subject: header

This is a high indicator of spam (but out-of-scope for this
particular module) but sometimes it is not, and people
legitimately forget to set a Subject: header at all.

5 years agoimport: reset :raw mode for commit title (subject)
Eric Wong [Thu, 25 May 2017 02:24:16 +0000 (02:24 +0000)]
import: reset :raw mode for commit title (subject)

This was necessary for the presence of the 0xa0 byte(*)
in the Subject: of the message at:
http://blade.nagaokaut.ac.jp/ruby/ruby-core/3220

(*) That is 0xa0, not 0x0a ("\n"), so I wonder if the
    nibbles got swapped somehow.

5 years agosearchview: retry queries if uri_unescape-able
Eric Wong [Tue, 23 May 2017 23:07:24 +0000 (23:07 +0000)]
searchview: retry queries if uri_unescape-able

It is possible to have double-escaped queries when copy and
pasting into browsers, so try to help users work around this
common error by automatically retrying after unescaping once.

Of course, we must inform the user when doing this results in
success, in case they really meant to search for a
double-escaped term which resulted in nothing.

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
  https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/

5 years agowww: do not mangle characters from search queries
Eric Wong [Tue, 23 May 2017 21:53:57 +0000 (21:53 +0000)]
www: do not mangle characters from search queries

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
  https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/

5 years agowww: avoid undefined warnings for query string parsing
Eric Wong [Tue, 9 May 2017 20:43:33 +0000 (20:43 +0000)]
www: avoid undefined warnings for query string parsing

Sometimes bots generate malformed queries with sequential
"&" and ";" characters.

5 years agowatchmaildir: show $@ in warning message
Eric Wong [Tue, 9 May 2017 06:30:42 +0000 (06:30 +0000)]
watchmaildir: show $@ in warning message

It should be helpful to know what error happened.

5 years agosearchidx: use cached local $@ copy
Eric Wong [Tue, 9 May 2017 06:30:41 +0000 (06:30 +0000)]
searchidx: use cached local $@ copy

umask should never fail and set $@, but use the cached local
to be more explicit just in case.

5 years agospamassassin: update example ~/.spamassassin/user_prefs file
Eric Wong [Sun, 7 May 2017 00:46:46 +0000 (00:46 +0000)]
spamassassin: update example ~/.spamassassin/user_prefs file

This is closer to what I run on the public-inbox.org servers.

5 years agosearchidx: fix ghost root vivification
Eric Wong [Sun, 7 May 2017 10:49:00 +0000 (10:49 +0000)]
searchidx: fix ghost root vivification

Due to the asynchronous nature of SMTP, it is possible for the
root message of a thread (with no References/In-Reply-To)
to arrive last in a series.  We must preserve the thread_id
of the ghost message in this case, as we do when vivifiying
non-root ghosts.

Otherwise, this causes threads to be broken when the root
arrives last.

5 years agosearch: fix help message for searching within quotes
Eric Wong [Tue, 11 Apr 2017 23:39:54 +0000 (23:39 +0000)]
search: fix help message for searching within quotes

I'm not sure if people use either and it's not in mairix
(where we base our abbreviations off of).  Lets go
with the shorter prefix since it's easier-to-type.

5 years agolearn: scan all inboxes when learning spam
Eric Wong [Wed, 5 Apr 2017 01:41:28 +0000 (01:41 +0000)]
learn: scan all inboxes when learning spam

This matches the behavior of the -watch daemon since
6d534038285ddd760709ba76ea007f9108200097
("watch: watchspam affects all configured inboxes")

5 years agowatchmaildir: do not reject lowercase flags on Maildir files
Eric Wong [Tue, 4 Apr 2017 18:25:47 +0000 (18:25 +0000)]
watchmaildir: do not reject lowercase flags on Maildir files

Dovecot uses 'a'..'z' (lowercase) to designate keywords
in Maildir flags.  This was preventing certain messages
from being marked as spam.

https://wiki2.dovecot.org/MailboxFormat/Maildir

5 years agosearchview: show full (&x=t) messages in ascending chronlogical order
Eric Wong [Fri, 24 Mar 2017 01:41:11 +0000 (01:41 +0000)]
searchview: show full (&x=t) messages in ascending chronlogical order

When displaying search results with full messages, it makes
more sense to show them in ascending chronological order when
going by date.  Reverse chronological order makes more sense
for search results which only show the subject.

5 years agosearchview: add "t" id to link to thread overview
Eric Wong [Fri, 24 Mar 2017 00:15:08 +0000 (00:15 +0000)]
searchview: add "t" id to link to thread overview

At least for the thread view (&x=t); this will make it
easy to link to the overview.

5 years agoextmsg: use updated mail-archive.com URL
Eric Wong [Wed, 22 Mar 2017 02:14:19 +0000 (02:14 +0000)]
extmsg: use updated mail-archive.com URL

Apparently mid.mail-archive.com does not support HTTPS,
and the HTTP version redirects to the search query, anyways.

5 years agoview: escape HTML description name
Eric Wong [Tue, 14 Mar 2017 21:23:39 +0000 (21:23 +0000)]
view: escape HTML description name

Otherwise funky filenames can cause HTML injection
vulnerabilities (hope you have JavaScript disabled!)

5 years agowww: do not unescape PATH_INFO twice
Eric Wong [Tue, 14 Feb 2017 22:45:15 +0000 (22:45 +0000)]
www: do not unescape PATH_INFO twice

PSGI specs already require PATH_INFO to be unescaped;
so our tests were wrong, too.

5 years agot/mime: quiet warnings for old versions of Email::Simple
Eric Wong [Sun, 12 Feb 2017 02:41:22 +0000 (02:41 +0000)]
t/mime: quiet warnings for old versions of Email::Simple

This is fixed in the newest versions of Email::Simple,
but not the version in Debian jessie (2.203)

5 years agohandle repeated References and In-Reply-To headers
Eric Wong [Sat, 11 Feb 2017 23:54:48 +0000 (23:54 +0000)]
handle repeated References and In-Reply-To headers

It seems possible for git-send-email(1) to generate repeated
repeated instances of References and In-Reply-To headers,
as evidenced in:

https://public-inbox.org/git/20161111124541.8216-17-vascomalmeida@sapo.pt/raw

This causes a mismatch between how our search indexer threads
and how our HTML view handles threading.  In the future, View.pm
will use the smsg-parsed {references} field and avoid redoing
Email::MIME header parsing.

We will still need to figure out a way to deal with messages
with repeated Message-IDs, at some point, too.

5 years agoconfig: do not slurp lines into memory
Eric Wong [Wed, 8 Feb 2017 21:41:38 +0000 (21:41 +0000)]
config: do not slurp lines into memory

There's no need to hold everything in memory, here,
since apparently "foreach" will read everything at
once in array context

(for some reason, I thought Perl5 was smart enough
 to avoid creating a temporary array, here...)

5 years agoTODO: several updates
Eric Wong [Tue, 7 Feb 2017 22:27:52 +0000 (22:27 +0000)]
TODO: several updates

Always plenty to do while working on this...

5 years agosearch: schema version bump for empty References/In-Reply-To
Eric Wong [Mon, 6 Feb 2017 21:39:45 +0000 (21:39 +0000)]
search: schema version bump for empty References/In-Reply-To

We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.

5 years agoRevert "searchidx: reindex clobbers old thread IDs"
Eric Wong [Mon, 6 Feb 2017 21:37:26 +0000 (21:37 +0000)]
Revert "searchidx: reindex clobbers old thread IDs"

Oops, that's broken, too.  I guess the only way to reindex
after fixing the thread detection is to start from scratch.

This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.

5 years agosearchidx: reindex clobbers old thread IDs
Eric Wong [Mon, 6 Feb 2017 21:08:13 +0000 (21:08 +0000)]
searchidx: reindex clobbers old thread IDs

We cannot always reuse thread IDs since our threading
logic may change as bugs are fixed.

5 years agosearchidx: deal with empty In-Reply-To and References headers
Eric Wong [Mon, 6 Feb 2017 19:54:25 +0000 (19:54 +0000)]
searchidx: deal with empty In-Reply-To and References headers

In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).

See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.

Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
  <https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>

5 years agosearchview: increase limit for displaying search results
Eric Wong [Mon, 6 Feb 2017 02:38:37 +0000 (02:38 +0000)]
searchview: increase limit for displaying search results

We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000!  Increase this to
200 for now.

5 years agosearchview: clarify numeric summary at bottom
Eric Wong [Mon, 6 Feb 2017 02:07:24 +0000 (02:07 +0000)]
searchview: clarify numeric summary at bottom

Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.

5 years agoadd filter for Subject: tags
Eric Wong [Thu, 26 Jan 2017 02:09:36 +0000 (02:09 +0000)]
add filter for Subject: tags

Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side.  They also waste precious screen space and
attention span.

Remove them from our archives to reduce clutter.

5 years agowatchmaildir: allow arguments for filters
Eric Wong [Wed, 25 Jan 2017 21:39:06 +0000 (21:39 +0000)]
watchmaildir: allow arguments for filters

We'll want to allow some degree of configuration for
various mailing lists.

5 years agowatchmaildir: limit live importer processes
Eric Wong [Wed, 18 Jan 2017 19:13:09 +0000 (19:13 +0000)]
watchmaildir: limit live importer processes

We don't want to be triggering OOM or swapping on weaker
systems when we have dozens of inboxes as potential targets.

5 years agolearn: implement "rm" only functionality
Eric Wong [Thu, 19 Jan 2017 00:31:30 +0000 (00:31 +0000)]
learn: implement "rm" only functionality

Do not consider this interface stable, but I just needed a
way to remove mis-imported multipart messages so
public-inbox-watch could pick them up again from my Maildir.

5 years agomime: avoid SUPER usage in Email::MIME subclass
Eric Wong [Wed, 18 Jan 2017 23:50:57 +0000 (23:50 +0000)]
mime: avoid SUPER usage in Email::MIME subclass

We must call Email::Simple methods directly in our monkey patch
for Email::MIME to call the intended method.  Using SUPER in our
subclass would instead hit a different, unintended method in
Email::MIME.

Reported-by: Junio C Hamano <gitster@pobox.com>
<xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>

5 years agoinbox: reinstate periodic cleanup of Xapian and SQLite objects
Eric Wong [Wed, 11 Jan 2017 10:13:00 +0000 (10:13 +0000)]
inbox: reinstate periodic cleanup of Xapian and SQLite objects

We may need to do this even more aggressively, since the
Xapian database does not always give the latest results.
This time, we'll do it without relying on weak references,
and instead check refcounts.

5 years agointroduce PublicInbox::MIME wrapper class
Eric Wong [Tue, 10 Jan 2017 21:40:37 +0000 (21:40 +0000)]
introduce PublicInbox::MIME wrapper class

This should fix problems with multipart messages where
text/plain parts lack a header.

cf. git clone --mirror https://github.com/rjbs/Email-MIME.git
    refs/pull/28/head

In the future, we may still introduce as streaming
interface to reduce memory usage on large emails.

5 years agoinbox: properly register cleanup timer for git processes
Eric Wong [Sat, 7 Jan 2017 02:10:23 +0000 (02:10 +0000)]
inbox: properly register cleanup timer for git processes

We still need to cleanup git processes occasionally, since
"git cat-file --batch" does not release old packs (and
git processes are fairly expensive).

For SQLite and Xapian file handles, they should be capable
of managing themselves without too much trouble, so lets
try keeping them for the lifetime of a process.

5 years agosearch: remove subject_summary
Eric Wong [Sat, 7 Jan 2017 01:44:52 +0000 (01:44 +0000)]
search: remove subject_summary

Apparently it never actually got used, and the world seems
fine without it, so we can drop it.

While we're at it, consider removing our subject_path
usage from existence, too.  We are not using fancy subject-line
based URLs, here.

5 years agosearchmsg: favor direct hash access over accessor methods
Eric Wong [Sat, 7 Jan 2017 01:44:51 +0000 (01:44 +0000)]
searchmsg: favor direct hash access over accessor methods

This is faster, smaller, and more straighforward to me with
fewer layers of indirection.