]>
Sergey Matveev's repositories - public-inbox.git/log
Eric Wong [Mon, 28 Apr 2014 10:25:58 +0000 (10:25 +0000)]
mda: support aliased addresses
This mimics functionality found in -learn. Originally the design
allowed for only one address per-list, but when migrating/hijacking
existing mailing lists, having multiple addresses map to the same
inbox is useful.
Eric Wong [Mon, 28 Apr 2014 07:39:41 +0000 (07:39 +0000)]
feed: disable navigation to previous page
This is unfortunately needed for scalability to long histories.
The design of git requires it to traverse full history to walk
forward in time, since commits can only record past history.
Instead, replace "prev" with a "head" link to zip us back to
the most recent page. Users who wish to go backwards can use
browser history, which should always work on our old-fashioned
HTML pages.
Eric Wong [Mon, 28 Apr 2014 03:27:21 +0000 (03:27 +0000)]
feed: pedantically quote HTML attributes
This is more correct, although it costs us 4 bytes.
Eric Wong [Mon, 28 Apr 2014 04:56:47 +0000 (04:56 +0000)]
allow running as a Plack app without CGI emulation
This might be slightly cleaner, though generating the base URL
now has an ugly condition in it.
Eric Wong [Mon, 28 Apr 2014 04:50:17 +0000 (04:50 +0000)]
uri_escape => uri_escape_utf8
We should be able to deal with URIs with non-ASCII characters in
them. I only found this problem when looking at archives with
non-English spam :x
Eric Wong [Mon, 28 Apr 2014 02:15:04 +0000 (02:15 +0000)]
cgi: preliminary Plack compatibility
This needs further testing and refactoring, but seems to work
reasonably well.
Eric Wong [Sun, 27 Apr 2014 06:21:57 +0000 (06:21 +0000)]
feeds use XHTML to avoid tag soup
This should work in most browsers, lets find out!
Eric Wong [Sun, 27 Apr 2014 05:04:02 +0000 (05:04 +0000)]
Feed: <id> element must be a URI, oops :x
For each feed element, we'll just use the link since there's
currently no suitable URN.
Eric Wong [Sun, 27 Apr 2014 01:56:39 +0000 (01:56 +0000)]
view: uri_escape subject
This hopefully improves compatibility with mailers
Eric Wong [Sat, 26 Apr 2014 22:22:35 +0000 (22:22 +0000)]
feed: comment about the %deleted hash
It's strictly not necessary anymore since
commit
fa6168c56bdd1cece178b6b852a9b2cba6ce6ffb
("feed: message must exist in current HEAD to show up")
However it can still save us some unnecessary syscalls and
round-trips to the "git cat-file --batch" process, so it's probably
worth th cost of stuffing it in a hash.
Eric Wong [Sat, 26 Apr 2014 21:32:52 +0000 (21:32 +0000)]
cgi: style: return undef => return
Eric Wong [Sat, 26 Apr 2014 09:07:42 +0000 (09:07 +0000)]
feed: message must exist in current HEAD to show up
We do not want recently-deleted spam showing up when looking
in old histories.
Eric Wong [Sat, 26 Apr 2014 07:42:25 +0000 (07:42 +0000)]
spamassassin rule and config updates
While we're at it, add a script for easy editing of user prefs.
We need some human-maintained rules based on the spam we get.
It's an imperfect world, but I'd _much_ rather deal with the
occassional spam than require signup/registration to post.
Eric Wong [Sat, 26 Apr 2014 02:52:50 +0000 (02:52 +0000)]
view: show References: header, too
Some mail user agents use this header, and Mail::Thread uses
it, too, so show it if possible, but only if it's not redundant
to In-Reply-To.
Eric Wong [Sat, 26 Apr 2014 02:29:22 +0000 (02:29 +0000)]
view: add per-message HTML footer to encourage replies
This may not work with most mail user agents, however.
Eric Wong [Sat, 26 Apr 2014 01:01:10 +0000 (01:01 +0000)]
huge refactor of encoding handling
Hopefully this simplifies and corrects our usage of Perl encoding
APIs.
Eric Wong [Fri, 25 Apr 2014 07:49:14 +0000 (07:49 +0000)]
cgi: eliminate dead/redundant HTML escaping code
Eric Wong [Thu, 24 Apr 2014 00:21:21 +0000 (00:21 +0000)]
html: refactor header value handling to be OO
This helps us keep track of escaping which needs to be done
for various levels.
Eric Wong [Wed, 23 Apr 2014 10:50:08 +0000 (10:50 +0000)]
t/feed: cleanup test and add test for /f/ link
We already depend on IPC::Run, so just use it our tests.
Eric Wong [Wed, 23 Apr 2014 01:05:56 +0000 (01:05 +0000)]
feed: add tests for <id> element setting
We need to ensure this works
This follows commit
bd8fd095067b79a0d2a40bbca2b27b923d02b3f8
("feed: fix address when multiple addresses exist")
Eric Wong [Mon, 21 Apr 2014 20:39:54 +0000 (20:39 +0000)]
t/feed: notify user of missing XML::Feed
One of my dev machines did not have XML::Feed so things were
not tested sufficiently.
Eric Wong [Tue, 22 Apr 2014 09:48:53 +0000 (09:48 +0000)]
view: fix link to raw message from /f/ endpoint
Ugh, at least this has a test...
Eric Wong [Tue, 22 Apr 2014 09:24:45 +0000 (09:24 +0000)]
fix quoted URL generation in feeds
While we're at it, make sure strange characters are escaped properly
in Message-IDs. We'll need tests for all this behavior.
Eric Wong [Tue, 22 Apr 2014 09:22:30 +0000 (09:22 +0000)]
view: do not decode Message-ID
This is ridiculous, nobody (including ssoma) ever does this.
Eric Wong [Mon, 21 Apr 2014 19:29:32 +0000 (19:29 +0000)]
feed: fix address when multiple addresses exist
This needs to be cleaned up
Eric Wong [Mon, 21 Apr 2014 18:33:22 +0000 (18:33 +0000)]
README: add links to try and HTML archives
Eric Wong [Mon, 21 Apr 2014 10:43:59 +0000 (10:43 +0000)]
config: use description file for gitweb
Do not repeat ourselves, just use the same description file
gitweb uses to avoid surprising users.
Eric Wong [Mon, 21 Apr 2014 10:10:14 +0000 (10:10 +0000)]
html/index: fix broken prev/next pagination on short histories
We do not have much history in public-inbox meta, so do
not mislead users with strange navigation elements which
lead nowhere.
Eric Wong [Mon, 21 Apr 2014 10:00:21 +0000 (10:00 +0000)]
feed: there is only one atom feed, with all messages
This is not a blog. All posts, whether replies or not,
carry equal weight.
Eric Wong [Mon, 21 Apr 2014 09:53:53 +0000 (09:53 +0000)]
html/index: do not waste space with non-existent thread roots
Screen real-estate is valuable, and missing roots tend to
be false-positive matches (using Subject, not In-Reply-To
or References).
Eric Wong [Mon, 21 Apr 2014 09:19:13 +0000 (09:19 +0000)]
doc: update design_www and add HACKING file
Document some of the stranger choices I've made.
Eric Wong [Mon, 21 Apr 2014 08:07:53 +0000 (08:07 +0000)]
new scripts for importing slrn spools and maildirs
The old import_gmane_spool script was inflexible,
since we may import from maildir archives as well, so
get everything into maildir, first.
Eric Wong [Mon, 21 Apr 2014 01:45:24 +0000 (01:45 +0000)]
README: fix URL for source code clone
This is an 80x24.org project (more on that at another date).
Eric Wong [Mon, 21 Apr 2014 00:18:33 +0000 (00:18 +0000)]
scripts/dc-dlvr: allow exiting from ~/.dc-dlvr.pre
The ~/.dc-dlvr.pre script for my public-inbox user does this.
Eric Wong [Sun, 20 Apr 2014 23:27:46 +0000 (23:27 +0000)]
use ORIGINAL_RECIPIENT once again
It should be common for a single users to be subscribed to multiple
addresses/lists, so we must use the address before alias expansion.
This partially reverts commit
b949afc9edf89dd494cac6255c78b124d58e11a5
Eric Wong [Sun, 20 Apr 2014 20:40:55 +0000 (20:40 +0000)]
scripts/import_gmane_spool: set git committer date
We normally want committer date to be different so we may
track delivery latencies (which do not differ much).
However, the rules for importing are much different and
tend to screw things up when using time ranges with git-rev-list.
Eric Wong [Sun, 20 Apr 2014 20:35:51 +0000 (20:35 +0000)]
various documentation cleanups
Eric Wong [Sun, 20 Apr 2014 19:31:23 +0000 (19:31 +0000)]
feed: close string ref before returning
Just in case there is an error, this should be more explicit.
Eric Wong [Sun, 20 Apr 2014 11:17:16 +0000 (11:17 +0000)]
cgi: delay some requires
This shaves off nearly 100ms when my Core2Duo is clocked to 800Mhz
when rendering a full HTML index.
Eric Wong [Sun, 20 Apr 2014 11:00:06 +0000 (11:00 +0000)]
feed: speed up blob checks if Git.pm is usable
Git::cat_blob is a handy interface to read multiple emails
without incurring fork + exec overhead. Git.pm is GPLv2+,
not GPLv2-only, so we may link to it.
Eric Wong [Sat, 19 Apr 2014 23:23:10 +0000 (23:23 +0000)]
mda: share commit setup code with -learn
We need -learn to do many of the same things as -mda
when we have a false-positive. We also need -learn to
do HTML filtering in case the training user screws up.
Eric Wong [Sat, 19 Apr 2014 23:11:00 +0000 (23:11 +0000)]
move precheck to MDA namespace
We will be combining common code between -learn and -mda
Eric Wong [Sat, 19 Apr 2014 19:19:06 +0000 (19:19 +0000)]
various documentation updates
We have an HTML homepage, OMG!
Eric Wong [Fri, 18 Apr 2014 22:18:17 +0000 (22:18 +0000)]
mda: rename PI_FAILBOX to PI_EMERGENCY
The emergency destination may be Maildir. A Maildir emergency
destination is better for volatile data which is written to
and deleted-from frequently.
Eric Wong [Sat, 19 Apr 2014 10:37:28 +0000 (10:37 +0000)]
cgi: index pages allow iterating some pagination
This allows WWW readers to slowly page through the entire history
of the mailing list.
Eric Wong [Fri, 18 Apr 2014 22:43:59 +0000 (22:43 +0000)]
view: fix regression in standalone /^>$/ lines
The lack of trailing whitespace in quote prefixes threw us
off and cause t/view to fail.
This failure was caused by
commit
fefea3d7d2484ffbf433aec0dd80026aa7120e07
("ensure per-message short quotes do not get too long")
and not caught before pushing because I failed to run
"make", only "prove" (and not even "prove -l" :x).
Eric Wong [Thu, 17 Apr 2014 22:41:57 +0000 (22:41 +0000)]
ensure per-message short quotes do not get too long
Sometimes a single long word or URL can lengthen the line
too much.
Eric Wong [Thu, 17 Apr 2014 22:23:01 +0000 (22:23 +0000)]
cgi: sort HTML index by most recent date
This is hopefully the most user-friendly method.
Eric Wong [Thu, 17 Apr 2014 22:08:31 +0000 (22:08 +0000)]
view: remove pointless self-linkage
It is pointless to link to ourselves.
Eric Wong [Thu, 17 Apr 2014 22:05:04 +0000 (22:05 +0000)]
view: fix title of HTML views
We need to take care to escape everything properly to avoid
HTML/JS injections.
Eric Wong [Thu, 17 Apr 2014 21:56:12 +0000 (21:56 +0000)]
add example for CGI with Ruby WEBrick
Some people like old-fashioned Ruby and WEBrick is in the Ruby
standard library, so widely available.
Eric Wong [Thu, 17 Apr 2014 21:49:52 +0000 (21:49 +0000)]
view: inline shorter quotes
Breaking up short quote messages at 1 line is too disconcerting,
try 5 lines as proper quotes shouldn't be too long.
Eric Wong [Thu, 17 Apr 2014 21:31:06 +0000 (21:31 +0000)]
cgi: implement suffix-less Message-ID redirects
This may be easier in some cases for copy+paste, but not 100%
reliable in case the .txt and .html suffixes are in the Message-ID
itself.
Eric Wong [Thu, 17 Apr 2014 21:30:21 +0000 (21:30 +0000)]
cgi: include HTTP status in error message body
This makes it slightly easier for out-of-the-box curl users since
curl does not report or show the error by default.
Eric Wong [Thu, 17 Apr 2014 20:10:38 +0000 (20:10 +0000)]
HTML: various encoding fixups
Eric Wong [Thu, 17 Apr 2014 03:19:49 +0000 (03:19 +0000)]
Feed: add bug note on memory cycle
This affects users of long-lived processes (FastCGI/Plack)
Eric Wong [Tue, 15 Apr 2014 07:10:17 +0000 (07:10 +0000)]
Revert "cgi: relax path restriction for top-level"
CGI mounts should probably handle this internally. We're reverting
this since it adds too much potential for abuse with fake/extra
prefixes in the URL. We also need to reorder our redirect handling
as a result.
This reverts commit
c394de9f2c91c2c5ed1f7832a5a7cc0206120b7f .
Eric Wong [Tue, 15 Apr 2014 07:06:45 +0000 (07:06 +0000)]
cgi: correct links to folded quotes
Lightly tested, but this seems to work OK.
Eric Wong [Tue, 15 Apr 2014 06:52:07 +0000 (06:52 +0000)]
view: finish HTML in per-message pages
Eric Wong [Tue, 15 Apr 2014 06:51:41 +0000 (06:51 +0000)]
cgi: support /all.html page with inline threads
Maybe this increases readability for now.
Eric Wong [Tue, 15 Apr 2014 06:35:52 +0000 (06:35 +0000)]
HTML: use shorter URLs in indices
Long URLs are not needed for HTML pages, but may be for feeds since
they're often resyndicated and not consumed by the browser.
Eric Wong [Tue, 15 Apr 2014 06:18:43 +0000 (06:18 +0000)]
HTML: ensure hrefs are quoted properly
We may be breaking some parsers or allowing more breakage
to slip through without quotes. We waste some bytes, though.
Eric Wong [Tue, 15 Apr 2014 06:00:16 +0000 (06:00 +0000)]
mda: encoding-aware From: for GIT_ authorship
Users with non-US-ASCII compatible names were not showing
up properly in "git log" output.
Eric Wong [Tue, 15 Apr 2014 05:51:34 +0000 (05:51 +0000)]
scripts/import_gmane_spool: preserve delivery order
Unfortunately, this means we get rid of parallelization,
as we need to preserve delivery order so HTML indices look
chronological. Order may also affect spam filtering and
training, too.
Eric Wong [Tue, 15 Apr 2014 02:38:07 +0000 (02:38 +0000)]
doc: add notes on scalability
Fortunately, most mailing lists will never grow too large.
Eric Wong [Mon, 14 Apr 2014 08:20:42 +0000 (08:20 +0000)]
doc: fold philosophy into the README
Hopefully this makes the scope and intent of the project clearer.
Eric Wong [Sat, 12 Apr 2014 23:14:54 +0000 (23:14 +0000)]
rename list from "bugs" to "meta"
"bugs" might confuse and limit the discussion, so "meta" it is!
Eric Wong [Sat, 12 Apr 2014 10:01:01 +0000 (10:01 +0000)]
mda: add most RFC 2919 and 2369 headers
These probably make sense even though we do not handle
delivery ourselves. It can aid in searching/filtering/tagging
of messages.
Eric Wong [Sat, 12 Apr 2014 09:59:41 +0000 (09:59 +0000)]
derive -primary_address in config
This may be useful for generating List-Id headers and HTML pages.
Eric Wong [Mon, 14 Apr 2014 07:58:50 +0000 (07:58 +0000)]
cgi: fix up top-level index
We do not have all messages in the top-level index
(and we need to adjust the test while we're at it).
Eric Wong [Mon, 14 Apr 2014 07:53:53 +0000 (07:53 +0000)]
cgi: 301 for list-indices without trailing slash
It is common to type upper-level URLs without the slash,
redirect users to the correct page for usability.
Eric Wong [Mon, 14 Apr 2014 06:59:04 +0000 (06:59 +0000)]
t/cgi: cleanup: no need for additional block
Not sure what I was thinking...
Eric Wong [Sat, 12 Apr 2014 01:06:58 +0000 (01:06 +0000)]
cgi: ensure we unescape MIDs correctly in URLs
MIDs may have strange characters in them, so we need to handle
escaping/unescaping properly to avoid broken links or worse.
Eric Wong [Sat, 12 Apr 2014 00:52:40 +0000 (00:52 +0000)]
cgi: avoid parsing ENV directly for PATH_INFO
This might make it easier to go to non-CGI things.
Eric Wong [Sat, 12 Apr 2014 00:49:34 +0000 (00:49 +0000)]
cgi: relax path restriction for top-level
We may have something like /foo.cgi/m/$MID.html in there.
Eric Wong [Sat, 12 Apr 2014 00:09:30 +0000 (00:09 +0000)]
cgi: rename to have .cgi suffix
This makes it easier to configure for systems which
determine a script is a CGI script based on suffix.
Eric Wong [Fri, 11 Apr 2014 23:21:51 +0000 (23:21 +0000)]
scripts/import_gmane_spool: misc updates
We may promote this to be a real script, since public-inbox-mda
is idempotent.
Eric Wong [Fri, 11 Apr 2014 22:23:27 +0000 (22:23 +0000)]
view: trim_message_id consistently
Message-IDs could have embedded '<' and '>'
Eric Wong [Fri, 11 Apr 2014 21:08:50 +0000 (21:08 +0000)]
config: support multiple addresses for a inbox
This makes it possible to gradually migrate to new address in case
of list name changes, and is one step closer to operating in
"stealth hijack mode" :)
Eric Wong [Fri, 11 Apr 2014 20:46:12 +0000 (20:46 +0000)]
add spam/ham learning wrapper script
This is essential for integrating into my inotify-based
spam training setup.
Eric Wong [Fri, 11 Apr 2014 20:32:57 +0000 (20:32 +0000)]
Documentation/design_notes: more updates
Laziness \o/
(Then impatience and hubris :)
Eric Wong [Fri, 11 Apr 2014 19:51:32 +0000 (19:51 +0000)]
filter: clarify regular expression
I often forget the subtleties of Perl regexps and newlines,
so I suspect others do, too. Use explicit capture so it's
more familiar to users of non-Perl regexps.
Eric Wong [Thu, 10 Apr 2014 23:33:48 +0000 (23:33 +0000)]
cgi: wire up HTML pages for messages
These need better tests and verification, but it's something
for now.
Eric Wong [Thu, 10 Apr 2014 20:12:03 +0000 (20:12 +0000)]
cgi: update feed/view and tests for shorter URLs
Code should be consistent with the design docs
(and we will need better tests).
Eric Wong [Thu, 10 Apr 2014 20:06:04 +0000 (20:06 +0000)]
cgi: add one-line descriptions for subroutines
Hopefully it's slightly easier-to-follow this way.
Eric Wong [Thu, 10 Apr 2014 20:02:34 +0000 (20:02 +0000)]
cgi: /$LISTNAME/ and /$LISTNAME/index.html are equal
This prevents ambiguity when switching URLs between static
file servers and CGI.
The /$LISTNAME/index.html URL appearing in the wild is inevitable
because of our static file server support. Worst yet, there's
no easy/consistent way to get all installations detect and 301
them to the shorter /$LISTNAME/. So we make the CGI support
/$LISTNAME/index.html.
The downside of this is the potential duplicate entry in all caches.
Eric Wong [Thu, 10 Apr 2014 19:26:40 +0000 (19:26 +0000)]
cgi: be strict about UTF-8 encoding in HTML and XML
Hopefully this forces us to generate valid UTF-8 data.
Eric Wong [Fri, 11 Apr 2014 22:17:07 +0000 (22:17 +0000)]
filter: use IPC::Run and improve lynx error handling
We may occasionally encounter horrid HTML which lynx cannot
handle, so improve error reporting.
Eric Wong [Fri, 11 Apr 2014 19:06:07 +0000 (19:06 +0000)]
INSTALL: add lynx to install requirements
We need it for converting HTML, unfortunately.
Eric Wong [Thu, 10 Apr 2014 05:36:58 +0000 (05:36 +0000)]
INSTALL: update with Mail::Thread dependency
While we're at it, sort Makefile.PL and add a note to
update INSTALL, too.
Eric Wong [Thu, 10 Apr 2014 03:02:01 +0000 (03:02 +0000)]
cgi: implement get_mid_txt
This is essential when telling people to use something like:
curl $URL | git am
Eric Wong [Thu, 10 Apr 2014 03:01:02 +0000 (03:01 +0000)]
doc/design_www: add 301 URLs
We'll probably support these so they're easier-to-type and share.
Eric Wong [Thu, 10 Apr 2014 00:33:54 +0000 (00:33 +0000)]
cgi: wire up index + tests
Remove the specified /all.html while we're at it, we only have
/all.atom.xml because it's convenient for feed readers.
Eric Wong [Thu, 10 Apr 2014 00:23:18 +0000 (00:23 +0000)]
cgi: do not specify charset in Atom HTTP header
The feed itself already specifies it in XML, and we risk confusing
clients if XML::Atom::SimpleFeed changes in the future. This also
increases consistency for CGI vs static-file serving.
Eric Wong [Thu, 10 Apr 2014 00:20:55 +0000 (00:20 +0000)]
cgi: remove some redundant logic
We'll be reusing more validation logic for per-message and
per-thread pages.
Eric Wong [Wed, 9 Apr 2014 23:48:17 +0000 (23:48 +0000)]
doc/design_www: shorten URLs as much as possible
Message-IDs are extremely long already, so try to keep them short here.
Eric Wong [Wed, 9 Apr 2014 22:55:36 +0000 (22:55 +0000)]
t/mda: additional precheck tests
We did not test as well as we should have.
Eric Wong [Wed, 9 Apr 2014 22:42:38 +0000 (22:42 +0000)]
mda: set GIT_AUTHOR_DATE in commits as well
While we're at it, write some quick tests.
Eric Wong [Wed, 9 Apr 2014 19:33:06 +0000 (19:33 +0000)]
mda: set GIT_{COMMITTER,AUTHOR}_{NAME,EMAIL} env
This can make it easy to query via "git log --author=..."
without extracting each message.
Eric Wong [Wed, 9 Apr 2014 19:32:18 +0000 (19:32 +0000)]
config: include listname on lookup
We will be using it when setting GIT_COMMITTER_NAME