]>
Sergey Matveev's repositories - public-inbox.git/log
Eric Wong [Sat, 19 Apr 2014 19:19:06 +0000 (19:19 +0000)]
various documentation updates
We have an HTML homepage, OMG!
Eric Wong [Fri, 18 Apr 2014 22:18:17 +0000 (22:18 +0000)]
mda: rename PI_FAILBOX to PI_EMERGENCY
The emergency destination may be Maildir. A Maildir emergency
destination is better for volatile data which is written to
and deleted-from frequently.
Eric Wong [Sat, 19 Apr 2014 10:37:28 +0000 (10:37 +0000)]
cgi: index pages allow iterating some pagination
This allows WWW readers to slowly page through the entire history
of the mailing list.
Eric Wong [Fri, 18 Apr 2014 22:43:59 +0000 (22:43 +0000)]
view: fix regression in standalone /^>$/ lines
The lack of trailing whitespace in quote prefixes threw us
off and cause t/view to fail.
This failure was caused by
commit
fefea3d7d2484ffbf433aec0dd80026aa7120e07
("ensure per-message short quotes do not get too long")
and not caught before pushing because I failed to run
"make", only "prove" (and not even "prove -l" :x).
Eric Wong [Thu, 17 Apr 2014 22:41:57 +0000 (22:41 +0000)]
ensure per-message short quotes do not get too long
Sometimes a single long word or URL can lengthen the line
too much.
Eric Wong [Thu, 17 Apr 2014 22:23:01 +0000 (22:23 +0000)]
cgi: sort HTML index by most recent date
This is hopefully the most user-friendly method.
Eric Wong [Thu, 17 Apr 2014 22:08:31 +0000 (22:08 +0000)]
view: remove pointless self-linkage
It is pointless to link to ourselves.
Eric Wong [Thu, 17 Apr 2014 22:05:04 +0000 (22:05 +0000)]
view: fix title of HTML views
We need to take care to escape everything properly to avoid
HTML/JS injections.
Eric Wong [Thu, 17 Apr 2014 21:56:12 +0000 (21:56 +0000)]
add example for CGI with Ruby WEBrick
Some people like old-fashioned Ruby and WEBrick is in the Ruby
standard library, so widely available.
Eric Wong [Thu, 17 Apr 2014 21:49:52 +0000 (21:49 +0000)]
view: inline shorter quotes
Breaking up short quote messages at 1 line is too disconcerting,
try 5 lines as proper quotes shouldn't be too long.
Eric Wong [Thu, 17 Apr 2014 21:31:06 +0000 (21:31 +0000)]
cgi: implement suffix-less Message-ID redirects
This may be easier in some cases for copy+paste, but not 100%
reliable in case the .txt and .html suffixes are in the Message-ID
itself.
Eric Wong [Thu, 17 Apr 2014 21:30:21 +0000 (21:30 +0000)]
cgi: include HTTP status in error message body
This makes it slightly easier for out-of-the-box curl users since
curl does not report or show the error by default.
Eric Wong [Thu, 17 Apr 2014 20:10:38 +0000 (20:10 +0000)]
HTML: various encoding fixups
Eric Wong [Thu, 17 Apr 2014 03:19:49 +0000 (03:19 +0000)]
Feed: add bug note on memory cycle
This affects users of long-lived processes (FastCGI/Plack)
Eric Wong [Tue, 15 Apr 2014 07:10:17 +0000 (07:10 +0000)]
Revert "cgi: relax path restriction for top-level"
CGI mounts should probably handle this internally. We're reverting
this since it adds too much potential for abuse with fake/extra
prefixes in the URL. We also need to reorder our redirect handling
as a result.
This reverts commit
c394de9f2c91c2c5ed1f7832a5a7cc0206120b7f .
Eric Wong [Tue, 15 Apr 2014 07:06:45 +0000 (07:06 +0000)]
cgi: correct links to folded quotes
Lightly tested, but this seems to work OK.
Eric Wong [Tue, 15 Apr 2014 06:52:07 +0000 (06:52 +0000)]
view: finish HTML in per-message pages
Eric Wong [Tue, 15 Apr 2014 06:51:41 +0000 (06:51 +0000)]
cgi: support /all.html page with inline threads
Maybe this increases readability for now.
Eric Wong [Tue, 15 Apr 2014 06:35:52 +0000 (06:35 +0000)]
HTML: use shorter URLs in indices
Long URLs are not needed for HTML pages, but may be for feeds since
they're often resyndicated and not consumed by the browser.
Eric Wong [Tue, 15 Apr 2014 06:18:43 +0000 (06:18 +0000)]
HTML: ensure hrefs are quoted properly
We may be breaking some parsers or allowing more breakage
to slip through without quotes. We waste some bytes, though.
Eric Wong [Tue, 15 Apr 2014 06:00:16 +0000 (06:00 +0000)]
mda: encoding-aware From: for GIT_ authorship
Users with non-US-ASCII compatible names were not showing
up properly in "git log" output.
Eric Wong [Tue, 15 Apr 2014 05:51:34 +0000 (05:51 +0000)]
scripts/import_gmane_spool: preserve delivery order
Unfortunately, this means we get rid of parallelization,
as we need to preserve delivery order so HTML indices look
chronological. Order may also affect spam filtering and
training, too.
Eric Wong [Tue, 15 Apr 2014 02:38:07 +0000 (02:38 +0000)]
doc: add notes on scalability
Fortunately, most mailing lists will never grow too large.
Eric Wong [Mon, 14 Apr 2014 08:20:42 +0000 (08:20 +0000)]
doc: fold philosophy into the README
Hopefully this makes the scope and intent of the project clearer.
Eric Wong [Sat, 12 Apr 2014 23:14:54 +0000 (23:14 +0000)]
rename list from "bugs" to "meta"
"bugs" might confuse and limit the discussion, so "meta" it is!
Eric Wong [Sat, 12 Apr 2014 10:01:01 +0000 (10:01 +0000)]
mda: add most RFC 2919 and 2369 headers
These probably make sense even though we do not handle
delivery ourselves. It can aid in searching/filtering/tagging
of messages.
Eric Wong [Sat, 12 Apr 2014 09:59:41 +0000 (09:59 +0000)]
derive -primary_address in config
This may be useful for generating List-Id headers and HTML pages.
Eric Wong [Mon, 14 Apr 2014 07:58:50 +0000 (07:58 +0000)]
cgi: fix up top-level index
We do not have all messages in the top-level index
(and we need to adjust the test while we're at it).
Eric Wong [Mon, 14 Apr 2014 07:53:53 +0000 (07:53 +0000)]
cgi: 301 for list-indices without trailing slash
It is common to type upper-level URLs without the slash,
redirect users to the correct page for usability.
Eric Wong [Mon, 14 Apr 2014 06:59:04 +0000 (06:59 +0000)]
t/cgi: cleanup: no need for additional block
Not sure what I was thinking...
Eric Wong [Sat, 12 Apr 2014 01:06:58 +0000 (01:06 +0000)]
cgi: ensure we unescape MIDs correctly in URLs
MIDs may have strange characters in them, so we need to handle
escaping/unescaping properly to avoid broken links or worse.
Eric Wong [Sat, 12 Apr 2014 00:52:40 +0000 (00:52 +0000)]
cgi: avoid parsing ENV directly for PATH_INFO
This might make it easier to go to non-CGI things.
Eric Wong [Sat, 12 Apr 2014 00:49:34 +0000 (00:49 +0000)]
cgi: relax path restriction for top-level
We may have something like /foo.cgi/m/$MID.html in there.
Eric Wong [Sat, 12 Apr 2014 00:09:30 +0000 (00:09 +0000)]
cgi: rename to have .cgi suffix
This makes it easier to configure for systems which
determine a script is a CGI script based on suffix.
Eric Wong [Fri, 11 Apr 2014 23:21:51 +0000 (23:21 +0000)]
scripts/import_gmane_spool: misc updates
We may promote this to be a real script, since public-inbox-mda
is idempotent.
Eric Wong [Fri, 11 Apr 2014 22:23:27 +0000 (22:23 +0000)]
view: trim_message_id consistently
Message-IDs could have embedded '<' and '>'
Eric Wong [Fri, 11 Apr 2014 21:08:50 +0000 (21:08 +0000)]
config: support multiple addresses for a inbox
This makes it possible to gradually migrate to new address in case
of list name changes, and is one step closer to operating in
"stealth hijack mode" :)
Eric Wong [Fri, 11 Apr 2014 20:46:12 +0000 (20:46 +0000)]
add spam/ham learning wrapper script
This is essential for integrating into my inotify-based
spam training setup.
Eric Wong [Fri, 11 Apr 2014 20:32:57 +0000 (20:32 +0000)]
Documentation/design_notes: more updates
Laziness \o/
(Then impatience and hubris :)
Eric Wong [Fri, 11 Apr 2014 19:51:32 +0000 (19:51 +0000)]
filter: clarify regular expression
I often forget the subtleties of Perl regexps and newlines,
so I suspect others do, too. Use explicit capture so it's
more familiar to users of non-Perl regexps.
Eric Wong [Thu, 10 Apr 2014 23:33:48 +0000 (23:33 +0000)]
cgi: wire up HTML pages for messages
These need better tests and verification, but it's something
for now.
Eric Wong [Thu, 10 Apr 2014 20:12:03 +0000 (20:12 +0000)]
cgi: update feed/view and tests for shorter URLs
Code should be consistent with the design docs
(and we will need better tests).
Eric Wong [Thu, 10 Apr 2014 20:06:04 +0000 (20:06 +0000)]
cgi: add one-line descriptions for subroutines
Hopefully it's slightly easier-to-follow this way.
Eric Wong [Thu, 10 Apr 2014 20:02:34 +0000 (20:02 +0000)]
cgi: /$LISTNAME/ and /$LISTNAME/index.html are equal
This prevents ambiguity when switching URLs between static
file servers and CGI.
The /$LISTNAME/index.html URL appearing in the wild is inevitable
because of our static file server support. Worst yet, there's
no easy/consistent way to get all installations detect and 301
them to the shorter /$LISTNAME/. So we make the CGI support
/$LISTNAME/index.html.
The downside of this is the potential duplicate entry in all caches.
Eric Wong [Thu, 10 Apr 2014 19:26:40 +0000 (19:26 +0000)]
cgi: be strict about UTF-8 encoding in HTML and XML
Hopefully this forces us to generate valid UTF-8 data.
Eric Wong [Fri, 11 Apr 2014 22:17:07 +0000 (22:17 +0000)]
filter: use IPC::Run and improve lynx error handling
We may occasionally encounter horrid HTML which lynx cannot
handle, so improve error reporting.
Eric Wong [Fri, 11 Apr 2014 19:06:07 +0000 (19:06 +0000)]
INSTALL: add lynx to install requirements
We need it for converting HTML, unfortunately.
Eric Wong [Thu, 10 Apr 2014 05:36:58 +0000 (05:36 +0000)]
INSTALL: update with Mail::Thread dependency
While we're at it, sort Makefile.PL and add a note to
update INSTALL, too.
Eric Wong [Thu, 10 Apr 2014 03:02:01 +0000 (03:02 +0000)]
cgi: implement get_mid_txt
This is essential when telling people to use something like:
curl $URL | git am
Eric Wong [Thu, 10 Apr 2014 03:01:02 +0000 (03:01 +0000)]
doc/design_www: add 301 URLs
We'll probably support these so they're easier-to-type and share.
Eric Wong [Thu, 10 Apr 2014 00:33:54 +0000 (00:33 +0000)]
cgi: wire up index + tests
Remove the specified /all.html while we're at it, we only have
/all.atom.xml because it's convenient for feed readers.
Eric Wong [Thu, 10 Apr 2014 00:23:18 +0000 (00:23 +0000)]
cgi: do not specify charset in Atom HTTP header
The feed itself already specifies it in XML, and we risk confusing
clients if XML::Atom::SimpleFeed changes in the future. This also
increases consistency for CGI vs static-file serving.
Eric Wong [Thu, 10 Apr 2014 00:20:55 +0000 (00:20 +0000)]
cgi: remove some redundant logic
We'll be reusing more validation logic for per-message and
per-thread pages.
Eric Wong [Wed, 9 Apr 2014 23:48:17 +0000 (23:48 +0000)]
doc/design_www: shorten URLs as much as possible
Message-IDs are extremely long already, so try to keep them short here.
Eric Wong [Wed, 9 Apr 2014 22:55:36 +0000 (22:55 +0000)]
t/mda: additional precheck tests
We did not test as well as we should have.
Eric Wong [Wed, 9 Apr 2014 22:42:38 +0000 (22:42 +0000)]
mda: set GIT_AUTHOR_DATE in commits as well
While we're at it, write some quick tests.
Eric Wong [Wed, 9 Apr 2014 19:33:06 +0000 (19:33 +0000)]
mda: set GIT_{COMMITTER,AUTHOR}_{NAME,EMAIL} env
This can make it easy to query via "git log --author=..."
without extracting each message.
Eric Wong [Wed, 9 Apr 2014 19:32:18 +0000 (19:32 +0000)]
config: include listname on lookup
We will be using it when setting GIT_COMMITTER_NAME
Eric Wong [Wed, 9 Apr 2014 18:07:15 +0000 (18:07 +0000)]
mda: prevent duplicate Message-IDs from appearing
For practical purposes, Message-IDs are unique and duplicates
do not appear unless client software is broken.
Eric Wong [Wed, 9 Apr 2014 18:06:34 +0000 (18:06 +0000)]
Makefile.PL: add parallel test target
These tests were designed to run in parallel.
Eric Wong [Wed, 9 Apr 2014 05:46:59 +0000 (05:46 +0000)]
doc: split out philosophy to a different page
Hopefully a little easier to find for clients and not admins running
servers. While we're at it, expand design_notes.
Eric Wong [Wed, 9 Apr 2014 01:59:06 +0000 (01:59 +0000)]
preliminary HTML index generation
Using JWZ threading might work decently for this.
Haven't checked in lynx, yet.
Eric Wong [Wed, 9 Apr 2014 00:06:53 +0000 (00:06 +0000)]
precheck: stricter checks including min length
We should reject values which are too short to be useful or sane.
Eric Wong [Tue, 8 Apr 2014 23:58:36 +0000 (23:58 +0000)]
precheck: reject messages with no subject
Composers may screw up and leave the subject out, so
reject those messages.
Eric Wong [Tue, 8 Apr 2014 21:57:53 +0000 (21:57 +0000)]
feed: filter out each_recent_blob wrapper
We will need it for HTML indices, too.
Eric Wong [Tue, 8 Apr 2014 20:42:17 +0000 (20:42 +0000)]
design_notes: various updates, including "why git?"
Things to keep in mind when working on this.
Eric Wong [Tue, 8 Apr 2014 09:08:56 +0000 (09:08 +0000)]
scripts/report-spam: explain design decisions
Trying my best to not forget things I wrote this years ago.
Eric Wong [Tue, 8 Apr 2014 08:45:25 +0000 (08:45 +0000)]
doc: various cleanups all around
Most notably, the INSTALL is geared towards potential server admins,
whereas the README is also for interested "drive-by" readers.
Eric Wong [Tue, 8 Apr 2014 08:43:18 +0000 (08:43 +0000)]
cgi: cleanup dependencies
We do not need to use CGI::Util internals here.
Eric Wong [Mon, 7 Apr 2014 22:59:10 +0000 (22:59 +0000)]
cgi: make internal interface more Plack-like
This should make it easier to support non-CGI uses,
as well as making it easier to generate static sites.
Eric Wong [Mon, 7 Apr 2014 20:26:42 +0000 (20:26 +0000)]
feed: generate takes a hashref for args
Passing a giant argument list is to error prone and
hard-to-document.
Eric Wong [Sun, 6 Apr 2014 07:06:15 +0000 (07:06 +0000)]
doc: design_www note /full/ namespace for messages
We serve the short, abridge-quote version by default since
it is (unfortunately) common practice to over-quote on mailing lists.
Eric Wong [Sun, 6 Apr 2014 07:03:49 +0000 (07:03 +0000)]
view: use "original" rather than "raw"
This wording is probably clearer to everyone, and also used by at
least one other mailing list WWW interface.
Eric Wong [Sun, 6 Apr 2014 06:59:43 +0000 (06:59 +0000)]
view: minor cleanups
Avoid adding extraneous whitespace in HTML content, as it
increases bandwidth usage of network, disk and memory.
Eric Wong [Sun, 6 Apr 2014 06:52:18 +0000 (06:52 +0000)]
feed: reuse view class to display message
This reduces duplicated/similar code and hopefully makes things more
consistent.
Eric Wong [Sun, 6 Apr 2014 06:50:08 +0000 (06:50 +0000)]
view: all content is assumed to be displayable text
Our Filter class now passes through application/octet-stream
if it looks like text (because some mailers suck); so we
cannot trust the specified Content-Type of a message.
Eric Wong [Sat, 5 Apr 2014 10:55:44 +0000 (10:55 +0000)]
Makefile.PL: update dependencies
This is lightly tested.
Eric Wong [Sat, 5 Apr 2014 10:54:24 +0000 (10:54 +0000)]
view: use URI::Escape to escape URIs
CGI::escape is not a documented subroutine, so the chances of
it going away are higher.
Eric Wong [Sat, 5 Apr 2014 10:51:52 +0000 (10:51 +0000)]
feed: remove unnecessary use
We no longer use DateTime::Format::Mail.
Eric Wong [Sat, 5 Apr 2014 07:02:19 +0000 (07:02 +0000)]
feed: use Date::Parse to parse dates
This is a smaller module dependency-wise and should be easier-to-install
for folks with limited packaging systems or network/disk capacity.
We do not need very powerful date parsing, as bad date formats are
likely the work of spammers.
Eric Wong [Sat, 5 Apr 2014 06:53:19 +0000 (06:53 +0000)]
get a basic CGI feed sender running
We should be able to wire up the rest, soon.
Eric Wong [Sat, 5 Apr 2014 03:17:35 +0000 (03:17 +0000)]
remove failrepo config
We will just use the fallback in Email::Filter to
reduce configuration knobs. Failed messages are failed
messages, do not classify them beyond that.
Eric Wong [Sat, 5 Apr 2014 01:33:46 +0000 (01:33 +0000)]
public-inbox-mda: preliminary manpage documentation
This still needs to be fleshed out.
Eric Wong [Sat, 5 Apr 2014 01:11:08 +0000 (01:11 +0000)]
view: implement quote folding and flesh out tests
Unfortunately, quoting is often excessive, so hide multi-line quotes
by default and provide anchored links to full messages instead.
Eric Wong [Fri, 4 Apr 2014 23:31:46 +0000 (23:31 +0000)]
config: add shortcut for retrieving elements
Hopefully this makes for less ad-hoc hash access in case
our config format changes.
Eric Wong [Fri, 4 Apr 2014 23:28:53 +0000 (23:28 +0000)]
doc: add design documentation for WWW interface
Mainly, start with URL routes since that's what users usually
see, first.
Eric Wong [Fri, 4 Apr 2014 01:42:41 +0000 (01:42 +0000)]
view: update IRP and MID links
We'll go with .html and .txt suffixes on MIDs to benefit
static hosting setups.
Eric Wong [Tue, 1 Apr 2014 23:07:38 +0000 (23:07 +0000)]
flesh out MDA and simplify config setup
We will be reusing the config parsing code for the CGI
script, too.
Eric Wong [Mon, 31 Mar 2014 20:16:19 +0000 (20:16 +0000)]
precheck uses recipient argument
We will also be using the RECIPIENT env in the future, since
that takes aliases into account.
Reducing the possible callsites to check ENV means we can more
easily update the code in the future.
Eric Wong [Mon, 31 Mar 2014 20:15:27 +0000 (20:15 +0000)]
README: various updates
Eric Wong [Fri, 4 Apr 2014 00:39:41 +0000 (00:39 +0000)]
filter: use regexp to check multipart bodies
This should be safer than running file(1), which has had its share
of vulnerabilities this year (early 2014) We really only care about
diffs and maybe short log files, here.
Eric Wong [Thu, 3 Apr 2014 20:28:30 +0000 (20:28 +0000)]
filter: possibly keep PGP sigs only (not other types)
We may keep PGP signatures for messages we do not modify.
However, we have no way of verifying them on the server-side.
Eric Wong [Mon, 31 Mar 2014 19:58:24 +0000 (19:58 +0000)]
design_notes: with philosophy
Eric Wong [Fri, 28 Mar 2014 08:22:45 +0000 (08:22 +0000)]
filter: use file(1) to detect mime type if octet-stream
Some mailers do not correctly detect/set the Content-Type header; so
attempt to keep messages based on our server-detected MIME type if
application/octet-stream was specified.
Eric Wong [Thu, 27 Mar 2014 19:38:06 +0000 (19:38 +0000)]
config: revamp API and implement lookup
Eric Wong [Mon, 24 Mar 2014 20:17:11 +0000 (20:17 +0000)]
initial cut at Atom feed generation
This should make it easier for non-ssoma users to follow.
Eric Wong [Tue, 25 Feb 2014 22:26:35 +0000 (22:26 +0000)]
precheck: require Message-ID to be set
Valid emails should not arrive without a Message-ID.
Eric Wong [Tue, 25 Feb 2014 03:01:04 +0000 (03:01 +0000)]
view: add view module to be used for rendering HTML
This is to keep content accessible to search engines.
Eric Wong [Tue, 11 Feb 2014 00:36:06 +0000 (00:36 +0000)]
move pre-spamc checks to PublicInbox->precheck
We may add more checks before we go to spamc.
Eric Wong [Tue, 11 Feb 2014 00:29:13 +0000 (00:29 +0000)]
public-inbox-mda: reject messages without From header