1 % public-inbox developer manual
5 public-inbox v1 git repository and tree description (aka "ssoma")
9 WARNING: this does NOT describe the scalable v2 format used
10 by public-inbox. Use of ssoma is not recommended for new
11 installations due to scalability problems.
13 ssoma uses a git repository to store each email as a git blob.
14 The tree filename of the blob is based on the SHA1 hexdigest of
15 the first Message-ID header. A commit is made for each message
16 delivered. The commit SHA-1 identifier is used by ssoma clients
17 to track synchronization state.
19 =head1 PATHNAMES IN TREES
21 A Message-ID may be extremely long and also contain slashes, so using
22 them as a path name is challenging. Instead we use the SHA-1 hexdigest
23 of the Message-ID (excluding the leading "E<lt>" and trailing "E<gt>")
24 to generate a path name. Leading and trailing white space in the
25 Message-ID header is ignored for hashing.
27 A message with Message-ID of: E<lt>20131106023245.GA20224@dcvr.yhbt.netE<gt>
29 Would be stored as: f2/8c6cfd2b0a65f994c3e1be266105413b3d3f63
31 Thus it is easy to look up the contents of a message matching a given
34 =head1 MESSAGE-ID CONFLICTS
36 public-inbox v1 repositories currently do not resolve conflicting
37 Message-IDs or messages with multiple Message-IDs.
41 The Message-ID header is required.
42 "Bytes", "Lines" and "Content-Length" headers are stripped and not
43 allowed, they can interfere with further processing.
44 When using ssoma with public-inbox-mda, the "Status" mbox header
45 is also stripped as that header makes no sense in a public archive.
49 L<flock(2)> locking exclusively locks the empty $GIT_DIR/ssoma.lock file
50 for all non-atomic operations.
52 =head1 EXAMPLE INPUT FLOW (SERVER-SIDE MDA)
54 1. Message is delivered to a mail transport agent (MTA)
56 1a. (optional) reject/discard spam, this should run before ssoma-mda
58 1b. (optional) reject/strip unwanted attachments
60 ssoma-mda handles all steps once invoked.
62 2. Mail transport agent invokes ssoma-mda
64 3. reads message via stdin, extracting Message-ID
66 4. acquires exclusive flock lock on $GIT_DIR/ssoma.lock
68 5. creates or updates the blob of associated 2/38 SHA-1 path
70 6. updates the index and commits
72 7. releases $GIT_DIR/ssoma.lock
74 ssoma-mda can also be used as an L<inotify(7)> trigger to monitor maildirs,
75 and the ability to monitor IMAP mailboxes using IDLE will be available
78 =head1 GIT REPOSITORIES (SERVERS)
80 ssoma uses bare git repositories on both servers and clients.
82 Using the L<git-init(1)> command with --bare is the recommend method
83 of creating a git repository on a server:
85 git init --bare /path/to/wherever/you/want.git
87 There are no standardized paths for servers, administrators make
88 all the choices regarding git repository locations.
90 Special files in $GIT_DIR on the server:
94 =item $GIT_DIR/ssoma.lock
96 An empty file for L<flock(2)> locking.
97 This is necessary to ensure the index and commits are updated
98 consistently and multiple processes running MDA do not step on
101 =item $GIT_DIR/public-inbox/msgmap.sqlite3
103 SQLite3 database maintaining a stable mapping of Message-IDs to NNTP
104 article numbers. Used by L<public-inbox-nntpd(1)> and created
105 and updated by L<public-inbox-index(1)>.
107 Automatically updated by L<public-inbox-mda(1)>,
108 L<public-inbox-learn(1)> and L<public-inbox-watch(1)>.
110 Losing or damaging this file will cause synchronization problems for
111 NNTP clients. This file is expected to be stable and require no
112 updates to its schema.
114 Requires L<DBD::SQLite>.
116 =item $GIT_DIR/public-inbox/xapian$N/
118 Xapian database for search indices in the PSGI web UI.
120 $N is the value of PublicInbox::Search::SCHEMA_VERSION, and
121 installations may have parallel versions on disk during upgrades
122 or to roll-back upgrades.
124 This is created and updated by L<public-inbox-index(1)>.
126 Automatically updated by L<public-inbox-mda(1)>,
127 L<public-inbox-learn(1)> and L<public-inbox-watch(1)>.
129 This directory can always be regenerated with L<public-inbox-index(1)>.
130 If lost or damaaged, there is no need to back it up unless the
131 CPU/memory cost of regenerating it outweighs the storage/transfer cost.
133 Since SCHEMA_VERSION 15 and the development of the v2 format,
134 the "overview" DB also exists in the xapian directory for v1
135 repositories. See L<public-inbox-v2-format(5)/OVERVIEW DB>
137 =item $GIT_DIR/ssoma.index
139 This file is no longer used or created by public-inbox, but it is
140 updated if it exists to remain compatible with ssoma installations.
142 A git index file used for MDA updates. The normal git index (in
143 $GIT_DIR/index) is not used at all as there is typically no working
148 Each client $GIT_DIR may have multiple mbox/maildir/command targets.
149 It is possible for a client to extract the mail stored in the git
150 repository to multiple mboxes for compatibility with a variety of
155 It is NOT recommended to check out the working directory of a git.
156 there may be many files.
158 It is impossible to completely expunge messages, even spam, as git
159 retains full history. Projects may (with adequate notice) cycle to new
160 repositories/branches with history cleaned up via L<git-filter-branch(1)>.
161 This is up to the administrators.
165 Copyright 2013-2019 all contributors L<mailto:meta@public-inbox.org>
167 License: AGPL-3.0+ L<http://www.gnu.org/licenses/agpl-3.0.txt>
171 L<gitrepository-layout(5)>, L<ssoma(1)>