1 Internal data structures of public-inbox
3 This is a guide for hackers new to our code base. Do not
4 consider our internal data structures stable for external
5 consumers, this document should be updated when internals
6 change. I recommend reading this document from the source tree,
7 with the source code easily accessible if you need examples.
9 This mainly documents in-memory data structures. If you're
10 interested in the stable on-filesystem formats, see the
11 public-inbox-config(5), public-inbox-v1-format(5) and
12 public-inbox-v2-format(5) manpages.
14 Common abbreviations when used outside of their packages are
15 documented. `$self' is the common variable name when used
21 PublicInbox::Config is the root class which loads a
22 public-inbox-config file and instantiates PublicInbox::Inbox,
23 PublicInbox::WWW, PublicInbox::NNTPD, and other top-level
26 Outside of tests, this is typically a singleton.
31 * PublicInbox::MIME - Email::MIME subclass
32 Common abbreviation: $mime
33 Used by: PublicInbox::WWW, PublicInbox::SearchIdx
35 An representation of an entire email, multipart or not. It's
36 a subclass of Email::MIME to workaround bugs in old
37 Email::MIME versions. An option to use libgmime or libmailutils
38 may be supported in the future for performance and memory use.
40 This can be a memory hog with big messages and giant
41 attachments, so our PublicInbox::WWW interface only keeps
42 one object of this class in memory at-a-time.
44 In other words, this is the "meat" of the message, whereas
45 $smsg (below) is just the "skeleton".
47 Our PublicInbox::V2Writable class may have two objects of this
48 type in memory at-a-time for deduplication.
50 * PublicInbox::Smsg - small message skeleton
51 Used by: PublicInbox::{NNTP,WWW,SearchIdx}
52 Common abbreviation: $smsg
54 Represents headers shown in NNTP overview and PSGI message
55 summaries (thread skeleton).
57 This is loaded from either the overview DB (over.sqlite3) or
58 the Xapian DB (docdata.glass), though the Xapian docdata
59 is won't hold NNTP-only fields (Cc:/To:)
61 There may be hundreds or thousands of these objects in memory
62 at-a-time, so fields are pruned if unneeded.
64 * PublicInbox::SearchThread::Msg - subclass of Smsg
65 Common abbreviation: $cont or $node
66 Used by: PublicInbox::WWW
68 The structure we use for a non-recursive[1] variant of
69 JWZ's algorithm: <https://www.jwz.org/doc/threading.html>.
70 Nowadays, this is a re-blessed $smsg with additional fields.
72 As with $smsg objects, there may be hundreds or thousands
73 of these objects in memory at-a-time.
75 We also do not use a linked-list for storing children as JWZ
76 describes, but instead a Perl hashref for {children} which
77 becomes an arrayref upon sorting.
79 [1] https://rt.cpan.org/Ticket/Display.html?id=116727
84 * PublicInbox::Inbox - represents a single public-inbox
85 Common abbreviation: $ibx
88 This represents a "publicinbox" section in the config
89 file, see public-inbox-config(5) for details.
91 * PublicInbox::Git - represents a single git repository
92 Common abbreviation: $git, $ibx->git
95 Each configured "publicinbox" or "coderepo" has one of these.
97 * PublicInbox::Msgmap - msgmap.sqlite3 read-write interface
98 Common abbreviation: $mm, $ibx->mm
99 Used everywhere if SQLite is available.
101 Each indexed inbox has one of these, see
102 public-inbox-v1-format(5) and public-inbox-v2-format(5)
103 manpages for details.
105 * PublicInbox::Over - over.sqlite3 read-only interface
106 Common abbreviation: $over, $ibx->over
107 Used everywhere if SQLite is available.
109 Each indexed inbox has one of these, see
110 public-inbox-v1-format(5) and public-inbox-v2-format(5)
111 manpages for details.
113 * PublicInbox::Search - Xapian read-only interface
114 Common abbreviation: $srch, $ibx->search
115 Used everywhere if Search::Xapian (or Xapian.pm) is available.
117 Each indexed inbox has one of these, see
118 public-inbox-v1-format(5) and public-inbox-v2-format(5)
119 manpages for details.
124 The main PSGI web interface, uses several other packages to
125 form our web interface.
127 PublicInbox::SolverGit
128 ----------------------
130 This is instantiated from the $INBOX/$BLOB_OID/s/ WWW endpoint
131 and represents the stages and states for "solving" a blob by
132 searching for and applying patches. See the code and comments
133 in PublicInbox/SolverGit.pm
138 This is instantiated from various WWW endpoints and represents
139 the stages and states for running and managing subprocesses
140 in a way which won't exceed configured process limits defined
141 via "publicinboxlimiter.*" directives in public-inbox-config(5).
143 ad-hoc structures shared across packages
144 ----------------------------------------
146 * $ctx - PublicInbox::WWW app request context
147 This holds the PSGI $env as well as any internal variables
148 used by various modules of PublicInbox::WWW.
150 As with the PSGI $env, there is one per-active WWW
151 request+response cycle. It does not exist for idle HTTP
157 * PublicInbox::NNTP - a NNTP client socket
158 Common abbreviation: $nntp
159 Used by: PublicInbox::DS, public-inbox-nntpd
161 Unlike PublicInbox::HTTP, all of the NNTP client logic for
162 serving to NNTP clients is here, including what would be
163 in $ctx on the HTTP or WWW side.
165 There may be thousands of these since we support thousands of
168 * PublicInbox::HTTP - a HTTP client socket
169 Common abbreviation: $http
170 Used by: PublicInbox::DS, public-inbox-httpd
172 Unlike PublicInbox::NNTP, this class no knowledge of any of
173 the email or git-specific parts of public-inbox, only PSGI.
174 However, it supports APIs and behaviors (e.g. streaming large
175 responses) which PublicInbox::WWW may take advantage of.
177 There may be thousands of these since we support thousands of
180 * PublicInbox::Listener - a SOCK_STREAM listen socket (TCP or Unix)
181 Used by: PublicInbox::DS, public-inbox-httpd, public-inbox-nntpd
182 Common abbreviation: @listeners in PublicInbox::Daemon
184 This class calls non-blocking accept(2) or accept4(2) on a
185 listen socket to create new PublicInbox::HTTP and
186 PublicInbox::HTTP instances.
189 Common abbreviation: $httpd
191 Represents an HTTP daemon which creates PublicInbox::HTTP
192 wrappers around client sockets accepted from
193 PublicInbox::Listener.
195 Since the SERVER_NAME and SERVER_PORT PSGI variables needs to be
196 exposed for HTTP/1.0 requests when Host: headers are missing,
197 this is per-Listener socket.
199 * PublicInbox::HTTPD::Async
200 Common abbreviation: $async
202 Used for implementing an asynchronous "push" interface for
203 slow, expensive responses which may require spawning
204 git-httpd-backend(1), git-apply(1) or other commands.
205 This will also be used for dealing with future asynchronous
206 operations such as HTTP reverse proxying and slow storage
207 retrieval operations.
210 Common abbreviation: $nntpd
212 Represents an NNTP daemon which creates PublicInbox::NNTP
213 wrappers around client sockets accepted from
214 PublicInbox::Listener.
216 This is currently a singleton, but it is associated with a
217 given PublicInbox::Config which may be instantiated more than
220 * PublicInbox::ParentPipe
222 Per-worker process class to detect shutdown of master process.
223 This is not used if using -W0 to disable worker processes
224 in public-inbox-httpd or public-inbox-nntpd.
226 This is a per-worker singleton.