1 Internal data structures of public-inbox
3 This is a guide for hackers new to our code base. Do not
4 consider our internal data structures stable for external
5 consumers, this document should be updated when internals
6 change. I recommend reading this document from the source tree,
7 with the source code easily accessible if you need examples.
9 This mainly documents in-memory data structures. If you're
10 interested in the stable on-filesystem formats, see the
11 public-inbox-config(5), public-inbox-v1-format(5) and
12 public-inbox-v2-format(5) manpages.
14 Common abbreviations when used outside of their packages are
15 documented. `$self' is the common variable name when used
21 PublicInbox::Config is the root class which loads a
22 public-inbox-config file and instantiates PublicInbox::Inbox,
23 PublicInbox::WWW, PublicInbox::NNTPD, and other top-level
26 Outside of tests, this is typically a singleton.
31 * PublicInbox::MIME - Email::MIME subclass
32 Common abbreviation: $mime
33 Used by: PublicInbox::WWW, PublicInbox::SearchIdx
35 An representation of an entire email, multipart or not. It's
36 a subclass of Email::MIME to workaround bugs in old
37 Email::MIME versions. An option to use libgmime or libmailutils
38 may be supported in the future for performance and memory use.
40 This can be a memory hog with big messages and giant
41 attachments, so our PublicInbox::WWW interface only keeps
42 one object of this class in memory at-a-time.
44 In other words, this is the "meat" of the message, whereas
45 $smsg (below) is just the "skeleton".
47 Our PublicInbox::V2Writable class may have two objects of this
48 type in memory at-a-time for deduplication.
50 * PublicInbox::SearchMsg - small message skeleton
51 Used by: PublicInbox::{NNTP,WWW,SearchIdx}
52 Common abbreviation: $smsg
54 Represents headers shown in NNTP overview and PSGI message
55 summaries (thread skeleton).
57 This is loaded from either the overview DB (over.sqlite3) or
58 the Xapian DB (docdata.glass), though the Xapian docdata
59 is won't hold NNTP-only fields (Cc:/To:)
61 There may be hundreds or thousands of these objects in memory
62 at-a-time, so fields are pruned if unneeded.
64 * PublicInbox::SearchThread::Msg - container for message threading
65 Common abbreviation: $cont or $node
66 Used by: PublicInbox::WWW
68 The container we use for a non-recursive[1] variant of
69 JWZ's algorithm: <https://www.jwz.org/doc/threading.html>.
70 This holds a $smsg and is only used for message threading.
71 This wrapper class may go away in the future and handled
72 directly by PublicInbox::SearchMsg to save memory.
74 As with $smsg objects, there may be hundreds or thousands
75 of these objects in memory at-a-time.
77 We also do not use a linked-list for storing children as JWZ
78 describes, but instead a Perl hashref for {children} which
79 becomes an arrayref upon sorting.
81 [1] https://rt.cpan.org/Ticket/Display.html?id=116727
86 * PublicInbox::Inbox - represents a single public-inbox
87 Common abbreviation: $ibx
90 This represents a "publicinbox" section in the config
91 file, see public-inbox-config(5) for details.
93 * PublicInbox::Git - represents a single git repository
94 Common abbreviation: $git, $ibx->git
97 Each configured "publicinbox" or "coderepo" has one of these.
99 * PublicInbox::Msgmap - msgmap.sqlite3 read-write interface
100 Common abbreviation: $mm, $ibx->mm
101 Used everywhere if SQLite is available.
103 Each indexed inbox has one of these, see
104 public-inbox-v1-format(5) and public-inbox-v2-format(5)
105 manpages for details.
107 * PublicInbox::Over - over.sqlite3 read-only interface
108 Common abbreviation: $over, $ibx->over
109 Used everywhere if SQLite is available.
111 Each indexed inbox has one of these, see
112 public-inbox-v1-format(5) and public-inbox-v2-format(5)
113 manpages for details.
115 * PublicInbox::Search - Xapian read-only interface
116 Common abbreviation: $srch, $ibx->search
117 Used everywhere if Search::Xapian (or Xapian.pm) is available.
119 Each indexed inbox has one of these, see
120 public-inbox-v1-format(5) and public-inbox-v2-format(5)
121 manpages for details.
126 The main PSGI web interface, uses several other packages to
127 form our web interface.
129 PublicInbox::SolverGit
130 ----------------------
132 This is instantiated from the $INBOX/$BLOB_OID/s/ WWW endpoint
133 and represents the stages and states for "solving" a blob by
134 searching for and applying patches. See the code and comments
135 in PublicInbox/SolverGit.pm
140 This is instantiated from various WWW endpoints and represents
141 the stages and states for running and managing subprocesses
142 in a way which won't exceed configured process limits defined
143 via "publicinboxlimiter.*" directives in public-inbox-config(5).
145 ad-hoc structures shared across packages
146 ----------------------------------------
148 * $ctx - PublicInbox::WWW app request context
149 This holds the PSGI $env as well as any internal variables
150 used by various modules of PublicInbox::WWW.
152 As with the PSGI $env, there is one per-active WWW
153 request+response cycle. It does not exist for idle HTTP
159 * PublicInbox::NNTP - a NNTP client socket
160 Common abbreviation: $nntp
161 Used by: PublicInbox::DS, public-inbox-nntpd
163 Unlike PublicInbox::HTTP, all of the NNTP client logic for
164 serving to NNTP clients is here, including what would be
165 in $ctx on the HTTP or WWW side.
167 There may be thousands of these since we support thousands of
170 * PublicInbox::HTTP - a HTTP client socket
171 Common abbreviation: $http
172 Used by: PublicInbox::DS, public-inbox-httpd
174 Unlike PublicInbox::NNTP, this class no knowledge of any of
175 the email or git-specific parts of public-inbox, only PSGI.
176 However, it supports APIs and behaviors (e.g. streaming large
177 responses) which PublicInbox::WWW may take advantage of.
179 There may be thousands of these since we support thousands of
182 * PublicInbox::Listener - a SOCK_STREAM listen socket (TCP or Unix)
183 Used by: PublicInbox::DS, public-inbox-httpd, public-inbox-nntpd
184 Common abbreviation: @listeners in PublicInbox::Daemon
186 This class calls non-blocking accept(2) or accept4(2) on a
187 listen socket to create new PublicInbox::HTTP and
188 PublicInbox::HTTP instances.
191 Common abbreviation: $httpd
193 Represents an HTTP daemon which creates PublicInbox::HTTP
194 wrappers around client sockets accepted from
195 PublicInbox::Listener.
197 Since the SERVER_NAME and SERVER_PORT PSGI variables needs to be
198 exposed for HTTP/1.0 requests when Host: headers are missing,
199 this is per-Listener socket.
201 * PublicInbox::HTTPD::Async
202 Common abbreviation: $async
204 Used for implementing an asynchronous "push" interface for
205 slow, expensive responses which may require spawning
206 git-httpd-backend(1), git-apply(1) or other commands.
207 This will also be used for dealing with future asynchronous
208 operations such as HTTP reverse proxying and slow storage
209 retrieval operations.
212 Common abbreviation: $nntpd
214 Represents an NNTP daemon which creates PublicInbox::NNTP
215 wrappers around client sockets accepted from
216 PublicInbox::Listener.
218 This is currently a singleton, but it is associated with a
219 given PublicInbox::Config which may be instantiated more than
222 * PublicInbox::ParentPipe
224 Per-worker process class to detect shutdown of master process.
225 This is not used if using -W0 to disable worker processes
226 in public-inbox-httpd or public-inbox-nntpd.
228 This is a per-worker singleton.