-
Notifications
You must be signed in to change notification settings - Fork 10
/
file_formats.txt
268 lines (205 loc) · 10.6 KB
/
file_formats.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
The TDL relies on index files to gain high performance on very slow vintage
architecture. Index files assist with the following functions:
- Mapping long titles to 8.3 DOS filenames
- Identifying long titles with a short, unique hash (useful for
tracking "Favorites" across multiple index and file loads)
- Implementing a fast "search as you type" interface
The remainder of this text file documents the file format of the index
files generated by the TDL indexer.
All values are little-endian and unsigned unless otherwise noted.
This specification is current as of 20180701.
=====================================================================
Title Index format:
numEntries: 16-bit word of how many titles we have available
REPEAT (This structure repeats numEntries times)
titleOfs: 32-bit word of offset where each variable-length
record starts
END
REPEAT (This structure repeats numEntries times)
titleID: 16-bit word
titleHash: 16 bytes of the MD5 hash of the title string
titleLen: 1 byte of length of title string
titleStr: titleLen characters of title string
END
Purpose:
Contains the list of titles available for launching.
Implementation notes:
titleID isn't necessary but is included for error-checking.
titleHash is used to uniquely identify the title in a smaller form that
can be tracked across multiple file/index loads. Currently it is a hash
of the title string, but later analysis might change it to something
else (like a hash of the .zip file contents, or a MobyGames gameid).
titleLen is just for printing the string from memory; it is not used to
determine how to load the string from disk exactly. That is an
additional seek operation, and we're trying to minimize seeks on vintage
hardware. Loading a title record will just be a straight 256-byte load,
and the record fragment that comes after what we want will simply be
ignored.
=====================================================================
Files Index format:
numEntries: 16-bit word of how many files we have available
REPEAT (This structure repeats numEntries times)
fileID: 16-bit word
fileStr: 12-byte string, including dot ("."), null-padded
END
Purpose:
Contains titleID-to-filename mappings.
Implementation Notes:
Including the dot (".") is redundant, but is included to ensure the
entires are always aligned to word boundaries for speed optimization.
(vintage systems are slow; we're going to need all the help we can get)
numEntries and fileID are redundant, but are included for error-checking
purposes.
=====================================================================
Wordlist index format:
numEntries: 16-bit word of how many words are in the list
REPEAT (This structure repeats numEntries times)
sWord: 16-byte uppercase character sequence
END
Purpose:
Contains all recognized search words/terms.
Implementation Notes:
All words present in all title strings will be converted to uppercase,
truncated to 16 characters, made unique (ie. duplicates will be
removed), and aligned to 8086 paragraph boundaries, sorted in ascii
order, null-padded. This arrangement will allow for quick searching of
words via binary search. The offset from the beginning of the wordlist
serves as their ID.
numEntries is redundant, but is included for error-checking purposes.
=====================================================================
Search Word Locations index format:
(This is a mapping table that returns all of the titleIDs that contain a
particular search word. At each mappingOfs, mappingLen titleIDs follow.)
numEntries: 16-bit word of how many word mappings we have available
REPEAT (This structure repeats numEntries times)
mappingOfs: 32-bit word of offset where each variable-length
record starts
mappingLen: 16-bit word of how many title match records follow
END
REPEAT (This structure repeats every mappingOfs+mappingLen times)
titleID: 16-bit word denoting a titleID that contains the word
(titleID, titleID, titleID, etc. for mappingLen times)
titleEndMarker: 16-bit value 0xFFFF (used for error-checking)
END
Purpose:
Contains searchwords-to-title mappings.
Implementation Notes:
Lists of titles to display will be based on combinations of the
information found in this index file.
To minimize IO activity, only the titleIDs are included, and not the
word's found position in the title string. Located words will be found
and highlighted in the launcher running in DOS. (This decision may be
revisited at a later date if the highlighting process is found to be too
slow in DOS.)
=====================================================================
Favorites export/import format:
numEntries: 16-bit word of how many favorites are in the file
REPEAT (This structure repeats numEntries times)
titleHash: 16 bytes of the MD5 hash of the title string
END
Purpose:
Contains the list of games the user has "favorited".
Implementation Notes:
The actual favorites list is just an array of bitflags that, if
modified, is saved to disk before each handle() call. However, a user
may want to reload the list of files in the future while still retaining
their favorites. Since the titleids will change if this happens, and
our existing favorites won't map to the new indexes, we offer the user a
way to export their favorites and then later import them.
One future improvement is to eliminate the need for export/import by
storing the titleids in a block before the titlehashes; that way, the
titleid could be updated whenever a new index load is detected.
=====================================================================
Metadata cache:
numEntries: 16-bit byte of how many bitflags are in the file
REPEAT (This structure repeats numEntries times)
flags: 1 byte containing flags
END
Flags contained in each byte (values in hex):
76543210
0000000x: Title is marked as a Favorite
000000x0: Title was unpacked into data cache at some point
xxxxxx00: Reserved for future expansion
Purpose:
Contains metadata pertinent to each Title, such as whether or not it is
marked as a "favorite", or how many times it has been run.
Implementation Notes:
Metadata is OR'd into a byte or AND'd out of a byte as necessary. Data is
stored in TITLES.DAT which is flushed to disk before every execution operation,
or at normal program exit.
=====================================================================
Colors spefication:
Array of unsigned 8-bit byte values, each of which have the following
structure:
76543210
bbbb - background text color
ffff - foreground text color
UI attributes to be specified at a later date.
=====================================================================
Title audit log (issue #27):
Two data files make up TDL's auditing function:
C:\ACTIVITY.DAT:
activity: One byte for every minute of sampled activity:
FF = Start of file
00 = no activity for that minute
01 = activity detected for that minute
AUDIT.DAT:
REPEAT (This structure repeats for as many audit entries are in the log)
titleID: 16-bit word
titleHash: 16 bytes of the MD5 hash of the title string
(future use; could be used to migrate audit entries)
startTime: 12-byte Pascal DateTime record (16-bit unsigned words):
DateTime = record
Year,Month,Day,Hour,Min,Sec: Word;
end;
endTime: (same format as startTime)
minsActive: 16-bit unsigned word of minutes (from ACTIVITY.DAT)
If 0, entry is unconfirmed; if non-zero, confirmed
END
Purpose: (From github MobyGamer/total-dos-launcher/Issue #27)
TDL is in use at various vintage computing festivals and conventions. To
better understand the needs of the public using TDL at these settings, TDL
should be able to provide an audit report after the event. Auditing would
collect the following information:
- timestamp of when a program was launched
- timestamp of when a program returned to TDL, including:
- Confirmed (program exited normally and returned to TDL)
- Unconfirmed (system was rebooted, TDL crashed)
- Each minute the program was actively in use
_(Program activity will be collected by a TSR loaded before TDL. Program
activity is not possible to collect reliably, but one possible implementation
could involve hooking the system timer 18Hz interrupt to wake up once a minute
to check keyboard buffer contents and mouse X/Y location, and if either are
changed from the last sample, note it as active. Joystick activity collection
is not possible because joystick port reads are done with interrupts disabled,
although joystick buttons can be read.
As for recording this information, it must be done by a TSR loaded before TDL,
as TDL cannot install any resident code due to its use of swap. The TSR will
comunicate via the multiplex interrupt, and hook the keyboard and mouse
interrupts.)_
An audit report can then be generated after the session is over, and would
display the following:
- All programs launched, sorted by launch frequency (ie. # of times launched)
- All programs launched, sorted by program duration in minutes
- A human-readable log of the entire session (program, start time, end time,
active minutes)
_Note: It is implied that TDL will be in the system's C:\AUTOEXEC.BAT for this
to provide meaningful results. Also, if the system date and time are not set
properly at every system boot from a battery-backed clock, only the launch
counts and activity duration will be accurate._
Implementation:
- Auditing will be enabled via AUDITING=ON in TDL.INI
- AUDIT.DAT with titleID, titleHash, start time, end time, number of minutes
active, confirmed
- New entry added as a game is launched, and updated upon return or a new
start of TDL
- An end time equal to the start time means the end time was never recorded
and must be adjusted; this would be the case if the user rebooted the
computer instead of exiting a program cleanly.
- A number of minutes equal to $FFFF means program activity could not be
determined
- C:\ACTIVITY.DAT with the activity log, one byte per minute
- Is recreated on every new program launch
- 0=not active, 1=active, FF=start of collection, any other value is ignored
- Does nothing if INDOS is set, to prevent crashes
- TDL -s to generate a summary report of whatever is in the AUDIT.DAT