-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathFEED.ME
423 lines (323 loc) · 19 KB
/
FEED.ME
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
PREPARING DATA FOR MAPMAKER VERSION 3.0 AND MAPMAKER/QTL VERSION 1.1
(c) Copyright 1992 Whitehead Institute for Biomedical Research
Data files are fully compatible between MAPMAKER Version 3.0 and MAPMAKER/QTL
version 1.1, providing a unified package for mapping both genetic markers as
well as factors controlling quantitative traits in the same populations.
MAPMAKER Version 3.0 can analyze data derived from progeny of several types of
crosses, including:
F2 backcross (e.g. BC1)
F2 intercross
F3 intercross (by self-mating)
Recombinant Inbred Lines (by self or sib-mating)
There are also other types of crosses for which MAPMAKER and MAPMAKER/QTL can
be used, because the genetic model for the cross is identical to that of one of
the above simple crosses. For example, F2 testcrosses and F1 haploid data can
be used, as described below.
Unlike MAPMAKER however, MAPMAKER/QTL can currently only work with F2
intercross and backcross data. The two programs handle loading and preparing
data files in similar ways, and share files which hold intermediate results.
To get your data into MAPMAKER and MAPMAKER/QTL, the data must first be placed
into a 'raw' file in an appropriate format. You can either maintain your data
in this format, or instead extract it from your working database (such as a
spreadsheet program). MAPMAKER (not MAPMAKER/QTL) must then be used to
'prepare' these files into a processed form ready for analysis, these processed
files are then loadable by either MAPMAKER or MAPMAKER/QTL. These issues are
the topic of the next sections.
SETTING UP A RAW DATA FILE
Raw files are flat ASCII text files which may be generated in many ways,
including: (i) any simple text editor, such as DOS Edit, A/UX's Text Editor, or
Sun's OpenWindows Text Editor; (ii) a word-processor which can export text-only
files; (iii) a spreadsheet or flat-file database which can export "Text Only"
files, such as Excel, Lotus 1-2-3, or FileMaker; or (iv) a program which you
write yourself. The raw data does not need to be stored on the same machine as
that you run MAPMAKER on, although you obviously will need some way of
transferring the data (Bear in mind that text-only formats are very slightly
different on Unix, DOS, and Macintosh -- your software should convert the file
appropriately as it is trasferred. Ask your computer support people for
details.)
As a general note, MAPMAKER attempts to be very lienient about how you separate
items in a data file (e.g. spaces, tabs, or sometimes line breaks), and is
generally insenitive to extra spaces, uppercase-lowercase distinctions, and
(after the top two lines) blank lines. However, it is still possible to format
ia file in such a way as it confuses MAPMAKER -- if you have trouble, try to
make your MAPMAKER file look more like the sample file, included.
The very first line of your raw data file should read like:
data type xxxx
where xxxx is one of the allowed data types, either:
f2 intercross
f2 backcross
f3 self
ri self
ri sib
The second line of the raw file should contain a list of three numbers,
separated by spaces, such as:
46 362 2
The first of these values indicates the number of progeny for which data are
included in the file (in this case, 46). The second indicates the number of
genetic loci for which data are supplied (362). The third indicates the number
of quantitative traits in the data set (here 2, although this may be zero, of
course).
Additional information may be optionally supplied at the end of this line. In
particular, you may specify the coding scheme you use for genotypes. By
default, the codes used for F2 backcross (a.k.a. BC1) data are:
'A' Homozygote for the recurrent parent genotype.
'H' Heterozygote.
'-' Missing data for the individual at this locus.
For F2 intercross data, the default codes are:
'A' Homozygote for the allele from parental strain a of this locus.
'B' Homozygote for the allele from parental strain b of this locus.
'H' Heterozygote carrying both alleles a and b.
'C' Not a homozygote for allele a (either bb or ab genotype.)
'D' Not a homozygote for allele b (either aa or ab genotype.)
'-' Missing data for the individual at this locus
For RI data, the default codes are:
'A' Homozygote for parental genotype a.
'B' Homozygote for parental genotype b.
'-' Missing data for the individual (or line) at this locus.
Also by default, MAPMAKER will match genotype characters in a case-insensitive
manner (that is 'a' and 'A' indicate the same genotypes).
Howver, you can tell MAPMAKER to use whatever conventions you like, so long as
you use the same conventions for the entire data file. First off, if you follow
the numbers on the second line with the word "case", then MAPMAKER will match
genotype characters in a case sensitive manner (that is 'a' and 'A' can be used
to indicate different genotypes). For example:
46 362 2 case
If you do not wish to use case-sensitive genotypes, do not include the word
"case".
To specify the coding scheme itself, include on the end of the above line the
word "symbols" followed by the coding scheme you wish to use, defined in terms
of the coding scheme above. For example, if you wish to use the following
scheme with an RI data set:
'1' Homozygote for parental genotype a.
'2' Homozygote for parental genotype b.
'0' Missing data for the individual (or line) at this locus.
then you would use a second line like:
46 362 2 symbols 1=A 2=B 0=-
Note that when interpreting this line, MAPMAKER is in fact quite finickey about
spaces and case distinctions (in order to keep MAPMAKER from ever
misunderstanding exactly what you mean). In particular, NO SPACES should
surround the "=" signs.
To use with a backross data set the scheme:
'a' Homozygote for parental genotype a.
'A' Heterozygote.
'-' Missing data for the individual (or line) at this locus.
you should use a line like:
46 362 2 case symbols a=A A=H
The main restriction on coding schemes are that the only allowed symbols are
letters, numbers, and the characters '-' and '+'.
After the first two header lines, the raw file should then present the genetic
locus data, in the following simple format: For each locus, you list (1) the
name of the locus, preceded by an asterisk ("*"); (2) one or more spaces (or
tabs etc.); and (3) the genotypic data for all individuals, in order. For
example:
*locus1 BA-HHHAAABBB-HHAA
would provide data for a locus named "locus1" with individual #1 having the B
genotype, individual #2 having the A genotype, and so forth. Data for each new
locus should begin on a new line (with blank lines allowed), although the
genetic data for any one locus may be "broken" by any number of spaces, tabs,
and line breaks. This means that, among other things, tab-delimited-text files
(such as those often exported by spreadsheet programs) will work well, for
example:
*L2 B A - H H H A A A B B B - H
There is a system-dependednt maximum line length, although it is fairly large
(at least 1,000 characters, where a tab counts as one character).
Locus names should be kept to at most 8 characters, and must be limited to
alphabetic and numeric characters, along with the underscore character ('_')
and periods ('.'). No other characters are allowed (although any dashes in
locus names ('-') will be converted to underscores). Locus names must start
with a alphabetic character (so that they are not confused with locus numbers
in MAPMAKER sequences).
Any quantitative trait data should come after the genetic locus data. These
data follow a similar format, except that the trait values for each individual
must be separated by at least one space, tab, or line break. A dash ('-') alone
indicates missing data. For example:
*weight 6.3 7.7 8.0 6.2 8.6 - 7.5 9.0 5.5 - - 8.4 7.7 7.4 6.9 -
would correspond to a trait named "weight", for which individual #1 has a value
of 6.3, individual #2 has a value of 7.7, and so on. The sixth individual is
missing data for this trait (and will be ignored for all analyses involving
these trait data). As for the genotypes, a new trait should begin on a new
line, and line breaks are allowed. Tab-delimted-text files work well here too.
Traits may also be specified as functions of other existing trait data. For
example:
*weight1 6.3 7.7 8.0 6.2 8.6 6.9 7.5 9.0
*weight2 6.7 7.9 7.5 6.8 8.0 7.3 7.5 9.5
*mean= (weight1 + weight2)/2
The format of these equations is described under the "make trait" command. Such
traits must be included in the number of traits indicated on the file's second
line.
Note that genetic maps (particularly for MAPMAKER/QTL) are no longer included
in the raw file, as they were with MAPMAKER Version 2.0. Instead, use a ".prep"
initialization file, described below.
Finally, note that comments may be inserted on any line starting with a number
sign character ("#").
An example of a complete raw file follows:
data type f2 intercross
20 5 2
# Joe's tiny data set, 10/21 version.
*locus1 BBBHH-AAABBBHHH-AABA
*locus2 AB-ABHABHAB-ABHABHBH
*locus3 ABBAHHHBHABHABHBBHH-
# Locus3 may be mis-scored in individual 12!
*locus4 ABHABAAAHAB-ABHABHHB
*locus5 ABHABHAA-ABHABHAHHHB
*trait1 6.3 7.7 8.0 6.2 8.8 6.2 4.1 6.5 5.4 7.3
8.7 9.0 5.2 6.8 7.2 7.1 7.6 8.3 8.1 7.5
*trait2 5.5 5.5 5.5 4.5 4.5 4.5 3.5 3.5 3.5 -
5.5 5.5 4.5 4.5 4.5 3.5 5.2 6.8 7.2 7.1
PREPARING A RAW DATA FILE FOR ANALYSIS
Once your data are in the raw file format, it is easy to process them into a
form usable by MAPMAKER Version 3.0 and MAPMAKER/QTL 1.1. In this version of
the programs, you must do this processing using MAPMAKER's "prepare data"
command (you can not presently prepare a raw file using MAPMAKER/QTL).
Simply put, the "prepare data" command loads the information in your raw data
file into MAPMAKER. Unless told otherwise (see below), MAPMAKER then writes
some new files which are in a slightly different format (you should not ever
modify these files, and thus you should not be concerned about precisely what
this format is.) Your raw file remains unaltered and should be saved as a
backup copy of your data. These new files will serve as the working data set
for MAPMAKER and MAPMAKER/QTL -- both programs will read and write these files
repeatedly to keep the state of your analyses between sessions.
In the process of preparing data, MAPMAKER loads the new data set into its
memory, which is then ready for analysis (earlier versions of MAPMAKER required
you to separately load a data file after it is prepared, this is no longer the
case.)
The first files generated get the extensions ".data", ".maps", and ".traits"
(truncated on DOS systems to ".dat", ".map", and ".tra"). The ".data" file
contains the genetic locus data. The ".maps" file contains saved mapping
results along with some MAPMAKER specific information. The ".traits" file
contains the quantitative trait data and several MAPMAKER/QTL specific values.
Other files may also be created while you use MAPMAKER and MAPMAKER/QTL --
these include ".2pt" and ".3pt" files containing MAPMAKER's two-point and
three-point data respectively, and a ".qtls" file (".qtl" on DOS) containing
save results from MAPMAKER/QTL.
To prepare a raw file, simply start up MAPMAKER, and type the command:
prepare data xxxx
where xxxx is the name of the raw file (with its extension, if it has one). We
recommend that raw files use the extension ".raw", although this is not
required. For example:
prepare data mydata.raw
If you specify a directory for the file name, the prepared files will be placed
in that directory also.
You may now start analyzing your data using any of MAPMAKER's commands. When
you later quit MAPMAKER (or use the "save" command), the files will be updated.
Later, you may resume your analyses by restarting MAPMAKER and re-loading these
files using the "load data" command. For example:
load data mydata
USING AN INITIALIZATION (.PREP) FILE
Whenever you issue the "prepare data" command, MAPMAKER looks for a file with
the same name as the raw data file and the extension ".prep" (on UNIX,
truncated to ".pre" on DOS). If this file is present, it is assumed to contain
MAPMAKER commands, which are automatically executed after the data are
prepared. These "initialization files" serve as a useful way to setup MAPMAKER
in the appropriate state for working with a particular data set. With an
initialization file, every time that data set is prepared (e.g. if you change
genotype data), it is relatively easy to start again where you left off.
When a initialization file is not found, MAPMAKER's default initialization
action is simply to save the working data files (as if the "save data" command
had been typed). When a initialization file is found, MAPMAKER executes these
commands INSTEAD. Thus, if you want MAPMAKER to save the files, you should end
your initialization file with a "save data" command.
Typical actions in an initialization file might be to:
- set various MAPMAKER options or parameters
- declare the names of chromosomes, classes, anchor loci, etc
- set the framework orders of chromosomes, particularly for MAPMAKER/QTL
- precompute two-point data and find linkage groups
- set various named sequences
To load a data set into MAPMAKER/QTL, you need to provide "framework" maps for
any chromosome you wish to scan. When you know a map order for some
chromosomes, it is often convenient to place this in a initialization file in
order to quickly have a data set ready for MAPMAKER/QTL.
If you wish MAPMAKER to calculate the map distances, you can do this with
commands like:
make chromosome chrom2
sequence R45S TG165 TG175 CD35 TG93 CD66 TG50B
framework chrom2
To provide map distances yourself, use a sequence with fixed distances using
MAPMAKER's "=" syntax:
seq R45S =21.9 TG165 =20.7 TG175 =4.4 CD35 =13.2 TG93 =7.3 CD66 =13.6 TG50B
See the discussion of the "sequence" command in the MAPMAKER reference manual
for details. Note that the above map distances would be assumed to be in
centimorgans, using the specified "centimorgan function" (by default, the
Haldane function). Naturally, you do not NEED to declare the map orders in an
initialization file to use MAPMAKER/QTL -- you may issue the same commands
interactively before saving the data and then run MAPMAKLER/QTL.
A sample ".prep" file might be:
units cm
cent func kosambi
make chrom chrom1 chrom2 chrom3
seq 1
anchor chrom1
seq 4
anchor chrom2
seq 13
anchor chrom3
error det on
seq all
error prob 0.5
two point
assign
seq R45S TG165 TG175 CD35 TG93 CD66 TG50B
frame chrom2
save
(note the use of command abbreviations here). Another exmaple of a ".prep"
file is supplied with the sample data files included with MAPMAKER.
USING OTHER TYPES OF CROSSES AND MARKERS
MAPMAKER's linkage analysis mechanism is quite general, and in fact can analyze
many varied sorts of data.
Fort example, one frequently asked question concerns multibanded markers, such
as cDNA RFLPs and RAPDs, particularly in an F2 intercross. In this case, each
band of the marker can be considered a dominant trait, and can be entered using
the C and D notation described above. However, some of the bands may be
allelic, in which case you would gain much power by recoding them as a
codominant (A/B/H) marker. This can be done two ways: either (1) enter each
band as a +/- marker, and perform an initial linkage analysis looking for
markers that are recombinationally unseparated and which map together. Recode
these as a codominant locus. Alternatively (2), you may be able to use
MAPMAKER's "join haplotypes" feature, discussed in the referencs manual.
To enter data for other types of crosses, you need to determine whether the
cross genetically resembles one MAPMAKER already understands, in terms of the
underlying genetic model, or whether it one of MAPMAKER's models will provide a
reasonble interpretation (modulo some scaling of likelihoods and distances).
As a simple example, consider an F2 testcross, which is much like a backcross
except that we have:
(a|a x b|b) x c|c
in which case the observable F2 genotypes are a|c and b|c. To code this as a
backcross, simply designate one parent's genotype (a or b) as 'A', the other as
'H', and enter the data with this coding in the normal way. NAPMAKER's
underlying genetic model will be exactly correct and the LOD scores and
distances will be correct. Be careful however with +/- markers (such as RAPDs)
to get the parental genotype assignments (a allele vs. b allele) correct!
As another example, imagine F1 haploids of an outbred species, again encoding
the data as a simple backcross. For example, if we cross:
a|b x c|d
then the observable haploid genotypes at any locus are: a, b, c, and d. If
linkage phase is known (that is if we know which chromosome a and b are on, and
which c and d are on, and we can keep this assignment consistent accross the
entire data set), then the case is easy: Arbitrarily designate one backcross
class (say 'A') as "a or c", the other ('H') as "b or d", and enter the data
with this coding in the normal way -- NAPMAKER's underlying genetic model again
will be exactly correct and the LOD scores and distances will be correct.
Problems arise when true genotypic classes cannot be distinguished, or
(equivalently) when linkage-phase is not known beforehand, as may be the case
with RAPDs and similar markers. In such cases, your only recourse may be to
perform a segragation analysis on the observed genotypes to determine probable
phase assignments, and then code the data as phase known. Other methods may be
available: contact us for details.
COMPATIBILITY WITH PREVIOUS VERSIONS OF MAPMAKER
Users of MAPMAKER version 2.0 (a.k.a. 1.9) will have little trouble getting
their data into MAPMAKER version 3.0, because the file formats are virtually
identical. The only slight difference is in the format of the second line of
the data file header, as described above.
Users of MAPMAKER/QTL version 1.0 (a.k.a. 0.9) however, will have to slightly
modify their files in the way that chromosome orders and maps are included. The
format described above makes this very convenient for the majority of users who
will compute maps in MAPMAKER and then load these results into MAPMAKER/QTL.
Users of MAPMAKER Version 1.0, or MAPMAKER for Macintosh (a.k.a. MAPMAKER-II)
will need to do a little more work, because of both the slightly different
header and the required asterixes before locus names, as described above. F2
backcross data sets, entered into old versions of MAPMAKER as intercrosses,
should in fact be analyzed as true backcrosses in the new version (luckily,
MAPMAKER 3.0's ability to use arbitrary genotype coding schemes, described
above, insures that you will NOT have to retype all of your genotype data into
MAPMAKER.)
Ver 3b: S. Lincoln 12/92