-
Notifications
You must be signed in to change notification settings - Fork 113
Analyze File
The file analyze.cfg is used to setup Avida when it is run in analyze mode, which can be done by running avida -a. Analyze mode is useful for performing additional tests on genotypes after a run has completed.
This analysis language is basically a simple programming language. The structure of a program involves loading in genotypes in one or more batches, and then either manipulating single batches, or doing comparisons between batches. Currently there can be up to 2000 batches of genotypes, but we will eventually remove this limit.
The rest of this file describes how individual commands work, as well as some notes on other languages features, like how to use variables. As a formatting guide, command arguments will be presented between brackets, such as [filename]. If that argument is mandatory, it will be in blue. If it is optional, it will be in green, and (if relevant) a default value will be listed, such as [filename='output.dat'].
Analyze mode provides a number of commands for loading, manipulating, and saving analysis data. In addition to the analyze mode specific commands detailed in the following sections, all of the Avida actions can be called as well.
There are currently four ways to load in genotypes:
- LOAD_ORGANISM [filename]
- Load in a normal single-organism file of the type that is output from Avida. These consist of lots of organismal information inside of comments, and then the full genome of the organism with one instruction per line.
- LOAD [filename]
- Load in a file that contains a list of genotypes, one-per-line with additional informaiton about those genotypes. Avida now includes a header on such files indicating the values containted in each column.
- LOAD_SEQUENCE [sequence]
- Load in a user-provided sequence as the genotype. Avida has a symbol associated with each instruction; this command is simply followed by a sequence of such symbols that is than translated back into a proper genotype.
- LOAD_MULTI_DETAIL [start-UD] [step-UD] [stop-UD] [dir='./'] [start batch=0]
-
Allows the user to load in multiple detail files at once, one per
batch. This is helpful when you're trying to do parallel analysis
on many detail files, or else to create a phylogenetic depth map.
Example:LOAD_MULTI_DETAIL 100 100 100000 ../my_run/run100/
This would load in the files detail_pop.100 through detail_pop.100000 in steps of 100, from the directory of my choosing. Since 1000 files will be loaded and we didn't specify starting batch, they will be put in batches 0 through 999.
A future addition to this list is a command that will use the "dominant.dat" file to identify all of the dominant genotypes from a run, and then lookup and load their individual genomes from the archive directory.
All of the load commands place the new genotypes into the current batch,
which can be set with the SET_BATCH
command. Below is the list of control
functions that allow you to manipulate the batches.
- SET_BATCH [id]
- Set the batch that is currently active; the initial active batch at the start of a program is 0.
- NAME_BATCH [name]
- Attach a name to the current batch. Some of the printing methods will print data from multiple batches, and we want the data from each batch to be attached to a meaningful identifier.
- PURGE_BATCH [id=current]
- Remove all genotypes in the specified batch (if no argument is given, the current batch is purged.
- DUPLICATE [id1] [id2=current]
- Copy the genotypes from batch id1 into id2. By default, copy id1 into the current batch. Note that duplicate is non-destructive so you should purge the target batch first if you don't want to just add more genotypes to the ones already in that batch.
- STATUS
- Print out (to the screen) the genotype count of each non-empty batch and identify the currently active batch.
There are several other commands that will allow you to interact with the analysis mode in some very important ways, but don't actually trigger any analysis tests or output. Below are a list of some of the more important control commands.
- SYSTEM [command]
- Run the command listed on the command line. This is particularly useful if you need to unzip files before you can use them, or if you want to delete files no longer in use.
- CLOSE_FILE [filename] Close a file. This is useful for instances where your script creates lots of unique files using command that support reentrant usage, such as DETAIL_AVERAGE for example.
- INCLUDE [filename]
- Include another file into this one and run its contents immediately. This is useful if you have some pre-written routines that you want to have available in several analysis files. Watch out because there are currently no protections against circular includes.
- INTERACTIVE
- Place Avida analysis into interactive mode so that you can type commands have have them immediately acted upon. You can place this anywhere within the analyze file, so that you can have some processing done before interactive mode starts. You can type quit at any point to continue with the normal processing of the file.
- DEBUG [message]
- ECHO [message]
- These are both echo commands that will print a message (the arguments given) onto the screen. If there are any variables (see below) in the message, they will be translated before printing, so this is a good way of debugging your programs.
Now that we know how to interact with analysis mode, and load in genotypes, its important to be able to manipulate them. The next batch of commands will do basic analysis on genotypes, and allow the user to prune batches to only include those genotypes that are needed.
- RECALCULATE [use_resources=0] [update=-1] [use_random_inputs=0] [env_input.1 env_input.2 env_input.3]
- Run all of the genotypes in the current batch through a test CPU and record the measurements taken (fitness, gestation time, etc.). This overrides any values that may have been loaded in with the genotypes. The use_resources flags signifies whether or not the test cpu will use resources when it runs. If resources are used, the update parameter allows setting resource values from a specific time point in the resource list. For more information on resources, see the summary below. If the use_random_inputs flag is set, then organisms will be provided with new, random input strings for each trace as they would experience during an actual Avida run. By default, the same inputs are provided every time to organisms in analysis mode. If additional arguments are specified after use_random_input's value, these integers will be used as environmental inputs for the genotype's test cpu recalculation. Manually specified test cpu inputs must conform to the pseudo-random formatting described in cEnvironment::SetupInputs. Phenotypic plasticity information is not available from the RECALCULATE command; use RECALC with num_trials X, where X is greater than 1, for phenotypic plasticity statistics.
- RECALC [use_resources] [use_random_inputs] [update N (N=-1)] [use_manual_inputs input.1 input.2 input.3] [num_trials T (T=1)]
-
This command will perform the same operations as RECALCULATE but has a few additional features.
Instead of having a specified ordering for inputs to this command, argument order does not matter.
To use resources, for instance, use the flag use_resources following RECALC.
Arguments with parameters must have the values specified immediately after them. For instance
RECALC update 10 use_resources use_manual_inputs 256948023 870730840 1441302276
will set the update to 10, request the use of resources, and test the genotypes using the inputs 256948023, 870730840, and 1441302276. Manually specified environment inputs must conform to the pseudo-random numbers as described in cEnvironment::SetupInputs. Typically, use_manual_inputs will override use_random_inputs, however if num_trials is set to greater than one, manual input specficiation will be overriden and random inputs will be used to gather phenotypic plasticity information that will be available for genotype statistics at the bottom of this document. Please note that phenotypic plasticity analysis perfoemd by using RECALC will reset genotype statistics to the values for the most likely phenotype. Implicit phenotypic plasticity analysis (e.g. by not calling RECALC or calling RECALC with num_trials 1) will not re-evaluate the genotype statistics in this manner and instead rely on the initial values or those values from a single recalculation. - FILTER [stat] [relation] [test_value] [batch=current]
-
Perform the given test on all genotypes and Remove all those that do not pass. Stat indicates which
metric you want to compare, relation is the test to perform (==, !=, <, >, <=, or >=), and
test_value is the value to compare it to. For example
FILTER fitness >= 1.5
will save only those genotypes with a fitness greater than or equal to 1.5. Set the section on Genotype Statistics for more information on what keywords can be used here. - FIND_GENOTYPE [type='num_cpus' ...]
- Remove all genotypes but the one selected. Type indicates which genotype to choose. Options available for type are num_cpus (to choose the genotype with the maximum organismal abundance at time of printing), total_cpus (number of organisms ever of this genotype), fitness, length, or merit. If a the type entered is numerical, it is used as an id number to indicate the desired genotype (if no such id exists, a warning will be given). Multiple arguments can be given to this command, in which case all those genotypes in that list will be preserved and the remainder deleted. If no argument is passed for type, it uses max num_cpus as default.
- FIND_ORGANISM [random]
- Picks out a random organism from the population and removes all others. It is different from FIND_GENOTYPE because it takes into account relative number of organisms within each genotype. To pick more than one organisms, list the word 'random' multiple times. This is essentially sampling without replacement from the population.
- FIND_LINEAGE [type="num_cpus"]
- Delete everything except the lineage from the chosen genotype back to the most distant ancestor available. This command will only function properly if parental information was loaded in with the genotypes. Type is the same as the FIND_GENOTYPE command.
- FIND_SEX_LINEAGE [type="num_cpus"] [parent_method="rec_region_size"]
- Delete everything except the lineage from the chosen genotype back to the most distant ancestor available. Similar to FIND_LINEAGE but works in sexual populations. To simplify things, only maternal lineage plus immediate fathers are saved, i.e. info about father's parents is discarded. The second option, parent_method, determines which parent is considered the 'mother' in each particular recombination. If parent_method is "rec_region_size" : 'mother' is the parent contributing more code to the offspring genome (default); if it's genome_size, 'mother' is the parent with the longer genome, no matter how much of it was contributed to the offspring. This command will only function properly if parental information was loaded in with the genotypes. Type is the same as the FIND_GENOTYPE command.
- ALIGN
- Create an alignment of all the genome's sequences; It will place '_'s in the sequences to show the alignment. Note that a FIND_LINEAGE must first be run on the batch in order for the alignment to be possible.
- SAMPLE_ORGANISMS [fraction] [test_viable=0]
- Keep only fraction of organisms in the current batch. This is done per organism, not per genotype. Thus, genotypes of high abundance may only have their abundance lowered, while genotypes of abundance 1 will either stay or be removed entirely. If test_viable is set to 1, sample only from the viable organisms.
- SAMPLE_GENOTYPES [fraction] [test_viable=0]
- Keep only fraction of genotypes in the current batch. If test_viable is set to 1, sample only from the viable genotypes.
- RENAME [start_id=0]
- Change the id numbers of all the genotypes to start at a given value. Often in long runs we will be dealing with ID's in the millions. In particular, after reducing a batch to a lineage, we will often want to number the genotypes in order from the ancestor to the final one.
Next, we are going to look at the standard output commands that will used to save information generated in analyze mode.
- PRINT [dir='archive/'] [filename]
- Print the genotypes from the current batch as individual files (one genotype per file) in the directory given. If no filename is specified, the files will be named by the genotype name, with a .gen appended to them. Specifying the filename is useful when printing a single genotype.
- TRACE [dir='archive/'] [ use_resources=0] [update=-1] [ use_random_inputs=0] [env_input.1 env_input.2 env_input.3]
- Trace all of the genotypes and print a listing of their execution. This will show step-by-step the status of all of the CPU components and the genome during the course of the execution. The filename used for each trace will be the genotype's name with a .trace appended. The use resources flag signifies whether or not the test cpu will use resources when it runs. If resources are used, the update parameter allows setting resource values from a specific time point in the resource list. For more information on resources, see the summary below. If the use_random_inputs flag is set, then organisms will be provided with new, random input strings for each trace as they would experience during an actual Avida run. By default, the same inputs are provided every time to organisms in analysis mode. You can manually specify environmental inputs by setting use_random_inputs to 0 setting env_input.X values. Manually specified environment inputs must conform to the pseudo-random numbers as described in cEnvironment::SetupInputs.
- PRINT_TASKS [file='tasks.dat']
- This will print out the tasks doable by each genotype, one per line in the output file specified. Note that this information must either have been loaded in, or a RECALCULATE (or RECALC) must have been run to collect it.
- PRINT_PHENOTYPES [file='phenotype.dat'] [total_task_count] [total_task_performance_count]
- Prints phenotypes in the current batch as determined by task-signature. Default statistics for each phenotype are number of organisms, number of genotypes, average genome length, average gestation time, viability, and tasks performed. Using the total_task_count and/or total_task_performance_count flags will add that statistic to the output. (In the case of total_task_performance_count, this will be an average over the members of the phenotype.)
- DETAIL [file='detail.dat'] [format ...]
- Print out all of the stats for each genotype, one per line. The format indicates the layout of columns in the file. If the filename specified ends in .html, html formatting will be used instead of plain text. For the format, see the section on Genotype Statistics below.
- DETAIL_TIMELINE [file='detail_timeline.dat'] [time_step=100] [max_time=100000]
- Details a time-sequence of dump files.
- DETAIL_BATCHES [file='detail_baches.dat'] [format ...]
- Details all batches.
- DETAIL_INDEX [file] [min_batch] [max_batch] [format ...]
- Detail all the batches between min_batch and max_batch.
- DETAIL_AVERAGE [file="detail.dat"] [format ...]
- Detail the current batch, but print out the average for each argument, as opposed to the individual values for each genotype, the way DETAIL would. Arguments are the same as for DETAIL. it takes into account the relative abundance of each genotype in the batch when calculating the averages.
And at last, we have the actual analysis commands that perform tests on the data and output the results.
- ANALYZE_EPISTASIS [file='epistasis.dat'] [num_test=(all)]
- For each genotype in the current batch, test possible double mutatants, and single mutations composing them; print both of individual relative fitnesses and the double mutant relative fitness. By default all double mutants are tested. If in a hurry, specify the number to be tested.
- MAP_TASKS [dir="phenotype/"] [flags ...] [format ...]
- Construct a genotype-phenotype array for each genotype in the current batch. The format is the list of stats that you want to include as columns in the array (see Genotype Statistics for more info). Additionally you can have special format flags; the possible flags are 'html' to print output in HTML format, and 'link_maps' to create html links between consecutive genotypes in a lineage. The flag 'use_manual_inputs input.1 input.2 input.3' where input.X are integers allow for designated environmental inputs to be used when evaluating the genotype-phenotype mapping. Manually specified environment inputs must conform to the pseudo-random numbers as described in cEnvironment::SetupInputs.
- MAP_MUTATIONS [dir="mutations/"] [flags ...]
- Construct a genome-mutation array for each genotype in the current batch. The format has each line in the genome as a row in the chart, and all available instructions representing the columns. The cells in the chart indicate the fitness were a mutation to occur at the position in the matrix, to the listed instruction. If the 'html' flag is used, the charts will be output in HTML format.
- MAP_DEPTH [filename='depth_map.dat'] [min_batch=0] [max_batch=cur_batch-1]
- This will create a depth map (like those we use for phylogeny visualization) in the filename specified. You can direct which batches to take this from, but by default it will work perfectly after a LOAD_MULTI_DETAIL.
- AVERAGE_MODULATITY [file='modularity.dat'] [task.0 task.1 task.2 task.3 task.4 task.5 task.6 task.7 task.8]
-
Calculate several modularity measuers, such as how many tasks is an
instruction involved in, number of sites required for each task, etc.
The measures are averaged over all the organisms in the current batch
that perform any tasks. For the full output list, do
AVERAGE_MODULATITY legend.dat
At the moment doesn't support html output format and works with only 1 and 2 input tasks. - HAMMING [file="hamming.dat"] [b1=current] [b2=b1]
- Calculate the hamming distance between batches b1 and b2. If only one batch is given, calculations are on all pairs within that batch.
- LEVENSTEIN [file='lev.dat'] [batch1] [b2=b1]
- Calculate the levenstein distance (edit distance) between batches b1 and b2. This metric is similar to hamming distance, but calculates the minimum number of single insertions, deletions, and mutations to move from one sequence to the other.
- SPECIES [file='species.dat'] [batch1] [batch2] [num_recombinants]
-
Calculates the percentage of non-viable recombinants between all
pairs of organisms from batches 1 and 2. Number of random recombination
events for each pair of organisms is specified by num_recombinants.
Recombination is done in the same way as in the birth chamber when
divide-sex is executed.
Output: Batch1Name Batch2Name AveDistance Count FailCount - RECOMBINE [batch1] [batch2] [batch3] [num_recombinants]
- Similar to Species command, but instead of calculating things on the spot, just create all the recombinant genotypes using organisms from baches 1 and 2 and put them in the batch3.
- ANALYZE_REDUNDANCY_BY_INST_FAILURE
- Determine redundancy by calculating the percentage of the lifetimes where fitness is decreased over a range of instruction failure probabilities.
- ANALYZE_COMPLEXITY [mut_rate] [directory] [useResources] [batchFrequency]
- Loops through each genotype in the batch and tests the fitness of each single site mutant. useResources is a flag to set whether the testCPU should use resources when testing the mutants. Calculates probabilities at mutation selection balance, normalizes fitness values, and calculates and outputs complexity based on entropy.
- ANALYZE_LINEAGE_COMPLEXITY
- Loops through each genotype in the batch and calculates the number of positive and neutral mutations for single and double mutations. Calculates entropy as (log(pos_neut_mut/( pow((double)num_insts,(double)2)*(gen_length)*(gen_length-1)*(0.5))) / log(num_insts) where num_insts is the number of instructions in the instruction set. Calculates and outputs complexity as gen_length-entropy.
- ANALYZE_KNOCKOUTS [file_name] [max_knockouts]
- Loops through all genomes in batch and tests the removal of each instruction (-2=lethal, -1=detrimental, 0=neutral, 1=beneficial). If max_knockouts is more than one, also tests pairs of knockouts. If both individual knockouts are both harmful, but in combination they are neutral or even beneficial, they should not count as information. If the individual knockouts are both neutral (or beneficial?), but in combination they are harmful, they are likely redundant to each other. For now, count them both as information. Outputs the counts of each type of instruction.
- GET_SKELETONS [max_knockouts=2]
- Similar to analyze_knockouts, however instead of just counting the number of non-informative sites, it removes completely non-informative (i.e. neutral) sites, and replaces sites that are informative only as placeholders with the NULL instruction. This creates genotype "skeletons". If max_knockouts is 1, it only tests single knockouts. If max_knockouts is more than 1 (default), it tests double knockouts as well (such as inc and then dec which are only neutral when removed together). (Development note: GET_SKELETONS can be called in analyze mode to skeletonize the current batch, but if you're building a more complicated command that involves skeletons, take a look at helper function with the same name that returns a vector of skeletons.)
- ANALYZE_POP_COMPLEXITY
- Loops through all genotypes in batch and outputs the complexity of each genotype as 1 - entropy.
- ANALYZE_NEWINFO [mutation_rate] [directory]
- Only works for fixed length runs and requires lineage in the batch. Calculates the information of each organism and its parent about the environment and finds if there is a gain or decrease of information. Compares parent and child information at each instruction site to determine if information has been gained or lost. Calculates information the same as ANALYZE_COMPLEXITY.
- ANALYZE_MUTATION_TRACEBACK
- Works best on lineages and requires fixed length genomes. Loops through each genotype to check for mutations from the previous genotype in the lineage and then tests if those mutations are currently adaptive. Prints out the number of beneficial, neutral, detrimental and static sites at the given lineage depth.
- ANALYZE_COMPLEXITY_DELTA
- This command will examine the current population, and sample mutations to see what the distribution of complexity changes is. Only genotypes with a certain abundance (default=3) will be tested to make sure that the organism didn't already have hidden complexity due to a downward step.
These commands build input files for Avida, using the capabilities of analyze mode to automate some tedium.
- WRITE_CLONE [file='clone.dat'] [num_cells=-1]
-
Creates a clone population file (as SaveClone)
from the current batch, suitable for loading with LoadClone.
The starting update is 0 and the archive is empty. num_cells should be the number
of cells in the world -- the default value of -1 will not be accepted by LoadClone,
so be sure to specify the correct number.
Warning: Unlike SaveClone, WRITE_CLONE does not preserve location. Any spatial structure the population had is lost. - WRITE_INJECT_EVENTS [file='events_inj.cfg'] [start_cell=0] [lineage=0]
-
Creates an events file which injects all the genotypes of the current
batch at update 0. Injection starts at the given start_cell id and increments
upward. All injected organisms are assigned the given lineage label and start
with the relevant merit; num_cpus copies of each genotype are injected.
Warning: injection is in order that the genotypes appear in the batch. This will break any spatial structure your population may have had. - WRITE_INJECT_INITIAL [file='events_inj.cfg'] [start_cell=0] [lineage=0]
-
Creates an events file which injects all the genotypes of the current
batch before update 0 (same as WRITE_INJECT_EVENTS but doesn't cause no organism errors). Injection starts at 0 (this is made for reintroduction of single organisms only!) and injects a single organism only. Injected organism is assigned the given lineage label and starts
with the relevant merit.
Warning: injection is in order that the genotypes appear in the batch and doesn't take into account the original location. This will break any spatial structure your population may have had. - WRITE_COMPETITION [join_UD=0] [start_merit=50000] [file='events_comp.cfg'] [batch_A=cur_batch-1] [file=batch_B=cur_batch] [grid_side=-1] [lineage=0]
-
Creates an events file which acts much like the one produced
by WRITE_INJECT_EVENTS, but injects two populations (from the given batches), separates
the populations at update 0 (via SeverGridRow grid_side), and joins them at the given
join_UD. Organisms from batch_A are injected starting at cell id 0; organisms from
batch_B are injected starting at cell id grid_side*grid_side. (If grid_side is negative,
an attempt will be made to infer it from the number of organisms in the population.)
Each population should be square, of grid_side x grid_side dimensions. (You will have to set up the world in avida.cfg to have WORLD_X of grid_side and WORLD_Y of 2*grid_side.) Each population may not be larger than 10,000 organisms. Organisms from batch_A will be assigned the given lineage label; organisms from batch_B will be assigned the given lineage label + 1.
Warning: Like WRITE_INJECT_EVENTS, WRITE_COMPEITITIONS will destroy any spatial structure the populations may have had. Also, since it severs/joins only the grid_side row, it is not suitable for use with WORLD_GEOMETRY values other than 1 (bounded grid).
This summary is given to help explain the use and constraints for using resources.
When a command specifies the use of resources for the test cpu, it should not affect the state of the test cpu after the command has finished. However, this means that the test cpu is no longer guaranteed to be reentrant. Each command will set up the environment and the resource count in the test cpu with it's own environment and resource count. When the command has finished it will set the the test cpu's environment and resource count back to what they were before the command was executed.
Resource usage for the test cpu occurs by setting the environment and then setting up the resource count using the environment. Once the resource count has been set up, it will not change during the use of the test cpu. When an organism performs and IO, completing a task, the concentrations are not changed. This was a design decision, but is easily changed.
In analyze, a new data structure was included which contains a time ordered list of resource concentrations. This list can be used to set up resources from different time points. By using the update parameter in the RECALCULATE (or RECALC) function, you can use the resource concentrations from a specified time point. If the LOAD_RESOURCES command is not called, the list defaults to a single entry which is the the initial concentrations of the resources specified in the environment configuration file.
- PRINT_TEST_CPU_RESOURCES
- This command first prints the whether or not the test cpu is using resources. Then it will print the concentration for each resource.
- LOAD_RESOURCES [file_name="resource.dat"][ class="cmdargopt">resource_cpu_cycle_offset=0]
- This command loads a time oriented list of resource concentrations. The command takes a file name containing this type of data, and defaults to resource.dat. The format of the file must be the same as resource.dat, and each line must be in the correct chronological order with oldest first. The resource_cpu_cycle_offset parameter will influence which update RECALCULATE, etc. will use from the resource.dat file. Specifically, the update specified in RECALCULATE, etc. will have resource_cpu_cycle_offset / AVE_TIME_SLICE (integer division) added to it. Unless you know exactly why you would use this option, you should leave it at its default of 0.
Avida analyze mode recognizes several keywords that correspond to information about genotypes. Several commands (such as DETAIL and MAP) require the use of these as format parameters to specify what genotypic features should be output. Others (such as FILTER) use them to identify specific genotypes that possess certain qualities. Before these commands are used, other processing functions may need to be run.
Allowable formats after a normal load (assuming these values were available from the input file to be loaded in) are:
id (Genome ID) | parent_id (Parent ID) | parent2_id (Second Parent ID in sexual genotypes) |
num_cpus (Number of CPUs) | total_cpus (Total CPUs Ever) | length (Genome Length) |
update_born (Update Born) | update_dead (Update Dead) | depth (Tree Depth) |
lineage (Unique Lineage Label) | sequence (Genome Sequence) | task_list (List of all tasks performed) |
After a RECALCULATE (or RECALC), these additional formats become available:
viable (Is Viable [0/1]) | copy_length (Copied Length) | exe_length (Executed Length) |
merit (Merit) | comp_merit (Computational Merit) | gest_time (Gestation Time) |
efficiency (Replication Efficiency) | fitness (Fitness) | div_type (Divide type used; 1 is default) |
mate_id (Mate Selection ID Number (sexual genotypes)) | executed_flags (Executed Flags) | task_order (Task Performance Order) |
task.n (# of times task number n is done) | task.n:binary (is task n done, 0/1) | total_task_count (# of unique tasks done) |
total_task_performance_count (total # of tasks done) | inst.n (# of times instruction #n is done) | r_tot.n (amount of resource n added to the organism's store in its lifetime) |
r_avail.n (amount of resource n in organism's store) | r_spec.n (# of times specification #n is used) |
It gives a count of 0 if there is no such instruction.
A note on r_spec.n: This counts nop specifications done by any and all "collect-type" instructions -- that is, any instruction that uses the helper function DoCollect. If more than one such instruction is included in the instruction set, r_spec.n will include specification counts for both instructions without any differentiation. For details on what the specification numbers mean, see cCodeLabel::AsIntUnique.
If a FIND_LINEAGE was done before the RECALCULATE (or RECALC), the parent genotype for each regular genotype will be available, enabling the additional formats:
parent_dist (Parent Distance) | ancestor_dist (Ancestor Distance) |
comp_merit_ratio, (Computational Merit Ratio with parent) | efficiency_ratio (Replication Efficiency Ratio with parent) |
fitness_ratio (Fitness Ratio with parent) | parent_muts (Mutations from Parent) |
html.sequence (Genome Sequence in Color; html format) |
If an ALIGN is run, one additional format is available:
alignment (Aligned Sequence) |
If a RECALCULATE (or RECALC) was done before the ALIGN, the following format is available:
alignment_executed_flags (Alignment Executed Flags) |
If tags have been applied to genotypes in analyze mode, an additional format is available:
tag (Genotype Tag) |
There are a handful of commands that will automatically perform landscapping. The landscape will only be run once per organism even when multiple output variables are used. For enhanced performance on multi-processor/multi-core systems, see the PrecalcLandscape action.
frac_dead (Fraction of Lethal Mutations) | frac_neg (Fraction of Harmful Mutations) |
frac_neut (Fraction of Neutral Mutations) | frac_pos (Fraction of Beneficial Mutations) |
complexity (Physical Complexity of Organism) | land_fitness (Average Mutation Fitness) |
Phenotypic plasticity information is available through a number of different commands. This information will be gathered in one of two manners. If RECALC num_trials X is called, where X is greater than 1, phenotypic plasticity information for each genotype in the batch will be collected. If RECALC is not called or is called with with just one trial (the default for RECALC), then using these commands will request 1000 trials for each genotype to gather plasticity information. Requesting an analysis of phenotypic plasticity in this manner will not re-evaluate other genotype statistics.
num_phen (Number of Phenotypes Identified) | phen_avg_fitness (Weighted Average Fitness) |
num_trials (Number of Phenotype Tests) | phen_entropy (Phenotypic Entropy [bits]) |
phen_max_fit_freq (Maximum Fitness Phenotype Frequency) | phen_max_fitness (Maximum Phenotype Fitness) |
phen_min_fit_freq (Minimum Fitness Phenotype Frequency) | phen_min_fitness (Minimum Phenotype Fitness) |
phen_likely_freq (Most Likely Phenotype Frequency) | phen_likely_fitness (Fitness of the Most Likely Phenotype) |
prob_task.n (Probability of task n being performed) | prob_viable (Probability of genotype viability) |
For the moment, all variables can only be a single character (letter or number) and begin with a $ whenever they need to be translated to their value. Lowercase letters are global variables, capital letters are local to a function (described later), and numbers are arguments to a function. A $$ will act as a single dollar sign, if needed.
- SET [variable] [value]
- Sets the variable to the value...
- CONFIG_GET [config variable] [variable]
- Retrieves the value of the supplied configuration variable and places it into the supplied analyze mode variable. For example, 'CONFIG_GET RANDOM_SEED r' will place the value of the random number seed (as specified in the configuration settings) into the variable r.
- CONFIG_SET [config variable] [value]
- Sets the supplied configuration variable to the value specified with the command.
- FOREACH [variable] [value] [value ...]
- Set the variable to each of the values listed, and run the code that follows between here and the next END command once for each of those values.
- FORRANGE [variable] [min_value] [max_value] [step_value=1]
- Set the variable to each of the values between min and max (at steps given), and run the code that follows between here and the next END command, once for each of those values.
These functions are currently very primitive with fixed inputs of $0 through $9. $0 is always the function name, and then there can be up to 9 other arguments passed through. Once a function is created, it can be run just like any other command.
- FUNCTION [name]
- This will create a function of the given name, including in it all of the commands up until an END is found. These commands will be bound to the function, but are not executed until the function is run as a command. Inside the function, the variables $1 through $9 can be used to access arguments passed in.
Currently there are no conditionals or mathematical commands in this scripting language. These are both planned for the future.