-
Notifications
You must be signed in to change notification settings - Fork 1
RuleDescriptions
Each rule in the build pipeline corresponds to a processing step, which may require other steps to be executed.
Build everything. This is the default target rule, and produces binary data files required by the GeneMANIA website.
Target rule for producing clean attribute files in text format.
Convert gene-attrib input files with ragged-length records into regular, tall thin tables with multiple records per gene.
Convert ragged attrib-gene input files into regular, tall thin tables.
Convert attrib-gene to gene-attrib pairs, common format required for later processing.
Convert gmt-format ragged input files into tall thin tables.
Remove records with unrecognized gene symbols from attribute files, producing new cleaned attribute files.
Remove duplicate attributes.
Convert attributes represented by gene and attribute symbols to internal equivalent internal genemania ids.
Assign an internal genemania id to each attribute group
Create (attributeid, description) pairs for all attribute ids in the cleaned input, adding empty description fields where no description is given.
Create the generic_db ATTRIBUTE_GROUPS.txt file.
Create the generic_db ATTRIBUTES.txt file, containing the name/description of each attribute in each group.
Copy attribute data files containing internal genemania ids to generic_db.
Target rule for processing direct networks.
Apply weight cleaning to direct networks, adding an implicit '1' weight if missing, and removing weights <=0.
Apply genemania network normalization to direct networks.
CacheBuilder: convert interaction networks to binary engine format.
AttributeBuilder: convert attributes to binary engine format.
PostSparsifier: filter co-expression networks removing unsupported interactions.
NodeDegreeComputer: count interactions for each gene across the entire organism.
AnnotationCacheBuilder: load functional annotation data into engine binary format.
FastWeightCacheBuilder: build precomputed data structures for GO-based network weighting. .
EnrichmentCategoryBuilder: build data structures for functional enrichment analysis.
DefaultNetworkSelector: select subset of co-expression networks to use as default networks.
NetworkPrecombiner: build combined networks for common queries, non-query-list dependent queries, such as single-gene queries.
Filter functional annotations removing unrecognized gene symbols.
Filter functional annotation categories by size for enrichment analysis.
Filter functional annotation categories by size and branch for BP combining.
Filter functional annotation categories by size and branch for MF combining.
Filter functional annotation categories by size and branch CC combining.
Build generic db file ONTOLOGY_CATEGORIES.txt listing sets of functional annotations available for enrichment analysis.
Build generic db file ONTOLGOIES.txt with names of function categories for display in enrichment analysis.
Copy functional annotation data files for GO based combining to generic db.
Copy functional annotation data files for enrichment analysis to generic db.
Create flag file marking functional annotation data being ready in generic_db.
Create empty generic db file TAGS.txt, network tags are no longer supported.
Create empty generic_db file NETWORK_TAG_ASSOC.txt, network tags are longer supported.
Create generic db file SCHEMA.txt, listing fields in each file in generic db.
Create generic db file STATISTICS.txt containing interaction total count, and dataset production date.
Create generic db file ORGANISMS.txt containing an organisms descriptive metadata such as scientific and common names.
Target rule for creating interaction data in generic_db format.
Copy network interaction files to generic_db.
Convert raw identifiers into id/symbol/source triplets, and remove genes with unneeded biotypes.
Target rule for producing cleaned gene identifier files.
Load all identifier input files containing id/symbol/source triplets and produce a single clean file, removing duplicates and clashes.
Create table containing descriptions for only the clean gene symbols.
Create generic db file NODES.txt, containing an id record for each unique gene (not symbol) in the system.
Create generic db file GENES.txt containing all recognized identifier symbols.
Create generic db file GENE_DATA.txt containing gene descriptions.
Create generic db file GENE_NAMING_SOURCES.txt enumerating all identifier source types such as Entrez ID, etc.
Target rule for constructing Lucene index files.
Build a config file in format required by index construction program.
Build Lucene index from generic db files containing organism, network, attribute, and functional annotation metadata.
Target rule constructing a table containing network metadata.
Combine metadata from individual network config files into a single tabular file.
Set default values where no network metadata was provided.
Target rule for computing interaction stats for all individual networks.
Compute interaction stats for individual networks.
Target rule for constructing a table combining all individual network stats.
Combine individual network stats files into a single table
Create an empty pubmed data cache file if none exists.
Retrieve publication metadata from pubmed, where available. Create a new extended network metadata file adding the required fields from pubmed.
Compute network names from publication metadata, if not given explicitly. Apply network name deduplication by adding letters 'A', 'B', etc to networks with the same name and network group.
Incorporate network interaction counts into network metadata
Create the generic db file NETWORKS.txt, listing all networks.
Create generic db file NETWORK_GROUPS.txt.
Create generic db file NETWORK_METADATA.txt, containing publication references and descriptive data for each network.
Target rule for interaction networks created from profile data.
Convert profiles to networks via Pearson correlation.
Apply genemania network normalization to networks created from profile data.
Target rule for interaction networks created from shared neighbour profile data.
Convert shared neighbour profiles to networks
Apply GeneMANIA network normalization to networks created from shared neighbour profiles.