GitHub

Custom Made Modules

A_hash_file.py - a script that hash the second column using first column as key
B_hash_mRNA_IDs.py - returns a uniq mRNA id hash
C_loadFasta.py - script to load fasta sequences
D_longest_fasta_sequence_header.py - script return headers of longest sequence
E_get_chr_size_gff3.py - script takes a gff3 file and returns max position for each chromosome

smallRNA Clustering Scripts

4a.py - calculating clusters based on genotype (input IGV file)
4a1.r - for plotting results from previous step
4b.py - find clusters regulations and pattern based on size
4c.py - from the clusters, make it to the inter-intra genic analysis
4d.py - calculating regulated sequences in the cluster

FASTA Handlers

5.py - make fasta files
5b.py - take out mapping positions from igv file using fasta file containing sequences
6.py - a script for taking out cDNAs from transcripts file i.e. MG20 file
7.py - a script for any fasta file which looks for a pattern and returns a count and possibility of being random

miRNA Mapping

8a.py - script for replacing 'U' to 'T'

Genome Gap Filling Simulation

9.sh - script for gap filling project
9a.py - for taking out rep element
9b.py - for replacing genome by N
9c.py - for taking out sequence where only one read is mapped
9d.py - take out the rep elements and put these in the genomic region

Gap Filling Real Data

10a.py - take out N-region from the ref_genome
10b.py - remove any additional N-region which might be present in 10Kb flanking region
10c.py -
10q.py - calculating insert_size/distances
10r.py - check if you have all elements
10s.py - make reverse complement multi-fasta
10t.py - make summary output for elements

ShortRan Scripts

12.sh - /plant/2011_week37/
12a.py - /plant/2011_week40/20111004/
12b.py - for taking out sequences with particular pattern
12c.py - make fasta file from profile
12d.py - replace U by T in miRNA database
12e.py - for counting mismatches
12f.py - for miRNA mapping from counting
12g.py - cluster predictions
12h.py - genome region analysis cluster by cluster or position by position
12i.py - for counting regulated sequences in cluster
12j.py - for counting unique size sequences and plotting the distribution from profile files
12k.py - for counting unique size sequences and plotting the distribution from cluster files
12l.py - for making profile from fastq files
12m.py - remove reads which were mapped on repeats
12n.py - for normalizing profiles
12o.py - generate a text file with 0's of size total_no_of_librariestotal_no_of_libraries
12p.py - for plotting expression data between two sets
12r.py - for making sql batch script
12s.py - add anotation to the sequences
12t.py - script for making mySQl-add column batch file
12u.py - make a script which can make a fasta file with abundance in the header as of format Sequence-xAbundance
12v.py - script for making igv files for clustering and visualization
12w.py - script for filtering reads based on score
12x.py - script for making chromosome length file for miRNA predictions
12y.py - script for making file compatible for tasiRNA predictions
12z.py - script for taking out clusters of ta-siRNAs
12aa.py - script for combining sequences for profile with the mapped genomic sequences which contain genomic annotation
12ab.py - script for making header for the table with library wise abundances
12ac.py - script for making table the library wise abundances
12ad.py - script for plotting abundances
12ae.py - python script for reaplcing libraries header
12af.py - miRNA to MySQL database
12ag.py - tasi-RNA to MySQL database
12ah.py - add length coloumn to the mySQL database
12ai.py - make a file with unique abundances
12ai.py -
12aj.py - script for spliting the fastq files by size
12ak.py - make artificial adapters
12al.py - parse the mirdeep 2 output to the mysql supported output
12am.py - script to find the miRNA sequences in profiles - /Users/vgupta/Desktop/script/python
12an.py - script to add an identifier based non-redundancy - /Users/vgupta/Desktop/script/python
mysql_batch - for saving file into the mysql database

gapfillRE

13_20110929_gapfillRE.sh - shell script processing other python scripts data
13_20110929_positive_control.sh - for running posistive control- a bit different as input comes from blast
13a.py - for filtering of reads where both ends map to rep elements
13b.py - for making reference compatible, i.e. adding headers, removing small letters
13c.py - for taking out all gap positions
13d.py - take out genomic sequence with flanking regions
13e.py - remove additional N regions around targeted gap
13f.py - take out hanging reads mapping on the flanking region
13g.py - filter out pairs mapped to the flanking region
13h.py - filter out diretional reads i.e. for 5'&3', 5',3'
13i.py - taking out top four condidate suitable for replacement of gap and make score table
13j.py - reporting for gap regions which have no appropriate rep element for gap
13k.py - pick out best possible element from scores
13l.py - print final list of elements with score
13m.py - count correctly inserted elements(only for positive controls)
13n.py - correct sequence name in fasta file (remove every thing after spaces), problem when mapping
13o.py - for taking out a particular fasta sequecnce
13p.py - script to remove pair mapped
13q.py - take out all the contigs alraeady placed in the psuedomolecule
13r.py - add length of the contigs
13s.py - add distances from 5 prime and 3 prime ends

Bactrial Genome Project With Niels

14.py - for finding a gene in many genomes
14b.py - for finding a gene in many genomes using blast for unannotated genomes
14c.py - script for taking list of genes and concatanating these by species.

Genome-wide Signatures

15.py - script to process the genome wide signature
15a.py - script to add length and relavant columns

Svend's Data

16.py

SpearmanRank

spr.py open file and calculate spearman co-efficient between all columns

Counting Corrected Reads

correct_read.py - count reads that has been corrected by ECHO

Making Patterns '/_' For Regulations

make_patterns.py - making patterns '/_' for regulations

Using R From Python

18_plot_sv.py - for plotting results obtinaed from the breakdancer

Yasu's Data

19_filter_markers.py - for filtering positions with the markers and storing these
19_merge_marker.py - script for merging different files based on some columns
19_remove_marker_positions.py - script for removing the existing markers and keeping only new SNPs

28 Accession Data

20_compare_fq_mapping.py - script for comparing read-1 and read-2 mapped files to same reference
20_divide_on_adaptors.py - script for deviding fastq file based on different adapters (demultiplexing)
20_trim_reads.py - script for trimming the fastq reads and quality scores

Fastq Script Kit

20_compare_fq_mapping.py - script for comparing read-1 and read-2 mapped files to same reference
20_divide_on_adaptors.py - script for deviding fastq file based on different adapters (demultiplexing)
20_trim_reads.py - script for trimming the fastq reads and quality scores
20_compare_fq.py - script for counting common reads in two fastq files
20d_count_mapped_fastq_inSam.py - script for counting common reads in two fastq files - script for counting the reads mapped

Function For Making Filtered Fastq File

17_20120212_filter_fastq.py - /Users/vikas0633/Desktop/plant/2012_week7

Genomic Toolkit

21a_remove_chacters.py - this script removes the any other character than ATGCN 21b_better_header.py - this script keeps only 4th field separated by '|'
21c_add_1_start.py - this script can add +1 to start position in a fasta file
21d_take_out_gene.py - this script takes out a sequence from fasta file given correct header name
21d_take_out_gene_list_headers.py - this script takes out a sequence from fasta file given correct header names in a file
21e_gff2gtf.py - this script converts gff, gff3 format to gtf format
gtf_to_gff.pl - this script converts gtf to gff3 format
81_parse.pl - script for calculating N50 value
21f_merge_two_files.py - script for merging two files based on given columns
21g_para_gtf.py - script for calculating exon/intron/transcripts lengths
gff_convert.pl - script for inter-converting different gff formats
intersection of gene models - bedtools intersect
21h_calculate_seq_len.py - take a fasta file and print sequences in decreasing length
21h_plot_seq_len.py - take a fasta file and plot sequence length
21i_RMoutput2GTF.py - take a tab-formatted RMoutput file as parse it to make a gtf file
21j_orf2fasta.py - script takes fasta file and output from orffinder and take out sequences with the orfs
21k_make_input4_glimmerHMM.py - this scripts takes a gene structure file (gff/gtf) and makes a exon file parsable by glimmerHMM
gff_to_genbank.py - Convert a GFF and associated FASTA file into GenBank format
21l_pileup2GTF.py - script converts a pileup to a gtf file based on the coverage
21m_gff2genestru.py - script creates input for gb format conversion script
21n_overlap_gff.py - takes two or more gff files merge the files where you see an overlap
21n_intersect_gff.py - takes two or more gff files merge the files where you see an intersection
21o_extract_seq_model.py - script takes out sequences/GTF models from given co-ordinate
21p_filter_fasta.py - script to filter fasta file based on the length of the sequences
21q_combine_GTF.py - This is the script for combining various annotations files
21r_make_CDS.py- script to create CDS file from fasta (containing exon sequences generated by bedtools) and GTF/GFF3 file
21s_summary_eval.py - script for summarizing eval output
21t_tau.py - script to add ORF to the gff file
21u_make_gff2.py - script makes gff2 file for the TAU input, same as Stig's 26_parse.pl
21v_format_gff3.py- script to format gff3 file in order to put in MySQL table
21v2_format_gff3.py- script to format gff3 file in order to put in MySQL table sequal to 21v
21w1_format_fasta.py- fasta file has duplicate entries
21w1_format_orthoMCL.py - format OrthoMCL output
21x_exon_repeat.py- find the exon Repeat over lap
21y_strand_fasta.py - script takes a GFF3 file and correct fasta file if minus strand
21z_foramt_IPR.py - script takes raw output from IPRScan and make non-redundant gene_ID\annotation
21aa_countMShit_in_GFF.py - script to count the uniq MS supported genes
21ab_split_gff.py - script to split sorted GFF file based on contig/sequence/chro name
21ac_addType.py - script to add gene type
21ad_makebed.py - script to make bed format file from the given column names
21ae_correct_UTR.py - script to correct the UTR co-ordinates
21af_format_protein_list_headers.py - script to get the corresponding headers between corrected and real fasta file
21ag_cal_CSD_gene_overlap.py - script to calculate the CDS vs gene overlap
21ah_find_longest_isoform.py - script was made for finding longest isoform in the spider protein set
21ah_count_N_between_genes.py - script to count Ns between the genes
21ai_modify_gene_names.py - script to modify gene names based on N counts
21aj_add_mRNA.py - script to add dummy mRNAs if absent
21ak_remove_redundant.py - script to remove the redundant node gene models
21al_correct_strand.py - this script takes strand from CDS and assigns the same to mRNA, exons and UTRs, GFF3 files
21am_update_GFF3_fasta.py - this script updates GFF3 and fasta given a different file
21an_hash_MySQLid.py - this script makes a 2 column table one with Id and another with yes/no
21ak_update_GFF3_IDsOnly.py this script take a two column id and replaces these in the GFF3 file
21ao_keep_fasta_ifGFF3.py - script to throw out excessive sequences in fasta file
21ap_TranscriptSummary.py - Summerizes GFF3 transcript wise
21aq_addGeneStrand.py - Adds the strand to the gene based on the mRNAs strands
21ar_findLongestIsoform_GFF3.py - Find the longest isoform for each gene in a gff3 file
21as_calc5primeCdnaDistance.py - calculate 5' distances of insersions
21at_FindLongestProtein.py - Finds Longest Protein
21au_trim3primeCDS.py - trims the 3 prime ends of CDS
21aw_CallFractionexon.py - calculate the callable fraction on the genome
21ax_LongestProteinCodingIsoform.py - Find the longest protein coding isoform
21ay_countFixDifference.py - Script for counting the Fix Differences in Population genetics
21az_addNRanno.py - Script for adding blast annotations from NR database
21b_better_header.py - Script to fix the fasta headers
21ba_getGeneBasedAlign.py - Script to calculate the gene alignment length from the MAF output
21bb_getGeneBasedAlignLength.py - Script to calculate the gene alignment length from the MAF output
21bc_GenotypicDistance.py - Script to calculate the genotypic distance from the VCF format file
21bd_summerizeArrayData.py - Script to summerize the array data
21be_bMakeHeatMap.py - Script to make the heatmap
21bf_ortho2fasta.py - Script to transfer the ortholog groups to fasta files
21bg_find_fragmented_genemodels.py - Script to find fragmented genemodels
21bi_search_blast.py - Script to blast a list of genes against a database and back

Transcripts Handlers

22a.py - script for parsing tophat/cufflink generated GTF files against a target (-G cufflink) annotation file
22b.py - normalize transcript profile table
22c.py - script for making plots from profile tables generated using 22b
22d.py - add profile tables to MySQL
22e.py - add annotations to profiles using fasta files
22f.py - add annotations to profiles using two column formatted file
22g.py - script to get pattern frequency from a profile table given a regulation, abundance and score cut-off
22h.py - script for finding complementary pattern between small RNAs and transcripts

MYSQL

23a_mysql_header.py - script for making headers for mysql tables

Blast

24a_filter_blast.py - script for filtering blast results

Python Plots

25a_plot_gene_freq.py - script for plotting gene frequencies across each chromosome

MirDeepP Summary

26_summary_mirDeepP.py - script for taking all the outputs from mirDeepP and putting it together

Spider Project

27_summary_MS_hit.py - script for process MS hit text file
27_foramt_fasta_spider.py - script to format the fasta headers according to the Thomas's explanations
27_TranscriptsOnScaffold.py - Script to extract all the transcripts on given scaffolds

UNC RNA-seq project

28a_obo_parser.py - script to obo file from the geneontology.org
28b_MSU_RAP_ids.py - MSU id parser
28c_gff3_validator.py - Script to validate a gff3 file

29. snpEff data analysis

29a_MakeGeneWideTable.py - script to put the snpEff data togehter
29b_MakeGeneWideTableUnique.py - script to summarize snpEff data

30. Degradome data analysis

30a_count_5prime_stacks.py - script for counting 5' degradome mappings from BAM file

GABox specific

31a_reformat_gff3.py - script to replace the ref column of gff3 by priority
31b_combine_GTF.py - This is the script for combining various annotations files
31c_TAU.py - script to add ORF to the gff3 file
31d_modify_gene_names.py - script to modify gene names based on N counts
31e_ReplaceWithLongerCodingRegion.py - Script to find the longest protein coding evidence with overlapping exons
31e_2_ReplaceWithLongerCodingRegion.py - Script to find the longest protein coding evidence with overlapping exons
31f_get_CuffBasedGenemodels - script to extract cufflinks based genemodels
31g_MakeGeneModelTable.py - same as 21v2_format_gff3.py
31h_add_FeatureType.py - script to modify GFF3 second column
31i_FixBoundries.py - script to modify GFF3 feature boundries

General Scripts

100_intersect_columns.py - script to find non-overlapping entries between the two columns
21ab_split_gff.py - script to split sorted GFF file based on contig/sequence/chro name
101_filter_fastq_len.py - script to filter a fastq file based on read length
102_flat2fasta_anno.py - script to make fasta file from the MySQL output
103_sort_gff_blocks.py - script to sort GFF3 file blocks
104_intersect_files_column.py - script to print the desired columns given keys from the files
105_match_IDs_from_2gff3_files.py - script will take two gff3 files and print out the corresponding mRNA IDs
106_filter_out_against_genelist.py - this script will filter out the genes which are in the list
107_ParameterGFF3.py - Script to calculate Gene, mRNA, exon, CDS count, Total length and average length
108_filterExactOverlapGFF3.py - script to filter overlapping start/end genemodels
109_AddPhaseGFF3.py - script to add Phase
110_getGene.py - script take a list of genes and extracts the genemodels from the GFF3 file
111_blastoutput_parser.py - script to parse blast output and return a table
112_iprscanout_parser.py - script to parse blast output and return a table
113_validate_GFF3.py - GFF3 validation script
114_validate_Fasta.py - Fasta validation script
115_MapFastq.py - Script to Map Fastq files
116_runGATK.py - script to run GATK analysis
117_addReadGroup.py - script to add readgroup in sam or bam file
118_gaps2bed.py - script takes a fasta file and created bed file with gap co-ordinates
129_splitIPR.py - script to split IPR file
130_shuffle_header.py - script to shuffle header of a given file
131_replace_values.py - script to replace all the values in a row
132_translateDNA.py - script to convert DNA to protein
133_snp_genomic_annotation.py - script to calculate the snps genomic distribution

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.metadata		.metadata
.settings		.settings
01		01
RemoteSystemsTempFiles		RemoteSystemsTempFiles
repos/python		repos/python
.gitignore		.gitignore
.project		.project
.pydevproject		.pydevproject
01_plot_hist.R		01_plot_hist.R
07a_count_geneorder_based_wrong_orthologs.py		07a_count_geneorder_based_wrong_orthologs.py
07b_find_similar_contigs.py		07b_find_similar_contigs.py
07c_find_orthologs.py		07c_find_orthologs.py
07d_add_ExonCount_ProteinLength.py		07d_add_ExonCount_ProteinLength.py
07e_CountGeneOrdering.py		07e_CountGeneOrdering.py
07f_consensus_contigs.py		07f_consensus_contigs.py
07g_bacterial_contigs.py		07g_bacterial_contigs.py
07g_count_progenators.py		07g_count_progenators.py
07h_CreateExpressionTable.py		07h_CreateExpressionTable.py
07i2_HomoeologousRegele.py		07i2_HomoeologousRegele.py
07i3_HomoeologousRegele.py		07i3_HomoeologousRegele.py
07i4_HomoeologousRegele.py		07i4_HomoeologousRegele.py
07i5_HomoeologousRegele.py		07i5_HomoeologousRegele.py
07i_HomoeologousRegele.py		07i_HomoeologousRegele.py
0b_degradome_site_relative2_smallRNAs.py		0b_degradome_site_relative2_smallRNAs.py
100_intersect_columns.py		100_intersect_columns.py
100b_fasta2flat.py		100b_fasta2flat.py
101_filter_fastq_len.py		101_filter_fastq_len.py
102_flat2fasta_anno.py		102_flat2fasta_anno.py
103_sort_gff_blocks.py		103_sort_gff_blocks.py
104_intersect_files_column.py		104_intersect_files_column.py
105_match_IDs_from_2gff3_files.py		105_match_IDs_from_2gff3_files.py
105_match_IDs_from_2gff3_files_noUnique_key.py		105_match_IDs_from_2gff3_files_noUnique_key.py
106_filter_out_genelist.py		106_filter_out_genelist.py
107_ParameterGFF3.py		107_ParameterGFF3.py
108_filterExactOverlapGFF3.py		108_filterExactOverlapGFF3.py
109_AddPhaseGFF3.py		109_AddPhaseGFF3.py
110_getGene.py		110_getGene.py
111_blastoutput_parser.py		111_blastoutput_parser.py
112_iprscanout_parser.py		112_iprscanout_parser.py
113_validate_GFF3.py		113_validate_GFF3.py
114_validate_Fasta.py		114_validate_Fasta.py
115_MapFastq.py		115_MapFastq.py
116_runGATK.py		116_runGATK.py
117_addReadGroup.py		117_addReadGroup.py
118_flankingRegion.py		118_flankingRegion.py
118_gaps2bed.py		118_gaps2bed.py
119_vcfParser.py		119_vcfParser.py
119_vcfParser_all_f.py		119_vcfParser_all_f.py
119b_vcfParser.py		119b_vcfParser.py
119c_vcfParser.py		119c_vcfParser.py
119d_vcfParser.py		119d_vcfParser.py
119e_vcfParser.py		119e_vcfParser.py
120_takeOutPos.py		120_takeOutPos.py
121_SlidingWindow.py		121_SlidingWindow.py
121b_SlidingWindow.py		121b_SlidingWindow.py
121c_SlidingWindow.py		121c_SlidingWindow.py
121d_SlidingWindow.py		121d_SlidingWindow.py
121e_SlidingWindow.py		121e_SlidingWindow.py
121f_SlidingWindowByGroup.py		121f_SlidingWindowByGroup.py
121g_SlidingWindowFrequecy.py		121g_SlidingWindowFrequecy.py
121h_SlidingWindowgenicFraction.py		121h_SlidingWindowgenicFraction.py
122_split_DNA_protein_fasta.py		122_split_DNA_protein_fasta.py
123_longestORF.py		123_longestORF.py
124_removeFastaDups.py		124_removeFastaDups.py
125_pileup2bed.py		125_pileup2bed.py
126_maf2bed.py		126_maf2bed.py
127_vcf_fasta.py		127_vcf_fasta.py
128_PFformatter.py		128_PFformatter.py
128b_PFformatter.py		128b_PFformatter.py
129_splitIPR.py		129_splitIPR.py
12a.py		12a.py
12aa.py		12aa.py
12ab.py		12ab.py
12ac.py		12ac.py
12ad.py		12ad.py
12ae.py		12ae.py
12af.py		12af.py
12ag.py		12ag.py
12ai.py		12ai.py
12ai2.py		12ai2.py
12aj.py		12aj.py
12ak.py		12ak.py
12al.py		12al.py
12am.py		12am.py
12an.py		12an.py
12av_removeMAFoverlaps.py		12av_removeMAFoverlaps.py
12av_removeMAFoverlaps_chr1.py		12av_removeMAFoverlaps_chr1.py
12av_removeMAFoverlaps_chr2.py		12av_removeMAFoverlaps_chr2.py
12av_removeMAFoverlaps_chr3.py		12av_removeMAFoverlaps_chr3.py
12av_removeMAFoverlaps_chr4.py		12av_removeMAFoverlaps_chr4.py
12av_removeMAFoverlaps_chr5.py		12av_removeMAFoverlaps_chr5.py
12av_removeMAFoverlaps_chr6.py		12av_removeMAFoverlaps_chr6.py
12b.py		12b.py
12c.py		12c.py
12d.py		12d.py
12e.py		12e.py
12f.py		12f.py
12g.py		12g.py
12h.py		12h.py
12i.py		12i.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custom Made Modules

smallRNA Clustering Scripts

FASTA Handlers

miRNA Mapping

Genome Gap Filling Simulation

Gap Filling Real Data

ShortRan Scripts

gapfillRE

Bactrial Genome Project With Niels

Genome-wide Signatures

Svend's Data

SpearmanRank

Counting Corrected Reads

Making Patterns '/_' For Regulations

Using R From Python

Yasu's Data

28 Accession Data

Fastq Script Kit

Function For Making Filtered Fastq File

Genomic Toolkit

Transcripts Handlers

MYSQL

Blast

Python Plots

MirDeepP Summary

Spider Project

UNC RNA-seq project

29. snpEff data analysis

30. Degradome data analysis

GABox specific

General Scripts

About

Releases

Packages

Languages

vikas0633/python

Folders and files

Latest commit

History

Repository files navigation

Custom Made Modules

smallRNA Clustering Scripts

FASTA Handlers

miRNA Mapping

Genome Gap Filling Simulation

Gap Filling Real Data

ShortRan Scripts

gapfillRE

Bactrial Genome Project With Niels

Genome-wide Signatures

Svend's Data

SpearmanRank

Counting Corrected Reads

Making Patterns '/_' For Regulations

Using R From Python

Yasu's Data

28 Accession Data

Fastq Script Kit

Function For Making Filtered Fastq File

Genomic Toolkit

Transcripts Handlers

MYSQL

Blast

Python Plots

MirDeepP Summary

Spider Project

UNC RNA-seq project

29. snpEff data analysis

30. Degradome data analysis

GABox specific

General Scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages