-
Notifications
You must be signed in to change notification settings - Fork 10
Sequence
A Sequence
in Eoulsan is an Object that contains the following information:
- An numerical id
- A name
- A description
- An alphabet
- A sequence
The Javadoc of the Sequence
class is available here.
// Create a sequence object with default values (id=0, string value are null and alphabet is AMBIGUOUS_DNA_ALPHABET
Sequence s1 = new Sequence();
// Create a sequence and set the id, name and sequence
Sequence s2 = new Sequence(1, "chr1", "ATGCATGC");
// Here we add also the description
Sequence s2 = new Sequence(1, "chr1", "ATGCATGC", "a test sequence");
There is getters and setters for all the members of the class (id, name, description, alphabet and sequence). The default setters do not check the value of set. For string values, the value set is always trimmed.
Sequence s = new Sequence();
s.setId(5);
int id = s.getId(); // 5
s.setId(-14785);
int id = s.getId(); // -14785
s.setName("chr2");
String chr = s.getName(); // chr2
There is two special setters that check the value before changing the member of the class.
boolean r = s.setNameWithValidation(null); // false
boolean r = s.setNameWithValidation(" "); // false
boolean r = s.setNameWithValidation("chr1"); // true
boolean r = setSequenceWithValidation("ATAGQ"); // false with AMBIGUOUS_DNA_ALPHABET_LETTERS
boolean r = setSequenceWithValidation("ATAGA"); // true with AMBIGUOUS_DNA_ALPHABET_LETTERS
Eoulsan alphabet system in Sequence
object handle several IUPAC alphabets and a read alphabet with letters "ATGCN". The defaults alphabets are available in the Alphabets
class:
Alphabets.AMBIGUOUS_DNA_ALPHABET
Alphabets.UNAMBIGUOUS_DNA_ALPHABET
Alphabets.AMBIGUOUS_RNA_ALPHABET
Alphabets.UNAMBIGUOUS_RNA_ALPHABET
Alphabets.READ_DNA_ALPHABET
s.setAlphabet(Alphabets.AMBIGUOUS_DNA_ALPHABET);
To get the reverse complement of a sequence use the reverseComplement()
method :
Sequence s = new Sequence();
s.setSequence("ATGC");
String seq1 = s.getSequence(); // ATGC
s.reverseComplement();
String seq2 = s.getSequence(); // GCAT
String seq1 = "ATGC";
String seq2 = Sequence.reverseComplement(seq1, Alphabets.AMBIGUOUS_DNA_ALPHABET); // GCAT
The Sequence
class define a validate()
that check if the name of the sequence is valid (not null and length >0) and if the sequence string is valid (not null, length > 0 and all the characters of the sequence must be allowed by the alphabet of the sequence).
new Sequence(0, "seq1", "ATCG").validate(); // true
new Sequence(0, "seq1", "").validate(); // false
new Sequence(0, "seq1", "ATCQ").validate(); // false
You can get the length of a sequence with the lenght()
method:
Sequence s = new Sequence();
s.setSequence("ATGC");
s.length(); //4
We can also remove a part of the sequence:
Sequence s = new Sequence();
s.setSequence("ATGCATGC");
s.subSequence(2,4); // Note that index starts at 0
s.getSequence(); // GCA
And concat sequences:
Sequence s1 = new Sequence();
Sequence s2 = new Sequence();
s1.setSequence("AATT");
s2.setSequence("GGCC");
Sequence s3 = s1.concat(s2);
s3.getSequence(); // AATTGGCC
Convert a sequence to a fasta string:
Sequence s = new Sequence(1, "myseq", "ATGCATGCATGC");
System.out.println(s.toFasta());
>myseq
ATGCATGCATGC
System.out.println(s.toFasta(4));
>myseq
ATGCA
TGCAT
GC
Parsing a fasta sequence
Sequence s = new Sequence();
s.getName(); // null
s.getSequence(); // null
s.parse(">myseq\nATGCATGCATGC\n");
s.getName(); // myseq
s.getSequence(); // ATGCATGCATGC
The Sequence
class contains also other utility methods:
getTm()
getGCPercent()