Addition of a Lazy Loading Sequence Parser to Biopython’s SeqIO Package
by Evan Parker for Open Bioinformatics Foundation
Biopython’s SeqIO package is used to parse sequence files such as the popular FASTA format and heavily annotated formats like GenBank flat file format. Currently the module will completely parse a sequence prior to returning a sequence record object. By implementing an indexing and lazy loading sequence parser, Biopython can enable more efficient use of large sequence files such as chromosomes or entire genomes.