Biopython‎ > ‎Examples‎ > ‎

Read/write fasta

Biopython - read and write a fasta file


from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

file_in ='gene_seq_in.fasta'
file_out='gene_seq_out.fasta'

with open(file_out, 'w') as f_out:
    for seq_record in SeqIO.parse(open(file_in, mode='r'), 'fasta'):
        # remove .id from .description record (remove all before first space)
        seq_record.description=' '.join(seq_record.description.split()[1:])
        # do something (print or edit seq_record)       
        print('SequenceID = '  + seq_record.id)
        print('Description = ' + seq_record.description + '\n')
        # write new fasta file
        r=SeqIO.write(seq_record, f_out, 'fasta')
        if r!=1: print('Error while writing sequence:  ' +
seq_record.id)

SeqIO read & write

for each sequence, for example
>gi|16445223|ref|NC_002655.2|:1353261-1353530 stx2B Shiga toxin 2 subunit B
ATGAAGAAGATGTTTATGGCGGTTTTATTTGCATTAGCTTCTGTTAATGCAATGGCGGCG
GATTGTGCTAAAGGTAAAATTGAGTTTTCCAAGTATAATGAGGATGACACATTTACAGTG
AAGGTTGACGGGAAAGAATACTGGACCAGTCGCTGGAATCTGCAACCGTTACTGCAAAGT
GCTCAGTTGACAGGAATGACTGTCACAATCAAATCCAGTACCTGTGAATCAGGCTCCGGA
TTTGCTGAAGTGCAGTTTAATAATGACTGA


SeqIO.parse  returns:

# sequence-ID (all before first space)
seq_record.id
'gi|16445223|ref|NC_002655.2|:1353261-1353530'

# description is the complete line after '>'  (includes also the sequence-ID)
seq_record.description
'gi|16445223|ref|NC_002655.2|:1353261-1353530 stx2B Shiga toxin 2 subunit B'

# sequence
seq_record.seq
Seq('ATGAAGAAGATGTTTATGGCGGTTTTATTTGCATTAGCTTCTGTTAATGCAATG...TGA', SingleLetterAlphabet())


SeqIO.write
concatenates id and description:
'>' + seq_record.id + ' ' + seq_record.description

Attention! To avoid that the sequence-ID appears twice (as ID and in the description),
the ID needs to be removed from description record before writing:
seq_record.description=' '.join(seq_record.description.split()[1:])
or alternatively use .replace to remove the sequence-ID from decription
seq_record.description=seq_record.description.replace(seq_record.id+' ','',1)