Read/write fasta

Biopython - read and write a fasta file

from Bio import SeqIO

from Bio.SeqRecord import SeqRecord

file_in ='gene_seq_in.fasta'

file_out='gene_seq_out.fasta'

with open(file_out, 'w') as f_out:

for seq_record in SeqIO.parse(open(file_in, mode='r'), 'fasta'):

# remove .id from .description record (remove all before first space)

seq_record.description=' '.join(seq_record.description.split()[1:])

# do something (print or edit seq_record)

print('SequenceID = ' + seq_record.id)

print('Description = ' + seq_record.description + '\n')

# write new fasta file

r=SeqIO.write(seq_record, f_out, 'fasta')

if r!=1: print('Error while writing sequence: ' + seq_record.id)

SeqIO read & write

for each sequence, for example

>gi|16445223|ref|NC_002655.2|:1353261-1353530 stx2B Shiga toxin 2 subunit B

ATGAAGAAGATGTTTATGGCGGTTTTATTTGCATTAGCTTCTGTTAATGCAATGGCGGCG

GATTGTGCTAAAGGTAAAATTGAGTTTTCCAAGTATAATGAGGATGACACATTTACAGTG

AAGGTTGACGGGAAAGAATACTGGACCAGTCGCTGGAATCTGCAACCGTTACTGCAAAGT

GCTCAGTTGACAGGAATGACTGTCACAATCAAATCCAGTACCTGTGAATCAGGCTCCGGA

TTTGCTGAAGTGCAGTTTAATAATGACTGA

SeqIO.parse returns:

# sequence-ID (all before first space)

seq_record.id

'gi|16445223|ref|NC_002655.2|:1353261-1353530'

# description is the complete line after '>' (includes also the sequence-ID)

seq_record.description

'gi|16445223|ref|NC_002655.2|:1353261-1353530 stx2B Shiga toxin 2 subunit B'

# sequence

seq_record.seq

Seq('ATGAAGAAGATGTTTATGGCGGTTTTATTTGCATTAGCTTCTGTTAATGCAATG...TGA', SingleLetterAlphabet())

SeqIO.write

concatenates id and description:

'>' + seq_record.id + ' ' + seq_record.description

Attention! To avoid that the sequence-ID appears twice (as ID and in the description),

the ID needs to be removed from description record before writing:

seq_record.description=' '.join(seq_record.description.split()[1:])

or alternatively use .replace to remove the sequence-ID from decription

seq_record.description=seq_record.description.replace(seq_record.id+' ','',1)