Biopython - read and write a fasta file
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
file_in ='gene_seq_in.fasta'
file_out='gene_seq_out.fasta'
with open(file_out, 'w') as f_out:
for seq_record in SeqIO.parse(open(file_in, mode='r'), 'fasta'):
# remove .id from .description record (remove all before first space)
seq_record.description=' '.join(seq_record.description.split()[1:])
# do something (print or edit seq_record)
print('SequenceID = ' + seq_record.id)
print('Description = ' + seq_record.description + '\n')
# write new fasta file
r=SeqIO.write(seq_record, f_out, 'fasta')
if r!=1: print('Error while writing sequence: ' +
seq_record
.id)
SeqIO read & write
for each sequence, for example
>gi|16445223|ref|NC_002655.2|:1353261-1353530 stx2B Shiga toxin 2 subunit B
ATGAAGAAGATGTTTATGGCGGTTTTATTTGCATTAGCTTCTGTTAATGCAATGGCGGCG
GATTGTGCTAAAGGTAAAATTGAGTTTTCCAAGTATAATGAGGATGACACATTTACAGTG
AAGGTTGACGGGAAAGAATACTGGACCAGTCGCTGGAATCTGCAACCGTTACTGCAAAGT
GCTCAGTTGACAGGAATGACTGTCACAATCAAATCCAGTACCTGTGAATCAGGCTCCGGA
TTTGCTGAAGTGCAGTTTAATAATGACTGA
SeqIO.parse returns:
# sequence-ID (all before first space)
seq_record.id
'gi|16445223|ref|NC_002655.2|:1353261-1353530'
# description is the complete line after '>' (includes also the sequence-ID)
seq_record.description
'gi|16445223|ref|NC_002655.2|:1353261-1353530 stx2B Shiga toxin 2 subunit B'
# sequence
seq_record.seq
Seq('ATGAAGAAGATGTTTATGGCGGTTTTATTTGCATTAGCTTCTGTTAATGCAATG...TGA', SingleLetterAlphabet())
SeqIO.write
concatenates id and description:
'>' + seq_record.id + ' ' + seq_record.description
Attention! To avoid that the sequence-ID appears twice (as ID and in the description),
the ID needs to be removed from description record before writing:
seq_record.description=' '.join(seq_record.description.split()[1:])
or alternatively use .replace to remove the sequence-ID from decription
seq_record.description
=seq_record.description.replace(
seq_record.id
+' ','',1)