Replace substring
replace all characters "a" with "x"
"a1 a2 a3 a4".replace("a","x")
x1 x2 x3 x4
replace file ending ".txt" with ".csv"
filename = "mydata.txt"
newFilename = filename.replace(".txt",".csv")
mydata.csv
remove file ending ".txt"
filename = "mydata.txt"
rawname = filename.replace(".txt","")
mydata
Split and merge text strings
split
>>> 'a1,a2,a3'.split(',')
['a1', 'a2', 'a3']
get filename (get first list element)
>>> 'filename.txt.bz2'.split('.')[0]
'filename'
split() by default used all whitespace characters (space, tab \t, new line \n, \r, ...)
join - to concatenate a list to a string
>>> ','.join(['a1', 'a2', 'a3']) # comma separated
'a1,a2,a3'
>>> ' '.join(['a1', 'a2', 'a3']) # space separated
'a1 a2 a3'
Or, simply using + for concatenating 2 or 3 words . For many words, better use join()
>>> 'a1' + ' ' + 'a2' + ' ' + 'a3'
'a1 a2 a3'
# add prefix, only if not already present
s = 'hello world'
prefix = 'hello'
if not s.startswith(prefix):
s = prefix + ' ' + s
'hello world'
Sort and convert text strings
length of string
>>> len('hello')
5
sorted list
>>> sorted(['C', 'b', 'd','A'], key=str.lower)
['A', 'b', 'C', 'd']
sort list in descending order
>>> sorted([4, 1, 3, 2], reverse=True)
[4, 3, 2, 1]
convert strings to numbers (and back as string)
>>> int('23')
23
>>> float('23.1')
23.1
>>> str(23)
'23'
see also: format()
format('hello','>20')
' hello'
convert all strings in a list into int numbers
s = ['4', '1', '3', '2']
n = [int(x) for x in s]
[4, 1, 3, 2]
see also: List Comprehension
Search for substring
Check presence of substring
'day' in 'Friday'
True
get index location of substring
'Friday'.index('day')
3
find all locations of a substring (e.g., all positions of letter 'l' in word 'hello' as 0-based-indices )
s='hello'
[idx for idx, letter in enumerate(s) if letter == 'l']
[2, 3]
check for any substring present in a string (e.g., in filename)
filename = 'sample.fastq.gz'
any(seqType in filename for seqType in ['.fasta','.fa','.fastq','.fq'])
True
Lists
find a substring in list items
dates = ['May 2015','January 2012','Dezember 2015','June 2014']
dates2015= [s for s in dates if '2015' in s]
['May 2015', 'Dezember 2015']
check if a substring is present any list item
dates= ['May 2015','January 2012','Dezember 2015','June 2014']
if any('2015' in s for s in dates):
print('yes, 2015 is included')
read more: