Python split a string between different delimiters - python

I would like to split a string like this
str = "$$Node_Type<block=begin>Blabla$$Node_Type<block=end>"
to something like this:
tab = ["$$Node_Type<block=begin>", "Blabla", "$$Node_Type<block=end>"]
but I can also have this:
str = "$$Node_Type1<block=begin>Blabla1$$Node_Type2<block=begin>Blabla2$$Node_Type2<block=end>$$Node_Type1<block=end>"
to something like this:
tab = ["$$Node_Type1<block=begin>", "Blabla1", "$$Node_Type2<block=begin>", "Blabla2", "$$Node_Type2<block=end>", "$$Node_Type1<block=end>"]
The idea at the end is to print it like that
$$Node_Type1<block=begin>
Blabla1
$$Node_Type2<block=begin>
Blabla2
$$Node_Type2<block=end>
$$Node_Type1<block=end>
Does someone has an idea ? Thx

You can take advantage of the fact that re.split retains the "splitter" in the results if it's a capturing group, and then:
import re
example = "Hello$$Node_Type1<block=begin>Blabla1$$Node_Type2<block=begin>Blabla2$$Node_Type2<block=end>$$Node_Type1<block=end>"
level = 0
for bit in re.split(r'(\$\$[^>]+>)', example):
if bit.startswith('$$') and bit.endswith('block=end>'):
level -= 1
if bit:
print(' ' * level + bit)
if bit.startswith('$$') and bit.endswith('block=begin>'):
level += 1
This prints out
Hello
$$Node_Type1<block=begin>
Blabla1
$$Node_Type2<block=begin>
Blabla2
$$Node_Type2<block=end>
$$Node_Type1<block=end>

Related

Python replace particular substring in string

I need my program to replace a certain - with a variable. I've tried this but it replaces the whole thing with the varibale.
jelenleg = 8 * "-"
x = 0
guess = c
jelenleg = jelenleg[x].replace("-", guess)
print(jelenleg)
So I need this to happen:
before: --------
after: c-------
But instead what I get is this: c
You can specify the count of to be replaced items:
jelenleg.replace("-", guess, 1)
will only replace one -
To replace at a particular location, I cant think of anything easier than transforming the string into a list, then replacing, then back to a string, like this:
jelenleg_list = list(jelenleg) # str --> list
jelenleg_list[x] = guess # replace at pos x
jelenleg = "".join(jelenleg_list) # list --> str

How to I find and replace after performing math on the string

I have a file with with following strings
input complex_data_BITWIDTH;
output complex_data_(2*BITWIDTH+1);
Lets say BITWIDTH = 8
I want the following output
input complex_data_8;
output complex_data_17;
How can I achieve this in python with find and replace with some mathematical operation.
I would recommend looking into the re RegEx library for string replacement and string search, and the eval() function for performing mathematical operations on strings.
Example (assuming that there are always parentheses around what you want to evaluate) :
import re
BITWIDTH_VAL = 8
string_initial = "something_(BITWIDTH+3)"
string_with_replacement = re.sub("BITWIDTH", str(BITWIDTH_VAL), string_initial)
# note: string_with_replacement is "something_(8+3)"
expression = re.search("(\(.*\))", string_with_replacement).group(1)
# note: expression is "(8+3)"
string_evaluated = string_with_replacement.replace(expression, str(eval(expression)))
# note: string_evaluated is "something_11"
You can use variables for that if you know the value to change, one for the value to search and other for the new value
BITWIDTH = 8
NEW_BITWIDTH = 2 * BITWIDTH + 1
string_input = 'complex_data_8;'
string_output = string_input.replace(str(BITWIDTH), str(NEW_BITWIDTH))
if you don't know the value then you need to get it first and then operate with it
string_input = 'complex_data_8;'
bitwidth = string_input.split('_')[-1].replace(';', '')
new_bitwidth = 2 * int(bitwidth) + 1
string_output = string_input.replace(bitwidth, str(new_bitwidth))

How can I extract a floating point value from a string, in python 3?

string = probability is 0.05
how can I extract 0.05 float value in a variable? There are many such strings in the file,I need to find the average probability, so
I used 'for' loop.
my code :
fname = input("enter file name: ")
fh = open(fname)
count = 0
val = 0
for lx in fh:
if lx.startswith("probability"):
count = count + 1
val = val + #here i need to get the only "float" value which is in string
print(val)
import re
string='probability is 1'
string2='probability is 1.03'
def FindProb(string):
pattern=re.compile('[0-9]')
result=pattern.search(string)
result=result.span()[0]
prob=string[result:]
return(prob)
print(FindProb(string2))
Ok, so.
This is using the regular expression (aka Regex aka re) library
It basically sets up a pattern and then searches for it in a string.
This function takes in a string and finds the first number in the string, then returns the variable prob which would be the string from the first number to the end.
If you need to find the probability multiple times then this might do it:
import re
string='probability is 1'
string2='probability is 1.03 blah blah bllah probablity is 0.2 ugggggggggggggggg probablity is 1.0'
def FindProb(string):
amount=string.count('.')
prob=0
for i in range(amount):
pattern=re.compile('[0-9]+[.][0-9]+')
result=pattern.search(string)
start=result.span()[0]
end=result.span()[1]
prob+=float(string[start:end])
string=string[end:]
return(prob)
print(FindProb(string2))
The caveat to this is that everything has to have a period so 1 would have to be 1.0 but that shouldn't be too much of a problem. If it is, let me know and I will try to find a way

How do I remove blank lines from a string in Python?

Lets say I have a variable that's data contained blank lines how would I remove them without making every thong one long line?
How would I turn this:
1
2
3
Into this:
1
2
3
Without turning it into this:
123
import os
text = os.linesep.join([s for s in text.splitlines() if s])
You can simply do this by using replace() like data.replace('\n\n', '\n')
Refer this example for better understanding.!!
data = '1\n\n2\n\n3\n\n'
print(data)
data = data.replace('\n\n', '\n')
print(data)
Output
1
2
3
1
2
3
text = text.replace(r"\n{2,}","\n")

Loops for sequence output - python

I've been struggling to figure out a way to get my sequence printed out with a 6-mer in the sequence on separate lines. As so (note the spacing of each line):
atgctagtcatc
tgctag
gctagt
ctagtc
tagtca
etc
So far, I've been able to get my sequence in string as shown:
from Bio import SeqIO
record = SeqIO.read(open("testSeq.fasta"), "fasta")
sequence = str(record.seq)
However, the only way I could seem to figure out to do the printing of the 6-mers is by:
print sequence
print sequence[0:5]
print "", sequence[1:6]
print "", "", sequence[2:7]
print "", "", "", sequence [3:8]
etc
I feel like there should be an easier way to do this. I've tried this, but it doesn't seem to work:
x = 0
y = 6
for sequence in sequence[x:y]
print sequence
x = x + 1
y = y + 1
Any opinions on how I should be attempting to accomplish this task would be greatly appreciated. I've only been using python for a couple days now and I'm sorry if my question seems simple.
Thank you!!
This should work:
width = 6
for i in range(len(sequence) - width):
print " " * i + sequence[i:i+width]
You could try the following (as far as I see you're using python2)
seq = "atgctagtcatc"
spaces = " "
for i in range(0, len(seq)):
print spaces*i+seq[i:i+6]
Output:
atgcta
tgctag
gctagt
ctagtc
tagtca
agtcat
gtcatc
tcatc
catc
atc
tc
c

Categories