I have a string that looks something like that
my_string='TAG="0000" TAG="1111" TAG="2222"'
what I want to do is simply replace those numbers by randomly generated ones in my string.
I was consindering doing something like:
new_string = my_string.replace('0000',str(random.randint(1,1000000)))
This is very easy and it works. Now let's say I want to make it more dynamic (in case I have a very long string with many TAG elements), I want to tell the code: "Each time you find "TAG=" in my_string, replace the following number with a random one". Does anyone have an idea?
Thanks a lot.
You can use re.sub:
import re, random
my_string='TAG="0000" TAG="1111" TAG="2222"'
new_string = re.sub('(?<=TAG\=")\d+', lambda _:str(random.randint(1,1000000)), my_string)
Output:
'TAG="901888" TAG="940530" TAG="439872"'
Related
I'm trying to go through a string and every time I come across an asterisk (*) replace it with every letter in the alphabet. Once that is done and another asterisk is hit, do the same again in both positions and so on. if possible saving these permutations to a .txt file or just printing them out. This is what I have and don't know where to go further than this:
alphabet = "abcdefghijklmnopqrstuvwxyz"
for i in reversed("h*l*o"):
if i =="*":
for j in ("abcdefghijklmnopqrstuvwxyz"):
Right I have some more challenges for some of the solutions below that im trying to use.
I cannot write to a file as I just get errors.
You can:
count the amount of asterisks in the string.
Create the product of all letters with as many repetitions as given in (1).
Replace each asterisk (in order) with the matching letter:
import string
import itertools
s = "h*l*o"
num_of_asterisks = s.count('*')
for prod in itertools.product(string.ascii_lowercase, repeat=num_of_asterisks):
it = iter(prod)
new_s = ''.join(next(it) if c == '*' else c for c in s)
print(new_s)
Notes:
Instead of creating a string of all letters, just use the string module.
This converts the product's tuples to iterators for easy handling of sequential replacing of each letter.
Uses the join method to create the new string out of the input string.
The above code simply prints each permutation. You can of course replace it with writing to a file or anything else you desire.
Interesting problems. I assume you mean the cartesian product and not "permutations".
I would use itertools:
string = "h*l*o"
import itertools
# for every combination of N letters
for letters in itertools.product(alphabet, repeat=string.count('*')):
# iterate over the letters
letter_iter = iter(letters)
# replace every * with the next instance
print(''.join(i if i!='*' else next(letter_iter) for i in string))
I'm learning the RE module for Python and doing some experiment. I have question regarding using expression, here is the example:
name = 'abc123def456'
m = re.compile('.*[^0-9]').match(name)
m.group()
print m
Result is 'abc123def'
What should I do if I want to totally take out the numeric number
Thank you!
You can extract all occurrences of alphabets and concatenate them to get just the alphabets in the string. See below:
"".join(re.findall("[a-zA-Z]+",name))
Hello I am new to python, and I hope you can help me. I have a text file (call it data.txt) with data on gene number with corresponding rs number and some distance measure. The data looks something like this:
rs1982171 55349 40802
rs6088650 55902 38550
rs1655902 3105 12220
rs1013677 55902 0
where the first column is rs number, second column is gene number, and third column is some distance measure. The data is much bigger, but hopefully the above gives you an idea of the dataset. What I want to do is find all the rs numbers that correspond to a certain gene. For example, for the data set above, gene 55902= {rs6088650, rs1013677}. Ideally, I want my code to find all rs numbers corresponding to a given gene. Since I am unable to do that now, I instead wrote a short code that gives the lines that contain the string "55902" in the data.txt file:
import re
data=open("data.txt","r")
for line in data:
line=line.rstrip()
if re.search("55902",line):
print line
The problem with this code is that the output is something like this:
rs6088650 55902 38550
rs1655902 3105 12220
rs1013677 55902 0
I want my code to ignore the string "55902" in the rs number. In other words, I don't my code to output the second line in the above output because the gene number is not 55902. I would like my output to be :
rs6088650 55902 38550
rs1013677 55902 0
How can I modify the above code to achieve what I want. Any help would be appreciated. Thanks in advance.
There's no need for regular expressions here, as all you're looking for is a simple static sequence. This line:
if re.search("55902",line):
Could be expressed as:
if "55902" in line:
And if you only want to check the second column, split the line first:
if '55902' in line.split()[1]:
Since you're now already checking the correct column, check for equality rather than membership:
if line.split()[1] == '55902':
You can use word boundary (\b), to match whole word search:
>>> import re
>>> re.search(r"\b55902\b", "rs1655902 3105 12220")
>>> re.search(r"\b55902\b", "rs6088650 55902 38550")
<_sre.SRE_Match object at 0x7f82594566b0>
if re.search(r"\b55902\b", line):
....
You can do this easily with a more powerful regular expression. One possible quick solution is to use a regex of the form:
r'\b55902\b'
The \b are word boundaries.
If you want to use regex, then you can use match or search along with word boundary \b as
x = " rs1982171 55349 40802".strip()
if (re.match(r"\b55349\b", x.split()[1])):
print x
IDEONE DEMO
I have strings like
"ABCD_ABCD_6.2.15_3.2"
"ABCD_ABCD_12.22.15_4.323"
"ABCD_ABCD_2.33.15_3.223"
I want to extract following from above
"6.2.15"
"12.22.15"
"2.33.15"
I tried using indices of numbers but cant use them since they are variable. Only thing constant here is the length of the characters appearing in the beginning of each string.
Another way would be this regex:
_(\d+.*?)_
import re
m = re.search('_(\\d+.*?)_', 'ABCD_ABCD_6.2.15_3.2')
m.group(1)
There are a ton of ways to do this. Try:
>>> "ABCD_ABCD_6.2.15_3.2".split("_")[2]
'6.2.15'
So, I am working with a text file on which I am doing the following operations on the string
def string_operations(string):
1) lowercase
2) remove integers from string
3) remove symbols
4) stemming
After this, I am still left with strings like:
durham 28x23
I see the flaw in my approach but would like to know if there is a good, fast way to identify if there is a numeric value attached with the string.
So in the above example, I want the output to be
durham
Another example:
21st ammendment
Should give:
ammendment
So how do I deal with this stuff?
If you requirement is, "remove any terms that start with a digit", you could do something like this:
def removeNumerics(s):
return ' '.join([term for term in s.split() if not term[0].isdigit()])
This splits the string on whitespace and then joins with a space all the terms that do not start with a number.
And it works like this:
>>> removeNumerics('21st amendment')
'amendment'
>>> removeNumerics('durham 28x23')
'durham'
If this isn't what you're looking for, maybe show some explicit examples in your questions (showing both the initial string and your desired result).