extract first three numbers from a string - python

I have strings like
"ABCD_ABCD_6.2.15_3.2"
"ABCD_ABCD_12.22.15_4.323"
"ABCD_ABCD_2.33.15_3.223"
I want to extract following from above
"6.2.15"
"12.22.15"
"2.33.15"
I tried using indices of numbers but cant use them since they are variable. Only thing constant here is the length of the characters appearing in the beginning of each string.

Another way would be this regex:
_(\d+.*?)_
import re
m = re.search('_(\\d+.*?)_', 'ABCD_ABCD_6.2.15_3.2')
m.group(1)

There are a ton of ways to do this. Try:
>>> "ABCD_ABCD_6.2.15_3.2".split("_")[2]
'6.2.15'

Related

Replacing a part of a string with a randomly generated number

I have a string that looks something like that
my_string='TAG="0000" TAG="1111" TAG="2222"'
what I want to do is simply replace those numbers by randomly generated ones in my string.
I was consindering doing something like:
new_string = my_string.replace('0000',str(random.randint(1,1000000)))
This is very easy and it works. Now let's say I want to make it more dynamic (in case I have a very long string with many TAG elements), I want to tell the code: "Each time you find "TAG=" in my_string, replace the following number with a random one". Does anyone have an idea?
Thanks a lot.
You can use re.sub:
import re, random
my_string='TAG="0000" TAG="1111" TAG="2222"'
new_string = re.sub('(?<=TAG\=")\d+', lambda _:str(random.randint(1,1000000)), my_string)
Output:
'TAG="901888" TAG="940530" TAG="439872"'

Is it possible to search and replace a string with "any" characters?

There are probably several ways to solve this problem, so I'm open to any ideas.
I have a file, within that file is the string "D133330593" Note: I do have the exact position within the file this string exists, but I don't know if that helps.
Following this string, there are 6 digits, I need to replace these 6 digits with 6 other digits.
This is what I have so far:
def editfile():
f = open(filein,'r')
filedata = f.read()
f.close()
#This is the line that needs help
newdata = filedata.replace( -TOREPLACE- ,-REPLACER-)
#Basically what I need is something that lets me say "D133330593******"
#->"D133330593123456" Note: The following 6 digits don't need to be
#anything specific, just different from the original 6
f = open(filein,'w')
f.write(newdata)
f.close()
Use the re module to define your pattern and then use the sub() function to substitute occurrence of that pattern with your own string.
import re
...
pat = re.compile(r"D133330593\d{6}")
re.sub(pat, "D133330593abcdef", filedata)
The above defines a pattern as -- your string ("D133330593") followed by six decimal digits. Then the next line replaces ALL occurrences of this pattern with your replacement string ("abcdef" in this case), if that is what you want.
If you want a unique replacement string for each occurrence of pattern, then you could use the count keyword argument in the sub() function, which allows you to specify the number of times the replacement must be done.
Check out this library for more info - https://docs.python.org/3.6/library/re.html
Let's simplify your problem to you having a string:
s = "zshisjD133330593090909fdjgsl"
and you wanting to replace the 6 characters after "D133330593" with "123456" to produce:
"zshisjD133330594123456fdjgsl"
To achieve this, we can first need to find the index of "D133330593". This is done by just using str.index:
i = s.index("D133330593")
Then replace the next 6 characters, but for this, we should first calculate the length of our string that we want to replace:
l = len("D133330593")
then do the replace:
s[:i+l] + "123456" + s[i+l+6:]
which gives us the desired result of:
'zshisjD133330593123456fdjgsl'
I am sure that you can now integrate this into your code to work with a file, but this is how you can do the heart of your problem .
Note that using variables as above is the right thing to do as it is the most efficient compared to calculating them on the go. Nevertheless, if your file isn't too long (i.e. efficiency isn't too much of a big deal) you can do the whole process outlined above in one line:
s[:s.index("D133330593")+len("D133330593")] + "123456" + s[s.index("D133330593")+len("D133330593")+6:]
which gives the same result.

How to use re.compile in Python if I want to match alphabetic word only

I'm learning the RE module for Python and doing some experiment. I have question regarding using expression, here is the example:
name = 'abc123def456'
m = re.compile('.*[^0-9]').match(name)
m.group()
print m
Result is 'abc123def'
What should I do if I want to totally take out the numeric number
Thank you!
You can extract all occurrences of alphabets and concatenate them to get just the alphabets in the string. See below:
"".join(re.findall("[a-zA-Z]+",name))

String splitting in python by finding non-zero character

I want to do the following split:
input: 0x0000007c9226fc output: 7c9226fc
input: 0x000000007c90e8ab output: 7c90e8ab
input: 0x000000007c9220fc output: 7c9220fc
I use the following line of code to do this but it does not work!
split = element.rpartition('0')
I got these outputs which are wrong!
input: 0x000000007c90e8ab output: e8ab
input: 0x000000007c9220fc output: fc
what is the fastest way to do this kind of split?
The only idea for me right now is to make a loop and perform checking but it is a little time consuming.
I should mention that the number of zeros in input is not fixed.
Each string can be converted to an integer using int() with a base of 16. Then convert back to a string.
for s in '0x000000007c9226fc', '0x000000007c90e8ab', '0x000000007c9220fc':
print '%x' % int(s, 16)
Output
7c9226fc
7c90e8ab
7c9220fc
input[2:].lstrip('0')
That should do it. The [2:] skips over the leading 0x (which I assume is always there), then the lstrip('0') removes all the zeros from the left side.
In fact, we can use lstrip ability to remove more than one leading character to simplify:
input.lstrip('x0')
format is handy for this:
>>> print '{:x}'.format(0x000000007c90e8ab)
7c90e8ab
>>> print '{:x}'.format(0x000000007c9220fc)
7c9220fc
In this particular case you can just do
your_input[10:]
You'll most likely want to properly parse this; your idea of splitting on separation of non-zero does not seem safe at all.
Seems to be the XY problem.
If the number of characters in a string is constant then you can use
the following code.
input = "0x000000007c9226fc"
output = input[10:]
Documentation
Also, since you are using rpartitionwhich is defined as
str.rpartition(sep)
Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.
Since your input can have multiple 0's, and rpartition only splits the last occurrence this a malfunction in your code.
Regular expression for 0x00000 or its type is (0x[0]+) and than replace it with space.
import re
st="0x000007c922433434000fc"
reg='(0x[0]+)'
rep=re.sub(reg, '',st)
print rep

Regexp matching equal number of the same character on each side of a string

How do you match only equal numbers of the same character (up to 3) on each side of a string in python?
For example, let's say I am trying to match equal signs
=abc= or ==abc== or ===abc===
but not
=abc== or ==abc=
etc.
I figured out how to do each individual case, but can't seem to get all of them.
(={1}(?=abc={1}))abc(={1})
as | of the same character
((={1}(?=abc={1}))|(={2}(?=abc={2})))abc(={1}|={2})
doesn't seem to work.
Use the following regex:
^(=+)abc\1$
Edit:
If you are talking about only max three =
^(={1,3})abc\1$
This is not a regular language. However, you can do it with backreferences:
(=+)[^=]+\1
consider that sample is a single string, here's a non-regex approach (out of many others)
>>> string="===abc==="
>>> string.replace("abc"," ").split(" ")
['===', '===']
>>> a,b = string.replace("abc"," ").split(" ")
>>> if a == b:
... print "ok"
...
ok
You said you want to match equal characters on each side, so regardless of what characters, you just need to check a and b are equal.
You are going to want to use a back reference. Check this post for an example:
Regex, single quote or double quote

Categories