How to use Python's re to insert an integer? - python

I'm currently using Python to search through a .config file and look for an integer in a line such as "locationId="225".
It replaces the integer such as 225 with another number of my choosing.
This works fine. However, I'm not sure how to enter my own number if the original .config file is missing a number. Example:
locationID=""
So if the original locationId is missing an integer, I still want to replace it with my new integer.
I have used:
import re
sys.stdout.write(re.sub(r'(locationid=")', r'\1 ' + newtext, line))
but this causes it to output something such as
locationId=" 33"
with a space before the 33. How to I remove the space before the 33 and make it output
locationId="33"
?
I basically just want to know how to remove the space before the number.

The space is coming from your replacement string, r'\1 ', but removing that space causes a problem when you concatenate a number, say, 1, with it. If newtext is 1 then the replacement string becomes r'\11' without the space.
Remove the double quote from the capturing group and add it to the replacement string:
re.sub(r'(locationid=)"', r'\1"' + newtext, line)

Related

Remove numbers from result; Python3

I'm creating a script that querys websites, and my results end up looking something like this
result = "
nameof1stlink
38
nameof2ndlink120
12
nameof3rdlink15
7
nameof4thlin...
k143
43
"
Basically, I want to remove the numbers that come after each line of text. That would be easy for me to do in a pattern, but there is the occasional long string that takes up two separate lines. There's also the matter of needing to keep the numbers in the actual text names.
I was thinking of checking each individual line for string length and just removing those w/o 5 or more letters / numbers, but I wasn't sure if that would work, and I wasn't too sure how to do it either.
Any help from you guys would be great.
Thanks! :)
You could maybe use regex matching, looking for a link-like string (allowing for newlines) followed by a number and a newline, which you'd want to ignore. Then, to accommodate multi-line links, use simple str.replace() to remove any occurrences of the consistent ...\n that occurs when the link is split across multiple lines.
What I have in mind, given the example you've provided, is this:
import re
result = """nameof1stlink
38
nameof2ndlink120
12
nameof3rdlink15
7
nameof4thlin...
k143
43"""
matches = re.findall(r'([A-Za-z0-9\n/_.-]+?)[0-9\n]+[\n\b]', result, flags=re.M)
# match this group '( ) ' ^
# shortest possible ' ? ' (multi-line
# at least one of ' + ' string input)
# these characters ' [A-Za-z0-9\n/_.-] '
# then, at least one ' + '
# digit or newline ' [0-9\n] '
# and ending with \n ' [\n\b]'
# or end-of-string
# matches = ['nameof1stlink', 'nameof2ndlink', 'nameof3rdlink', 'nameof4thlin...\nk']
links = [link.replace('...\n', '') for link in matches]
# links = ['nameof1stlink', 'nameof2ndlink', 'nameof3rdlink', 'nameof4thlink']
I'm not sure what your links look like, but I assumed [A-Za-z0-9/_.-] (alphanumerics plus /, _, ., and -) covers all the standard parts of hyperlinks. And \n needs to be thrown somewhere in there to accommodate for multi-line entries. You can modify this character class depending on what you expect your links to look like.

Is it possible to search and replace a string with "any" characters?

There are probably several ways to solve this problem, so I'm open to any ideas.
I have a file, within that file is the string "D133330593" Note: I do have the exact position within the file this string exists, but I don't know if that helps.
Following this string, there are 6 digits, I need to replace these 6 digits with 6 other digits.
This is what I have so far:
def editfile():
f = open(filein,'r')
filedata = f.read()
f.close()
#This is the line that needs help
newdata = filedata.replace( -TOREPLACE- ,-REPLACER-)
#Basically what I need is something that lets me say "D133330593******"
#->"D133330593123456" Note: The following 6 digits don't need to be
#anything specific, just different from the original 6
f = open(filein,'w')
f.write(newdata)
f.close()
Use the re module to define your pattern and then use the sub() function to substitute occurrence of that pattern with your own string.
import re
...
pat = re.compile(r"D133330593\d{6}")
re.sub(pat, "D133330593abcdef", filedata)
The above defines a pattern as -- your string ("D133330593") followed by six decimal digits. Then the next line replaces ALL occurrences of this pattern with your replacement string ("abcdef" in this case), if that is what you want.
If you want a unique replacement string for each occurrence of pattern, then you could use the count keyword argument in the sub() function, which allows you to specify the number of times the replacement must be done.
Check out this library for more info - https://docs.python.org/3.6/library/re.html
Let's simplify your problem to you having a string:
s = "zshisjD133330593090909fdjgsl"
and you wanting to replace the 6 characters after "D133330593" with "123456" to produce:
"zshisjD133330594123456fdjgsl"
To achieve this, we can first need to find the index of "D133330593". This is done by just using str.index:
i = s.index("D133330593")
Then replace the next 6 characters, but for this, we should first calculate the length of our string that we want to replace:
l = len("D133330593")
then do the replace:
s[:i+l] + "123456" + s[i+l+6:]
which gives us the desired result of:
'zshisjD133330593123456fdjgsl'
I am sure that you can now integrate this into your code to work with a file, but this is how you can do the heart of your problem .
Note that using variables as above is the right thing to do as it is the most efficient compared to calculating them on the go. Nevertheless, if your file isn't too long (i.e. efficiency isn't too much of a big deal) you can do the whole process outlined above in one line:
s[:s.index("D133330593")+len("D133330593")] + "123456" + s[s.index("D133330593")+len("D133330593")+6:]
which gives the same result.

Python Regex: how to not select whitespace before last string?

I am (a newbie,) struggling with separating a database in columns with regex.findall().
I want to separate these Dutch street names into name and number.
Roemer Visscherstraat 15
Vondelstraat 102-huis
For the number I use
\S*$
Which works just fine. For the street name I use
^\S.+[^\S$]
Or: use everything but the last element, which may be a number or a combination of a number and something else.
Problem is: Python then also keeps the last whitespace after the last name, so I get:
'Roemer Visscherstraat '
Any way I can stop this from happening?
Also, Findall returns a list consisting of the bit of database I wanted, and an empty string. How does this happen and can i prevent it somehow?
Thanks so much in advance for you help.
You can rstrip() the name to remove any spaces at the end of it:
>>>'Roemer Visscherstraat '.rstrip()
'Roemer Visscherstraat'
But if the input is similar to the one you posted, you can simply use split() instead of regex, for example:
st = 'Roemer Visscherstraat 15'
data = st.split()
num = st[-1]
name = ' '.join(st[:-1])
print 'Name: {}, Number: {}'.format(name, num)
output:
Name: Roemer Visscherstraat, Number: 15
For the number you should use the following:
\S+$
Using a + instead of a * will ensure that you have at least one character in the match.
For the street name you can use the following:
^.+(?=\s\S+$)
What this does is selects text up until the number.
However, what you may consider doing is using one regex match with capture groups instead. The following would work:
^(.+(?=\s\S+$))\s(\S+$)
In this case, the first capture group gives you the street name, and the second gives you the number.
([^\d]*)\s+(\d.*)
In this regex the first group captures everything before a space and a number and the 2nd group gives the desired number
my assumption is that number would begin with a digit and the name would not have a digit in it
take a look at https://regex101.com/r/eW0UP2/1
Roemer Visscherstraat 15
Full match 0-24 `Roemer Visscherstraat 15`
Group 1. 0-21 `Roemer Visscherstraat`
Group 2. 22-24 `15`
Vondelstraat 102-huis
Full match 24-46 `Vondelstraat 102-huis`
Group 1. 24-37 `Vondelstraat`
Group 2. 38-46 `102-huis`

String splitting in python by finding non-zero character

I want to do the following split:
input: 0x0000007c9226fc output: 7c9226fc
input: 0x000000007c90e8ab output: 7c90e8ab
input: 0x000000007c9220fc output: 7c9220fc
I use the following line of code to do this but it does not work!
split = element.rpartition('0')
I got these outputs which are wrong!
input: 0x000000007c90e8ab output: e8ab
input: 0x000000007c9220fc output: fc
what is the fastest way to do this kind of split?
The only idea for me right now is to make a loop and perform checking but it is a little time consuming.
I should mention that the number of zeros in input is not fixed.
Each string can be converted to an integer using int() with a base of 16. Then convert back to a string.
for s in '0x000000007c9226fc', '0x000000007c90e8ab', '0x000000007c9220fc':
print '%x' % int(s, 16)
Output
7c9226fc
7c90e8ab
7c9220fc
input[2:].lstrip('0')
That should do it. The [2:] skips over the leading 0x (which I assume is always there), then the lstrip('0') removes all the zeros from the left side.
In fact, we can use lstrip ability to remove more than one leading character to simplify:
input.lstrip('x0')
format is handy for this:
>>> print '{:x}'.format(0x000000007c90e8ab)
7c90e8ab
>>> print '{:x}'.format(0x000000007c9220fc)
7c9220fc
In this particular case you can just do
your_input[10:]
You'll most likely want to properly parse this; your idea of splitting on separation of non-zero does not seem safe at all.
Seems to be the XY problem.
If the number of characters in a string is constant then you can use
the following code.
input = "0x000000007c9226fc"
output = input[10:]
Documentation
Also, since you are using rpartitionwhich is defined as
str.rpartition(sep)
Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.
Since your input can have multiple 0's, and rpartition only splits the last occurrence this a malfunction in your code.
Regular expression for 0x00000 or its type is (0x[0]+) and than replace it with space.
import re
st="0x000007c922433434000fc"
reg='(0x[0]+)'
rep=re.sub(reg, '',st)
print rep

Retrieve part of string, variable length

I'm trying to learn how to use Regular Expressions with Python. I want to retrieve an ID number (in parentheses) in the end from a string that looks like this:
"This is a string of variable length (561401)"
The ID number (561401 in this example) can be of variable length, as can the text.
"This is another string of variable length (99521199)"
My coding fails:
import re
import selenium
# [Code omitted here, I use selenium to navigate a web page]
result = driver.find_element_by_class_name("class_name")
print result.text # [This correctly prints the whole string "This is a text of variable length (561401)"]
id = re.findall("??????", result.text) # [Not sure what to do here]
print id
This should work for your example:
(?<=\()[0-9]*
?<= Matches something preceding the group you are looking for but doesn't consume it. In this case, I used \(. ( is a special character, so it has to be escaped with \. [0-9] matches any number. The * means match any number of the directly preceding rule, so [0-9]* means match as many numbers as there are.
Solved this thanks to Kaz's link, very useful:
http://regex101.com/
id = re.findall("(\d+)", result.text)
print id[0]
You can use this simple solution :
>>> originString = "This is a string of variable length (561401)"
>>> str1=OriginalString.replace("("," ")
'This is a string of variable length 561401)'
>>> str2=str1.replace(")"," ")
'This is a string of variable length 561401 '
>>> [int(s) for s in string.split() if s.isdigit()]
[561401]
First, I replace parantheses with space. and then I searched the new string for integers.
No need to really use regular expressions here, if it is always at the end and always in parenthesis you can split, extract last element and remove the parenthesis by taking the substring ([1:-1]). Regexes are relatively time expensive.
line = "This is another string of variable length (99521199)"
print line.split()[-1][1:-1]
If you did want to use regular expressions I would do this:
import re
line = "This is another string of variable length (99521199)"
id_match = re.match('.*\((\d+)\)',line)
if id_match:
print id_match.group(1)

Categories