python search string for numbers, put brackets around them - python

I'm trying to search a string for numbers, and when finding them, wrap some chars around them, e.g.
a = "hello, i am 8 years old and have 12 toys"
a = method(a)
print a
"hello, i am \ref{8} years old and have \ref{12} toys"
I've looked at the re (regular expression) library, but cannot seem to find anything helpful... any cool ideas?

This is pretty basic usage of the .sub method:
numbers = re.compile(r'(\d+)')
a = numbers.sub(r'\ref{\1}', a)
The parethesis around the \d+ number pattern create a group, and the \1 reference is replaced with the contents of the group.
>>> import re
>>> a = "hello, i am 8 years old and have 12 toys"
>>> numbers = re.compile(r'(\d+)')
>>> a = numbers.sub(r'\\ref{\1}', a)
>>> print a
hello, i am \ref{8} years old and have \ref{12} toys

you need to use re.sub function along these lines :
re.sub("(\d+)",my_sub_func,text) # catch the numbers here (altho this only catches non real numbers)
where my_sub_func is defined like this :
def my_sub_func(match_obj):
text = match_obj.group(0) # get the digit text here
new_text = "\\ref{"+text+"}" # change the pattern here
return new_text`

Related

find the specific part of string between special characters

i am trying to find specific part of the string using regex or something like that.
for example:
string = "hi i am *hadi* and i have &18& year old"
name = regex.find("query")
age = regex.find("query")
print(name,age)
result:
hadi 18
i need the 'hadi' and '18'
Attention: The string is different each time. I need the sentence or
words betwee ** and &&
Try:
import re
string = "hi i am *hadi* and i have &18& year old"
pattern = r'(?:\*|&)(\w+)(?:\*|&)'
print(re.findall(pattern, string))
Outputs:
['hadi', '18']
You could assign re.findall(pattern, string) to a variable and have a Python list and access the values etc.
Regex demo:
https://regex101.com/r/vIg7lU/1
The \w+ in the regex can be changed to .*? if there is more than numbers and letters. Example: (?:\*|&)(.*?)(?:\*|&) and demo: https://regex101.com/r/RIqLuI/1
this is how i solved my question:
import re
string = "hello. my name is *hadi* and i am ^18^ years old."
name = re.findall(r"\*(.+)\*", string)
age = re.findall(r"\^(.+)\^", string)
print(name[0], age[0])

Write a Regex to extract number before '/'

I don't want to use string split because I have numbers 1-99, and a column of string that contain '#/#' somewhere in the text.
How can I write a regex to extract the number 10 in the following example:
He got 10/19 questions right.
Use a lookahead to match on the /, like this:
\d+(?=/)
You may need to escape the / if your implementation uses it as its delimiter.
Live example: https://regex101.com/r/xdT4vq/1
You can still use str.split() if you carefully construct logic around it:
t = "He got 10/19 questions right."
t2 = "He/she got 10/19 questions right"
for q in [t,t2]:
# split whole string at spaces
# split each part at /
# only keep parts that contain / but not at 1st position and only consists
# out of numbers elsewise
numbers = [x.split("/") for x in q.split()
if "/" in x and all(c in "0123456789/" for c in x)
and not x.startswith("/")]
if numbers:
print(numbers[0][0])
Output:
10
10
import re
myString = "He got 10/19 questions right."
oldnumber = re.findall('[0-9]+/', myString) #find one or more digits followed by a slash.
newNumber = oldnumber[0].replace("/","") #get rid of the slash.
print(newNumber)
>>>10
res = re.search('(\d+)/\d+', r'He got 10/19 questions right.')
res.groups()
('10',)
Find all numbers before the forward-slash and exclude the forward-slash by using start-stop parentheses.
>>> import re
>>> myString = 'He got 10/19 questions right.'
>>> stringNumber = re.findall('([0-9]+)/', myString)
>>> stringNumber
['10']
This returns all numbers ended with a forward-slash, but in a list of strings. if you want integers, you should map your list with int, then make a list again.
>>> intNumber = list(map(int, stringNumber))
>>> intNumber
[10]

Python regex if all whole words in string [duplicate]

This question already has answers here:
Do regular expressions from the re module support word boundaries (\b)?
(5 answers)
Closed 4 years ago.
I have the following a string, I need to check if
the string contains App2 and iPhone,
but not App and iPhone
I wrote the following:
campaign_keywords = "App2 iPhone"
my_string = "[Love]App2 iPhone Argentina"
pattern = re.compile("r'\b" + campaign_keywords + "\b")
print pattern.search(my_string)
It prints None. Why?
The raw string notation is wrong, the r should not be inside the the quotes. and the second \b should also be a raw string.
The match function tries to match at the start of the string. You need to use search or findall
Difference between re.search and re.match
Example
>>> pattern = re.compile(r"\b" + campaign_keywords + r"\b")
>>> pattern.findall(my_string)
['App2 iPhone']
>>> pattern.match(my_string)
>>> pattern.search(my_string)
<_sre.SRE_Match object at 0x10ca2fbf8>
>>> match = pattern.search(my_string)
>>> match.group()
'App2 iPhone'

Python - an extremely odd behavior of function lstrip [duplicate]

This question already has answers here:
Python string.strip stripping too many characters [duplicate]
(3 answers)
Closed 6 years ago.
I have encountered a very odd behavior of built-in function lstrip.
I will explain with a few examples:
print 'BT_NAME_PREFIX=MUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=NUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=PUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=SUV'.lstrip('BT_NAME_PREFIX=') # SUV
print 'BT_NAME_PREFIX=mUV'.lstrip('BT_NAME_PREFIX=') # mUV
As you can see, the function trims one additional character sometimes.
I tried to model the problem, and noticed that it persisted if I:
Changed BT_NAME_PREFIX to BT_NAME_PREFIY
Changed BT_NAME_PREFIX to BT_NAME_PREFIZ
Changed BT_NAME_PREFIX to BT_NAME_PREF
Further attempts have made it even more weird:
print 'BT_NAME=MUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=NUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=PUV'.lstrip('BT_NAME=') # PUV - different than before!!!
print 'BT_NAME=SUV'.lstrip('BT_NAME=') # SUV
print 'BT_NAME=mUV'.lstrip('BT_NAME=') # mUV
Could someone please explain what on earth is going on here?
I know I might as well just use array-slicing, but I would still like to understand this.
Thanks
You're misunderstanding how lstrip works. It treats the characters you pass in as a bag and it strips characters that are in the bag until it finds a character that isn't in the bag.
Consider:
'abc'.lstrip('ba') # 'c'
It is not removing a substring from the start of the string. To do that, you need something like:
if s.startswith(prefix):
s = s[len(prefix):]
e.g.:
>>> s = 'foobar'
>>> prefix = 'foo'
>>> if s.startswith(prefix):
... s = s[len(prefix):]
...
>>> s
'bar'
Or, I suppose you could use a regular expression:
>>> s = 'foobar'
>>> import re
>>> re.sub('^foo', '', s)
'bar'
The argument given to lstrip is a list of things to remove from the left of a string, on a character by character basis. The phrase is not considered, only the characters themselves.
S.lstrip([chars]) -> string or unicode
Return a copy of the string S with leading whitespace removed. If
chars is given and not None, remove characters in chars instead. If
chars is unicode, S will be converted to unicode before stripping
You could solve this in a flexible way using regular expressions (the re module):
>>> import re
>>> re.sub('^BT_NAME_PREFIX=', '', 'BT_NAME_PREFIX=MUV')
MUV

Python - Most elegant way to extract a substring, being given left and right borders [duplicate]

This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
I have a string - Python :
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
Expected output is :
"Atlantis-GPS-coordinates"
I know that the expected output is ALWAYS surrounded by "/bar/" on the left and "/" on the right :
"/bar/Atlantis-GPS-coordinates/"
Proposed solution would look like :
a = string.find("/bar/")
b = string.find("/",a+5)
output=string[a+5,b]
This works, but I don't like it.
Does someone know a beautiful function or tip ?
You can use split:
>>> string.split("/bar/")[1].split("/")[0]
'Atlantis-GPS-coordinates'
Some efficiency from adding a max split of 1 I suppose:
>>> string.split("/bar/", 1)[1].split("/", 1)[0]
'Atlantis-GPS-coordinates'
Or use partition:
>>> string.partition("/bar/")[2].partition("/")[0]
'Atlantis-GPS-coordinates'
Or a regex:
>>> re.search(r'/bar/([^/]+)', string).group(1)
'Atlantis-GPS-coordinates'
Depends on what speaks to you and your data.
What you haven't isn't all that bad. I'd write it as:
start = string.find('/bar/') + 5
end = string.find('/', start)
output = string[start:end]
as long as you know that /bar/WHAT-YOU-WANT/ is always going to be present. Otherwise, I would reach for the regular expression knife:
>>> import re
>>> PATTERN = re.compile('^.*/bar/([^/]*)/.*$')
>>> s = '/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/'
>>> match = PATTERN.match(s)
>>> match.group(1)
'Atlantis-GPS-coordinates'
import re
pattern = '(?<=/bar/).+?/'
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
result = re.search(pattern, string)
print string[result.start():result.end() - 1]
# "Atlantis-GPS-coordinates"
That is a Python 2.x example. What it does first is:
1. (?<=/bar/) means only process the following regex if this precedes it (so that /bar/ must be before it)
2. '.+?/' means any amount of characters up until the next '/' char
Hope that helps some.
If you need to do this kind of search a bunch it is better to 'compile' this search for performance, but if you only need to do it once don't bother.
Using re (slower than other solutions):
>>> import re
>>> string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
>>> re.search(r'(?<=/bar/)[^/]+(?=/)', string).group()
'Atlantis-GPS-coordinates'

Categories