I want to check if a string ends with a decimal of varying numbers, from searching for a while, the closest solution I found was to input values into a tuple and using that as the condition for endswith(). But is there any shorter way instead of inputting every possible combination?
I tried hard coding the end condition but if there are new elements in the list it wont work for those, I also tried using regex it returns other elements together with the decimal elements as well. Any help would be appreciated
list1 = ["abcd 1.01", "zyx 22.98", "efgh 3.0", "qwe -70"]
for e in list1:
if e.endswith('.0') or e.endswith('.98'):
print 'pass'
Edit: Sorry should have specified that I do not want to have 'qwe -70' to be accepted, only those elements with a decimal point should be accepted
I'd like to propose another solution: using regular expressions to search for an ending decimal.
You can define a regular expression for an ending decimal with the following regex [-+]?[0-9]*\.[0-9]+$.
The regex broken apart:
[-+]?: optional - or + symbol at the beginning
[0-9]*: zero or more digits
\.: required dot
[0-9]+: one or more digits
$: must be at the end of the line
Then we can test the regular expression to see if it matches any of the members in the list:
import re
regex = re.compile('[-+]?[0-9]*\.[0-9]+$')
list1 = ["abcd 1.01", "zyx 22.98", "efgh 3.0", "qwe -70", "test"]
for e in list1:
if regex.search(e) is not None:
print e + " passes"
else:
print e + " does not pass"
The output for the previous script is the following:
abcd 1.01 passes
zyx 22.98 passes
efgh 3.0 passes
qwe -70 does not pass
test does not pass
Your example data leaves many possibilities open:
Last character is a digit:
e[-1].isdigit()
Everything after the last space is a number:
try:
float(e.rsplit(None, 1)[-1])
except ValueError:
# no number
pass
else:
print "number"
Using regular expressions:
re.match('[.0-9]$', e)
suspects = [x.split() for x in list1] # split by the space in between and get the second item as in your strings
# iterate over to try and cast it to float -- if not it will raise ValueError exception
for x in suspects:
try:
float(x[1])
print "{} - ends with float".format(str(" ".join(x)))
except ValueError:
print "{} - does not ends with float".format(str(" ".join(x)))
## -- End pasted text --
abcd 1.01 - ends with float
zyx 22.98 - ends with float
efgh 3.0 - ends with float
qwe -70 - ends with float
I think this will work for this case:
regex = r"([0-9]+\.[0-9]+)"
list1 = ["abcd 1.01", "zyx 22.98", "efgh 3.0", "qwe -70"]
for e in list1:
str = e.split(' ')[1]
if re.search(regex, str):
print True #Code for yes condition
else:
print False #Code for no condition
As you correctly guessed, endswith() is not a good way to look at the solution, given that the number of combinations is basically infinite. The way to go is - as many suggested - a regular expression that would match the end of the string to be a decimal point followed by any count of digits. Besides that, keep the code simple, and readable. The strip() is in there just in case one the input string has an extra space at the end, which would unnecessarily complicate the regex.
You can see this in action at: https://eval.in/649155
import re
regex = r"[0-9]+\.[0-9]+$"
list1 = ["abcd 1.01", "zyx 22.98", "efgh 3.0", "qwe -70"]
for e in list1:
if re.search(regex, e.strip()):
print e, 'pass'
The flowing maybe help:
import re
reg = re.compile(r'^[a-z]+ \-?[0-9]+\.[0-9]+$')
if re.match(reg, the_string):
do something...
else:
do other...
Related
Using python regex, I am trying to match as many number of p as the the digit first matched in pattern.
Sample Input
1pp
2p
3ppp
4ppppppppp
Expected Output
1p
None
3ppp
4pppp
Code Tried
I have tried the following code, where i use named group, and give the name 'dig' to the matched digit, now I want to use dig in repetition {m}. But the following code does not find any match in pattern.
pattern = "2pppp"
reTriple = '((?P<dig>\d)p{(?P=dig)})'
regex = re.compile(reTriple,re.IGNORECASE)
matches = re.finditer(regex,pattern)
I think the problem is that repetition {m} expects an int m, where as dig is a string. But I can't find a way to concatenate an int to string while keeping it int! I tried casting as follows:
reTrip = '((?P<dig>\d)p{%d}'%int('(?P=dig)')+')'
But I get the following error:
ValueError: invalid literal for int() with base 10: '(?P=dig)'
I feel stuck. Can someone please guide.
And its weird that if i instead break reTriple as follows: save the matched digit in a variable first and then concatenate this variable in reTriple, it works, and the expected output is achieved. But this is a work around, and I am looking for a better method.
reTriple = '(?P<dig>\d)'
dig = re.search(reTriple , pattern).group('dig')
reTriple = reTriple + '(p{1,' + dig + '})'
It seems that what you are trying basically comes down to: (\d+)p{\1} where you would use capture group 1 as input for how often you need to match "p". However capture group one seems to be returned as text (not numeric) causing you to find no results. Have a look here for example.
Maybe it helps to split this into two operations. For example:
import re
def val_txt(txt):
i = int(re.search(r'\d+', txt).group(0))
fnd = re.compile(fr'(?i)\d+p{{{i}}}')
if fnd.search(txt):
return fnd.search(txt).group(0)
print(val_txt('2p'))
You can also do pure string operations without depending on any module for the mentioned strings in the question (digits < 10):
def val_txt(txt):
dig = int(txt[0])
rest_val = 'p' * dig
return f'{dig}{rest_val}' if txt[1:1+dig] == rest_val else None
print(val_txt('1ppp'))
# 1p
Hi you can do another approach something like this without regex:
from typing import Union
def test(txt: str, var: str ='p') -> Union[str, None]:
var_count = txt.count(var)
number = int(txt[0:len(txt) - var_count:])
if number <= var_count:
return f'{number}{number * var}'
return None
lets test it
output:
t = ['1pp', '2p', '3ppp', '4ppppppppp', '10pppppppppp']
for i in t:
print(test(i))
1p
None
3ppp
4pppp
10pppppppppp
Here's a single step regex solution which uses a lambda function to check if there are sufficient p's to match the digits at the beginning of the string; if there are it returns the appropriate string (e.g. 1p or 3ppp), otherwise it returns an empty string:
import re
strs = ['1pp',
'2p',
'3ppp',
'4ppppppppp'
]
for s in strs:
print(re.sub(r'^(\d+)(p+).*', lambda m: m.group(1) + m.group(2)[:int(m.group(1))] if len(m.group(2)) >= int(m.group(1)) else '', s))
Output:
1p
3ppp
4pppp
I have text with values like:
this is a value £28.99 (0.28/ml)
I want to remove everything to return the price only so it returns:
£28.99
there could be any number of digits between the £ and .
I think
r"£[0-9]*\.[0-9]{2}"
matches the pattern I want to keep but i'm unsure on how to remove everything else and keep the pattern instead of replacing the pattern like in usual re.sub() cases.
I want to remove everything to return the price only so it returns:
Why not trying to extract the proper information instead?
import re
s = "this is a value £28.99 (0.28/ml)"
m = re.search("£\d*(\.\d+)?",s)
if m:
print(m.group(0))
to find several occurrences use findall or finditer instead of search
You don't care how many digits are before the decimal, so using the zero-or-more matcher was correct. However, you could just rely on the digit class (\d) to provide that more succinctly.
The same is true of after the decimal. You only need two so your limiting the matches to 2 is correct.
The issue then comes in with how you actually capture the value. You can use a capturing group to be sure that you only ever get the value you care about.
Complete regex:
(£\d*.\d{2})
Sample code:
import re
r = re.compile("(£\d*.\d{2})")
match = r.findall("this is a value £28.99 (0.28/ml)")
if match: # may bring back an empty list; check for that here
print(match[0]) # uses the first group, and will print £28.99
If it's a string, you can do something like this:
x = "this is a value £28.99 (0.28/ml)"
x_list = x.split()
for i in x_list:
if "£" in i: #or if i.startswith("£") Credit – Jean-François Fabre
value=i
print(value)
>>>£28.99
You can try:
import re
t = "this is a value £28.99 (0.28/ml)"
r = re.sub(".*(£[\d.]+).*", r"\1", t)
print(r)
Output:
£28.99
Python Demo
The input to this problem is a string and has a specific form. For example if s is a string then inputs can be s='3(a)2(b)' or s='3(aa)2(bbb)' or s='4(aaaa)'. The output should be a string, that is the substring inside the brackets multiplied by numerical substring value the substring inside the brackets follows.
For example,
Input ='3(a)2(b)'
Output='aaabb'
Input='4(aaa)'
Output='aaaaaaaaaaaa'
and similarly for other inputs. The program should print an empty string for wrong or invalid inputs.
This is what I've tried so far
s='3(aa)2(b)'
p=''
q=''
for i in range(0,len(s)):
#print(s[i],end='')
if s[i]=='(':
k=int(s[i-1])
while(s[i+1]!=')'):
p+=(s[i+1])
i+=1
if s[i]==')':
q+=k*p
print(q)
Can anyone tell what's wrong with my code?
A oneliner would be:
''.join(int(y[0])*y[1] for y in (x.split('(') for x in Input.split(')')[:-1]))
It works like this. We take the input, and split on the close paren
In [1]: Input ='3(a)2(b)'
In [2]: a = Input.split(')')[:-1]
In [3]: a
Out[3]: ['3(a', '2(b']
This gives us the integer, character pairs we're looking for, but we need to get rid of the open paren, so for each x in a, we split on the open paren to get a two-element list where the first element is the int (as a string still) and the character. You'll see this in b
In [4]: b = [x.split('(') for x in a]
In [5]: b
Out[5]: [['3', 'a'], ['2', 'b']]
So for each element in b, we need to cast the first element as an integer with int() and multiply by the character.
In [6]: c = [int(y[0])*y[1] for y in b]
In [7]: c
Out[7]: ['aaa', 'bb']
Now we join on the empty string to combine them into one string with
In [8]: ''.join(c)
Out[8]: 'aaabb'
Try this:
a = re.findall(r'[\d]+', s)
b = re.findall(r'[a-zA-Z]+', s)
c = ''
for i, j in zip(a, b):
c+=(int(i)*str(j))
print(c)
Here is how you could do it:
Step 1: Simple case, getting the data out of a really simple template
Let's assume your template string is 3(a). That's the simplest case I could think of. We'll need to extract pieces of information from that string. The first one is the count of chars that will have to be rendered. The second is the char that has to be rendered.
You are in a case where regex are more than suited (hence, the use of re module from python's standard library).
I won't do a full course on regex. You'll have to do that by our own. However, I'll explain quickly the step I used. So, count (the variable that holds the number of times we should render the char to render) is a digit (or several). Hence our first capturing group will be something like (\d+). Then we have a char to extract that is enclosed by parenthesis, hence \((\w+)\) (I actually enable several chars to be rendered at once). So, if we put them together, we get (\d+)\((\w+)\). For testing you can check this out.
Applied to our case, a straight forward use of the re module is:
import re
# Our template
template = '3(a)'
# Run the regex
match = re.search(r'(\d+)\((\w+)\)', template)
if match:
# Get the count from the first capturing group
count = int(match.group(1))
# Get the string to render from the second capturing group
string = match.group(2)
# Print as many times the string as count was given
print count * string
Output:
aaa
Yeah!
Step 2: Full case, with several templates
Okay, we know how to do it for 1 template, how to do the same for several, for instance 3(a)4(b)? Well... How would we do it "by hand"? We'd read the full template from left to right and apply each template one by one. Then this is what we'll do with python!
Hopefully for us the re module has a function just for that: finditer. It does exactly what we described above.
So, we'll do something like:
import re
# Our template
template = '3(a)4(b)'
# Iterate through found templates
for match in re.finditer(r'(\d+)\((\w+)\)', template):
# Get the count from the first capturing group
count = int(match.group(1))
# Get the string to render from the second capturing group
string = match.group(2)
print count * string
Output:
aaa
bbbb
Okay... Just remains the combination of that stuff. We know we can put everything at each step in an array, and then join each items of this array at the end, no?
Let's do it!
import re
template = '3(a)4(b)'
parts = []
for match in re.finditer(r'(\d+)\((\w+)\)', template):
parts.append(int(match.group(1)) * match.group(2))
print ''.join(parts)
Output:
aaabbb
Yeah!
Step 3: Final step, optimization
Because we can always do better, we won't stop. for loops are cool. But what I love (it's personal) about python is that there is so much stuff you can actually just write with one line! Is it the case here? Well yes :).
First we can remove the for loop and the append using a list comprehension:
parts = [int(match.group(1)) * match.group(2) for match in re.finditer(r'(\d+)\((\w+)\)', template)]
rendered = ''.join(parts)
Finally, let's remove the two lines with parts populating and then join and let's do all that in a single line:
import re
template = '3(a)4(b)'
rendered = ''.join(
int(match.group(1)) * match.group(2) \
for match in re.finditer(r'(\d+)\((\w+)\)', template))
print rendered
Output:
aaabbb
Yeah! Still the same output :).
Hope it helped!
The value of 'p' should be refreshed after each iteration.
s='1(aaa)2(bb)'
p=''
q=''
i=0
while i<len(s):
if s[i]=='(':
k=int(s[i-1])
p=''
while(s[i+1]!=')'):
p+=(s[i+1])
i+=1
if s[i]==')':
q+=k*p
i+=1
print(q)
The code is not behaving the way I want it to behave. The problem here is the placement of 'p'. 'p' is the variable that adds the substring inside the ( )s. I'm repeating the process even after sufficient adding is done. Placing 'p' inside the 'if' block will do the job.
s='2(aa)2(bb)'
q=''
for i in range(0,len(s)):
if s[i]=='(':
k=int(s[i-1])
p=''
while(s[i+1]!=')'):
#print(i,'first time')
p+=s[i+1]
i+=1
q+=p*k
#print(i,'second time')
print(q)
what you want is not print substrings . the real purpose is most like to generate text based regular expression or comands.
you can parametrize a function to read it or use something like it:
The python library rstr has the function xeger() to do what you need by using random strings and only returning ones that match:
Example
Install with pip install rstr
In [1]: from __future__ import print_function
In [2]: import rstr
In [3]: for dummy in range(10):
...: print(rstr.xeger(r"(a|b)[cd]{2}\1"))
...:
acca
bddb
adda
bdcb
bccb
bcdb
adca
bccb
bccb
acda
Warning
For complex re patterns this might take a long time to generate any matches.
I would like to construct a reg expression pattern for the following string, and use Python to extract:
str = "hello w0rld how 34 ar3 44 you\n welcome 200 stack000verflow\n"
What I want to do is extract the independent number values and add them which should be 278. A prelimenary python code is:
import re
x = re.findall('([0-9]+)', str)
The problem with the above code is that numbers within a char substring like 'ar3' would show up. Any idea how to solve this?
Why not try something simpler like this?:
str = "hello w0rld how 34 ar3 44 you\n welcome 200 stack000verflow\n"
print sum([int(s) for s in str.split() if s.isdigit()])
# 278
s = re.findall(r"\s\d+\s", a) # \s matches blank spaces before and after the number.
print (sum(map(int, s))) # print sum of all
\d+ matches all digits. This gives the exact expected output.
278
How about this?
x = re.findall('\s([0-9]+)\s', str)
The solutions posted so far only work (if at all) for numbers that are preceded and followed by whitespace. They will fail if a number occurs at the very start or end of the string, or if a number appears at the end of a sentence, for example. This can be avoided using word boundary anchors:
s = "100 bottles of beer on the wall (ignore the 1000s!), now 99, now only 98"
s = re.findall(r"\b\d+\b", a) # \b matches at the start/end of an alphanumeric sequence
print(sum(map(int, s)))
Result: 297
To avoid a partial match
use this:
'^[0-9]*$'
I wondering if there is some way to find only second quotes from each pair in string, that has paired quotes.
So if I have string like '"aaaaa"' or just '""' I want to find only the last '"' from it. If I have '"aaaa""aaaaa"aaaa""' I want only the second, fourth and sixth '"'s. But if I have something like this '"aaaaaaaa' or like this 'aaa"aaa' I don't want to find anything, since there are no paired quotes. If i have '"aaa"aaa"' I want to find only second '"', since the third '"' has no pair.
I've tried to implement lookbehind, but it doesn't work with quantifiers, so my bad attempt was '(?<=\"a*)\"'.
You don't really need regex for this. You can do:
[i for i, c in enumerate(s) if c == '"'][1::2]
To get the index of every other '"'. Example usage:
>>> for s in ['"aaaaa"', '"aaaa""aaaaa"aaaa""', 'aaa"aaa', '"aaa"aaa"']:
print(s, [i for i, c in enumerate(s) if c == '"'][1::2])
"aaaaa" [6]
"aaaa""aaaaa"aaaa"" [5, 12, 18]
aaa"aaa []
"aaa"aaa" [4]
import re
reg = re.compile(r'(?:\").*?(\")')
then
for match in reg.findall('"this is", "my test"'):
print(match)
gives
"
"
If your necessity is to change the second quote you can also match the whole string and put the pattern before the second quote into a capture group. Then making the substitution by the first match group + the substitution string would archive the issue.
For example, this regex will match everything before the second quote and put it into a group
(\"[^"]*)\"
if you replace whole the match (which includes the second quote) by only the value of the capture group (which does not include the second quote), then you would just cut it off.
See the online example
import re
p = re.compile(ur'(\"[^"]*)\"')
test_str = u"\"test1\"test2\"test3\""
subst = r"\1"
result = re.sub(p, subst, test_str)
print result #result -> "test1test2"test3
Please read my answer about why you don't want to use regular expressions for such a problem, even though you can do that kind of non-regular job with it.
Ok then you probably want one of the solutions I give in the linked answer, where you'll want to use a recursive regex to match all the matching pairs.
Edit: the following has been written before the update to the question, which was asking only for second double quotes.
Though if you want to find only second double quotes in a string, you do not need regexps:
>>> s1='aoeu"aoeu'
>>> s2='aoeu"aoeu"aoeu'
>>> s3='aoeu"aoeu"aoeu"aoeu'
>>> def find_second_quote(s):
... pos_quote_1 = s2.find('"')
... if pos_quote_1 == -1:
... return -1
... pos_quote_2 = s[pos_quote_1+1:].find('"')
... if pos_quote_2 == -1:
... return -1
... return pos_quote_1+1+pos_quote_2
...
>>> find_second_quote(s1)
-1
>>> find_second_quote(s2)
4
>>> find_second_quote(s3)
4
>>>
here it either returns -1 if there's no second quote, or the position of the second quote if there is one.
a parser is probably better, but depending on what you want to get out of it, there are other ways. if you need the data between the quotes:
import re
re.findall(r'".*?"', '"aaaa""aaaaa"aaaa""')
['"aaaa"',
'"aaaaa"',
'""']
if you need the indices, you could do it as a generator or other equivalent like this:
def count_quotes(mystr):
count = 0
for i, x in enumerate(mystr):
if x == '"':
count += 1
if count % 2 == 0:
yield i
list(count_quotes('"aaaa""aaaaa"aaaa""'))
[5, 12, 18]