Retrieve exactly 1 digit using regular expression in python - python

I want to print only ages that are less than 10. In this string, only the
value 1 should be printed. Somehow, that is not happening.
I used the following codes (using regular expression python)
import re
# This is my string
s5 = "The baby is 1 year old, Sri is 45 years old, Ann is 50 years old;
their father, Sumo is 78 years old and their grandfather, Kris, is 100 years
old"
# print all the single digits from the string
re.findall('[0-9]{1}', s5)
# Out[153]: ['1', '4', '5', '5', '0', '7', '8', '1', '0', '0']
re.findall('\d{1,1}', s5)
# Out[154]: ['1', '4', '5', '5', '0', '7', '8', '1', '0', '0']
re.findall('\d{1}', s5)
# Out[155]: ['1', '4', '5', '5', '0', '7', '8', '1', '0', '0']
The output should be 1 and not all the digits as displayed above.
What am i doing wrong ?

You are trying to match "any 1 number", but you want to match "any 1 number, not followed or preceded by another number".
One way to do that is to use lookarounds
re.findall(r'(?<![0-9])[0-9](?![0-9])', s5)
Possible lookarounds:
(?<!R)S // negative lookbehind: match S that is not preceded by R
(?<=R)S // positive lookbehind: match S that is preceded by R
(?!R)S // negative lookahead: match S that is not followed by R
(?=R)S // positive lookahead: match S that is followed by R
Maybe a simpler solution is to use a capturing group (). if regex in findall has one capturing group, it will return list of matches withing the group instead of whole matches:
re.findall(r'[^0-9]([0-9])[^0-9]', s5)
Also note that you can replace any 0-9 with \d - character group of numbers

Try this :
k = re.findall('(?<!\S)\d(?!\S)', s5)
print(k)
This also works :
re.findall('(?<!\S)\d(?![^\s.,?!])', s5)

import re
s = "The baby is 1 year old, Sri is 45 years old, Ann is 50 years old; their father, Sumo is 78 years old and their grandfather, Kris, is 100 years old"
m = re.findall('\d+',s)
for i in m:
if int(i)<10:
print(i)

Related

How can I use regex to match only one character in Python?

I am trying do precess a list of files
file_list = ['.DS_Store', '9', '7', '6', '8', '01', '4', '3', '2', '5']
the goal is to find the files whose name has only one character.
I tried this code
r = re.compile('[0-9]')
result_list = list(filter(r.match, file_list))
result_list
and got
['9', '7', '6', '8', '01', '4', '3', '2', '5']
where '01' should not be included.
I made a workaround
tmp = []
for i in file_list:
if len(i)==1:
tmp.append(i)
tmp
and I got
['9', '7', '6', '8', '4', '3', '2', '5']
this is exactly what I want. Although the method is ugly.
how can I use regex in Python to finish the task?
r = re.compile('^[0-9]$')
The ^ matches the beginning of a line and $ matches the end.
And if you really want it to match any character, not just numbers, it should be
r = re.compile('^.$')
The . in the regex is a single-character wildcard.
Match a string if it's simply any single character appearing at the beginning of the string (^.) right before the end of the string ($):
^.$
Regex101
Your Python then becomes:
r = re.compile('^.$')
result_list = list(filter(r.match, file_list))
Your code is equivalent to
[ i for i in file_list if len(i)==1]
And this method adapts to every case in which file's name has only one character.

Python regular expression retrieving numbers between two different delimiters

I have the following string
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
I would like to use regular expressions to extract the groups:
group1 56,7,1
group2 88,9,1
group3 58,8,1
group4 45
group5 100
group6 null
My ultimate goal is to have tuples such as (group1, group2), (group3, group4), (group5, group6). I am not sure if this all can be accomplished with regular expressions.
I have the following regular expression with gives me partial results
(?<=h=|d=)(.*?)(?=h=|d=)
The matches have an extra comma at the end like 56,7,1, which I would like to remove and d=, is not returning a null.
You likely do not need to use regex. A list comprehension and .split() can likely do what you need like:
Code:
def split_it(a_string):
if not a_string.endswith(','):
a_string += ','
return [x.split(',')[:-1] for x in a_string.split('=') if len(x)][1:]
Test Code:
tests = (
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,",
"h=56,7,1,d=88,9,1,d=,h=58,8,1,d=45,h=100",
)
for test in tests:
print(split_it(test))
Results:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], ['']]
[['56', '7', '1'], ['88', '9', '1'], [''], ['58', '8', '1'], ['45'], ['100']]
You could match rather than split using the expression
[dh]=([\d,]*),
and grab the first group, see a demo on regex101.com.
That is
[dh]= # d or h, followed by =
([\d,]*) # capture d and s 0+ times
, # require a comma afterwards
In Python:
import re
rx = re.compile(r'[dh]=([\d,]*),')
string = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
numbers = [m.group(1) for m in rx.finditer(string)]
print(numbers)
Which yields
['56,7,1', '88,9,1', '58,8,1', '45', '100', '']
You can use ([a-z]=)([0-9,]+)(,)?
Online demo
just you need add index to group
You could use $ in positive lookahead to match against the end of the string:
import re
input_str = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
groups = []
for x in re.findall('(?<=h=|d=)(.*?)(?=d=|h=|$)', input_str):
m = x.strip(',')
if m:
groups.append(m.split(','))
else:
groups.append(None)
print(groups)
Output:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], None]
Here, I have assumed that parameters will only have numerical values. If it is so, then you can try this.
(?<=h=|d=)([0-9,]*)
Hope it helps.

Splitting a string similar to ip addresses using regex in Python

I want to have a regular expression which will split on seeing a '.'(dot)
For example:
Input: '1.2.3.4.5.6'
Output : ['1', '2', '3', '4', '5', '6']
What I have tried:-
>>> pattern = '(\d+)(\.(\d+))+'
>>> test = '192.168.7.6'
>>> re.findall(pat, test)
What I get:-
[('192', '.6', '6')]
What I expect from re.findall():-
[('192', '168', '7', '6')]
Could you please help in pointing what is wrong?
My thinking -
In pattern = '(\d+)(\.(\d+))+', initial (\d+) will find first number i.e. 192 Then (\.(\d+))+ will find one or more occurences of the form '.<number>' i.e. .168 and .7 and .6
[EDIT:]
This is a simplified version of the problem I am solving.
In reality, the input can be-
192.168 dot 7 {dot} 6
and expected output is still [('192', '168', '7', '6')].
Once I figure out the solution to extract .168, .7, .6 like patterns, I can then extend it to dot 168, {dot} 7 like patterns.
Since you only need to find the numbers, the regex \d+ should be enough to find numbers separated by any other token/separator:
re.findall("\d+", test)
This should work on any of those cases:
>>> re.findall("\d+", "192.168.7.6")
['192', '168', '7', '6']
>>> re.findall("\d+", "192.168 dot 7 {dot} 6 | 125 ; 1")
['192', '168', '7', '6', '125', '1']

Search for non-zero positive integers in a string using regex (python re)

I'm trying to run this code, to extract the non-zero positive integers from a string in Python:
#python code
import re
positive_patron = re.compile('[0-9]*\.?[0-9]+')
string = '''esto si esta en level 0 y extension txt LEVEL0.TXT
2 4 5 6 -12 -43 1 -54s esto si esta en 1 pero es
txt 69 con extension txt y profunidad 2'''
print positive_patron.findall(string)
This gives the output ['0', '0', '2', '4', '5', '6', '12', '43', '1', '54', '1', '69', '2']
However, I don't want to match 0 or negative numbers, and I want my output as ints, like this: [2,4,5,6,1,1,69,2].
Can anyone tell me how to achieve this?
Use the word boundary escape sequence \b, so it won't match a number that has other alphanumeric characters around it. Also use a negative lookbehind to prohibit a leading -.
positive_patron = re.compile(r'\b(?<!-)\d*\.?\d+\b')
demo
To skip 0, do that with a filter after using the regexp.
numbers = positive_patron.findall(string)
numbers = [int(x) for x in numbers if x != '0']

How do I convert strings of digits into lists of digits with proper spacing?

I have a string of digits:
grades= '50 20 1 55 90'
How do I want convert this into a lists? When I try to use the list() function I get
['5', '0', ' ', '2', '0', ' ', '1', ' ', '5', '5', ' ', '9', '0']
Which makes life hard man. I need them in the format:
[50, 20, 1, 55, 90]
I tried coming up with a solution of my own which would loop through each element, checking if the string was a digit, and then appending them together until it got to a space, which would make the whole appended digit be appended to another list. This seemed overly complex. There must be another way!
Split on whitespace with str.split() and convert each element to an integer:
[int(i) for i in grades.split()]
str.split() with no arguments, or None as the first argument, splits on arbitrary width whitespace, removing any leading or trailing whitespace.
Demo:
>>> grades = '50 20 1 55 90'
>>> grades.split()
['50', '20', '1', '55', '90']
>>> [int(i) for i in grades.split()]
[50, 20, 1, 55, 90]

Categories