I am trying do precess a list of files
file_list = ['.DS_Store', '9', '7', '6', '8', '01', '4', '3', '2', '5']
the goal is to find the files whose name has only one character.
I tried this code
r = re.compile('[0-9]')
result_list = list(filter(r.match, file_list))
result_list
and got
['9', '7', '6', '8', '01', '4', '3', '2', '5']
where '01' should not be included.
I made a workaround
tmp = []
for i in file_list:
if len(i)==1:
tmp.append(i)
tmp
and I got
['9', '7', '6', '8', '4', '3', '2', '5']
this is exactly what I want. Although the method is ugly.
how can I use regex in Python to finish the task?
r = re.compile('^[0-9]$')
The ^ matches the beginning of a line and $ matches the end.
And if you really want it to match any character, not just numbers, it should be
r = re.compile('^.$')
The . in the regex is a single-character wildcard.
Match a string if it's simply any single character appearing at the beginning of the string (^.) right before the end of the string ($):
^.$
Regex101
Your Python then becomes:
r = re.compile('^.$')
result_list = list(filter(r.match, file_list))
Your code is equivalent to
[ i for i in file_list if len(i)==1]
And this method adapts to every case in which file's name has only one character.
I have the following string
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
I would like to use regular expressions to extract the groups:
group1 56,7,1
group2 88,9,1
group3 58,8,1
group4 45
group5 100
group6 null
My ultimate goal is to have tuples such as (group1, group2), (group3, group4), (group5, group6). I am not sure if this all can be accomplished with regular expressions.
I have the following regular expression with gives me partial results
(?<=h=|d=)(.*?)(?=h=|d=)
The matches have an extra comma at the end like 56,7,1, which I would like to remove and d=, is not returning a null.
You likely do not need to use regex. A list comprehension and .split() can likely do what you need like:
Code:
def split_it(a_string):
if not a_string.endswith(','):
a_string += ','
return [x.split(',')[:-1] for x in a_string.split('=') if len(x)][1:]
Test Code:
tests = (
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,",
"h=56,7,1,d=88,9,1,d=,h=58,8,1,d=45,h=100",
)
for test in tests:
print(split_it(test))
Results:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], ['']]
[['56', '7', '1'], ['88', '9', '1'], [''], ['58', '8', '1'], ['45'], ['100']]
You could match rather than split using the expression
[dh]=([\d,]*),
and grab the first group, see a demo on regex101.com.
That is
[dh]= # d or h, followed by =
([\d,]*) # capture d and s 0+ times
, # require a comma afterwards
In Python:
import re
rx = re.compile(r'[dh]=([\d,]*),')
string = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
numbers = [m.group(1) for m in rx.finditer(string)]
print(numbers)
Which yields
['56,7,1', '88,9,1', '58,8,1', '45', '100', '']
You can use ([a-z]=)([0-9,]+)(,)?
Online demo
just you need add index to group
You could use $ in positive lookahead to match against the end of the string:
import re
input_str = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
groups = []
for x in re.findall('(?<=h=|d=)(.*?)(?=d=|h=|$)', input_str):
m = x.strip(',')
if m:
groups.append(m.split(','))
else:
groups.append(None)
print(groups)
Output:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], None]
Here, I have assumed that parameters will only have numerical values. If it is so, then you can try this.
(?<=h=|d=)([0-9,]*)
Hope it helps.
I want to have a regular expression which will split on seeing a '.'(dot)
For example:
Input: '1.2.3.4.5.6'
Output : ['1', '2', '3', '4', '5', '6']
What I have tried:-
>>> pattern = '(\d+)(\.(\d+))+'
>>> test = '192.168.7.6'
>>> re.findall(pat, test)
What I get:-
[('192', '.6', '6')]
What I expect from re.findall():-
[('192', '168', '7', '6')]
Could you please help in pointing what is wrong?
My thinking -
In pattern = '(\d+)(\.(\d+))+', initial (\d+) will find first number i.e. 192 Then (\.(\d+))+ will find one or more occurences of the form '.<number>' i.e. .168 and .7 and .6
[EDIT:]
This is a simplified version of the problem I am solving.
In reality, the input can be-
192.168 dot 7 {dot} 6
and expected output is still [('192', '168', '7', '6')].
Once I figure out the solution to extract .168, .7, .6 like patterns, I can then extend it to dot 168, {dot} 7 like patterns.
Since you only need to find the numbers, the regex \d+ should be enough to find numbers separated by any other token/separator:
re.findall("\d+", test)
This should work on any of those cases:
>>> re.findall("\d+", "192.168.7.6")
['192', '168', '7', '6']
>>> re.findall("\d+", "192.168 dot 7 {dot} 6 | 125 ; 1")
['192', '168', '7', '6', '125', '1']
I'm trying to run this code, to extract the non-zero positive integers from a string in Python:
#python code
import re
positive_patron = re.compile('[0-9]*\.?[0-9]+')
string = '''esto si esta en level 0 y extension txt LEVEL0.TXT
2 4 5 6 -12 -43 1 -54s esto si esta en 1 pero es
txt 69 con extension txt y profunidad 2'''
print positive_patron.findall(string)
This gives the output ['0', '0', '2', '4', '5', '6', '12', '43', '1', '54', '1', '69', '2']
However, I don't want to match 0 or negative numbers, and I want my output as ints, like this: [2,4,5,6,1,1,69,2].
Can anyone tell me how to achieve this?
Use the word boundary escape sequence \b, so it won't match a number that has other alphanumeric characters around it. Also use a negative lookbehind to prohibit a leading -.
positive_patron = re.compile(r'\b(?<!-)\d*\.?\d+\b')
demo
To skip 0, do that with a filter after using the regexp.
numbers = positive_patron.findall(string)
numbers = [int(x) for x in numbers if x != '0']
I have a string of digits:
grades= '50 20 1 55 90'
How do I want convert this into a lists? When I try to use the list() function I get
['5', '0', ' ', '2', '0', ' ', '1', ' ', '5', '5', ' ', '9', '0']
Which makes life hard man. I need them in the format:
[50, 20, 1, 55, 90]
I tried coming up with a solution of my own which would loop through each element, checking if the string was a digit, and then appending them together until it got to a space, which would make the whole appended digit be appended to another list. This seemed overly complex. There must be another way!
Split on whitespace with str.split() and convert each element to an integer:
[int(i) for i in grades.split()]
str.split() with no arguments, or None as the first argument, splits on arbitrary width whitespace, removing any leading or trailing whitespace.
Demo:
>>> grades = '50 20 1 55 90'
>>> grades.split()
['50', '20', '1', '55', '90']
>>> [int(i) for i in grades.split()]
[50, 20, 1, 55, 90]