How can I use regex to match only one character in Python? - python

I am trying do precess a list of files
file_list = ['.DS_Store', '9', '7', '6', '8', '01', '4', '3', '2', '5']
the goal is to find the files whose name has only one character.
I tried this code
r = re.compile('[0-9]')
result_list = list(filter(r.match, file_list))
result_list
and got
['9', '7', '6', '8', '01', '4', '3', '2', '5']
where '01' should not be included.
I made a workaround
tmp = []
for i in file_list:
if len(i)==1:
tmp.append(i)
tmp
and I got
['9', '7', '6', '8', '4', '3', '2', '5']
this is exactly what I want. Although the method is ugly.
how can I use regex in Python to finish the task?

r = re.compile('^[0-9]$')
The ^ matches the beginning of a line and $ matches the end.
And if you really want it to match any character, not just numbers, it should be
r = re.compile('^.$')
The . in the regex is a single-character wildcard.

Match a string if it's simply any single character appearing at the beginning of the string (^.) right before the end of the string ($):
^.$
Regex101
Your Python then becomes:
r = re.compile('^.$')
result_list = list(filter(r.match, file_list))

Your code is equivalent to
[ i for i in file_list if len(i)==1]
And this method adapts to every case in which file's name has only one character.

Related

How to get all the substrings in string using Regex in Python

I have a string such as: "12345"
using the regex, how to get all of its substrings that consist of one up to three consecutive characters to get an output such as:
'1', '2', '3', '4', '5', '12', '23', '34', '45', '123', '234', '345'
You can use re.findall with a positive lookahead pattern that matches a character repeated for a number of times that's iterated from 1 to 3:
[match for size in range(1, 4) for match in re.findall('(?=(.{%d}))' % size, s)]
However, it would be more efficient to use a list comprehension with nested for clauses to iterate through all the sizes and starting indices:
[s[start:start + size] for size in range(1, 4) for start in range(len(s) - size + 1)]
Given s = '12345', both of the above would return:
['1', '2', '3', '4', '5', '12', '23', '34', '45', '123', '234', '345']

Append to list from another list

i have list like
list = ['1,2,3,4,5', '6,7,8,9,10']
I have problem with "," in list, because '1,2,3,4,5' its string.
I want to have list2 = ['1','2','3','4'...]
How i can do this?
Should be something like that:
nums = []
for str in list:
nums = nums + [int(n) for n in str.split(',')]
You can loop through and split the strings up.
list = ['1,2,3,4,5', '6,7,8,9,10']
result = []
for s in list:
result += s.split(',')
print(result)
Split each value in the original by , and then keep appending them to a new list.
l = []
for x in ['1,2,3,4,5', '6,7,8,9,10']:
l.extend(y for y in x.split(','))
print(l)
Use itertools.chain.from_iterable with map:
from itertools import chain
lst = ['1,2,3,4,5', '6,7,8,9,10']
print(list(chain.from_iterable(map(lambda x: x.split(','), lst))))
# ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
Note that you shouldn't use list name for variables as it's a built-in.
You can also use list comprehension
li = ['1,2,3,4,5', '6,7,8,9,10']
res = [c for s in li for c in s.split(',') ]
print(res)
#['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
list2 = []
list2+=(','.join(list).split(','))
','.join(list) produces a string of '1,2,3,4,5,6,7,8,9,10'
','.join(list).split(',') produces ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
join method is used to joined elements in a list by a delimiter. It returns a string in which the elements of sequence have been joined by ','.
split method is used to split a string into a list by a delimiter. It splits a string into an array of substrings.
# Without using loops
li = ['1,2,3,4,5', '6,7,8,9,10']
p = ",".join(li).split(",")
#['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

Python regular expression retrieving numbers between two different delimiters

I have the following string
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
I would like to use regular expressions to extract the groups:
group1 56,7,1
group2 88,9,1
group3 58,8,1
group4 45
group5 100
group6 null
My ultimate goal is to have tuples such as (group1, group2), (group3, group4), (group5, group6). I am not sure if this all can be accomplished with regular expressions.
I have the following regular expression with gives me partial results
(?<=h=|d=)(.*?)(?=h=|d=)
The matches have an extra comma at the end like 56,7,1, which I would like to remove and d=, is not returning a null.
You likely do not need to use regex. A list comprehension and .split() can likely do what you need like:
Code:
def split_it(a_string):
if not a_string.endswith(','):
a_string += ','
return [x.split(',')[:-1] for x in a_string.split('=') if len(x)][1:]
Test Code:
tests = (
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,",
"h=56,7,1,d=88,9,1,d=,h=58,8,1,d=45,h=100",
)
for test in tests:
print(split_it(test))
Results:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], ['']]
[['56', '7', '1'], ['88', '9', '1'], [''], ['58', '8', '1'], ['45'], ['100']]
You could match rather than split using the expression
[dh]=([\d,]*),
and grab the first group, see a demo on regex101.com.
That is
[dh]= # d or h, followed by =
([\d,]*) # capture d and s 0+ times
, # require a comma afterwards
In Python:
import re
rx = re.compile(r'[dh]=([\d,]*),')
string = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
numbers = [m.group(1) for m in rx.finditer(string)]
print(numbers)
Which yields
['56,7,1', '88,9,1', '58,8,1', '45', '100', '']
You can use ([a-z]=)([0-9,]+)(,)?
Online demo
just you need add index to group
You could use $ in positive lookahead to match against the end of the string:
import re
input_str = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
groups = []
for x in re.findall('(?<=h=|d=)(.*?)(?=d=|h=|$)', input_str):
m = x.strip(',')
if m:
groups.append(m.split(','))
else:
groups.append(None)
print(groups)
Output:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], None]
Here, I have assumed that parameters will only have numerical values. If it is so, then you can try this.
(?<=h=|d=)([0-9,]*)
Hope it helps.

Use index of first and second repeated index in list

There are lots of similar posts out there, but I could not find something that directly matched, or resulted in a solution to, the issue I am dealing with.
I want to use the second instance of a repeated index contained in a list as the index of another list. When the function is executed, I want all numbers from the start of the list up to the first '\*' to print after Code1, all numbers between the first '\*' and the second '\*' to print after Code2, and then all numbers following the second '\*' until the end of the list to print after Code3. Example data for digit would be "['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']".
In other words, I want the code below to print , assuming those digits exist, User Code: 12345, Pass Code: 6, Pin Code: 789101, all in one line.
print_string += 'User Code: {} '.format(''.join(str(dig) for dig in digit[:digit.index('*')])) + \
'Pass Code: {} '.format(''.join(str(dig) for dig in digit[digit.index('*'):digit.index('*')])) + \
'Pin Code: {} '.format(''.join(str(dig) for dig in digit[digit.index('*'):]))
print(print_string)
Essentially, I would like to call the first asterisk as the right index for User Code, the first asterisk as the left index and the second asterisk as the right index for Pass Code, and the second asterisk as the left index for Pin Code.
I just cannot figure out how make it look for sequential asterisks. If there is a simpler way to execute this, please let me know!
Given,
L = ['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']
Then
str.join('', L)
will form a string
'12345\\*6\\*789101'
which you can split into the three parts
parts = str.join('', L).split('\*')
and then pull out what you need
user_code = parts[0]
pass_code = parts[1]
pin = parts[2]
If you have actually got all the digits in a list like shape ina string,
"['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']"
it might be worth just having them as a list, then you can use the join/split method above.

Python/Regex - Expansion of Parentheses and Slashes

I'm looking for a way to expand numbers that are separated by slashes. In addition to the slashes, parentheses, may be used around some (or all) numbers to indicate a "group" which may be repeated (by the number of times directly following the parentheses) or repeated in reverse (followed by 's' as shown in the second set of examples). Some examples are:
1 -> ['1'] -> No slashes, no parentheses
1/2/3/4 -> ['1', '2', '3', '4'] -> basic example with slashes
1/(2)4/3 -> ['1', '2', '2', '2', '2', '3'] -> 2 gets repeated 4 times
1/(2/3)2/4 -> ['1', '2', '3', '2', '3', '4'] -> 2/3 is repeated 2 times
(1/2/3)2 -> ['1', '2', '3', '1', '2', '3'] -> Entire sequence is repeated twice
(1/2/3)s -> ['1', '2', '3', '3', '2', '1'] -> Entire sequence is repeated in reverse
1/(2/3)s/4 -> ['1', '2', '3', '3', '2', '4'] -> 2/3 is repeated in reverse
In the most general case, there could even be nested parentheses, which I know generally make the use of regex impossible. In the current set of data I need to process, there are no nested sets like this, but I could see potential use for it in the future. For example:
1/(2/(3)2/4)s/5 -> 1/(2/3/3/4)s/5
-> 1/2/3/3/4/4/3/3/2/5
-> ['1', '2', '3', '3', '4', '4', '3', '3', '2', '5']
I know of course that regex cannot do all of this (especially with the repeating/reversing sets of parenthesis). But if I can get a regex that at least separates the strings of parenthesis from those not in parenthesis, then I could probably make some loop pretty easily to take care of the rest. So, the regex I'd be looking for would do something like:
1 -> ['1']
1/2/3/4 -> ['1', '2', '3', '4']
1/(2)4/3 -> ['1', '(2)4', '3']
1/(2/3)2/4 -> ['1', '(2/3)2', '4']
1/(2/(3)2/4)s/5 -> ['1', '(2/(3)/2/4)s', '5']
And then I could loop on this result and continue expanding any parentheses until I have only digits.
EDIT
I wasn't totally clear in my original post. In my attempt to make the examples as simple as possible, I perhaps oversimplified them. This needs to work for numbers >= 10 as well as negative numbers.
For example:
1/(15/-23)s/4 -> ['1', '(15/-23)s', '4']
-> ['1', '15', '-23', '-23', '15', '4']
Since you are dealing with nested parenthesis, regex can't help you much here. It cannot easily convert the string to the list, as you wanted at the end.
You would better go with parsing the string yourself. You can try this code, just to meet your requirement at the end:
Parsing string into list without loosing parenthesis:
def parse(s):
li = []
open = 0
closed = False
start_index = -1
for index, c in enumerate(s):
if c == '(':
if open == 0:
start_index = index
open += 1
elif c == ')':
open -= 1
if open == 0:
closed = True
elif closed:
li.append(s[start_index: index + 1])
closed = False
elif open == 0 and c.isdigit():
li.append(c)
return li
This will give you for the string '1/(2/(3)2/4)s/5' the following list:
['1', '(2/(3)2/4)s', '5']
and for the string '1/(15/-23)s/4', as per your changed requirement, this gives:
['1', '(15/-23)s', '4']
Now, you need to take care of the breaking the parenthesis further up to get different list elements.
Expanding the strings with parenthesis to individual list elements:
Here you can make use of a regex, by just dealing with inner-most parenthesis at once:
import re
def expand(s):
''' Group 1 contains the string inside the parenthesis
Group 2 contains the digit or character `s` after the closing parenthesis
'''
match = re.search(r'\(([^()]*)\)(\d|s)', s)
if match:
group0 = match.group()
group1 = match.group(1)
group2 = match.group(2)
if group2.isdigit():
# A digit after the closing parenthesis. Repeat the string inside
s = s.replace(group0, ((group1 + '/') * int(group2))[:-1])
else:
s = s.replace(group0, '/'.join(group1.split('/') + group1.split('/')[::-1]))
if '(' in s:
return expand(s)
return s
li = parse('1/(15/-23)2/4')
for index, s in enumerate(li):
if '(' in s:
s = expand(s)
li[index] = s.split('/')
import itertools
print list(itertools.chain(*li))
This will give you the required result:
['1', '15', '-23', '-23', '15', '4']
The above code iterates over the list generated from parse(s) method, and then for each element, recursively expands the inner most parenthesis.
Here is another way to get that done.
def expand(string):
level = 0
buffer = ""
container = []
for char in string:
if char == "/":
if level == 0:
container.append(buffer)
buffer = ""
else:
buffer += char
elif char == "(":
level += 1
buffer += char
elif char == ")":
level -= 1
buffer += char
else:
buffer += char
if buffer != "":
container.append(buffer)
return container
Regular expressions are the completely wrong tool for this job. There's a long, drawn out explanation as to why regular expressions are not appropriate (If you want to know why, here's an online course). A simple recursive parser is easy enough to write to handle this that you'd probably be done with it well before you finished debugging your regular expression.
It's a slow day so I took it upon myself to write it myself, complete with doctests.
def parse(s):
"""
>>> parse('1')
['1']
>>> parse('1/2/3/4')
['1', '2', '3', '4']
>>> parse('1/(2)4/3')
['1', '2', '2', '2', '2', '3']
>>> parse('1/(2/3)2/4')
['1', '2', '3', '2', '3', '4']
>>> parse('(1/2/3)2')
['1', '2', '3', '1', '2', '3']
>>> parse('1/(2/3)s/4')
['1', '2', '3', '3', '2', '4']
>>> parse('(1/2/3)s')
['1', '2', '3', '3', '2', '1']
>>> parse('1/(2/(3)2/4)s/5')
['1', '2', '3', '3', '4', '4', '3', '3', '2', '5']
"""
return _parse(list(s))
def _parse(chars):
output = []
while len(chars):
c = chars.pop(0)
if c == '/':
continue
elif c == '(':
sub = _parse(chars)
nextC = chars.pop(0)
if nextC.isdigit():
n = int(nextC)
sub = n * sub
output.extend(sub)
elif nextC == 's':
output.extend(sub)
output.extend(reversed(sub))
elif c == ')':
return output
else:
output.extend(c)
return output
if __name__ == "__main__":
import doctest
doctest.testmod()

Categories