Find first x matches with re.findall

Find first x matches with re.findall - python

I need limit re.findall to find first 3 matches and then stop.
for example
text = 'some1 text2 bla3 regex4 python5'
re.findall(r'\d',text)
then I get:
['1', '2', '3', '4', '5']
and I want:
['1', '2', '3']

re.findall returns a list, so the simplest solution would be to just use slicing:
>>> import re
>>> text = 'some1 text2 bla3 regex4 python5'
>>> re.findall(r'\d', text)[:3] # Get the first 3 items
['1', '2', '3']
>>>

To find N matches and stop, you could use re.finditer and itertools.islice:
>>> import itertools as IT
>>> [item.group() for item in IT.islice(re.finditer(r'\d', text), 3)]
['1', '2', '3']

Related

How do I trim specific elements of a list?

I've got a list here that represents a line in a file after I split it:
['[0.111,', '-0.222]', '1', '2', '3']
and I'm trying to trim off the "[" and the "," in the first element and the "]" in the second element. How would I do that? I've started my thought process here, but this code doesn't work:
for line in file:
line = line.split()
line[0] = line[1:-1]
line[1] = line[0:-1]
print(line2)

You can use re.sub:
from re import sub
s = '[0.111, -0.222] 1 2 3'
s = sub('[\[\]]', '', s)
print(s.split())
Output:
['0.111,', '-0.222', '1', '2', '3']
If by any chance you would like to remove the comma as well, you can
from re import sub
s = '[0.111, -0.222] 1 2 3'
s = sub('[\[\],]', '', s)
print(s.split())
Output:
['0.111', '-0.222', '1', '2', '3']

You can use replace to remove the brackets:
lst = ['[0.111,', '-0.222]', '1', '2', '3']
lst2 = [x.replace('[','').replace(']','') for x in lst]
print(lst2)
Output
['0.111,', '-0.222', '1', '2', '3']
You also be more specific:
lst2 = [x[1:] if x[0] == '[' else x[:-1] if x[-1] == ']' else x for x in lst]

You could filter only the numeric part of each string with a function like:
def clean(strings):
def onlyNumeric(c):
return c.isdigit() or c == '-' or c == '.'
return list(map(lambda s: "".join(filter(onlyNumeric, s)), strings))
Then your example (and many other oddities) could be addressed.
>>> clean(['[0.111,', '-0.222]', '1', '2', '3'])
['0.111', '-0.222', '1', '2', '3']

Python regular expression retrieving numbers between two different delimiters

I have the following string
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
I would like to use regular expressions to extract the groups:
group1 56,7,1
group2 88,9,1
group3 58,8,1
group4 45
group5 100
group6 null
My ultimate goal is to have tuples such as (group1, group2), (group3, group4), (group5, group6). I am not sure if this all can be accomplished with regular expressions.
I have the following regular expression with gives me partial results
(?<=h=|d=)(.*?)(?=h=|d=)
The matches have an extra comma at the end like 56,7,1, which I would like to remove and d=, is not returning a null.

You likely do not need to use regex. A list comprehension and .split() can likely do what you need like:
Code:
def split_it(a_string):
if not a_string.endswith(','):
a_string += ','
return [x.split(',')[:-1] for x in a_string.split('=') if len(x)][1:]
Test Code:
tests = (
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,",
"h=56,7,1,d=88,9,1,d=,h=58,8,1,d=45,h=100",
)
for test in tests:
print(split_it(test))
Results:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], ['']]
[['56', '7', '1'], ['88', '9', '1'], [''], ['58', '8', '1'], ['45'], ['100']]

You could match rather than split using the expression
[dh]=([\d,]*),
and grab the first group, see a demo on regex101.com.
That is
[dh]= # d or h, followed by =
([\d,]*) # capture d and s 0+ times
, # require a comma afterwards
In Python:
import re
rx = re.compile(r'[dh]=([\d,]*),')
string = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
numbers = [m.group(1) for m in rx.finditer(string)]
print(numbers)
Which yields
['56,7,1', '88,9,1', '58,8,1', '45', '100', '']

You can use ([a-z]=)([0-9,]+)(,)?
Online demo
just you need add index to group

You could use $ in positive lookahead to match against the end of the string:
import re
input_str = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
groups = []
for x in re.findall('(?<=h=|d=)(.*?)(?=d=|h=|$)', input_str):
m = x.strip(',')
if m:
groups.append(m.split(','))
else:
groups.append(None)
print(groups)
Output:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], None]

Here, I have assumed that parameters will only have numerical values. If it is so, then you can try this.
(?<=h=|d=)([0-9,]*)
Hope it helps.

How to split a string that includes a list (as a string)

For example, I have a string x = '1,test, 2,3,4,[5,6,7],9' and I want to split that into [‘1’,’test’,’2’,’3’,’4’,’[5,6,7]’,’9’].
I tried using split(",") but that doesn't work because of the "," inside the list itself.

You can hack csv to do this:
>>> import csv
>>> s='1,test, 2,3,4,[5,6,7],9'
>>> next(csv.reader([s.replace('[','"').replace(']','"')]))
['1', 'test', ' 2', '3', '4', '5,6,7', '9']
And if you want the braces:
>>> ["[{}]".format(e) if "," in e else e for e in next(csv.reader([s.replace('[','"').replace(']','"')]))]
['1', 'test', ' 2', '3', '4', '[5,6,7]', '9']
Or, use a regex:
>>> import re
>>> re.findall(r'\[[^\]]+\]|[^,]+', s)
['1', 'test', ' 2', '3', '4', '[5,6,7]', '9']
Pattern is explained here

How to remove whitespace in a list

I can't remove my whitespace in my list.
invoer = "5-9-7-1-7-8-3-2-4-8-7-9"
cijferlijst = []
for cijfer in invoer:
cijferlijst.append(cijfer.strip('-'))
I tried the following but it doesn't work. I already made a list from my string and seperated everything but the "-" is now a "".
filter(lambda x: x.strip(), cijferlijst)
filter(str.strip, cijferlijst)
filter(None, cijferlijst)
abc = [x.replace(' ', '') for x in cijferlijst]

Try that:
>>> ''.join(invoer.split('-'))
'597178324879'

If you want the numbers in string without -, use .replace() as:
>>> string_list = "5-9-7-1-7-8-3-2-4-8-7-9"
>>> string_list.replace('-', '')
'597178324879'
If you want the numbers as list of numbers, use .split():
>>> string_list.split('-')
['5', '9', '7', '1', '7', '8', '3', '2', '4', '8', '7', '9']

This looks a lot like the following question:
Python: Removing spaces from list objects
The answer being to use strip instead of replace. Have you tried
abc = x.strip(' ') for x in x

python slice set in list

i would like to slice a set within a list, but every time i do so, i get an empty list in return.
what i try to accomplish (maybe there is an easier way):
i got a list of sets
each set has 5 items
i would like to compare a new set against the list (if the set already exists in the list)
the first and the last item in the set is irrelevant for the comparison, so only the positions 2-4 are valid for the search of already existing sets
here is my code:
result_set = ['1', '2', '3', '4', '5']
result_matrix = []
result_matrix.append(result_set)
slicing the set is no problem:
print result_set[1:4]
['2', '3', '4']
print result_matrix[:][1:4]
[]
i would expect:
[['2', '3', '4']]

I think this is what you want to do:
>>> target_set = ['2', '3', '4']
>>> any([l for l in result_matrix if target_set == l[1:-1]])
True
>>> target_set = ['1', '2', '3']
>>> any([l for l in result_matrix if target_set == l[1:-1]])
False
Generalising and making that a function:
def is_set_in_matrix(target_set, matrix):
return any(True for l in matrix if list(target_set) == l[1:-1])
>>> result_matrix = [['1', '2', '3', '4', '5']]
>>> is_set_in_matrix(['1', '2', '3'], result_matrix)
False
>>> is_set_in_matrix(['2', '3', '4'], result_matrix)
True
# a quirk - it also works with strings...`
>>> s = '234'
>>> is_set_in_matrix(s, result_matrix)
True
Note that I have used l[1:-1] to ignore the first and last elements of the "set" in the comparison. This is more flexible should you ever need sets of different lengths.

>>> result_set = ['1', '2', '3', '4', '5']
>>> print result_set[1:4]
['2', '3', '4']
>>> result_matrix.append(result_set[1:4])
>>> result_matrix
[['2', '3', '4']]

Using result_matrix[:] returns the whole matrix as it is. You need to treat the result you want as a part of the array.
>>> result_matrix.append(result_set)
>>> result_matrix[:]
[['1', '2', '3', '4']]
>>> result_matrix[:][0]
['1', '2', '3', '4']
>>> result_matrix[0][1:4]
['2', '3', '4']
Also, as pointed out by falsetru:
>>> result_matrix.extend(result_set)
>>> result_matrix
['1', '2', '3', '4']
>>> result_matrix[1:4]
['2', '3', '4']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find first x matches with re.findall - python

I need limit re.findall to find first 3 matches and then stop. for example text = 'some1 text2 bla3 regex4 python5' re.findall(r'\d',text) then I get: ['1', '2', '3', '4', '5'] and I want: ['1', '2', '3']

re.findall returns a list, so the simplest solution would be to just use slicing: >>> import re >>> text = 'some1 text2 bla3 regex4 python5' >>> re.findall(r'\d', text)[:3] # Get the first 3 items ['1', '2', '3'] >>>

To find N matches and stop, you could use re.finditer and itertools.islice: >>> import itertools as IT >>> [item.group() for item in IT.islice(re.finditer(r'\d', text), 3)] ['1', '2', '3']

Related

How do I trim specific elements of a list?

Python regular expression retrieving numbers between two different delimiters

How to split a string that includes a list (as a string)

How to remove whitespace in a list

python slice set in list

Categories

Resources