find elements of string that ends with specific value - python

I have a list of strings
['time_10', 'time_23', 'time_345', 'date_10', 'date_23', 'date_345']
I want to use regular expression to get strings that end with a specific number.
As I understand, first I have to combine all strings from the list into large string, then use form some kind of a pattern to use it for regular expression
I would be grateful if you could provide
regex(some_pattern, some_string)
that would return
['time_10', 'date_10']
or just
'time_10, date_10'

str.endswith is enough.
l = ['time_10', 'time_23', 'time_345', 'date_10', 'date_23', 'date_345']
result = [s for s in l if s.endswith('10')]
print(result)
['time_10', 'date_10']
If you insist on using regex,
import re
result = [s for s in l if re.search('10$', s)]

Related

Extract number from a string using a pattern

I have strings like :
's3://bukcet_name/tables/name=moonlight/land/timestamp=2020-06-25 01:00:23.180745/year=2019/month=5'
And from it I would like to obtain a tuple contain the year value and the month value as first and second element of my tuple.
('2019', '5')
For now I did this :
([elem.split('=')[-1:][0] for elem in part[0].split('/')[-2:]][0], [elem.split('=')[-1:][0] for elem in part[0].split('/')[-2:]][1])
It isn't very elegant, how could I do better ?
Use, re.findall along with the given regex pattern:
import re
matches = re.findall(r'(?i)/year=(\d+)/month=(\d+)', string)
Result:
# print(matches)
[('2019', '5')]
Test the regex pattern here.
Perhaps regular expressions could do it. I would use regular expressions to capture the strings 'year=2019' and 'month=5' then return the item at index [-1] by splitting these two with the character '='. Hold on, let me open up my Sublime and try to write actual code which suits your specific case.
import re
search_string = 's3://bukcet_name/tables/name=moonlight/land/timestamp=2020-06-25 01:00:23.180745/year=2019/month=5'
string1 = re.findall(r'year=\d+', search_string)
string2 = re.findall(r'month=\d+', search_string)
result = (string1[0].split('=')[-1], string2[0].split('=')[-1]) print(result)

find substring from list - python

I have a list with elements I would like to remove from a string:
Example
list = ['345','DEF', 'QWERTY']
my_string = '12345XYZDEFABCQWERTY'
Is there a way to iterate list and find where are the elements in the string? My final objective is to remove those elements from the string (I don't know if is this the proper way, since strings are immutable)
You could use a regex union :
import re
def delete_substrings_from_string(substrings, text):
pattern = re.compile('|'.join(map(re.escape, substrings)))
return re.sub(pattern, '', text)
print(delete_substrings_from_string(['345', 'DEF', 'QWERTY'], '12345XYZDEFABCQWERTY'))
# 12XYZABC
print(delete_substrings_from_string(['AA', 'ZZ'], 'ZAAZ'))
# ZZ
It uses re.escape to avoid interpreting the string content as a literal regex.
It uses only one pass so it should be reasonably fast and it ensures that the second example isn't converted to an empty string.
If you want a faster solution, you could build a Trie-based regex out of your substrings.

Search for any number of unknown substrings in place of * in a list of string

First of all, sorry if the title isn't very explicit, it's hard for me to formulate it properly. That's also why I haven't found if the question has already been asked, if it has.
So, I have a list of string, and I want to perform a "procedural" search replacing every * in my target-substring by any possible substring.
Here is an example:
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor('mesh_*')
# should return: ['mesh_1_TMP', 'mesh_2_TMP']
In this case where there is just one * I just split each string with * and use startswith() and/or endswith(), so that's ok.
But I don't know how to do the same thing if there are multiple * in the search string.
So my question is, how do I search for any number of unknown substrings in place of * in a list of string?
For example:
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor('*_1_*')
# should return: ['obj_1_mesh', 'mesh_1_TMP']
Hope everything is clear enough. Thanks.
Consider using 'fnmatch' which provides Unix-like file pattern matching. More info here http://docs.python.org/2/library/fnmatch.html
from fnmatch import fnmatch
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor = '*_1_*'
resultSubList = [ strList[i] for i,x in enumerate(strList) if fnmatch(x,searchFor) ]
This should do the trick
I would use the regular expression package for this if I were you. You'll have to learn a little bit of regex to make correct search queries, but it's not too bad. '.+' is pretty similar to '*' in this case.
import re
def search_strings(str_list, search_query):
regex = re.compile(search_query)
result = []
for string in str_list:
match = regex.match(string)
if match is not None:
result+=[match.group()]
return result
strList= ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
print search_strings(strList, '.+_1_.+')
This should return ['obj_1_mesh', 'mesh_1_TMP']. I tried to replicate the '*_1_*' case. For 'mesh_*' you could make the search_query 'mesh_.+'. Here is the link to the python regex api: https://docs.python.org/2/library/re.html
The simplest way to do this is to use fnmatch, as shown in ma3oun's answer. But here's a way to do it using Regular Expressions, aka regex.
First we transform your searchFor pattern so it uses '.+?' as the "wildcard" instead of '*'. Then we compile the result into a regex pattern object so we can efficiently use it multiple tests.
For an explanation of regex syntax, please see the docs. But briefly, the dot means any character (on this line), the + means look for one or more of them, and the ? means do non-greedy matching, i.e., match the smallest string that conforms to the pattern rather than the longest, (which is what greedy matching does).
import re
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor = '*_1_*'
pat = re.compile(searchFor.replace('*', '.+?'))
result = [s for s in strList if pat.match(s)]
print(result)
output
['obj_1_mesh', 'mesh_1_TMP']
If we use searchFor = 'mesh_*' the result is
['mesh_1_TMP', 'mesh_2_TMP']
Please note that this solution is not robust. If searchFor contains other characters that have special meaning in a regex they need to be escaped. Actually, rather than doing that searchFor.replace transformation, it would be cleaner to just write the pattern using regex syntax in the first place.
If the string you are looking for looks always like string you can just use the find function, you'll get something like:
for s in strList:
if s.find(searchFor) != -1:
do_something()
If you have more than one string to look for (like abc*123*test) you gonna need to look for the each string, find the second one in the same string starting at the index you found the first + it's len and so on.

Using regex to grep a string in Python

I am trying to grep two kinds of patterns in a list using re in python:
'<xyz>number followed by optional *</xyz>'
'name="namepad">number</xyz>
Using regex in python, I am not able to get the data with asterisk. Here is a sample session, what can I do so that the filter also returns the first element?
>>> k = ['<xyz>27*</xyz>', 'name="namePad">22</xyz>']
>>> f = filter(lambda x:re.search('^name="namePad"|^<xyz>[0-9]{1,3}\*" <\/xyz>',x), k)
>>> f
['name="namePad">22</xyz>']
Your regex has mismatched " quotes. Try this:
filter(lambda x:re.search(r'^name="namePad"|^<xyz>[\d]{1,3}\*?</xyz>',x), k)
It will give you the following:
['27*', 'name="namePad">22']
You can use re.match since to check for a match only at the beginning of the string. Also you don't need filter use list comprehensions instead.
>>> [i for i in k if re.match(r'(<xyz>|name="namePad">)\d+\*?', i)]
['<xyz>27*</xyz>', 'name="namePad">22</xyz>']
The ? after * mean that * is optional you can read more about quantifiers Here

How to add a dot in python list

How do I add a dot into a Python list?
For example
groups = [0.122, 0.1212, 0.2112]
If I want to output this data, how would I make it so it is like
122, 1212, 2112
I tried write(groups...[0]) and further research but didn't get far. Thanks.
Thankyou
[str(g).split(".")[1] for g in groups]
results in
['122', '1212', '2112']
Edit:
Use it like this:
groups = [0.122, 0.1212, 0.2112]
decimals = [str(g).split(".")[1] for g in groups]
You could use a list comprehension and return a list of strings
groups = [0.122, 0.1212, 0.2112]
[str(x).split(".")[1] for x in groups]
Result
['122', '1212', '2112']
The list comprehension is doing the following:
Turn each list element into a string
Split the string about the "." character
Return the substring to the right of the split
Return a list based on the above logic
This should do it:
groups = [0.122, 0.1212, 0.2112]
import re
groups_str = ", ".join([str(x) for x in groups])
re.sub('[0-9]*[.]', "", groups_str)
[str(x) for x in groups] will make strings of the items.
", ".join will connect the items, as a string.
import re allows you to replace regular expressions:
using re.sub, the regular expression is used by replacing any numbers followed by a dot by nothing.
EDIT (no extra modules):
Working with Lutz' answer, this will also work in the case there is an integer (no dot):
decimals = [str(g).split("0.") for g in groups]
decimals = decimals = [i for x in decimals for i in x if i != '']
It won't work though when you have numbers like 11.11, where there is a part you don't want to ignore in front of the dot.

Categories