This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Literally, I've been trying to a way to solve this but it seems that I'm poor on regex;)
I need to remove (WindowsPath and )"from the strings in a list
x= ["(WindowsPath('D:/test/1_birds_bp.png'),WindowsPath('D:/test/1_eagle_mp.png'))", "(WindowsPath('D:/test/2_reptiles_bp.png'),WindowsPath('D:/test/2_crocodile_mp.png'))"]
So I tried
import re
cleaned_x = [re.sub("(?<=WindowsPath\(').*?(?='\))",'',a) for a in x]
outputs
["(WindowsPath(''),WindowsPath(''))", "(WindowsPath(''),WindowsPath(''))"]
what I need to have is;
cleaned_x= [('D:/test/1_birds_bp.png','D:/test/1_eagle_mp.png'), ('D:/test/2_reptiles_bp.png','D:/test/2_crocodile_mp.png')]
basically tuples in a list.
You can accomplish this by using re.findall like this:
>>> cleaned_x = [tuple(re.findall(r"[A-Z]:/[^']+", a)) for a in x]
>>> cleaned_x
[('D:/test/1_birds_bp.png', 'D:/test/1_eagle_mp.png'), ('D:/test/2_reptiles_bp.png',
'D:/test/2_crocodile_mp.png')]
>>>
Hope it helps.
Perhaps you could use capturing groups? For instance:
import re
re_winpath = re.compile(r'^\(WindowsPath\(\'(.*)\'\)\,WindowsPath\(\'(.*)\'\)\)$')
def extract_pair(s):
m = re_winpath.match(s)
if m is None:
raise ValueError(f"cannot extract pair from string: {s}")
return m.groups()
pairs = list(map(extract_pair, x))
Here's my take,
not pretty, and I did it in two steps so as not to make regexp spagetti, and you could turn it into a list comprehension if you like, but it should work
for a in x:
a = re.sub('(\()?WindowsPath', '', a)
a = re.sub('\)$','', a)
print(a)
Related
I have a list l.
l = ["This is","'the first 'string","and 'it is 'good"]
I want to replace all the whitespaces with "|space|" in strings that are within 's.
print (l)
# ["This is","'the|space|first|space|'string","and 'it|space|is|space|'good"]
I can't use a for loop inside a for loop and directly use .replace() as strings are not mutable
TypeError: 'str' object does not support item assignment
I have seen the below questions and none of them have helped me.
Replacing string element in for loop Python (3 answers)
Running replace() method in a for loop? (3 answers)
Replace strings using List Comprehensions (7 answers)
I have considered using re.sub but can't think of a suitable regular expression that does the job.
This works for me:
>>> def replace_spaces(str) :
... parts = str.split("'")
... for i in range(1,len(parts),2) :
... parts[i] = parts[i].replace(' ', '|')
... return "'".join( parts )
...
>>> [replace_spaces(s) for s in l]
['This is', "'the|first|'string", "and 'it|is|'good"]
>>>
I think I have solved your replacing problem with regex. You might have to polish the given code snippet a bit more to suit your need.
If I understood the question correctly, the trick was to use a regular expression to find the right space to be replaced.
match = re.findall(r"\'(.+?)\'", k) #here k is an element in list.
Placing skeleton code for your reference:
import re
l = ["This is","'the first 'string","and 'it is 'good"]
#declare output
for k in l:
match = re.findall(r"\'(.+?)\'", k)
if not match:
#append k itself to your output
else:
p = (str(match).replace(' ', '|space|'))
#append p to your output
I haven't tested it yet, but it should work. Let me know if you face any issues with this.
Using regex text-munging :
import re
l = ["This is","'the first 'string","and 'it is 'good"]
def repl(m):
return m.group(0).replace(r' ', '|space|')
l_new = []
for item in l:
quote_str = r"'.+'"
l_new.append(re.sub(quote_str, repl, item))
print(l_new)
Output:
['This is', "'the|space|first|space|'string", "and 'it|space|is|space|'g
ood"]
Full logic is basically:
Loop through elements of l.
Find the string between single quotes. Pass that to repl function.
repl function I'm using simple replace to replace spaces with |space| .
Reference for text-munging => https://docs.python.org/3/library/re.html#text-munging
This question already has answers here:
regexes: How to access multiple matches of a group? [duplicate]
(2 answers)
Closed 3 years ago.
i have a string like this:
to_search = "example <a>first</a> asdqwe <a>second</a>"
and i want to find both solutions between like this:
list = ["first","second"]
i know that when searching for one solution i should use this code:
import re
if to_search.find("<a>") > -1:
result = re.search('<a>(.*?)</a>', to_search)
s = result.group(1)
print(s)
but that only prints:
first
i tried result.group(2) and result.group(0) but i get the same solution
how can i make a list of all solutions?
Just use:
import re
to_search = "example <a>first</a> asdqwe <a>second</a>"
matches = re.findall(r'<a>(.*?)</a>', to_search)
print(matches)
OUTPUT
['first', 'second']
best to use a HTML parser than regex, but change re.search to re.findall
to_search = "example <a>first</a> asdqwe <a>second</a>"
for match in re.finditer("<a>(.*?)</a>", to_search):
captured_group = match.group(1)
# do something with captured group
This question already has answers here:
Keeping only certain characters in a string using Python?
(3 answers)
Closed 5 years ago.
So my code is
value = "123456"
I want to remove everything except for 2 and 5.
the output will be 25
the program should work even the value is changed for example
value = "463312"
the output will be 2
I tried to use remove() and replace() function. But it didn't work.
Doing it on python 3.6.2
Instead of trying to remove every unwanted character, you will be better off to build a whitelist of the characters you want to keep in the result:
>>> value = '123456'
>>> whitelist = set('25')
>>> ''.join([c for c in value if c in whitelist])
'25'
Here is another option where the loop is implicit. We build a mapping to use with str.translate where every character maps to '', unless specified otherwise:
>>> from collections import defaultdict
>>> d = defaultdict(str, str.maketrans('25', '25'))
>>> '123456'.translate(d)
'25'
In case you are looking for regex solution then you can use re.sub to replace all the characters other than 25 with ''.
import re
x = "463312"
new = re.sub('[^25]+' ,'', x)
x = "463532312"
new = re.sub('[^25]+' ,'', x)
Output:
2, 522
If you are using Python 2, you can use filter like this:
In [60]: value = "123456"
In [61]: whitelist = set("25")
In [62]: filter(lambda x: x in whitelist, value)
Out[62]: '25'
If you are using Python 3, you would need to "".join() the result of the filter.
value="23456"
j=""
for k in value:
if k=='2' or k=='5':
j=j+k
print (j)
It is the woorking program of what you said, You can give any input to the value, it will always print 25.
value = "123456"
whitelist = '25'
''.join(set(whitelist) & set(value))
'25'
I tried to approach this differently using the Python built-in filter() method with lambda. See below:
a = "1225866125" # Should return "22525"
whitelist = '25'
# Use filter to remove all elements that does not match the condition
a = "".join(filter(lambda c: c in whitelist, a))
Hope this helped!
(Edited to shorten the answer, credit #Akavall + again #salparadise)
This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
I have a string - Python :
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
Expected output is :
"Atlantis-GPS-coordinates"
I know that the expected output is ALWAYS surrounded by "/bar/" on the left and "/" on the right :
"/bar/Atlantis-GPS-coordinates/"
Proposed solution would look like :
a = string.find("/bar/")
b = string.find("/",a+5)
output=string[a+5,b]
This works, but I don't like it.
Does someone know a beautiful function or tip ?
You can use split:
>>> string.split("/bar/")[1].split("/")[0]
'Atlantis-GPS-coordinates'
Some efficiency from adding a max split of 1 I suppose:
>>> string.split("/bar/", 1)[1].split("/", 1)[0]
'Atlantis-GPS-coordinates'
Or use partition:
>>> string.partition("/bar/")[2].partition("/")[0]
'Atlantis-GPS-coordinates'
Or a regex:
>>> re.search(r'/bar/([^/]+)', string).group(1)
'Atlantis-GPS-coordinates'
Depends on what speaks to you and your data.
What you haven't isn't all that bad. I'd write it as:
start = string.find('/bar/') + 5
end = string.find('/', start)
output = string[start:end]
as long as you know that /bar/WHAT-YOU-WANT/ is always going to be present. Otherwise, I would reach for the regular expression knife:
>>> import re
>>> PATTERN = re.compile('^.*/bar/([^/]*)/.*$')
>>> s = '/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/'
>>> match = PATTERN.match(s)
>>> match.group(1)
'Atlantis-GPS-coordinates'
import re
pattern = '(?<=/bar/).+?/'
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
result = re.search(pattern, string)
print string[result.start():result.end() - 1]
# "Atlantis-GPS-coordinates"
That is a Python 2.x example. What it does first is:
1. (?<=/bar/) means only process the following regex if this precedes it (so that /bar/ must be before it)
2. '.+?/' means any amount of characters up until the next '/' char
Hope that helps some.
If you need to do this kind of search a bunch it is better to 'compile' this search for performance, but if you only need to do it once don't bother.
Using re (slower than other solutions):
>>> import re
>>> string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
>>> re.search(r'(?<=/bar/)[^/]+(?=/)', string).group()
'Atlantis-GPS-coordinates'
This question already has answers here:
How to convert string to Title Case in Python?
(10 answers)
Closed 9 years ago.
I'm having trouble trying to create a function that can do this job. The objective is to convert strings like
one to One
hello_world to HelloWorld
foo_bar_baz to FooBarBaz
I know that the proper way to do this is using re.sub, but I'm having trouble creating the right regular expressions to do the job.
You can try something like this:
>>> s = 'one'
>>> filter(str.isalnum, s.title())
'One'
>>>
>>> s = 'hello_world'
>>> filter(str.isalnum, s.title())
'HelloWorld'
>>>
>>> s = 'foo_bar_baz'
>>> filter(str.isalnum, s.title())
'FooBarBaz'
Relevant documentation:
str.title()
str.isalnum()
filter()
Found solution:
def uppercase(name):
return ''.join(x for x in name.title() if not x.isspace()).replace('_', '')