Feels like this should be easy, but I can't find the right keywords to search for the answer.
Given ['"https://container.blob.core.windows.net/"'] as results from a python statement...
...how do I extract only the URL and drop the ['" and "']?
You want the first element of the list without the first and last char
>>> l[0][1:-1]
'https://container.blob.core.windows.net/'
How about using regex??
In [35]: url_list = ['"https://container.blob.core.windows.net/"']
In [36]: url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', url_list[
...: 0])[0]
In [37]: print(url)
https://container.blob.core.windows.net/
try:
a = ['"https://container.blob.core.windows.net/"']
result = a[0].replace("\"","")
print(result)
Result:
'https://container.blob.core.windows.net/'
As a python string.
How about getting first element using list[0] and remove the single quotes from it using replace() or strip() ?
print(list[0].replace("'",""))
OR
print(list[0].strip("'")
Related
I got some data like this
https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpghttps://www.travel.taipei/streams/scenery_file_audio/c03.mp3
The thing I want to do is to put URL that contains "jpg" or "png" into a list by using Python.
like["https.....jpg", "https......jpg", "https........png"]
But I have no ideas. Any suggestions?
Try:
s = """https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpghttps://www.travel.taipei/streams/scenery_file_audio/c03.mp3"""
for url in s.split("http"):
if url.endswith(("jpg", "png")):
print("http" + url)
Prints:
https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpg
replace and split
strs ="https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpghttps://www.travel.taipei/streams/scenery_file_audio/c03.mp3"
strs =strs.replace("jpg", 'jpg ')
strs =strs.replace("png", 'png ')
print(strs.split())
output #
['https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpg', 'https://www.travel.taipei/streams/scenery_file_audio/c03.mp3']
What's a cute way to do this in python?
Say we have a list of strings:
clean_be
clean_be_al
clean_fish_po
clean_po
and we want the output to be:
be
be_al
fish_po
po
Another approach which will work for all scenarios:
import re
data = ['clean_be',
'clean_be_al',
'clean_fish_po',
'clean_po', 'clean_a', 'clean_clean', 'clean_clean_1']
for item in data:
item = re.sub('^clean_', '', item)
print (item)
Output:
be
be_al
fish_po
po
a
clean
clean_1
Here is a possible solution that works with any prefix:
prefix = 'clean_'
result = [s[len(prefix):] if s.startswith(prefix) else s for s in lst]
You've merely provided minimal information on what you're trying to achieve, but the desired output for the 4 given inputs can be created via the following function:
def func(string):
return "_".join(string.split("_")[1:])
you can do this:
strlist = ['clean_be','clean_be_al','clean_fish_po','clean_po']
def func(myList:list, start:str):
ret = []
for element in myList:
ret.append(element.lstrip(start))
return ret
print(func(strlist, 'clean_'))
I hope, it was useful, Nohab
There are many ways to do based on what you have provided.
Apart from the above answers, you can do in this way too:
string = 'clean_be_al'
string = string.replace('clean_','',1)
This would remove the first occurrence of clean_ in the string.
Also if the first word is guaranteed to be 'clean', then you can try in this way too:
string = 'clean_be_al'
print(string[6:])
You can use lstrip to remove a prefix and rstrip to remove a suffix
line = "clean_be"
print(line.lstrip("clean_"))
Drawback:
lstrip([chars])
The [chars] argument is not a prefix; rather, all combinations of its values are stripped.
So I have a lot of links in this format:
www.web.com
www.web2.com
www.web3.com
....
and i want to turn them into a python array. So basically into:
"www.web.com", "www.web2.com", "www.web3.com", ....
Is there any way I can use search and replace or any simple program to make that happen? thank you.
just us str.split as follows:
links_str = """www.web.com
www.web2.com
www.web3.com
...."""
links_list = links_str.split('\n') # \n means line break
print(links_list)
# output: ["www.web.com", "www.web2.com", "www.web3.com", "...."]
I have an array of strings like
urls_parts=['week', 'weeklytop', 'week/day']
And i need to monitor inclusion of this strings in my url, so this example needs to be triggered by weeklytop part only:
url='www.mysite.com/weeklytop/2'
for part in urls_parts:
if part in url:
print part
But it is of course triggered by 'week' too.
What is the way to do it right?
OOps, let me specify my question a bit.
I need that code not to trigger when url='www.mysite.com/week/day/2' and part='week'
The only url needed to trigger on is when the part='week' and the url='www.mysite.com/week/2' or 'www.mysite.com/week/2-second' for example
This is how I would do it.
import re
urls_parts=['week', 'weeklytop', 'week/day']
urls_parts = sorted(urls_parts, key=lambda x: len(x), reverse=True)
rexes = [re.compile(r'{part}\b'.format(part=part)) for part in urls_parts]
urls = ['www.mysite.com/weeklytop/2', 'www.mysite.com/week/day/2', 'www.mysite.com/week/4']
for url in urls:
for i, rex in enumerate(rexes):
if rex.search(url):
print url
print urls_parts[i]
print
break
OUTPUT
www.mysite.com/weeklytop/2
weeklytop
www.mysite.com/week/day/2
week/day
www.mysite.com/week/4
week
Suggestion to sort by length came from #Roman
Sort you list by len and break from the loop at first match.
try something like this:
>>> print(re.findall('\\weeklytop\\b', 'www.mysite.com/weeklytop/2'))
['weeklytop']
>>> print(re.findall('\\week\\b', 'www.mysite.com/weeklytop/2'))
[]
program:
>>> urls_parts=['week', 'weeklytop', 'week/day']
>>> url='www.mysite.com/weeklytop/2'
>>> for parts in urls_parts:
if re.findall('\\'+parts +r'\b', url):
print (parts)
output:
weeklytop
Why not use urls_parts like this?
['/week/', '/weeklytop/', '/week/day/']
A slight change in your code would solve this issue -
>>> for part in urls_parts:
if part in url.split('/'): #splitting the url string with '/' as delimiter
print part
weeklytop
I have a csv file thru which I want to parse the data to the lists.
So I am using the python csv module to read that
so basically the following:
import csv
fin = csv.reader(open(path,'rb'),delimiter=' ',quotechar='|')
print fin[0]
#gives the following
['"1239","2249.00","1","3","2011-02-20"']
#lets say i do the following
ele = str(fin[0])
ele = ele.strip().split(',')
print ele
#gives me following
['[\'"1239"', '"2249.00"', '"1"', '"3"', '"2011-02-20"\']']
now
ele[0] gives me --> output---> ['"1239"
How do I get rid of that ['
In the end, I want to do is get 1239 and convert it to integer.. ?
Any clues why this is happening
Thanks
Edit:*Never mind.. resolved thanks to the first comment *
Change your delimiter to ',' and you will get a list of those values from the csv reader.
It's because you are converting a list to a string, there is no need to do this. Grab the first element of the list (in this case it is a string) and parse that:
>>> a = ['"1239","2249.00","1","3","2011-02-20"']
>>> a
['"1239","2249.00","1","3","2011-02-20"']
>>> a[0]
'"1239","2249.00","1","3","2011-02-20"'
>>> b = a[0].replace('"', '').split(',')
>>> b[-1]
'2011-02-20'
of course before you do replace and split string methods you should check if the type is string or handle the exception if it isn't.
Also Blahdiblah is correct your delimiter is probably wrong.