convert link list into python array - python

So I have a lot of links in this format:
www.web.com
www.web2.com
www.web3.com
....
and i want to turn them into a python array. So basically into:
"www.web.com", "www.web2.com", "www.web3.com", ....
Is there any way I can use search and replace or any simple program to make that happen? thank you.

just us str.split as follows:
links_str = """www.web.com
www.web2.com
www.web3.com
...."""
links_list = links_str.split('\n') # \n means line break
print(links_list)
# output: ["www.web.com", "www.web2.com", "www.web3.com", "...."]

Related

How to extract string from python list

Feels like this should be easy, but I can't find the right keywords to search for the answer.
Given ['"https://container.blob.core.windows.net/"'] as results from a python statement...
...how do I extract only the URL and drop the ['" and "']?
You want the first element of the list without the first and last char
>>> l[0][1:-1]
'https://container.blob.core.windows.net/'
How about using regex??
In [35]: url_list = ['"https://container.blob.core.windows.net/"']
In [36]: url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', url_list[
...: 0])[0]
In [37]: print(url)
https://container.blob.core.windows.net/
try:
a = ['"https://container.blob.core.windows.net/"']
result = a[0].replace("\"","")
print(result)
Result:
'https://container.blob.core.windows.net/'
As a python string.
How about getting first element using list[0] and remove the single quotes from it using replace() or strip() ?
print(list[0].replace("'",""))
OR
print(list[0].strip("'")

spliting urls and geting new ones

I have a huge lists o urls like this :
https://www.example1.com/var1/var2/var3/
https://www.example2.com/var1/var2/var3/var4
https://www.example4.com/var1/
and I want to be able to extract only the first two elements of the paths if there is more than 2 elements in the path section.
like this:
https://www.example1.com/var1/var2/
https://www.example2.com/var1/var2/
https://www.example4.com/var1/
I'm using python and I know that I should use Regex but the code that I have tried is not giving me what I want.
Or use a list comprehension with a split and getting the first five splits:
print(['/'.join(i.split('/')[:5]) for i in l])]
Output:
['https://www.example1.com/var1/var2', 'https://www.example2.com/var1/var2', 'https://www.example4.com/var1/']
You can use str.split("/", 5) with str.join
Ex:
s = ['https://www.example1.com/var1/var2/var3/', 'https://www.example2.com/var1/var2/var3/var4', 'https://www.example4.com/var1/']
for i in s:
print( "/".join(i.split("/", 5)[:-1]) )
Output:
https://www.example1.com/var1/var2
https://www.example2.com/var1/var2
https://www.example4.com/var1

Decoding String list in python from a binary file

I need to read a list of strings from a binary file and create a python list.
I'm using the below command to extract data from binary file:
tmp = f.read(100)
abc, = struct.unpack('100c',tmp)
The data that I can see in variable 'abc' is exactly as shown below, but I need to get the below data into a python list as strings.
Data that I need as a list: 'UsrVal' 'VdetHC' 'VcupHC' ..... 'Gravity_Axis'
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
Here is how i would suggest you to do it with one liner.
You need to decode binary string and then you can do a split based on "\x00" which will return the list you are looking for.
e.g
my_binary_out = b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
decoded_list = my_binary_out.decode("latin1", 'ignore').split('\x00')
#or
decoded_list = my_binary_out.decode("cp1252", 'ignore').split('\x00')
Output Will look like this :
['UsrVal', 'VdetHC', 'VcupHC', 'VdirHC', 'HdirHC', 'UpFlwHC', 'UxHC', 'UyHC', 'UzHC', 'VresHC', 'UxRP', 'UyRP', 'UzRP', 'VresRP', 'Gravity_Axis']
Hope this helps
If you're going for a quick and messy way here, AND assuming your string
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
is in fact interpreted as
" b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis' "
Then the following few lines of code result with 'b' having the array you want.
a = {YourStringHere}
b = a[2:-1].split("\x00")

String comparing in python

I have an array of strings like
urls_parts=['week', 'weeklytop', 'week/day']
And i need to monitor inclusion of this strings in my url, so this example needs to be triggered by weeklytop part only:
url='www.mysite.com/weeklytop/2'
for part in urls_parts:
if part in url:
print part
But it is of course triggered by 'week' too.
What is the way to do it right?
OOps, let me specify my question a bit.
I need that code not to trigger when url='www.mysite.com/week/day/2' and part='week'
The only url needed to trigger on is when the part='week' and the url='www.mysite.com/week/2' or 'www.mysite.com/week/2-second' for example
This is how I would do it.
import re
urls_parts=['week', 'weeklytop', 'week/day']
urls_parts = sorted(urls_parts, key=lambda x: len(x), reverse=True)
rexes = [re.compile(r'{part}\b'.format(part=part)) for part in urls_parts]
urls = ['www.mysite.com/weeklytop/2', 'www.mysite.com/week/day/2', 'www.mysite.com/week/4']
for url in urls:
for i, rex in enumerate(rexes):
if rex.search(url):
print url
print urls_parts[i]
print
break
OUTPUT
www.mysite.com/weeklytop/2
weeklytop
www.mysite.com/week/day/2
week/day
www.mysite.com/week/4
week
Suggestion to sort by length came from #Roman
Sort you list by len and break from the loop at first match.
try something like this:
>>> print(re.findall('\\weeklytop\\b', 'www.mysite.com/weeklytop/2'))
['weeklytop']
>>> print(re.findall('\\week\\b', 'www.mysite.com/weeklytop/2'))
[]
program:
>>> urls_parts=['week', 'weeklytop', 'week/day']
>>> url='www.mysite.com/weeklytop/2'
>>> for parts in urls_parts:
if re.findall('\\'+parts +r'\b', url):
print (parts)
output:
weeklytop
Why not use urls_parts like this?
['/week/', '/weeklytop/', '/week/day/']
A slight change in your code would solve this issue -
>>> for part in urls_parts:
if part in url.split('/'): #splitting the url string with '/' as delimiter
print part
weeklytop

python parse csv to lists

I have a csv file thru which I want to parse the data to the lists.
So I am using the python csv module to read that
so basically the following:
import csv
fin = csv.reader(open(path,'rb'),delimiter=' ',quotechar='|')
print fin[0]
#gives the following
['"1239","2249.00","1","3","2011-02-20"']
#lets say i do the following
ele = str(fin[0])
ele = ele.strip().split(',')
print ele
#gives me following
['[\'"1239"', '"2249.00"', '"1"', '"3"', '"2011-02-20"\']']
now
ele[0] gives me --> output---> ['"1239"
How do I get rid of that ['
In the end, I want to do is get 1239 and convert it to integer.. ?
Any clues why this is happening
Thanks
Edit:*Never mind.. resolved thanks to the first comment *
Change your delimiter to ',' and you will get a list of those values from the csv reader.
It's because you are converting a list to a string, there is no need to do this. Grab the first element of the list (in this case it is a string) and parse that:
>>> a = ['"1239","2249.00","1","3","2011-02-20"']
>>> a
['"1239","2249.00","1","3","2011-02-20"']
>>> a[0]
'"1239","2249.00","1","3","2011-02-20"'
>>> b = a[0].replace('"', '').split(',')
>>> b[-1]
'2011-02-20'
of course before you do replace and split string methods you should check if the type is string or handle the exception if it isn't.
Also Blahdiblah is correct your delimiter is probably wrong.

Categories