spliting urls and geting new ones - python

I have a huge lists o urls like this :
https://www.example1.com/var1/var2/var3/
https://www.example2.com/var1/var2/var3/var4
https://www.example4.com/var1/
and I want to be able to extract only the first two elements of the paths if there is more than 2 elements in the path section.
like this:
https://www.example1.com/var1/var2/
https://www.example2.com/var1/var2/
https://www.example4.com/var1/
I'm using python and I know that I should use Regex but the code that I have tried is not giving me what I want.

Or use a list comprehension with a split and getting the first five splits:
print(['/'.join(i.split('/')[:5]) for i in l])]
Output:
['https://www.example1.com/var1/var2', 'https://www.example2.com/var1/var2', 'https://www.example4.com/var1/']

You can use str.split("/", 5) with str.join
Ex:
s = ['https://www.example1.com/var1/var2/var3/', 'https://www.example2.com/var1/var2/var3/var4', 'https://www.example4.com/var1/']
for i in s:
print( "/".join(i.split("/", 5)[:-1]) )
Output:
https://www.example1.com/var1/var2
https://www.example2.com/var1/var2
https://www.example4.com/var1

Related

List of strings to create a new list of strings without quotes?

Data
crop_list = ['Cotton','Ragi', 'Groundnut', 'Sugarcane', 'Redgram', 'Sunflower', 'Paddy', 'Maize','Jowar']
Now each element is DataFrame
for a in crop_list:
vars()[a] = Data[Data['Crop']== a]
For next line of codes i might need to create a list manually, i.e. dfs
from functools import reduce
dfs =[Cotton,Ragi,Groundnut,Sugarcane,Redgram,Sunflower,Paddy,Maize,Jowar]
df_merged = reduce(lambda a,b: pd.merge(a,b, on='Year'), dfs)
so im asking is there any way to get dunamic list:
Expected output:
Another List with same strings without quotes:
new_crop_list = [Cotton,Ragi, Groundnut, Sugarcane, Redgram, Sunflower, Paddy,Maize,Jowar]
I think this is basically what you meant
crop_list = ["'Cotton'","'Ragi'", "'Groundnut'", "'Sugarcane'", "'Redgram'", "'Sunflower'", "'Paddy'", "'Maize'","'Jowar'"]
Since without quotes, a string is not a string, if this is the case. You can remove the single quotes from the list using the following code
new_list = [ x.replace("'","") for x in crop_list]
The above code will remove single quotes from around the values in the list.
The output will look like
['Cotton', 'Ragi', 'Groundnut', 'Sugarcane', 'Redgram', 'Sunflower', 'Paddy', 'Maize', 'Jowar']
You will still see single quotes in output, since its a list of strings, and the quotes denotes that .
Hope this answers your question

Split List Elements in byte format to separate bytes in python

I have a list with byte elements like this:
list = [b'\x00\xcc\n', b'\x14I\x8dy_\xeb\xbc1C']
Now I want to separate all bytes like following:
list_new =[b'\x00', b'\xcc', b'\x14I', b'\x8dy_', b'\xeb', b'\xbc1C']
I am assuming here that you wanted to split the data with split criteria of '\x', this seems to be matching with your desired output. Let me know otherwise. Also I am not sure why you got this type of string, its little awkward to work with. A bigger context on the question might be more helpful. Nevertheless, I tried to get your desired output in following way:(May be not efficient but gets your job done).
import re
from codecs import encode
lists = [b'\x00\xcc\n', b'\x14I\x8dy_\xeb\xbc1C']
split = [re.split(r'(?=\\x)', str(item)) for item in lists] ## splitting with assumption of \x using lookarounds here
output = [] ## container to save the final item
for item in split: ## split is list of lists hence required two for loops
for nitem in item:
if nitem != "b'": ## remove anything which has only "b'"
output.append(nitem.replace('\\n','').replace("'",'').encode()) ## finally appending everyitem
## Note here that output contains two backward slashes , to remove them we use encode function from codecs module
## like below
[encode(itm.decode('unicode_escape'), 'raw_unicode_escape') for itm in output] ## Final output
Output:
[b'\x00', b'\xcc', b'\x14I', b'\x8dy_', b'\xeb', b'\xbc1C']

convert link list into python array

So I have a lot of links in this format:
www.web.com
www.web2.com
www.web3.com
....
and i want to turn them into a python array. So basically into:
"www.web.com", "www.web2.com", "www.web3.com", ....
Is there any way I can use search and replace or any simple program to make that happen? thank you.
just us str.split as follows:
links_str = """www.web.com
www.web2.com
www.web3.com
...."""
links_list = links_str.split('\n') # \n means line break
print(links_list)
# output: ["www.web.com", "www.web2.com", "www.web3.com", "...."]

How to extract string from python list

Feels like this should be easy, but I can't find the right keywords to search for the answer.
Given ['"https://container.blob.core.windows.net/"'] as results from a python statement...
...how do I extract only the URL and drop the ['" and "']?
You want the first element of the list without the first and last char
>>> l[0][1:-1]
'https://container.blob.core.windows.net/'
How about using regex??
In [35]: url_list = ['"https://container.blob.core.windows.net/"']
In [36]: url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', url_list[
...: 0])[0]
In [37]: print(url)
https://container.blob.core.windows.net/
try:
a = ['"https://container.blob.core.windows.net/"']
result = a[0].replace("\"","")
print(result)
Result:
'https://container.blob.core.windows.net/'
As a python string.
How about getting first element using list[0] and remove the single quotes from it using replace() or strip() ?
print(list[0].replace("'",""))
OR
print(list[0].strip("'")

How to split parts of a string in a list based on predefined part of in the string

Plese help me with below question
sample_list = ['Ironman.mdc.googlesuite.net', 'Hulk.nba.abc.googlekey.net',
'Thor.web.gg.hh.googlestream.net', 'Antman.googled.net',
'Loki.media.googlesuite.net','Captain.googlekey.net']
I would want everything preceeding 'googlesuite.net', 'googlekey.net','googlestream.net' and 'googled.net' in list1 and corresponding prefixes in another list as:
result_list1=['Ironman.mdc', 'Hulk.nba.abc', 'Thor.web.gg.hh', 'Antman',
'Loki.media', 'Captain']
result_list2=['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']
You can always split each string in the list with '.' and get a new list. In this case, if you are only interested in the first split, you should use the second argument in the split method (which tells the occurrence):
first_list =[x.split('.')[0] for x in sample_list]
For the second list:
second_list =[x.split('.',1)[1] for x in sample_list]
A better way is to iterate only once through the sample_list and get both the lists. As shown below:
first_list, second_list = zip(* [x.split('.',1) for x in sample_list])
Using a list comprehension along with split:
sample_list = ['Ironman.googlesuite.net', 'Hulk.googlekey.net',
'Thor.googlestream.net', 'Antman.googled.net', 'Loki.googlesuite.net',
'Captain.googlekey.net']
result_list1 = [i.split('.')[0] for i in sample_list]
print(result_list1)
This prints:
['Ironman', 'Hulk', 'Thor', 'Antman', 'Loki', 'Captain']
This strategy is to retain, for each input domain, just the component up to, but not including, the first dot separator. For the second list, we can use re.sub here:
result_list2 = [re.sub(r'^[^.]+\.', '', i) for i in sample_list]
print(result_list2)
This prints:
['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']
thank you for the answers, it does help but what if I have list like this:
sample_list = ['Ironman.mdc.googlesuite.net', 'Hulk.nba.abc.googlekey.net',
'Thor.web.gg.hh.googlestream.net', 'Antman.googled.net', 'Loki.media.googlesuite.net','Captain.googlekey.net']
I would want everything preceeding 'googlesuite.net', 'googlekey.net','googlestream.net' and 'googled.net' in list1 and corresponding prefixes in another list as:
result_list1=['Ironman.mdc', 'Hulk.nba.abc', 'Thor.web.gg.hh', 'Antman', 'Loki.media', 'Captain']
result_list2=['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']

Categories