spliting urls and geting new ones

spliting urls and geting new ones - python

I have a huge lists o urls like this :
https://www.example1.com/var1/var2/var3/
https://www.example2.com/var1/var2/var3/var4
https://www.example4.com/var1/
and I want to be able to extract only the first two elements of the paths if there is more than 2 elements in the path section.
like this:
https://www.example1.com/var1/var2/
https://www.example2.com/var1/var2/
https://www.example4.com/var1/
I'm using python and I know that I should use Regex but the code that I have tried is not giving me what I want.

Or use a list comprehension with a split and getting the first five splits:
print(['/'.join(i.split('/')[:5]) for i in l])]
Output:
['https://www.example1.com/var1/var2', 'https://www.example2.com/var1/var2', 'https://www.example4.com/var1/']

You can use str.split("/", 5) with str.join
Ex:
s = ['https://www.example1.com/var1/var2/var3/', 'https://www.example2.com/var1/var2/var3/var4', 'https://www.example4.com/var1/']
for i in s:
print( "/".join(i.split("/", 5)[:-1]) )
Output:
https://www.example1.com/var1/var2
https://www.example2.com/var1/var2
https://www.example4.com/var1

Related

List of strings to create a new list of strings without quotes?

Data
crop_list = ['Cotton','Ragi', 'Groundnut', 'Sugarcane', 'Redgram', 'Sunflower', 'Paddy', 'Maize','Jowar']
Now each element is DataFrame
for a in crop_list:
vars()[a] = Data[Data['Crop']== a]
For next line of codes i might need to create a list manually, i.e. dfs
from functools import reduce
dfs =[Cotton,Ragi,Groundnut,Sugarcane,Redgram,Sunflower,Paddy,Maize,Jowar]
df_merged = reduce(lambda a,b: pd.merge(a,b, on='Year'), dfs)
so im asking is there any way to get dunamic list:
Expected output:
Another List with same strings without quotes:
new_crop_list = [Cotton,Ragi, Groundnut, Sugarcane, Redgram, Sunflower, Paddy,Maize,Jowar]

I think this is basically what you meant
crop_list = ["'Cotton'","'Ragi'", "'Groundnut'", "'Sugarcane'", "'Redgram'", "'Sunflower'", "'Paddy'", "'Maize'","'Jowar'"]
Since without quotes, a string is not a string, if this is the case. You can remove the single quotes from the list using the following code
new_list = [ x.replace("'","") for x in crop_list]
The above code will remove single quotes from around the values in the list.
The output will look like
['Cotton', 'Ragi', 'Groundnut', 'Sugarcane', 'Redgram', 'Sunflower', 'Paddy', 'Maize', 'Jowar']
You will still see single quotes in output, since its a list of strings, and the quotes denotes that .
Hope this answers your question

Split List Elements in byte format to separate bytes in python

I have a list with byte elements like this:
list = [b'\x00\xcc\n', b'\x14I\x8dy_\xeb\xbc1C']
Now I want to separate all bytes like following:
list_new =[b'\x00', b'\xcc', b'\x14I', b'\x8dy_', b'\xeb', b'\xbc1C']

I am assuming here that you wanted to split the data with split criteria of '\x', this seems to be matching with your desired output. Let me know otherwise. Also I am not sure why you got this type of string, its little awkward to work with. A bigger context on the question might be more helpful. Nevertheless, I tried to get your desired output in following way:(May be not efficient but gets your job done).
import re
from codecs import encode
lists = [b'\x00\xcc\n', b'\x14I\x8dy_\xeb\xbc1C']
split = [re.split(r'(?=\\x)', str(item)) for item in lists] ## splitting with assumption of \x using lookarounds here
output = [] ## container to save the final item
for item in split: ## split is list of lists hence required two for loops
for nitem in item:
if nitem != "b'": ## remove anything which has only "b'"
output.append(nitem.replace('\\n','').replace("'",'').encode()) ## finally appending everyitem
## Note here that output contains two backward slashes , to remove them we use encode function from codecs module
## like below
[encode(itm.decode('unicode_escape'), 'raw_unicode_escape') for itm in output] ## Final output
Output:
[b'\x00', b'\xcc', b'\x14I', b'\x8dy_', b'\xeb', b'\xbc1C']

convert link list into python array

So I have a lot of links in this format:
www.web.com
www.web2.com
www.web3.com
....
and i want to turn them into a python array. So basically into:
"www.web.com", "www.web2.com", "www.web3.com", ....
Is there any way I can use search and replace or any simple program to make that happen? thank you.

just us str.split as follows:
links_str = """www.web.com
www.web2.com
www.web3.com
...."""
links_list = links_str.split('\n') # \n means line break
print(links_list)
# output: ["www.web.com", "www.web2.com", "www.web3.com", "...."]

How to extract string from python list

Feels like this should be easy, but I can't find the right keywords to search for the answer.
Given ['"https://container.blob.core.windows.net/"'] as results from a python statement...
...how do I extract only the URL and drop the ['" and "']?

You want the first element of the list without the first and last char
>>> l[0][1:-1]
'https://container.blob.core.windows.net/'

How about using regex??
In [35]: url_list = ['"https://container.blob.core.windows.net/"']
In [36]: url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', url_list[
...: 0])[0]
In [37]: print(url)
https://container.blob.core.windows.net/

try:
a = ['"https://container.blob.core.windows.net/"']
result = a[0].replace("\"","")
print(result)
Result:
'https://container.blob.core.windows.net/'
As a python string.

How about getting first element using list[0] and remove the single quotes from it using replace() or strip() ?
print(list[0].replace("'",""))
OR
print(list[0].strip("'")

How to split parts of a string in a list based on predefined part of in the string

Plese help me with below question
sample_list = ['Ironman.mdc.googlesuite.net', 'Hulk.nba.abc.googlekey.net',
'Thor.web.gg.hh.googlestream.net', 'Antman.googled.net',
'Loki.media.googlesuite.net','Captain.googlekey.net']
I would want everything preceeding 'googlesuite.net', 'googlekey.net','googlestream.net' and 'googled.net' in list1 and corresponding prefixes in another list as:
result_list1=['Ironman.mdc', 'Hulk.nba.abc', 'Thor.web.gg.hh', 'Antman',
'Loki.media', 'Captain']
result_list2=['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']

You can always split each string in the list with '.' and get a new list. In this case, if you are only interested in the first split, you should use the second argument in the split method (which tells the occurrence):
first_list =[x.split('.')[0] for x in sample_list]
For the second list:
second_list =[x.split('.',1)[1] for x in sample_list]
A better way is to iterate only once through the sample_list and get both the lists. As shown below:
first_list, second_list = zip(* [x.split('.',1) for x in sample_list])

Using a list comprehension along with split:
sample_list = ['Ironman.googlesuite.net', 'Hulk.googlekey.net',
'Thor.googlestream.net', 'Antman.googled.net', 'Loki.googlesuite.net',
'Captain.googlekey.net']
result_list1 = [i.split('.')[0] for i in sample_list]
print(result_list1)
This prints:
['Ironman', 'Hulk', 'Thor', 'Antman', 'Loki', 'Captain']
This strategy is to retain, for each input domain, just the component up to, but not including, the first dot separator. For the second list, we can use re.sub here:
result_list2 = [re.sub(r'^[^.]+\.', '', i) for i in sample_list]
print(result_list2)
This prints:
['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']

thank you for the answers, it does help but what if I have list like this:
sample_list = ['Ironman.mdc.googlesuite.net', 'Hulk.nba.abc.googlekey.net',
'Thor.web.gg.hh.googlestream.net', 'Antman.googled.net', 'Loki.media.googlesuite.net','Captain.googlekey.net']
I would want everything preceeding 'googlesuite.net', 'googlekey.net','googlestream.net' and 'googled.net' in list1 and corresponding prefixes in another list as:
result_list1=['Ironman.mdc', 'Hulk.nba.abc', 'Thor.web.gg.hh', 'Antman', 'Loki.media', 'Captain']
result_list2=['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

spliting urls and geting new ones - python

Or use a list comprehension with a split and getting the first five splits: print(['/'.join(i.split('/')[:5]) for i in l])] Output: ['https://www.example1.com/var1/var2', 'https://www.example2.com/var1/var2', 'https://www.example4.com/var1/']

Related

List of strings to create a new list of strings without quotes?

Split List Elements in byte format to separate bytes in python

convert link list into python array

How to extract string from python list

How to split parts of a string in a list based on predefined part of in the string

Categories

Resources