python parsing a string - python

I have a list with strings.
list_of_strings
They look like that:
'/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
I want to part this string into:
/folder1/folder2/folder3/folder4/folder5/exp-* and put this into a new list.
I thought to do something like that, but I am lacking the right snippet to do what I want:
list_of_stringparts = []
for string in sorted(list_of_strings):
part= string.split('/')[7] # or whatever returns the first part of my string
list_of_stringparts.append(part)
has anyone an idea? Do I need a regex?

You are using array subscription which extracts one (eigth) element. To get first seven elements, you need a slicing [N:M:S] like this:
>>> l = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> l.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
In our case N is ommitted (by default 0) and S is step which is by default set to 1, so you'll get elements 0-7 from the result of split.
To construct your string back, use join():
>>> '/'.join(s)
'/folder1/folder2/folder3/folder4/folder5/exp-*'

I would do like this,
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> s.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
>>> '/'.join(s.split('/')[:7])
'/folder1/folder2/folder3/folder4/folder5/exp-*'
Using re.match
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> re.match(r'.*?\*', s).group()
'/folder1/folder2/folder3/folder4/folder5/exp-*'

Your example suggests that you want to partition the strings at the first * character. This can be done with str.partition():
list_of_stringparts = []
list_of_strings = ['/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder1/exp-*/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder/blah/pow']
for s in sorted(list_of_strings):
head, sep, tail = s.partition('*')
list_of_stringparts.append(head + sep)
>>> list_of_stringparts
['/folder/blah/pow', '/folder1/exp-*', '/folder1/folder2/folder3/folder4/folder5/exp-*']
Or this equivalent list comprehension:
list_of_stringparts = [''.join(s.partition('*')[:2]) for s in sorted(list_of_strings)]
This will retain any string that does not contain a * - not sure from your question if that is desired.

Related

How can I modify every string in a list based on a character in each string?

For example, emails have "#" symbols in all of them.
lst = ["johndoe#example.com","doejohn#email.com","maryjane#domain.com"]
How could I replace all the text after "#" when they vary in character count? My goal would be to get a new list such as:
newlist = ["johndoe#changed.com", "doejohn#changed.com", "maryjane#changed.com"]
lst = ["johndoe#example.com", "doejohn#email.com", "maryjane#domain.com"]
out = [i.split("#")[0] + "#changed.com" if "#" in i else i for i in lst]
print(out)
Prints:
['johndoe#changed.com', 'doejohn#changed.com', 'maryjane#changed.com']
Note: for more advanced use-cases, I recommend to look at re module.
You can use .split() which will return an array of strings either side of the symbol.
list = ["johndoe#example.com","doejohn#email.com","maryjane#domain.com"]
split_list = []
for email in list:
split_email = email.split("#")
split_list.append(split_email[0] + "#changed.com")
print(split_list)
Option 1: use a regex (regular expression). These use the re library.
Option 2: do it a little more pythonically, by looping through the list and splitting the string into a domain and info section, and then setting the domain as "changed.com". Like so:
emails = ["aperson#email.com", "another#example.org", "onemore#something.net"]
new = [email.split("#")[0]+"#newone.com" for email in emails]
This uses something called a list comprehension, which is a really good python feature. It makes a list called new which stores every item of email with a domain ending.
You could make a function like this:
def new_domain(emails,domain):
return [email.split("#")[0]+"#"+domain for email in emails]
This means you can use it whenever you have a list of emails you want to change the ending of:
>>> emails = ["johndoe#example.com","doejohn#email.com","maryjane#domain.com"]
>>> new_domain(emails,"changed.com")
["johndoe#changed.com", "doejohn#changed.com", "maryjane#changed.com"]
You can use a regex, or you can use .partition in this particular example:
lst = ["johndoe#example.com","doejohn#email.com","maryjane#domain.com"]
>>> [lh+sep+'changed.com' for lh,sep,rh in (s.partition('#') for s in lst)]
['johndoe#changed.com', 'doejohn#changed.com', 'maryjane#changed.com']
If the '#' is possibly not in the string, and you would not want those strings modified, you can do:
[lh+sep+'changed.com' if sep else lh
for lh,sep,rh in (s.partition('#') for s in lst)]
You can use python slices to slice off the part of each email between '#' and '.', and use the replace method to replace the slice with the desired domain:
lst = ["johndoe#example.com","doejohn#email.com","maryjane#domain.com"]
lst = [e.replace(e[e.find('#')+1:e.find('.')], 'changed') for e in lst]
print(lst)
Output:
['johndoe#changed.com', 'doejohn#changed.com', 'maryjane#changed.com']
This method will come in handy if not all the email address end with '.com', for example, some ends with '.ca'.
Although split() and regular expressions make sense as the problem gets more complex, for the probem as described, we can simply use index():
lst = ["johndoe#example.com", "doejohn#email.com", "maryjane#domain.com"]
newlst = [address[:address.index('#')] + "#changed.com" for address in lst]
print(newlst)
OUTPUT
> python3 test.py
['johndoe#changed.com', 'doejohn#changed.com', 'maryjane#changed.com']
>

How to append a list withoud adding the quote

I have this code:
def add_peer_function():
all_devices=[]
all_devices.append("cisco,linux")
print(all_devices)
add_peer_function()
Which results in :
['cisco,linux']
My question is how can append the list without qota. So a result like this:
[cisco,router]
Well, I know two possible ways, but the first one is faster:
1:
def add_peer_function():
all_devices=[item for item in "cisco,linux".split(',')] # or `all_devices = ["cisco", "linux"]`
print(', '.join(all_devices)) # A prettier way to print list Thanks to Philo
add_peer_function()
2:
def add_peer_function():
all_devices=[]
for item in "cisco,linux".split(','): # or `all_devices = ["cisco", "linux"]`
all_devices.append(item)
print(', '.join(all_devices)) # A prettier way to print list Thanks to Philo
add_peer_function()
Python str.split documentation.
Python str.join documentation.
Python list comprehension documentation.
Python prints objects, by default, with its convention: strings are between quotes.
If you want to get another format, you can write your own formatter.
For lists of strings, a common pattern in Python is:
my_list = ['one', 'two', 'three']
print(', '.join(my_list))
Replace ', ' by another separator, eventually.
Finally, note that "cisco,linux" is just a string with a coma, which is different from a list of strings: ["cisco", "linux"].
Of course, if you append the string 'cisco,linux' to a list, you get ['cisco,linux'] which is the string representation of this list in Python.
What you what is to split the string.
Try:
>>> 'cisco,linux'.split(',')
['cisco', 'linux']
append accepts only one argument. so, your_list.append(something) will add something to your_list. you can however do sth like below.
your_list += [el for el in "cisco,linux".split(",")]

split string python with changing indicator

lets say I have a string .
a = '!!!!!!a1######a2&&&&&&a3::::'
It naturally splits by: a1,a2 and a3 to
['!!!!!!','######','&&&&&&','::::']
I want to use the split function, something like:
>>> a.split('a*')
The * indicates that it doesn't matter what character comes after a. Is there an immediate way to do this?
s = '!!!!!!a1######a2&&&&&&a3::::'
import re
print(re.split(r'a[0-9]+', s))
By using regex, with module re, you can try like this:
import re
a = re.split(r'a\d','!!!!!!a1######a2&&&&&&a3::::')
If you want to be more specific in your split keys, try this:
a = re.split(r'a1|a2|a3','!!!!!!a1######a2&&&&&&a3::::')
and make your custom condition, as you wish.
While not as efficient as #Menglong's solution, you could technically do it with just list and string operations, without importing re:
>>> a = '!!!!!!a1######a2&&&&&&a3::::'
>>> s = a.split('a')
>>> s[:1] + [x[1:] for x in s[1:] if x]
['!!!!!!', '######', '&&&&&&', '::::']
This works because if you split on 'a', the first character of every segment after the first will be the * character you want to get rid of.
This solution is not preferable, just something I did as an exercise.

Split Sentences of a String

HI i am trying to split a text
For example
'C:/bye1.txt'
i would want 'C:/bye1.txt' only
'C:/bye1.txt C:/hello1.txt'
i would want C:/hello1.txt only
'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
i would want C:/bye3 and so on.
Code i tried, Problem is it only print out Y.
x = "Hello i am boey"
Last = x[len(x)-1]
print Last
Look at this:
>>> x = 'C:/bye1.txt'
>>> x.split()[-1]
'C:/bye1.txt'
>>> y = 'C:/bye1.txt C:/hello1.txt'
>>> y.split()[-1]
'C:/hello1.txt'
>>> z = 'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
>>> z.split()[-1]
'C:/bye3'
>>>
Basically, you split the strings using str.split and then get the last item (that's what [-1] does).
x = "Hello i am boey".split()
Last = x[len(x)-1]
print Last
Though more Pythonic:
x = "Hello i am boey".split()
print x[-1]
Try:
>>> x = 'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
>>> x.split()[-1]
'C:/bye3'
>>>
x[len()-1] returns you the last character of the list (in this case a list of characters). It is the same as x[-1]
To get the output you want, do this:
x.split()[-1]
If your sentences are separated by other delimiters, you can specify the delimeter to split like so:
delimeter = ','
item = x.split(delimeter)[-1] # Split list according to delimeter and get last item
In addition, the "lstrip()", "rstrip()", and "strip()" functions might be useful for removing unnecessary characters from the end of your string. Look them up in the python documentations.
Answer:
'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'.split()[-1]
would give you 'C:/bye3'.
Details:
The split method without any parameter assumes space to be the delimiter. In the example above, it returns the following list:
['C:/bye1.txt', 'C:/hello1.txt', 'C:/bye2', 'C:/bye3']
An index of -1 specifies taking the first character in the reversed order (from the back).

Python - How to cut a string in Python?

Suppose that I have the following string:
http://www.domain.com/?s=some&two=20
How can I take off what is after & including the & and have this string:
http://www.domain.com/?s=some
Well, to answer the immediate question:
>>> s = "http://www.domain.com/?s=some&two=20"
The rfind method returns the index of right-most substring:
>>> s.rfind("&")
29
You can take all elements up to a given index with the slicing operator:
>>> "foobar"[:4]
'foob'
Putting the two together:
>>> s[:s.rfind("&")]
'http://www.domain.com/?s=some'
If you are dealing with URLs in particular, you might want to use built-in libraries that deal with URLs. If, for example, you wanted to remove two from the above query string:
First, parse the URL as a whole:
>>> import urlparse, urllib
>>> parse_result = urlparse.urlsplit("http://www.domain.com/?s=some&two=20")
>>> parse_result
SplitResult(scheme='http', netloc='www.domain.com', path='/', query='s=some&two=20', fragment='')
Take out just the query string:
>>> query_s = parse_result.query
>>> query_s
's=some&two=20'
Turn it into a dict:
>>> query_d = urlparse.parse_qs(parse_result.query)
>>> query_d
{'s': ['some'], 'two': ['20']}
>>> query_d['s']
['some']
>>> query_d['two']
['20']
Remove the 'two' key from the dict:
>>> del query_d['two']
>>> query_d
{'s': ['some']}
Put it back into a query string:
>>> new_query_s = urllib.urlencode(query_d, True)
>>> new_query_s
's=some'
And now stitch the URL back together:
>>> result = urlparse.urlunsplit((
parse_result.scheme, parse_result.netloc,
parse_result.path, new_query_s, parse_result.fragment))
>>> result
'http://www.domain.com/?s=some'
The benefit of this is that you have more control over the URL. Like, if you always wanted to remove the two argument, even if it was put earlier in the query string ("two=20&s=some"), this would still do the right thing. It might be overkill depending on what you want to do.
You need to split the string:
>>> s = 'http://www.domain.com/?s=some&two=20'
>>> s.split('&')
['http://www.domain.com/?s=some', 'two=20']
That will return a list as you can see so you can do:
>>> s2 = s.split('&')[0]
>>> print s2
http://www.domain.com/?s=some
string = 'http://www.domain.com/?s=some&two=20'
cut_string = string.split('&')
new_string = cut_string[0]
print(new_string)
You can use find()
>>> s = 'http://www.domain.com/?s=some&two=20'
>>> s[:s.find('&')]
'http://www.domain.com/?s=some'
Of course, if there is a chance that the searched for text will not be present then you need to write more lengthy code:
pos = s.find('&')
if pos != -1:
s = s[:pos]
Whilst you can make some progress using code like this, more complex situations demand a true URL parser.
>>str = "http://www.domain.com/?s=some&two=20"
>>str.split("&")
>>["http://www.domain.com/?s=some", "two=20"]
s[0:"s".index("&")]
what does this do:
take a slice from the string starting at index 0, up to, but not including the index of &in the string.

Categories