Applying regex on each item of a list in Python - python

If I apply this regex:
re.split(r"(^[^aeiou]+)(?=[aeiouy])", "janu")
on the string "janu", it gives the following result: ['', 'j', 'anu']
Now I want to apply this regex on the following list to get the similar results for each item as above. Can a for loop be used, and if yes, how?
lista = ['janu', 'manu', 'tanu', 'banu']

You can use a list comprehension:
>>> from re import split
>>> lista = ['janu', 'manu', 'tanu', 'banu']
>>> [split("(^[^aeiou]+)(?=[aeiouy])", x)[1]+"doc" for x in lista]
['jdoc', 'mdoc', 'tdoc', 'bdoc']
>>>
Edit regarding comment:
This will work:
>>> from re import split
>>> lista = ['janu', 'manu', 'tanu', 'banu']
>>> listb = []
>>> for item in lista:
... data = split("(^[^aeiou]+)(?=[aeiouy])", item)
... listb.append(data[2]+data[1]+"doc")
...
>>> listb
['anujdoc', 'anumdoc', 'anutdoc', 'anubdoc']
>>>

Use the list comprehension
[re.split(r"(^[^aeiou]+)(?=[aeiouy])", i) for i in list]
You can use a for loop but this is considered the pythonic way to do things.

Related

splitting a list in a better way using list comprehension

I have a simple list that I am splitting and concatenating. My code uses for loop and if condition and ugly. Can you suggest a better way using list comprehension?
My code
mylist = ['10.10.10.1','10.10.10.2,10.10.10.3,10.10.10.4,10.10.10.5','10.10.10.6']
mylist = [i.split(",") for i in mylist]
list =[]
for x,y in enumerate(mylist):
if len(y) == 1:
list.append(y[0])
else:
for z in y:
list.append(z)
print(list)
I am getting the below result and exactly the way i want
['10.10.10.1','10.10.10.2','10.10.10.3','10.10.10.4','10.10.10.5','10.10.10.6']
You want:
[s for string in mylist for s in string.split(',')]
Note, your original approach wouldn't be so bad if you just simplified. No need for enumerate and no need to check the length, so just:
final_list =[]
for sub in mylist:
for s in sub:
final_list.append(s)
By the way, you shouldn't shadow the built-in list. Use another name
I agree with #juanpa.arrivillaga. However hope we can avoid that second looping since he is checking for empty values returning while splitting
In [7]: s=['10.10.10.1','','10.10.10.2,10.10.10.3,10.10.10.4,10.10.10.5','10.10.10.6']
In [8]: [splitRec for rec in s for splitRec in rec.split(',') if splitRec]
Out[8]:
['10.10.10.1',
'10.10.10.2',
'10.10.10.3',
'10.10.10.4',
'10.10.10.5',
'10.10.10.6']
In [9]: s=['10.10.10.1',',,','10.10.10.2,10.10.10.3,10.10.10.4,10.10.10.5','10.10.10.6']
In [10]: [splitRec for rec in s for splitRec in rec.split(',') if splitRec]Out[10]:
['10.10.10.1',
'10.10.10.2',
'10.10.10.3',
'10.10.10.4',
'10.10.10.5',
'10.10.10.6']
Not a comprehension, but good anyway, I think.
','.join(mylist).split(',')
You can first just split each string on ',':
>>> mylist = ['10.10.10.1','10.10.10.2,10.10.10.3,10.10.10.4,10.10.10.5','10.10.10.6']
>>> split_str = [x.split(',') for x in mylist]
>>> split_str
[['10.10.10.1'], ['10.10.10.2', '10.10.10.3', '10.10.10.4', '10.10.10.5'], ['10.10.10.6']]
Then if you want to flatten it, you can use itertools.chain.from_iterable:
>>> from itertools import chain
>>> list(chain.from_iterable(split_str))
['10.10.10.1', '10.10.10.2', '10.10.10.3', '10.10.10.4', '10.10.10.5', '10.10.10.6']

How can I extract words before some string?

I have several strings like this:
mylist = ['pearsapple','grapevinesapple','sinkandapple'...]
I want to parse the parts before apple and then append to a new list:
new = ['pears','grapevines','sinkand']
Is there a way other than finding starting points of 'apple' in each string and then appending before the starting point?
By using slicing in combination with the index method of strings.
>>> [x[:x.index('apple')] for x in mylist]
['pears', 'grapevines', 'sinkand']
You could also use a regular expression
>>> import re
>>> [re.match('(.*?)apple', x).group(1) for x in mylist]
['pears', 'grapevines', 'sinkand']
I don't see why though.
I hope the word apple will be fix (fixed length word), then we can use:
second_list = [item[:-5] for item in mylist]
If some elements in the list don't contain 'apple' at the end of the string, this regex leaves the string untouched:
>>> import re
>>> mylist = ['pearsapple','grapevinesapple','sinkandapple', 'test', 'grappled']
>>> [re.sub('apple$', '', word) for word in mylist]
['pears', 'grapevines', 'sinkand', 'test', 'grappled']
By also using string split and list comprehension
new = [x.split('apple')[0] for x in mylist]
['pears', 'grapevines', 'sinkand']
One way to do it would be to iterate through every string in the list and then use the split() string function.
for word in mylist:
word = word.split("apple")[0]

Use list comprehension to print out a list with words of length 4

I am trying to write a list comprehension that uses List1 to create a list of words of length 4.
List1 = ['jacob','batman','mozarella']
wordList = [words for i in range(1)]
print(wordList)
This prints out the wordList however with words of length higher than 4
I am looking for this program to print out instead:
['jaco','batm','moza']
which are the same words in List1 but with length 4
I tried this and it didn't work
wordList = [[len(4)] words for i in range(1)]
any thoughts ?
You could use this list comp
>>> List1 = ['jacob','batman','mozarella']
>>> [i[:4] for i in List1]
['jaco', 'batm', 'moza']
Ref:
i[:4] is a slice of the string of first 4 characters
Other ways to do it (All have their own disadvantages)
[re.sub(r'(?<=^.{4}).*', '', i) for i in List1]
[re.match(r'.{4}', i).group() for i in List1]
[''.join(i[j] for j in range(4)) for i in List1]
[i.replace(i[4:],'') for i in List1] ----- Fails in case of moinmoin or bongbong
Credit - Avinash Raj
len() function return the length of string in your case. So list compression with len function will give the list of all item lenght.
e.g.
>>> List1 = ['jacob','batman','mozarella']
>>> [len(i) for i in List1]
[5, 6, 9]
>>>
Use slice() list method to get substring from the string. more info
e.g.
>>> a = "abcdef"
>>> a[:4]
'abcd'
>>> [i[:4] for i in List1]
['jaco', 'batm', 'moza']
Python beginner
Define List1.
Define empty List2
Use for loop to iterate every item from the List1
Use list append() method to add item into list with slice() method.
Use print to see result.
sample code:
>>> List1 = ['jacob','batman','mozarella']
>>> List2 = []
>>> for i in List1:
... List2.append(i[:4])
...
>>> print List2
['jaco', 'batm', 'moza']
>>>
One more way, now using map function:
List1 = ['jacob','batman','mozarella']
List2 = map(lambda x: x[:4], List1)

Replacing "<" with "*" with a Python regex

I need to go through strings in a list "listb", replacing the character "<" with "*".
I tried like this:
import re
for i in listb:
i = re.sub('\<','\*', 0)
But I keep getting TypeError: expected string or buffer.
Not sure what am I doing wrong and examples on the net were not much help.
See the docs
As per Seth's comment, the best way to do this using regular expressions is going to be:
listb = [re.sub(r'<',r'*', i) for i in listb]
As #Paco, said, you should be using str.replace() instead. But if you still want to use re:
You're putting 0 where the string is supposed to go! The TypeError is from the that third parameter. It's an int, needs to be a string.
Side note: always use raw strings, denoted by r'', in your regexes, so you don't have to escape.
>>> listb = ['abc', '<asd*', '<<>>**']
>>> for i in listb:
... i = re.sub(r'<',r'*', i)
... print i
...
abc
*asd*
**>>**
>>> listb
['abc', '<asd*', '<<>>**']
if you want a new list with all those replaced, do:
>>> listx = []
>>> for i in listb:
... listx.append(re.sub(r'<',r'*', i))
...
>>> listx
['abc', '*asd*', '**>>**']
>>> listb
['abc', '<asd*', '<<>>**']
>>> listb = listx
If you really don't want to create a new list, you can iterate through the indices.
Note that you're not changing i in the list. I would create a new list here. Each i here is its own variable, which doesn't point to listb.
>>> my_string = 'fowiejf<woiefjweF<woeiufjweofj'
>>> my_string.replace('<', '*')
'fowiejf*woiefjweF*woeiufjweofj'
Why are you using the re module for such a simple thing? you can use the .replace method.

Converting a list

I have a list similar to the one shown below. Do you have any ideas on how I can convert it to the one in the EXPECTED OUTPUT section below?
list =['username1,username2', 'username3','username4,username5']
EXPECTED OUTPUT:-
list = ['username1','username2', 'username3','username4','username5']
Thanks
>>> alist = ['username1,username2', 'username3','username4,username5']
>>> ','.join(alist).split(',')
['username1', 'username2', 'username3', 'username4', 'username5']
By the way, don't use list as the variable name.
you can also use
>>> alist = ['username1,username2', 'username3','username4,username5']
>>> [j for i in alist for j in i.split(',')]
but #zhangyangyu's method is faster
>>> import timeit
>>> timeit.timeit("[j for i in ['username1,username2', 'username3','username4,us
ername5'] for j in i.split(',')]", number=10000)
0.05875942333452144
>>> timeit.timeit("','.join(['username1,username2', 'username3','username4,usern
ame5']).split(',')", number=10000)
0.023530085527625033

Categories