How can I extract words before some string? - python

I have several strings like this:
mylist = ['pearsapple','grapevinesapple','sinkandapple'...]
I want to parse the parts before apple and then append to a new list:
new = ['pears','grapevines','sinkand']
Is there a way other than finding starting points of 'apple' in each string and then appending before the starting point?

By using slicing in combination with the index method of strings.
>>> [x[:x.index('apple')] for x in mylist]
['pears', 'grapevines', 'sinkand']
You could also use a regular expression
>>> import re
>>> [re.match('(.*?)apple', x).group(1) for x in mylist]
['pears', 'grapevines', 'sinkand']
I don't see why though.

I hope the word apple will be fix (fixed length word), then we can use:
second_list = [item[:-5] for item in mylist]

If some elements in the list don't contain 'apple' at the end of the string, this regex leaves the string untouched:
>>> import re
>>> mylist = ['pearsapple','grapevinesapple','sinkandapple', 'test', 'grappled']
>>> [re.sub('apple$', '', word) for word in mylist]
['pears', 'grapevines', 'sinkand', 'test', 'grappled']

By also using string split and list comprehension
new = [x.split('apple')[0] for x in mylist]
['pears', 'grapevines', 'sinkand']

One way to do it would be to iterate through every string in the list and then use the split() string function.
for word in mylist:
word = word.split("apple")[0]

Related

How to replace a character within a string in a list?

I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]

Replacing "<" with "*" with a Python regex

I need to go through strings in a list "listb", replacing the character "<" with "*".
I tried like this:
import re
for i in listb:
i = re.sub('\<','\*', 0)
But I keep getting TypeError: expected string or buffer.
Not sure what am I doing wrong and examples on the net were not much help.
See the docs
As per Seth's comment, the best way to do this using regular expressions is going to be:
listb = [re.sub(r'<',r'*', i) for i in listb]
As #Paco, said, you should be using str.replace() instead. But if you still want to use re:
You're putting 0 where the string is supposed to go! The TypeError is from the that third parameter. It's an int, needs to be a string.
Side note: always use raw strings, denoted by r'', in your regexes, so you don't have to escape.
>>> listb = ['abc', '<asd*', '<<>>**']
>>> for i in listb:
... i = re.sub(r'<',r'*', i)
... print i
...
abc
*asd*
**>>**
>>> listb
['abc', '<asd*', '<<>>**']
if you want a new list with all those replaced, do:
>>> listx = []
>>> for i in listb:
... listx.append(re.sub(r'<',r'*', i))
...
>>> listx
['abc', '*asd*', '**>>**']
>>> listb
['abc', '<asd*', '<<>>**']
>>> listb = listx
If you really don't want to create a new list, you can iterate through the indices.
Note that you're not changing i in the list. I would create a new list here. Each i here is its own variable, which doesn't point to listb.
>>> my_string = 'fowiejf<woiefjweF<woeiufjweofj'
>>> my_string.replace('<', '*')
'fowiejf*woiefjweF*woeiufjweofj'
Why are you using the re module for such a simple thing? you can use the .replace method.

Remove strings from a list that contains numbers in python [duplicate]

This question already has answers here:
How to remove all integer values from a list in python
(8 answers)
Closed 6 years ago.
Is there a short way to remove all strings in a list that contains numbers?
For example
my_list = [ 'hello' , 'hi', '4tim', '342' ]
would return
my_list = [ 'hello' , 'hi']
Without regex:
[x for x in my_list if not any(c.isdigit() for c in x)]
I find using isalpha() the most elegant, but it will also remove items that contain other non-alphabetic characters:
Return true if all characters in the string are alphabetic and there is at least one character, false otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”
my_list = [item for item in my_list if item.isalpha()]
I'd use a regex:
import re
my_list = [s for s in my_list if not re.search(r'\d',s)]
In terms of timing, using a regex is significantly faster on your sample data than the isdigit solution. Admittedly, it's slower than isalpha, but the behavior is slightly different with punctuation, whitespace, etc. Since the problem doesn't specify what should happen with those strings, it's not clear which is the best solution.
import re
my_list = [ 'hello' , 'hi', '4tim', '342' 'adn322' ]
def isalpha(mylist):
return [item for item in mylist if item.isalpha()]
def fisalpha(mylist):
return filter(str.isalpha,mylist)
def regex(mylist,myregex = re.compile(r'\d')):
return [s for s in mylist if not myregex.search(s)]
def isdigit(mylist):
return [x for x in mylist if not any(c.isdigit() for c in x)]
import timeit
for func in ('isalpha','fisalpha','regex','isdigit'):
print func,timeit.timeit(func+'(my_list)','from __main__ import my_list,'+func)
Here are my results:
isalpha 1.80665302277
fisalpha 2.09064006805
regex 2.98224401474
isdigit 8.0824341774
Try:
import re
my_list = [x for x in my_list if re.match("^[A-Za-z_-]*$", x)]
Sure, use the string builtin for digits, and test the existence of them.
We'll get a little fancy and just test for truthiness in the list comprehension; if it's returned anything there's digits in the string.
So:
out_list = []
for item in my_list:
if not [ char for char in item if char in string.digits ]:
out_list.append(item)
And yet another slight variation:
>>> import re
>>> filter(re.compile('(?i)[a-z]').match, my_list)
['hello', 'hi']
And put the characters that are valid in your re (such as spaces/punctuation/other)

python- reverse 2 strings in 1 list separately?

In python, I have this list containing
['HELLO', 'WORLD']
how do I turn that list into
['OLLEH', 'DLROW']
>>> words = ['HELLO', 'WORLD']
>>> [word[::-1] for word in words]
['OLLEH', 'DLROW']
Using a list comprehension:
reversed_list = [x[::-1] for x in old_list]
Arguably using the builtin reversed is more clear than slice notation x[::-1].
[reversed(word) for word in words]
or
map(reversed, words)
Map is fast without the lambda.
I just wish it was easier to get the string out of the resulting iterator. Is there anything better than ''.join() to put together the string from the iterator?
Using map and lambda(lambdas are slow):
>>> lis=['HELLO', 'WORLD']
>>> map(lambda x:x[::-1],lis)
['OLLEH', 'DLROW']

Cross-matching two lists

I have two lists where I am trying to see if there is any matches between substrings in elements in both lists.
["Po2311tato","Pin2231eap","Orange2231edg","add22131dfes"]
["2311","233412","2231"]
If any substrings in an element matches the second list such as "Po2311tato" will match with "2311". Then I would want to put "Po2311tato" in a new list in which all elements of the first that match would be placed in the new list. So the new list would be ["Po2311tato","Pin2231eap","Orange2231edg"]
You can use the syntax 'substring' in string to do this:
a = ["Po2311tato","Pin2231eap","Orange2231edg","add22131dfes"]
b = ["2311","233412","2231"]
def has_substring(word):
for substring in b:
if substring in word:
return True
return False
print filter(has_substring, a)
Hope this helps!
This can be a little more concise than the jobby's answer by using a list comprehension:
>>> list1 = ["Po2311tato","Pin2231eap","Orange2231edg","add22131dfes"]
>>> list2 = ["2311","233412","2231"]
>>> list3 = [string for string in list1 if any(substring in string for substring in list2)]
>>> list3
['Po2311tato', 'Pin2231eap', 'Orange2231edg']
Whether or not this is clearer / more elegant than jobby's version is a matter of taste!
import re
list1 = ["Po2311tato","Pin2231eap","Orange2231edg","add22131dfes"]
list2 = ["2311","233412","2231"]
matchlist = []
for str1 in list1:
for str2 in list2:
if (re.search(str2, str1)):
matchlist.append(str1)
break
print matchlist

Categories