Replacing "<" with "*" with a Python regex - python

I need to go through strings in a list "listb", replacing the character "<" with "*".
I tried like this:
import re
for i in listb:
i = re.sub('\<','\*', 0)
But I keep getting TypeError: expected string or buffer.
Not sure what am I doing wrong and examples on the net were not much help.

See the docs
As per Seth's comment, the best way to do this using regular expressions is going to be:
listb = [re.sub(r'<',r'*', i) for i in listb]
As #Paco, said, you should be using str.replace() instead. But if you still want to use re:
You're putting 0 where the string is supposed to go! The TypeError is from the that third parameter. It's an int, needs to be a string.
Side note: always use raw strings, denoted by r'', in your regexes, so you don't have to escape.
>>> listb = ['abc', '<asd*', '<<>>**']
>>> for i in listb:
... i = re.sub(r'<',r'*', i)
... print i
...
abc
*asd*
**>>**
>>> listb
['abc', '<asd*', '<<>>**']
if you want a new list with all those replaced, do:
>>> listx = []
>>> for i in listb:
... listx.append(re.sub(r'<',r'*', i))
...
>>> listx
['abc', '*asd*', '**>>**']
>>> listb
['abc', '<asd*', '<<>>**']
>>> listb = listx
If you really don't want to create a new list, you can iterate through the indices.
Note that you're not changing i in the list. I would create a new list here. Each i here is its own variable, which doesn't point to listb.

>>> my_string = 'fowiejf<woiefjweF<woeiufjweofj'
>>> my_string.replace('<', '*')
'fowiejf*woiefjweF*woeiufjweofj'
Why are you using the re module for such a simple thing? you can use the .replace method.

Related

How to replace a character within a string in a list?

I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]

How can I extract words before some string?

I have several strings like this:
mylist = ['pearsapple','grapevinesapple','sinkandapple'...]
I want to parse the parts before apple and then append to a new list:
new = ['pears','grapevines','sinkand']
Is there a way other than finding starting points of 'apple' in each string and then appending before the starting point?
By using slicing in combination with the index method of strings.
>>> [x[:x.index('apple')] for x in mylist]
['pears', 'grapevines', 'sinkand']
You could also use a regular expression
>>> import re
>>> [re.match('(.*?)apple', x).group(1) for x in mylist]
['pears', 'grapevines', 'sinkand']
I don't see why though.
I hope the word apple will be fix (fixed length word), then we can use:
second_list = [item[:-5] for item in mylist]
If some elements in the list don't contain 'apple' at the end of the string, this regex leaves the string untouched:
>>> import re
>>> mylist = ['pearsapple','grapevinesapple','sinkandapple', 'test', 'grappled']
>>> [re.sub('apple$', '', word) for word in mylist]
['pears', 'grapevines', 'sinkand', 'test', 'grappled']
By also using string split and list comprehension
new = [x.split('apple')[0] for x in mylist]
['pears', 'grapevines', 'sinkand']
One way to do it would be to iterate through every string in the list and then use the split() string function.
for word in mylist:
word = word.split("apple")[0]

Append a portion of a string to a list

I was wondering how one can append a portion of a string to a list? Is there an option of both appending based on the position of characters in the string, and another option that is able to take a specific character of interest? For instance, If I had the string "2 aikjhakihaiyhgikjewh", could I not only append the part of the string that was in positions 3-4 but also append the "2" as well? I'm a beginner, so I'm still kinda new to this python thing. Thanks.
You can use slicing to reference a portion of a string like this:
>>> s = 'hello world'
>>> s[2:5]
'llo'
You can append to a list using the append method:
>>> l = [1,2,3,4]
>>> l.append('Potato')
>>> l
[1, 2, 3, 4, 'Potato']
Best way to learn this things in python is to open an interactive shell and start typing commands on it. I suggest ipython as it provides autocomplete which is great for exploring objects methods and properties.
You can append a portion of a string to a list by using the .append function.
List = []
List.append("text")
To append several parts of the string you can do the following:
List = []
String = "2 asdasdasd"
List.append(String[0:2] + String[3:5])
This would add both sections of the string that you wanted.
Use slicing to accomplish what you are looking for:
mystr = "2 aikjhakihaiyhgikjewh"
lst = list(list([item for item in [mystr[0] + mystr[3:5]]])[0])
print lst
This runs as:
>>> mystr = "2 aikjhakihaiyhgikjewh"
>>> lst = list(list([item for item in [mystr[0] + mystr[3:5]]])[0])
>>> print lst
['2', 'i', 'k']
>>>
Slicing works by taking certain parts of an object:
>>> mystr
'2 aikjhakihaiyhgikjewh'
>>> mystr[0]
'2'
>>> mystr[-1]
'h'
>>> mystr[::-1]
'hwejkighyiahikahjkia 2'
>>> mystr[:-5]
'2 aikjhakihaiyhgi'
>>>
You are describing 2 separate operations: slicing a string, and extending a list. Here is how you can put the two together:
In [26]: text = "2 aikjhakihaiyhgikjewh"
In [27]: text[0], text[3:5]
Out[27]: ('2', 'ik')
In [28]: result = []
In [29]: result.extend((text[0], text[3:5]))
In [30]: result
Out[30]: ['2', 'ik']

Applying regex on each item of a list in Python

If I apply this regex:
re.split(r"(^[^aeiou]+)(?=[aeiouy])", "janu")
on the string "janu", it gives the following result: ['', 'j', 'anu']
Now I want to apply this regex on the following list to get the similar results for each item as above. Can a for loop be used, and if yes, how?
lista = ['janu', 'manu', 'tanu', 'banu']
You can use a list comprehension:
>>> from re import split
>>> lista = ['janu', 'manu', 'tanu', 'banu']
>>> [split("(^[^aeiou]+)(?=[aeiouy])", x)[1]+"doc" for x in lista]
['jdoc', 'mdoc', 'tdoc', 'bdoc']
>>>
Edit regarding comment:
This will work:
>>> from re import split
>>> lista = ['janu', 'manu', 'tanu', 'banu']
>>> listb = []
>>> for item in lista:
... data = split("(^[^aeiou]+)(?=[aeiouy])", item)
... listb.append(data[2]+data[1]+"doc")
...
>>> listb
['anujdoc', 'anumdoc', 'anutdoc', 'anubdoc']
>>>
Use the list comprehension
[re.split(r"(^[^aeiou]+)(?=[aeiouy])", i) for i in list]
You can use a for loop but this is considered the pythonic way to do things.

Remove strings from a list that contains numbers in python [duplicate]

This question already has answers here:
How to remove all integer values from a list in python
(8 answers)
Closed 6 years ago.
Is there a short way to remove all strings in a list that contains numbers?
For example
my_list = [ 'hello' , 'hi', '4tim', '342' ]
would return
my_list = [ 'hello' , 'hi']
Without regex:
[x for x in my_list if not any(c.isdigit() for c in x)]
I find using isalpha() the most elegant, but it will also remove items that contain other non-alphabetic characters:
Return true if all characters in the string are alphabetic and there is at least one character, false otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”
my_list = [item for item in my_list if item.isalpha()]
I'd use a regex:
import re
my_list = [s for s in my_list if not re.search(r'\d',s)]
In terms of timing, using a regex is significantly faster on your sample data than the isdigit solution. Admittedly, it's slower than isalpha, but the behavior is slightly different with punctuation, whitespace, etc. Since the problem doesn't specify what should happen with those strings, it's not clear which is the best solution.
import re
my_list = [ 'hello' , 'hi', '4tim', '342' 'adn322' ]
def isalpha(mylist):
return [item for item in mylist if item.isalpha()]
def fisalpha(mylist):
return filter(str.isalpha,mylist)
def regex(mylist,myregex = re.compile(r'\d')):
return [s for s in mylist if not myregex.search(s)]
def isdigit(mylist):
return [x for x in mylist if not any(c.isdigit() for c in x)]
import timeit
for func in ('isalpha','fisalpha','regex','isdigit'):
print func,timeit.timeit(func+'(my_list)','from __main__ import my_list,'+func)
Here are my results:
isalpha 1.80665302277
fisalpha 2.09064006805
regex 2.98224401474
isdigit 8.0824341774
Try:
import re
my_list = [x for x in my_list if re.match("^[A-Za-z_-]*$", x)]
Sure, use the string builtin for digits, and test the existence of them.
We'll get a little fancy and just test for truthiness in the list comprehension; if it's returned anything there's digits in the string.
So:
out_list = []
for item in my_list:
if not [ char for char in item if char in string.digits ]:
out_list.append(item)
And yet another slight variation:
>>> import re
>>> filter(re.compile('(?i)[a-z]').match, my_list)
['hello', 'hi']
And put the characters that are valid in your re (such as spaces/punctuation/other)

Categories