This question already has answers here:
What is the best way to remove accents (normalize) in a Python unicode string?
(13 answers)
Closed 7 months ago.
I am removing accents and special characters from a DataFrame but the way I am doing it does not seem optimal to me, how can I improve it?
Thanks.
Code:
import pandas as pd
m = pd.read_excel('file.xlsx')
print(m)
m['hola']=m['hola'].str.replace(r"\W","")
m['hola']=m['hola'].str.replace(r"á","a")
m['hola']=m['hola'].str.replace(r"é","e")
m['hola']=m['hola'].str.replace(r"í","i")
m['hola']=m['hola'].str.replace(r"ó","o")
m['hola']=m['hola'].str.replace(r"ú","u")
m['hola']=m['hola'].str.replace(r"Á","A")
m['hola']=m['hola'].str.replace(r"É","E")
m['hola']=m['hola'].str.replace(r"Í","I")
m['hola']=m['hola'].str.replace(r"Ó","O")
m['hola']=m['hola'].str.replace(r"Ú","U")
print(m)
You could make a dictionary with the special characters as the keys and their replacements as the values:
d = {}
d["á"] = "a".... etc.
x = "árwwwe"
for character in x:
if character in d.keys():
x = x.replace(character, d[character])
print(x)
Output:
arwwwe
Related
This question already has answers here:
How to remove substring from string in Python 3
(2 answers)
Closed 2 years ago.
Is there I way to delete words from a string in Python if it doesn't have spaces. For example, if you have the string "WUBHELLOWUB" I want to remove "WUB". I tried
s = 'WUBHELLOWUB'
while 'WUB' in s:
ind = s.find('WUB')
s = s[:ind] + s[ind+1:]
print(s)
but it did not work.
You can use regex
import re
data=r"\S*WUB\S*"
re.sub(data, '','WUBWUBHELLO')
This question already has answers here:
Split binary number into groups of zeros and ones
(3 answers)
How to split a binary string into groups that containt only ones or zeros with Java regular expressions? [duplicate]
(5 answers)
Closed 2 years ago.
I need to split ones and zeros in any binary representation like this.
code = 10001100
output_list = [1,000,11,00]
I couldnt find the pattern.
and I am using python3.x
You don't really need a regex for this problem. You can use groupby from itertools to do this:
import itertools
code = "10001100"
gs = [list(g) for _, g in itertools.groupby(code)]
If you want to use regex, then:
import re
code = r'10001100'
output_list = re.findall(r'(0+|1+)', code)
regex is not required. Here is pythonic way to do it:
code = '10001100'
output_list = []
interim_list = [code[i] + ',' if i != len(code)-1 and code[i] != code[i+1] else code[i] for i in range(len(code))]
output_list.append(''.join(interim_list))
print(output_list)
>>> print(output_list)
['1,000,11,00']
This question already has answers here:
Removing duplicate characters from a string
(15 answers)
Closed 3 years ago.
I have a string like 'AABA'. I want to remove multiple occurances by removing others. The result should be 'AB'.
Sample Input: AABA
Sample Output: AB
If the order doesn't matter, use a set.
word = "AABA"
new_word = "".join(set(word))
If the order DOES matter, use an Ordered Dictionary (from collections library).
from collections import OrderedDict
word = "AABA"
new_word = "".join(OrderedDict.fromkeys(word))
EDIT: Consult the link posted in the comments above - it gives the same advice, but explains it better.
This question already has answers here:
Extracting only characters from a string in Python
(7 answers)
How do you filter a string to only contain letters?
(6 answers)
Closed 6 years ago.
I was trying to figure out how to list just the letters in a string and ignore the numbers or any other characters. I figured out how to do it using the for loop, but I couldn't find out how to do it without using the for loop.
This is how I used the for loop:
>>> a = "Today is April 1, 2016"
for i in a:
if i.isalpha():
list(i)
Any help will be appreciated!
You can use filter for this:
>>> ''.join(filter(str.isalpha, a))
'TodayisApril'
list(set([x for x in a if x.isalpha()]))
this should do it :)
This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 6 years ago.
Simple, simple question, hope you can help me:
How do I add a string to a regex?
Say:
d = '\d\d\d'
mo = re.compile(r #d goes here)
Pasting it, separating it with a comma, or with a plus gives me errors.
Normally, as you know, it would be re.compile(r'\d\d\d')
Is this what you are looking for?
d = r"\d\d\d"
re.compile(d)
Maybe more intuitive:
d = r"\d{3}"
# match a digit exactly three times, consecutively
re.compile(d)