Spaces in middle of each row in file csv - python

I have a problem with me file csv. It's saving with spaces in middle of each row. I don't know why. How do I solve this problem? I'm asking because I don't find any answer and solutions to this.
Here is the code:
import csv
import random
def dict_ID_aeropuertos():
with open('AeropuertosArg.csv') as archivo_csv:
leer = csv.reader(archivo_csv)
dic_ID = {}
for linea in leer:
dic_ID.setdefault(linea[0],linea[1])
archivo_csv.close()
return dic_ID
def ruteoAleatorio():
dic_ID = dict_ID_aeropuertos()
lista_ID = list(dic_ID.keys())
cont = 0
lista_rutas = []
while (cont < 50):
r1 = random.choice(lista_ID)
r2 = random.choice(lista_ID)
if (r1 != r2):
t = (r1,r2)
if (t not in lista_rutas):
lista_rutas.append(t)
cont += 1
with open('rutasAeropuertos.csv', 'w') as archivo_rutas:
escribir = csv.writer(archivo_rutas)
escribir.writerows(lista_rutas)
archivo_rutas.close()
ruteoAleatorio()
Here is the file csv AeropuertosArg.cvs:
1,Aeroparque Jorge Newbery,Ciudad Autonoma de Buenos Aires,Ciudad Autonoma de Buenos Aires,-34.55803,-58.417009
2,Aeropuerto Internacional Ministro Pistarini,Ezeiza,Buenos Aires,-34.815004,-58.5348284
3,Aeropuerto Internacional Ingeniero Ambrosio Taravella,Cordoba,Cordoba,-31.315437,-64.21232
4,Aeropuerto Internacional Gobernador Francisco Gabrielli,Ciudad de Mendoza,Mendoza,-32.827864,-68.79849
5,Aeropuerto Internacional Teniente Luis Candelaria,San Carlos de Bariloche,Rio Negro,-41.146714,-71.16203
6,Aeropuerto Internacional de Salta Martin Miguel de Guemes,Ciudad de Salta,Salta,-24.84423,-65.478412
7,Aeropuerto Internacional de Puerto Iguazu,Puerto Iguazu,Misiones,-25.731778,-54.476181
8,Aeropuerto Internacional Presidente Peron,Ciudad de Neuquen,Neuquen,-38.952137,-68.140484
9,Aeropuerto Internacional Malvinas Argentinas,Ushuaia,Tierra del Fuego,-54.842237,-68.309701
10,Aeropuerto Internacional Rosario Islas Malvinas,Rosario,Santa Fe,-32.916887,-60.780391
11,Aeropuerto Internacional Comandante Armando Tola,El Calafate,Santa Cruz,-50.283977,-72.053641
12,Aeropuerto Internacional General Enrique Mosconi,Comodoro Rivadavia,Chubut,-45.789435,-67.467498
13,Aeropuerto Internacional Teniente General Benjamin Matienzo,San Miguel de Tucuman,Tucuman,-26.835888,-65.108361
14,Aeropuerto Comandante Espora,Bahia Blanca,Buenos Aires,-38.716152,-62.164955
15,Aeropuerto Almirante Marcos A. Zar,Trelew,Chubut,-43.209957,-65.273405
16,Aeropuerto Internacional de Resistencia,Resistencia,Chaco,-27.444926,-59.048739
17,Aeropuerto Internacional Astor Piazolla,Mar del Plata,Buenos Aires,-37.933205,-57.581518
18,Aeropuerto Internacional Gobernador Horacio Guzman,San Salvador de Jujuy,Jujuy,-24.385987,-65.093755
19,Aeropuerto Internacional Piloto Civil Norberto Fernandez,Rio Gallegos,Santa Cruz,-51.611788,-69.306315
20,Aeropuerto Domingo Faustino Sarmiento,San Juan,San Juan,-31.571814,-68.422568

Your problem is, that the csv-module writerows has its own "newline"-logic. It interferes with the default newline behaviour of open():
Fix like this:
with open('rutasAeropuertos.csv', 'w', newline='' ) as archivo_rutas:
# ^^^^^^^^^^
This is also documented in the example in the documentation: csv.writer(csvfile, dialect='excel', **fmtparams):
If csvfile is a file object, it should be opened with newline='' [1]
with a link to a footnote telling you:
[1] If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.
You are using windows which does use \r\n which adds another \r which leads to your "wrong" output.
Full code with some optimizations:
import csv
import random
def dict_ID_aeropuertos():
with open('AeropuertosArg.csv') as archivo_csv:
leer = csv.reader(archivo_csv)
dic_ID = {}
for linea in leer:
dic_ID.setdefault(linea[0],linea[1])
return dic_ID
def ruteoAleatorio():
dic_ID = dict_ID_aeropuertos()
lista_ID = list(dic_ID.keys())
lista_rutas = set() # a set only holds unique values
while (len(lista_rutas) < 50): # simply check the length of the set
r1,r2 = random.sample(lista_ID, k=2) # draw 2 different ones
lista_rutas.add( (r1,r2) ) # you can not add duplicates, no need to check
with open('rutasAeropuertos.csv', 'w', newline='' ) as archivo_rutas:
escribir = csv.writer(archivo_rutas)
escribir.writerows(lista_rutas)
ruteoAleatorio()
Output:
9,3
16,10
15,6
[snipp lots of values]
13,14
13,7
20,4

Related

How to remove special characters from Pandas DF?

I have a Python BOT that queries a database, saves the output to Pandas Dataframe and writes the data to an Excel template.
Yesterday the data did not saved to the Excel template because one of the fields in a record contain the following characters:
", *, /, (, ), :,\n
Pandas failed to save the data to the file.
This is the code that creates the dataframe:
upload_df = sql_df.copy()
This code prepares the template file with time/date stamp
src = file_name.format(val="")
date_str = " " + str(datetime.today().strftime("%d%m%Y%H%M%S"))
dst_file = file_name.format(val=date_str)
copyfile(src, os.path.join(save_path, dst_file))
work_book = load_workbook(os.path.join(save_path, dst_file))
and this code saves the dataframe to the excel file
writer = pd.ExcelWriter(os.path.join(save_path, dst_file), engine='openpyxl')
writer.book = work_book
writer.sheets = {ws.title: ws for ws in work_book.worksheets}
upload_df.to_excel(writer, sheet_name=sheet_name, startrow = 1, index=False, header = False)
writer.save()
My question is how can I clean the special characters from an specific column [description] in my dataframe before I write it to the Excel template?
I have tried:
upload_df['Name'] = upload_df['Name'].replace(to_replace= r'\W',value=' ',regex=True)
But this removes everything and not a certain type of special character.
I guess we could use a list of items and iterate through the list and run the replace but is there a more Pythonic solution ?
adding the data that corrupted the excel file and prevent pandas to write the information:
this is an example of the text that created the problem i changed a few normal characters to keep privacy but is the same data tha corrupted the file:
"""*** CRQ.: N/A *** DF2100109 SADSFO CADSFVO EN SERWO JL1047 EL
PUWERTDTO EL DIA 08-09-2021 A LAS 11:00 HRS. PERA REALIZAR TRWEROS
DE AWERWRTURA DE SITIO PARA MWERWO PWERRVO.
RWERE DE WERDDFF EN SITIO : ING. JWER ERR3WRR ERRSDFF DFFF :RERFD DDDDF : 33 315678905. 1) ADFDSF SDFDF Y DFDFF DE DFDF Y DFFF XXCVV Y
CXCVDDÓN DE DFFFD EN DFDFFDD 2) EN SDFF DE REQUERIRSE: SDFFDF Y SDFDFF
DE EEERRW HJGHJ (ACCESO, GHJHJ, GHJHJ, RRRTTEE Y ACCESO A LA YUYUGGG
RETIRAR JJGHJGHGH
CONSIDERACIONES FGFFDGFG: SE FGGG LLAVE DE FF LLEVAR FFDDF PARA ERTBGFY Y SOLDAR.""S: SE GDFGDFG LLAVE DE ERTFFFGG, FGGGFF EQUIPO
PARA DFGFGFGFG Y SOLDAR."""
As some of the special characters to remove are regex meta-characters, we have to escape these characters before we can replace them to empty strings with regex.
You can automate escaping these special character by re.escape, as follows:
import re
# put the special characters in a list
special_char = ['"', '*', '/', '(', ')', ':', '\n']
special_char_escaped = list(map(re.escape, special_char))
The resultant list of escaped special characters is as follows:
print(special_char_escaped)
['"', '\\*', '/', '\\(', '\\)', ':', '\\\n']
Then, we can remove the special characters with .replace() as follows:
upload_df['Name'] = upload_df['Name'].replace(special_char_escaped, '', regex=True)
Demo
Data Setup
upload_df = pd.DataFrame({'Name': ['"abc*/(xyz):\npqr']})
Name
0 "abc*/(xyz):\npqr
Run codes:
import re
# put the special characters in a list
special_char = ['"', '*', '/', '(', ')', ':', '\n']
special_char_escaped = list(map(re.escape, special_char))
upload_df['Name'] = upload_df['Name'].replace(special_char_escaped, '', regex=True)
Output:
print(upload_df)
Name
0 abcxyzpqr
Edit
With your edited text sample, here is the result after removing the special characters:
print(upload_df)
Name
0 CRQ. NA DF2100109 SADSFO CADSFVO EN SERWO JL1047 EL PUWERTDTO EL DIA 08-09-2021 A LAS 1100 HRS. PERA REALIZAR TRWEROS DE AWERWRTURA DE SITIO PARA MWERWO PWERRVO.
1 RWERE DE WERDDFF EN SITIO ING. JWER ERR3WRR ERRSDFF DFFF RERFD DDDDF 33 315678905. 1 ADFDSF SDFDF Y DFDFF DE DFDF Y DFFF XXCVV Y CXCVDDÓN DE DFFFD EN DFDFFDD 2 EN SDFF DE REQUERIRSE SDFFDF Y SDFDFF DE EEERRW HJGHJ ACCESO, GHJHJ, GHJHJ, RRRTTEE Y ACCESO A LA YUYUGGG
2 3. RETIRAR JJGHJGHGH
3 CONSIDERACIONES FGFFDGFG SE FGGG LLAVE DE FF LLEVAR FFDDF PARA ERTBGFY Y SOLDAR.S SE GDFGDFG LLAVE DE ERTFFFGG, FGGGFF EQUIPO PARA DFGFGFGFG Y SOLDAR.
The special characters listed in your question have all been removed. Please check whether it is ok now.
You could use the following (pass characters as a list to the method parameter):
upload_df['Name'] = upload_df['Name'].replace(
to_replace=['"', '*', '/', '()', ':', '\n'],
value=' '
)
Use str.replace:
>>> df
Name
0 (**Hello\nWorld:)
>>> df['Name'] = df['Name'].str.replace(r'''["*/():\\n]''', '', regex=True)
>>> df
Name
0 HelloWorld
Maybe you want to replace line breaks by whitespaces:
>>> df = df.replace({'Name': {r'["*/():]': '',
r'\\n': ' '}}, regex=True)
>>> df
Name
0 Hello World

How can I modify an "if condition" in order to apply it to different list at the same time?

I wrote a script to extract sentences in huge set which contains particular pattern. The problem lied in the fact that , for some patterns I checked the value of the attribute at the beginning or ending of the pattern to see if the word is present in a particular list. I have 4 dictionaries with 2 lists of positive and negative word. So far I wrote the script and I am able to use the function I wrote with one dictionary. I am thinking how can I improve the my function so that I can use it at the same time of the 4 dictionaries without duplicating the bloc which loop in the dictionary.
I give an example with two dictionaries (since the script is quite long I make a small example with all the necessary element
import spacy.attrs
from spacy.attrs import POS
import spacy
from spacy import displacy
from spacy.lang.fr import French
from spacy.tokenizer import Tokenizer
from spacy.util import compile_prefix_regex, compile_infix_regex, compile_suffix_regex
from spacy.lemmatizer import Lemmatizer
nlp = spacy.load("fr_core_news_md")
from spacy.matcher import Matcher#LIST
##################### List of lexicon
# Lexique Diko
lexicon = open(os.path.join('/h/Ressources/Diko.txt'), 'r', encoding='utf-8')
data = pd.read_csv(lexicon, sep=";", header=None)
data.columns = ["id", "terme", "pol"]
pol_diko_pos = data.loc[data.pol =='positive', 'terme']
liste_pos_D = list(pol_diko_pos)
print(liste_pos[1])
pol_diko_neg = data.loc[data.pol =='negative', 'terme']
liste_neg_D = list(pol_diko_neg)
#print(type(liste_neg))
# Lexique Polarimots
lexicon_p = open(os.path.join('/h/Ressources/polarimots.txt'), 'r', encoding='utf-8')
data_p = pd.read_csv(lexicon_p, sep="\t", header=None)
#data.columns = ["terme", "pol", "pos", "degre"]
data_p.columns = ["ind", "terme", "cat", "pol", "fiabilité"]
pol_polarimot_pos = data_p.loc[data_p.pol =='POS', 'terme']
liste_pos_P = list(pol_polarimot_pos)
print(liste_pos_P[1])
pol_polarimot_neg = data_p.loc[data_p.pol =='NEG', 'terme']
liste_neg_P = list(pol_polarimot_neg)
#print(type(liste_neg))
# ############################# Lists
sentence_not_extract_lexique_1 =[] #List of all sentences without the specified pattern
sentence_extract_lexique_1 = [] #list of sentences which the pattern[0] is present in the first lexicon
sentence_not_extract_lexique_2 =[] #List of all sentences without the specified pattern
sentence_extract_lexique_2 = [] #list of sentences which the pattern[0] is present in the second lexicon
list_token_pos = [] #list of the token found in the lexique
list_token_neg = [] #list of the token found in the lexique
list_token_not_found = [] #list of the token not found in the lexique
#PATTERN
pattern1 = [{"POS": {"IN": ["VERB", "AUX","ADV","NOUN","ADJ"]}}, {"IS_PUNCT": True, "OP": "*"}, {"LOWER": "mais"} ]
pattern1_tup = (pattern1, 1, True)
pattern3 = [{"LOWER": {"IN": ["très","trop"]}},
{"POS": {"IN": ["ADV","ADJ"]}}]
pattern3_tup = (pattern3, 0, True)
pattern4 = [{"POS": "ADV"}, # adverbe de négation
{"POS": "PRON","OP": "*"},
{"POS": {"IN": ["VERB", "AUX"]}},
{"TEXT": {"IN": ["pas", "plus", "aucun", "aucunement", "point", "jamais", "nullement", "rien"]}},]
pattern4_tup = (pattern4, None, False)
#Tuple of pattern
pattern_list_tup =[pattern1_tup, pattern3_tup, pattern4_tup]
pattern_name = ['first', 'second', 'third', 'fourth']
length_of_list = len(pattern_list_tup)
print('length', length_of_list)
#index of the value of attribute to check in the lexicon
value_of_attribute = [0,-1,-1]
# List of lexicon to use
lexique_1 = [lexique_neg, lexique_pos]
lexique_2 = [lexique_2neg, lexique_2pos]
# text (example of some sentences)
file =b= ["Le film est superbe mais cette édition DVD est nulle !",
"J'allais dire déplorable, mais je serais peut-être un peu trop extrême.",
"Hélas, l'impression de violence, bien que très bien rendue, ne sauve pas cette histoire gothique moderne de la sécheresse scénaristique, le tout couvert d'un adultère dont le propos semble être gratuit, classique mais intéressant...",
"Tout ça ne me donne pas envie d'utiliser un pieu mais plutôt d'aller au pieu (suis-je drôle).",
"Oui biensur, il y a la superbe introduction des parapluies au debut, et puis lorsqu il sent des culs tout neufs et qu il s extase, j ai envie de faire la meme chose apres sur celui de ma voisine de palier (ma voisine de palier elle a un gros cul, mais j admets que je voudrais bien lui foute mon tarin), mais c est tout, apres c est un film tres noir, lent et qui te plonge dans le depression.",
"Et bien hélas ce DVD ne m'a pas appris grand chose par rapport à la doc des agences de voyages et la petite dame qui fait ses dessins est bien gentille mais tout tourne un peu trop autour d'elle.",
"Au final on passe de l'un a l'autre sans subtilité, et on n'arrive qu'à une caricature de plus : si Kechiche avait comme but initial déclaré de fustiger les préjugés, c'est le contraire qui ressort de ce ''film'' truffé de clichés très préjudiciables pour les quelques habitants de banlieue qui ne se reconnaîtront pas dans cette lourde farce.",
"-ci écorche les mots, les notes... mais surtout nos oreilles !"]
# Loop to check each sentence and extract the sentences with the specified pattern from above
for pat in range(0, length_of_list):
matcher = Matcher(nlp.vocab)
matcher.add("matching_2", None, pattern_list_tup[pat][0])
# print(pat)
# print(pattern_list_tup[pat][0])
for sent in file:
doc =nlp(sent)
matches= matcher(doc)
for match_id, start, end in matches:
span = doc[start:end].lemma_.split()
#print(f"{pattern_name[pat]} pattern found: {span}")
This is the part I want ot modify to use it for another dictionary, the goal is to able to retrieve sentences extract by 4 different dictionaries to make a comparison and then check which sentences are present in more than two list.
# Condition to use the lexicon and extract the sentence
if (pattern_list_tup[pat][2]):
if (span[value_of_attribute[pat]] in lexique_1[pattern_list_tup[pat][1]]):
if sent not in sentence_extract:
sentence_extract_lexique_1.append(sent)
if (pattern_list_tup[pat][1] == 1):
list_token_pos.append(span[value_of_attribute[pat]])
if (pattern_list_tup[pat][1] == 0):
list_token_neg.append(span[value_of_attribute[pat]])
else:
list_token_not_found.append(span[value_of_attribute[pat]]) # the text form is not present in the lexicon need the lemma form
sentence_not_extract_lexique_1.append(sent)
else:
if sent not in sentence_extract:
sentence_extract_lexique_1.append(sent)
print(len(sentence_extract))
print(sentence_extract)
One solution I find is to duplicate the code abode and change the name of the list where the sentences are stored but since I have 2 dictionaries duplicating will make the code longer is there a way to combine the looping the 2 dictionaries (actually 4 dictionaries in the original) and append the result to the good list. So, for example, when I use lexique_1 , all the sentences extracted are send to "sentence_extract_lexique_1" and so on for the other.
In my opinion attempt using the if-elif-else chain. If not attempt only using the if-elif block simply because the elif statement catches the specific condition of interest. In which you're trying to catch a specific to compare and check with the sentences. Keep in mind if you try the if-elif-else chain its a good method, but it only works when you need one test to pass. Because Python finds one test to pass and it skips the rest. Its very efficient and allows you to test for one specific condition.

How to remove newline from end and beginning of every list element [python]

I've got the following code:
from itemspecs import itemspecs
x = itemspecs.split('###')
res = []
for item in x:
res.append(item.split('##'))
print(res)
This imports a string from another document. And gives me the following:
[['\n1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?\n',
'\nA. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlanders
onderschept\n', '\nB\n', '\nAntwoord B De Nederlandse vlootvoogd werd hierdoor bekend.\n'],
['\n2\nIn welk land ligt Upernavik?\n', '\nA. Antartica\nB. Canada\nC. Rusland\nD.
Groenland\nE. Amerika\n', '\nD\n', '\nAntwoord D Het is een dorp in Groenland met 1224
inwoners.\n']]
But now I want to remove all the \n from every end and beginning of every element in this list. How can I do this?
You can do that by stripping "\n" as follow
a = "\nhello\n"
stripped_a = a.strip("\n")
so, what you need to do is iterate through the list and then apply the strip on the string as shown below
res_1=[]
for i in res:
tmp=[]
for j in i:
tmp.append(j.strip("\n"))
res_1.append(tmp)
The above answer just removes \n from start and end. if you want to remove all the new lines in a string, just use .replace('\n"," ") as shown below
res_1=[]
for i in res:
tmp=[]
for j in i:
tmp.append(j.replace("\n"))
res_1.append(tmp)
strip() is easiest method in this case, but if you want to do any advanced text processing in the future, it isn't bad idea to learn regular expressions:
import re
from pprint import pprint
l = [['\n1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?\n',
'\nA. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlandersonderschept\n',
'\nB\n',
'\nAntwoord B De Nederlandse vlootvoogd werd hierdoor bekend.\n'],
['\n2\nIn welk land ligt Upernavik?\n',
'\nA. Antartica\nB. Canada\nC. Rusland\nD.Groenland\nE. Amerika\n',
'\nD\n', '\nAntwoord D Het is een dorp in Groenland met 1224inwoners.\n']]
l = [[re.sub(r'^([\s]*)|([\s]*)$', '', j)] for i in l for j in i]
pprint(l, width=120)
Output:
[['1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?'],
['A. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlandersonderschept'],
['B'],
['Antwoord B De Nederlandse vlootvoogd werd hierdoor bekend.'],
['2\nIn welk land ligt Upernavik?'],
['A. Antartica\nB. Canada\nC. Rusland\nD.Groenland\nE. Amerika'],
['D'],
['Antwoord D Het is een dorp in Groenland met 1224inwoners.']]

Dealing with special characters in pandas Data Frame´s column Name

I am importing an excel worksheet that has the following columns name:
N° Pedido
1234
6424
4563
The column name ha a special character (°). Because of that, I can´t merge this with another Data Frame or rename the column. I don´t get any error message just the name stays the same. What should I do?
This is the code I am using and the result of the Dataframes:
import pandas as pd
import numpy as np
# Importando Planilhas
CRM = pd.ExcelFile(r'C:\Users\Michel\Desktop\Relatorio de
Vendas\relatorio_vendas_CRM.xlsx', encoding= 'utf-8')
protheus = pd.ExcelFile(r'C:\Users\Michel\Desktop\Relatorio de
Vendas\relatorio_vendas_protheus.xlsx', encoding= 'utf-8')
#transformando em Data Frame
df_crm = CRM.parse('190_pedido_export (33)')
df_protheus = protheus.parse('Relatorio de Pedido de Venda')]
# Transformando Campos em float o protheus
def turn_to_float(x):
return np.float(x)
df_protheus["TES"] = df_protheus["TES"].apply(turn_to_float)
df_protheus["Qtde"] = df_protheus["Qtde"].apply(turn_to_float)
df_protheus["Valor"] = df_protheus["Valor"].apply(turn_to_float)
#Tirando Tes de não venda do protheus
# tirando valores com código errado 6
df_protheus_1 = df_protheus[df_protheus.TES != 513.0]
df_protheus_2 = df_protheus_1[df_protheus_1.TES != 576.0]
**df_crm.columns = df_crm.columns.str.replace('N° Pedido', 'teste')
df_crm.columns**
Orçamento Origem N° Pedido Nº Pedido ERP Estabelecimento Tipo de
Pedido Classificação(Tipo) Aplicação Conta CNPJ/CPF Contato ...
Aprovação Parcial Antecipa Entrega Desconto da Tabela de Preço
Desconto do Cliente Desconto Informado Observações Observações NF Vl
Total Bruto Vl Total Completo
0 20619.0 23125 NaN Optitex 1 - Venda NaN Industrialização/Revenda
XAVIER E ARAUJO LTDA ME 7970626000170 NaN ... N N 0 0 0
Note that I used other codes for the bold part with the same result:
#renomeando tabela para dar Merge
#df_crm['proc'] = df_crm['N\xc2\xb0 Pedido']
#df_crm['N Pedido'] = df_crm['N° Pedido']
#df_crm.drop('N° Pedido',inplace=True,axis=1)
#df_crm
#df_crm['N Pedido'] = df_crm['N° Pedido']
#df.drop('N° Pedido',inplace=True,axis=1)
#df_crm
#df_crm_1 = df_crm.rename(columns={"N°Pedido": "teste"})
#df_crm_1
Thanks for posting the link to the Google Sheet. I downloaded it and loaded it via pandas:
df = pd.read_excel(r'~\relatorio_vendas_CRM.xlsx', encoding = 'utf-8')
df.columns = df.columns.str.replace('°', '')
df.columns = df.columns.str.replace('º', '')
Note that the two replace statements are replacing different characters, although they look very similar.
Help from: Why do I get a SyntaxError for a Unicode escape in my file path?
I was able to copy the values into another column. You could try that
df['N Pedido'] = df['N° Pedido']
df.drop('N° Pedido',inplace=True,axis=1)

python - cut a string in 2 lines

I'm looking for a line (using str.join I think) to cut a long string if the number of word is too much. I have the begining but I don't know whow to insert \n
example = "Au Fil Des Antilles De La Martinique a Saint Barthelemy"
nmbr_word = len(example.split(" "))
if nmbr_word >= 6:
# cut the string to have
result = "Au Fil Des Antilles De La\nMartinique a Saint Barthelemy"
Thanks
How about using the textwrap module?
>>> import textwrap
>>> s = "Au Fil Des Antilles De La Martinique a Saint Barthelemy"
>>> textwrap.wrap(s, 30)
['Au Fil Des Antilles De La', 'Martinique a Saint Barthelemy']
>>> "\n".join(textwrap.wrap(s, 30))
'Au Fil Des Antilles De La\nMartinique a Saint Barthelemy'
How about:
'\n'.join([' '.join(nmbr_word[i:i+6]) for i in xrange(0, len(nmbr_word), 6)])

Categories