Python pandas csv file unicode error and stuffs - python

I'm trying to read a csv file on python. The code goes like this -
import pandas as pd
df = pd.read_csv("C:\Users\User\Desktop\Inan")
print(df.head())
However it keeps showing the unicode error. Tried putting r,changing the slashes in multiple ways,but it didnt't work,just showed different errors like "file not found". What can I do?

Try this method, It may work
df = pd.read_csv("C:/Users/User/Desktop/Inan.csv", encoding="utf-8")
include your file extension also(.csv .xlxs)

Related

Pandas Read CSV for file address with \t in it

This may be a redundant question because I know that I can rename the file and solve the issue, but I'm still pretty new at this and it would be really useful information for the future. Thanks in advance to respondents!
So, I have a CSV file which is a table exported from SQL with the filename "t_SQLtable" located in a sub-folder of my working directory.
In order to open the file in Pandas I use the following command:
SQLfile= pd.read_csv('SUBFOLDER\t_SQLtable.csv', sep=',')
This is the error I receive:
FileNotFoundError: [Errno 2] File SUBFOLDER _SQLtable.csv does not exist: 'SUBFOLDER\t_SQLtable.csv'
My understanding is that Pandas is reading the <\t> as a tab and thus is not able to find the file, because that's not the file name it is looking for. But I don't know how to format the text in order to tell Pandas how to recognize the <t> as part of the filename. Would anyone know how to resolve this?
Thank you!
Folders are navigated using / which won't escape any character
SQLfile= pd.read_csv('SUBFOLDER/t_SQLtable.csv', sep=',')
in future if you want to keep \t without it being considered as tab
use raw string
print('SUBFOLDER\t_SQLtable.csv')
print(r'SUBFOLDER\t_SQLtable.csv')
SUBFOLDER _SQLtable.csv
SUBFOLDER\t_SQLtable.csv
Try with this.
SQLfile= pd.read_csv('SUBFOLDER\\t_SQLtable.csv', sep=',')
SQLfile= pd.read_csv('SUBFOLDER/t_SQLtable.csv', sep=',')
If doesn't work , then try this:
import os
file_path = os.path.join(os.getcwd(), "SUBFOLDER", "t_SQLtable.csv")
SQLfile= pd.read_csv(file_path, sep=',')
Simply do what you did before, except add an r right before the string:
SQLfile = pd.read_csv(r'SUBFOLDER\t_SQLtable.csv', sep=',')
Adding r to the start of a string will make python treat it as a raw string, as in, all escape codes won't be evaluated.

Pandas library unable to read csv file

I have just one line of code which reads a CSV file into a variable df, but this gives the following error: No columns to parse from file.
import pandas as pd
df = pd.read_csv("D:\Folder1\train.csv")
The CSV file is at this location (I've checked it more than once) and the CSV file was being correctly read until I updated the pandas library.
Can someone tell me how to remove this error?
You have to use forward slashes "/" in your path

Unable to read modified csv file with pandas

I have exported a Excel file using the pandas .to_csv method on a 9-column DataFrame successfully, as well as accessing the created file with the .to_csv method likewise, with no errors whatsoever using the following code:
dfBase = pd.read_csv('C:/Users/MyUser/Documents/Scripts/Base.csv',
sep=';', decimal=',', index_col=0, parse_dates=True,
encoding='utf-8', engine='python')
However, upon modifying the same CSV file manually using Notepad (which also extends to simply opening the file and saving it without making any actual alterations), pandas won't read it anymore, giving the following error message:
ParserError: Expected 2 fields in line 2, saw 9
In the case of the modified CSV, if the index_col=0 parameter is removed from the code, pandas is able to read the DataFrame again, however the first 8 columns become the index as a tuple and only the last column is brought as a field.
Could anyone point me out as to why I am unable to access the DataFrame after modifying it? Also, why does the removal of index_col enables its reading again with nearly all the columns as the index?
Have you tried opening and saving the file with some other text editor? Notepad really isn't that great, probably it's adding some special characters upon opening of the file or maybe the file already contains those characters and Notepad does not let you see them, hence pandas can't convert correctly
try Notepad++ or some more advanced IDEs like Atom, VSCode or PyCharm

Pandas DataFrame's accented characters appearing garbled in Excel

With:
# -*- coding: utf-8 -*-
at the top of my .ipynb, Jupyter is now displaying accented characters correctly.
When I export to csv (with .to_csv()) a pandas data frame containing accented characters:
... the characters do not render properly when the csv is opened in Excel.
This is the case whether I set the encoding='utf-8' or not. Is pandas/python doing all that it can here, and this is an Excel issue? Or can something be done before the export to csv?
Python: 2.7.10
Pandas: 0.17.1
Excel: Excel for Mac 2011
If you want to keep accents, try with encoding='iso-8859-1'
df.to_csv(path,encoding='iso-8859-1',sep=';')
I had similar problem, also on a Mac. I noticed that the unicode string showed up fine when I opened the csv in TextEdit, but showed up garbled when I opened in Excel.
Thus, I don't think there is any way successfully export unicode to Excel with to_csv, but I'd expect the default to_excel writer to suffice.
df.to_excel('file.xlsx', encoding='utf-8')
I also had the same inconvenience. When I checked the Dataframe in the Jupyter notebook I saw that everything was in order.
The problem happens when I try to open the file directly (as it has a .csv extension Excel can open it directly).
The solution for me was to open a new blank excel workbook, and import the file from the "Data" tab, like this:
Import External Data
Import Data from text
I choose the file
In the import wizard window, where it says "File origin" in the drop-down list, I chose the "65001 : Unicode (utf-8)"
Then i just choose the right delimiter, and that was it for me.
I think using a different excel writer helps, recommending xlsxwriter
import pandas as pd
df = ...
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
df.to_excel(writer)
writer.save()
Maybe try this function for your columns if you can't get Excel to cooperate. It will remove the accents using the unicodedata library:
import unicodedata
def remove_accents(input_str):
if type(input_str) == unicode:
nfkd_form = unicodedata.normalize('NFKD', input_str)
return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
else:
return input_str
I had the same problem, and writing to .xlsx and renaming to .csv didn't solve the problem (for application-specific reasons I won't go into here), nor was I able to successfully use an alternate encoding as Juliana Rivera recommended. 'Manually' writing the data as text worked for me.
with open(RESULT_FP + '.csv', 'w+') as rf:
for row in output:
row = ','.join(list(map(str, row))) + '\n'
rf.write(row)
Sometimes I guess you just have to go back to basics.
I encountered a similar issue when attempting to read_json followed by a to_excel:
df = pandas.read_json(myfilepath)
# causes garbled characters
df.to_excel(sheetpath, encoding='utf8')
# also causes garbled characters
df.to_excel(sheetpath, encoding='latin1')
Turns out, if I load the json manually with the json module first, and then export with to_excel, the issue doesn't occur:
with open(myfilepath, encoding='utf8') as f:
j = json.load(f)
df = pandas.DataFrame(j)
df.to_excel(sheetpath, encoding='utf8')

Pandas read excel with Chinese filename

I am trying to load as a pandas dataframe a file that has Chinese characters in its name.
I've tried:
df=pd.read_excel("url/某物2008.xls")
and
import sys
df=pd.read_excel("url/某物2008.xls", encoding=sys.getfilesystemencoding())
But the response is something like: "no such file or directory "url/\xa1\xa92008.xls"
I've also tried changing the names of the files using os.rename, but the filenames aren't even read properly (asking python to just print the filenames yields only question marks or squares).
df=pd.read_excel(u"url/某物2008.xls", encoding=sys.getfilesystemencoding())
may work... but you may have to declare an encoding type at the top of the file
try this for unicode conversion:
df=pd.read_excel(u"url/某物2008.xls", encoding='utf-8')

Categories