Trying to load a csv file which is encoded binarily in python - python

I am trying to load a csv file which is encoded binarily in python. When using pd.read_csv(), I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 16: invalid start byte
I have tried adding "encoding = 'utf-8'" and tried adding delimiters but that did not help.

Related

Getting the error 'utf-8' codec can't decode byte 0xa0 in position 15456: invalid start byte when trying to read a csv file in arabic

I'm trying to read a csv file written in arabic, this is the code i'm using:
data = pd.read_csv("/Users/User/Downloads/AJGT.csv",encoding='utf-8')
sentiment = np.array(data.drop('Feed', axis =1).values)
feed = np.array(data.drop('Sentiment', axis =1).values)
print(sentiment)
print(feed)
however i'm getting the following error
'utf-8' codec can't decode byte 0xa0 in position 15456: invalid start byte
I would appreciate any help
Thank you!
Try encoding='ISO-8859-1'. This worked for me, as I got similar error.

pd.read_excel throws UnicodeDecodeError

I am trying to read data from excel to pandas. The file I get comes from api and is not saved (the access to the file needs special permissions, so I don't want to save it). When I try to read excel from file
with open('path_to_file') as file:
re = pd.read_excel(file)
I get the error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x98 in position
10: invalid start byte
When I input path in palce of file everythng works fine
re = pd.read_excel('path-to-exactly-the-same-file')
Is there a way to read excel by pandas without saving it and inputting path?
the part that was missing was 'rb' in open
with open('path_to_file', 'rb') as file:
re = pd.read_excel(file)
to treat the file as binary. Idea taken from error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Can't read data using read_csv due to encoding errors

So, I am facing a huge issue. I am trying to read a csv file which has '|' as delimiters. If I use utf-8 or utf-sig-8 as encoders then I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte
but I use the unicode_escape encoding then I get this error:
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 13: \ at end of string
Is it an issue with the dataset?
it worked after I 'Saved with Encoding - utf-8' in Sublime Text Editor. I think the data had some issues.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 4: invalid start byte

I exported a csv file from Microsoft Excel. It showed properly in Jupyter notebook with pandas and numpy as below:
import pandas as pd
pd1 = pd.read_csv('test1.csv', encoding='utf-8')
There were no error messages the first time, but I just opened the csv file then just saved as a new name.
all the time I got a unicodeerror message
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 4: invalid start byte
The data has strange letters as shown below. Even if there were strange letters, there was no problem at first.
2 columns, 6 rows
I have to handle all languages, so I really want to know how to encode them. How to solve this problem?
When you save as, there will be a selection of the encoding format
Try to save as and see if it works.👍

'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte

I am trying to read a csv file using the following lines of Python code:
crimes = pd.read_csv('C:/Users/usuario1/Desktop/python/csv/001 Boston crimes/crime.csv', encoding = 'utf8')
crimes.head(5)
But I am getting decode error as follws:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte
What is going wrong?
May be your file does not support utf-8 codec or has a character that does not support utf-8. You can try other encodings like ISO-8859-1. But it is best to check your file encoding first. To do so, something like the following should work:
1.
with open('Your/file/path') as f:
print(f)
This should print file details with encoding.
Or you can just open the csv and when you go to File -> Save As this should show your encoding.
If those don't help, you can ignore the rows that are causing problems by using `error_bad_lines=False'
crimes = pd.read_csv('Your/file/path', encoding='utf8', error_bad_lines=False)
Hope these will help

Categories