Python _csv Error: line contains NULL byte - python

This is my code:
filepath = sys.argv[1]
csvdata = list(csv.reader(open(filepath)))
How can I fix it?
I saved my excel file as a csv and receieved this error:
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

An Excel file is not a csv file. First export / save the file as csv.
There are differences between python versions about whether to open the file as binary or text. This has relevance to how newlines are handled.
In Python 2.x, open as binary: open(filepath, 'rb')
In Python 3.x, don't : open('file.csv', 'r')
The second part I learned from this link about reading in csv files
For some operating systems (Mac OS for sure) you need to open with the mode 'rU' See: this link with same problem specifically on Mac OS

try this (put actual location of csv file)...
with open('c:\pytest.csv', 'rb') as csvfile:
data = csv.reader(csvfile)
mylist = list (data)
print mylist

from tkFileDialog import askopenfilename
import csv
filename = askopenfilename()
with open(filename, 'rb') as csvfile:
data = csv.reader(csvfile)
mylist = list (data)
print mylist

Related

python: use CSV reader with single file extracted from tarfile

I am trying to use the Python CSV reader to read a CSV file that I extract from a .tar.gz file using Python's tarfile library.
I have this:
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
tarredCSV = tarFile.extractfile(file)
reader = csv.reader(tarredCSV)
next(reader) # skip header
for row in reader:
if row[3] not in CSVRows.values():
CSVRows[row[3]] = row
All the files in the tar file are all CSVs.
I am getting an exception on the first file. I am getting this exception on the first next line:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
How do I open said file (without extracting the file then opening it)?
tarfile.extractfile returns an io.BufferedReader object, a bytes stream, and yet csv.reader expects a text stream. You can use io.TextIOWrapper to convert the bytes stream to a text stream instead:
import io
...
reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))
You need to provide a file-like object to csv.reader.
Probably the best solution, without having to consume a complete file at once is this approach (thanks to blhsing and damon for suggesting it):
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.TextIOWrapper(tarFile.extractfile(file), encoding="utf-8")
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Alternatively a possible solution from here: Python3 working with csv files in tar files would be
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.StringIO(tarFile.extractfile(file).read().decode('utf-8'))
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Here a io.StringIO object is used to make csv.reader happy. However, this might not scale well for larger files contained in the tar as each file is read in one single step.

Open file from zip without extracting it in Python?

I am working on a script that fetches a zip file from a URL using tje request library. That zip file contains a csv file. I'm trying to read that csv file without saving it. But while parsing it's giving me this error: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
import csv
import requests
from io import BytesIO, StringIO
from zipfile import ZipFile
response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
files = zip_file.namelist()
with zip_file.open(files[0]) as csvfile:
csvreader = csv.reader(csvfile)
# _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
for row in csvreader:
print(row)
Try this:
import pandas as pd
import requests
from io import BytesIO, StringIO
from zipfile import ZipFile
response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
files = zip_file.namelist()
with zip_file.open(files[0]) as csvfile:
print(pd.read_csv(csvfile, encoding='utf8', sep=","))
As #Aran-Fey alluded to:
import zipfile
import csv
import io
with open('/path/to/archive.zip', 'r') as f:
with zipfile.ZipFile(f) as zf:
csv_filename = zf.namelist()[0] # see namelist() for the list of files in the archive
with zf.open(csv_filename) as csv_f:
csv_f_as_text = io.TextIOWrapper(csv_f)
reader = csv.reader(csv_f_as_text)
csv.reader (and csv.DictReader) require a file-like object opened in text mode. Normally this is not a problem when simply open(...)ing file in 'r' mode, as the Python 3 docs say, text mode is the default: "The default mode is 'r' (open for reading text, synonym of 'rt')". But if you try rt with open on a ZipFile, you'll see an error that: ZipFile.open() requires mode "r" or "w":
with zf.open(csv_filename, 'rt') as csv_f:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: open() requires mode "r" or "w"
That's what io.TextIOWrapper is for -- for wrapping byte streams to be readable as text, decoding them on the fly.

How to write the contents of one CSV file to another

I have a csv file and I want to transfer the raw data without the headers to a new csv file and have the rows and columns the same as the original.
IRIS_data = "IRIS_data.csv"
with open(IRIS_data, 'wb') as data:
wr = csv.writer(data, quoting=csv.QUOTE_ALL)
with open(IRIS) as f:
next(f)
for line in f:
wr.writerow(line)
The code above is my most recent attempt, when I try run it I get the following error:
a bytes-like object is required, not 'str'
It's because you opened the input file with with open(IRIS_data, 'wb'), which opens it in binary mode, and the output file with just with open(IRIS) which opens it in text mode.
In Python 3, you should open both files in text mode and specify newline='' option)—see the examples in the csv module's documentation)
To fix it, change them as follows:
with open(IRIS_data, 'w', newline='') as data:
and
with open(IRIS, newline='') as f:
However there are other issues with you code. Here's how to use those statements to get what I think you want:
import csv
IRIS = "IRIS.csv"
IRIS_data = "IRIS_data.csv"
with open(IRIS, 'r', newline='') as f, open(IRIS_data, 'w', newline='') as data:
next(f) # Skip over header in input file.
writer = csv.writer(data, quoting=csv.QUOTE_ALL)
writer.writerows(line.split() for line in f)
Contents of IRIS_data.csv file after running the script with your sample input data:
"6.4","2.8","5.6","2.2","2"
"5","2.3","3.3","1","1"
"4.9","2.5","4.5","1.7","2"
"4.9","3.1","1.5","0.1","0"
"5.7","3.8","1.7","0.3","0"
"4.4","3.2","1.3","0.2","0"
"5.4","3.4","1.5","0.4","0"
"6.9","3.1","5.1","2.3","2"
"6.7","3.1","4.4","1.4","1"
"5.1","3.7","1.5","0.4","0"
You have to encode the line you are writing like this:
wr.writerow( line.encode(”utf8”))
Also open your file using open(..., ‘wb’). This will open the file in binary mode. So you are certain the file is actually open in binary mode. Indeed it is better to now explicitly the encoding than assuming it. Enforcing encoding for both reading and writing will save you lots of trouble.

python - writing hex digits to csv

I am having a the following string:
>>> line = '\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'
When I type the variable line in the python terminal it showing the following:
>>> line
'\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'
When I am printing it, its showing the following:
>>> print line
7 Cardio Metabolic Care 12,788,528.04
In the variable line each word is separated using \t and I wanted to save it to a csv file. So I tried using the following code:
import csv
with open('test.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
spamwriter.writerow(line.split('\t'))
When I look into the test.csv file, I am getting only the following
,,,,,,
Is there any to get the words into the csv file. Kindly help.
Your input text is not corrupted, it's encoded - as UTF-16 (Big Endian in this case). And it's CSV itself, just with tab as the delimiter.
You must decode it into a string, after that you can use it normally.
Ideally you declare the proper byte encoding when you read it from a source. For example, when you open a file you can state the encoding the file uses so that the file reader will decode the contents for you.
If you have that byte string from a source where you can't declare an encoding while reading it, you can decode manually:
line = '\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'
decoded = line.decode('utf_16_be')
print decoded
# 7 Cardio Metabolic Care 12,788,528.04
But since I suppose that you are actually reading it from a file:
import csv
import codecs
with codecs.open('input.txt', 'r', encoding='utf16') as in_file, codecs.open('output.csv', 'w', encoding='utf8') as out_file:
reader = csv.reader(in_file, delimiter='\t')
writer = csv.writer(out_file, delimiter=',', quotechar='"')
writer.writerows(reader)

Which encoding to use while reading Excel using xlrd

I am trying to read an Excel file using xlrd to write into txt files. Everything is being written fine except for some rows which has some spanish characters like 'Téd'. I can encode those using latin-1 encoding. However the code then fails for other rows which have a 'â' with unicode u'\u2013'. u'\2013' can't be encoded using latin-1. When using UTF-8 'â' are written out fine but 'Téd' is written as 'Téd' which is not acceptable. How do I correct this.
Code below :
#!/usr/bin/python
import xlrd
import csv
import sys
filePath = sys.argv[1]
with xlrd.open_workbook(filePath) as wb:
shNames = wb.sheet_names()
for shName in shNames:
sh = wb.sheet_by_name(shName)
csvFile = shName + ".csv"
with open(csvFile, 'wb') as f:
c = csv.writer(f)
for row in range(sh.nrows):
sh_row = []
cell = ''
for item in sh.row_values(row):
if isinstance(item, float):
cell=item
else:
cell=item.encode('utf-8')
sh_row.append(cell)
cell=''
c.writerow(sh_row)
print shName + ".csv File Created"
Python's csv module
doesn’t support Unicode input.
You are correctly encoding your input before writing it -- so you don't need codecs. Just open(csvFile, "wb") (the b is important) and pass that object to the writer:
with open(csvFile, "wb") as f:
writer = csv.writer(f)
writer.writerow([entry.encode("utf-8") for entry in row])
Alternatively, unicodecsv is a drop-in replacement for csv that handles encoding.
You are getting é instead of é because you are mistaking UTF-8 encoded text for latin-1. This is probably because you're encoding twice, once as .encode("utf-8") and once as codecs.open.
By the way, the right way to check the type of an xlrd cell is to do cell.ctype == xlrd.ONE_OF_THE_TYPES.

Categories