change csv file variable from bytes to string in python - python

I have a csv file which I get from the frontend so I have it stored in a variable. Now I want to use the csv reader from the csv python library. This is my code:
def parseCSV(file):
csv_reader = csv.reader(file)
for line in csv_reader:
print(line)
But this isn't working because every line of the file is byte format and not string. How can I change the entire file variable from byte to string and make the code work?
I've already seen some solutions for this problem if I read a csv file from a folder and open it with with open ("file.csv", "r"). But I can't read this csv file from a folder so I somehow need to change byte to string.
I've already tried something like this but this is not working:
def parseCSV(file):
csv_reader = csv.reader(file)
for line.decode("utf-8") in csv_reader:
print(line)

So I've found a solution for this problem.
I just had to change the following line:
csv_reader = csv.reader(file)
to this line:
csv_reader = csv.reader(codecs.iterdecode(file, 'utf-8'))
and import codecs as import codecs
So if anyone else should have the same problem this should make the code work.

Related

Writing the output to a different csv file column

I have some vocabulary and their counterparts to create an Anki deck. I need the program to write the output of my code in two columns of a csv file; first for the vocabulary and second for the meaning. I've tried two codes but neither of them worked. How can I solve this problem?
Notebook content(vocab):
obligatory,義務的
sole,単独,唯一
defined,一定
obey,従う
...
First try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.reader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w", newline="") as new_file:
csv_writer = csv.writer(new_file, delimiter=",")
for line in csv_reader:
csv_writer.writerow(line)
Second try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.DictReader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w",) as f:
field_names = ["Vocabulary", "Meaning"]
csv_writer = csv.DictWriter(f, fieldnames=field_names, extrasaction="ignore")
csv_writer.writeheader()
for line in csv_reader:
csv_writer.writerow(line)
Result of the first try:
https://cdn.discordapp.com/attachments/696432733882155138/746404430123106374/unknown.png
#Second try was not even close
Expected result:
https://cdn.discordapp.com/attachments/734460259560849542/746432094825087086/unknown.png
Like Kevin said, Excel uses ";" as delimiter and your csv code creates a csv file with comma(,) delimiter. That's why it's shown with commas in your Csv Reader. You can pass ";" as delimiter if you want Excel to read your file correctly. Or you can create a csv file with your own Csv Reader and read it with notepad if you want to see which delimiter it uses.
Your first try works, it's the app you're using for importing that is not recognizing the , as the delimiter. I'm not sure where you're importing this to, but at least in Google Sheets you can choose what the delimiter is, even after the fact.

How do I add to a variable in separate file using python?

My situation is, I have csv file and here is its code.
user_file = Path(str(message.author.id) + '.cvs')
if user_file.exists():
with open('test.csv', 'a') as fp:
writer = csv.writer(fp, delimiter=',')
writer.writerows(data)
else:
with open(user_file, 'w') as fp:
data = [('xp', 0)]
writer = csv.writer(fp, delimiter=',')
writer.writerows(data)
I'm wanting a csv file that keeps track of how many times they type a message so i need a way of editing the csv file and adding 1 to what it already has. But i have no idea how to do that! please help!<3
test.csv:
4
Python:
# Replace test.csv with the file you wish to open. Keep "w+"
with open("test.csv", "w+") as dat:
# Assumes the text in the file is an int
n = int(dat.read())
dat.write(str(n+1))
Result in test.csv:
5
This way it opens the file as write and read, reads the number, then writes it back as a string. Note that write() will override any text in the current file, so you don't need to remove the text
P.S if that if else statement is to check the file actually exists, it's unnecessary. If you try to open() a file which doesn't exist, python will create it for you.

Python - CSV file empty after rewriting using csv module

I'm attempting to rewrite specific cells in a csv file using Python.
However, whenever I try to modify an aspect of the csv file, the csv file ends up being emptied (the file contents becomes blank).
Minimal code example:
import csv
ReadFile = open("./Resources/File.csv", "rt", encoding = "utf-8-sig")
Reader = csv.reader(ReadFile)
WriteFile = open("./Resources/File.csv", "wt", encoding = "utf-8-sig")
Writer = csv.writer(WriteFile)
for row in Reader:
row[3] = 4
Writer.writerow(row)
ReadFile.close()
WriteFile.close()
'File.csv' looks like this:
1,2,3,FOUR,5
1,2,3,FOUR,5
1,2,3,FOUR,5
1,2,3,FOUR,5
1,2,3,FOUR,5
In this example, I'm attempting to change 'FOUR' to '4'.
Upon running this code, the csv file becomes empty instead.
So far, the only other question related to this that I've managed to find is this one, which does not seem to be dealing with rewriting specific cells in a csv file but instead deals with writing new rows to a csv file.
I'd be very grateful for any help anyone reading this could provide.
The following should work:
import csv
with open("./Resources/File.csv", "rt", encoding = "utf-8-sig") as ReadFile:
lines = list(csv.reader(ReadFile))
with open("./Resources/File.csv", "wt", encoding = "utf-8-sig") as WriteFile:
Writer = csv.writer(WriteFile)
for line in lines:
line[3] = 4
Writer.writerow(line)
When you open a writer with w option, it will delete the contents and start writing the file anew. The file is therefore, at the point when you start to read, empty.
Try writing to another file (like FileTemp.csv) and at the end of the program renaming FileTemp.csv to File.csv.

Trying to import a list of words using csv (Python 2.7)

import csv, Tkinter
with open('most_common_words.csv') as csv_file: # Opens the file in a 'closure' so that when it's finished it's automatically closed"
csv_reader = csv.reader(csv_file) # Create a csv reader instance
for row in csv_reader: # Read each line in the csv file into 'row' as a list
print row[0] # Print the first item in the list
I'm trying to import this list of most common words using csv. It continues to give me the same error
for row in csv_reader: # Read each line in the csv file into 'row' as a list
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
I've tried a couple different ways to do it as well, but they didn't work either. Any suggestions?
Also, where does this file need to be saved? Is it okay just being in the same folder as the program?
You should always open a CSV file in binary mode (Python 2) or universal newline mode (Python 3). Also, make sure that the delimiters and quote characters are , and ", or you'll need to specify otherwise:
with open('most_common_words.csv', 'rb') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';', quotechar='"') # for EU CSV
You can save the file in the same folder as your program. If you don't, you can provide the correct path to open() as well. Be sure to use raw strings if you're on Windows, otherwise the backslashes may trick you: open(r"C:\Python27\data\table.csv")
It seems you have a file with one column as you say here:
It is a simple list of words. When I open it up, it opens into Excel
with one column and 500 rows of 500 different words.
If so, you don't need the csv module at all:
with open('most_common_words.csv') as f:
rows = list(f)
Note in this case, each item of the list will have the newline appended to it, so if your file is:
apple
dog
cat
rows will be ['apple\n', 'dog\n', 'cat\n']
If you want to strip the end of line, then you can do this:
with open('most_common_words.csv') as f:
rows = list(i.rstrip() for i in f)

"Line contains NULL byte" in CSV reader (Python)

I'm trying to write a program that looks at a .CSV file (input.csv) and rewrites only the rows that begin with a certain element (corrected.csv), as listed in a text file (output.txt).
This is what my program looks like right now:
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'r') as mycsv:
reader = csv.reader(mycsv)
for row in reader:
if row[0] not in lines:
writer.writerow(row)
Unfortunately, I keep getting this error, and I have no clue what it's about.
Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 12, in <module>
for row in reader:
_csv.Error: line contains NULL byte
Credit to all the people here to even to get me to this point.
I'm guessing you have a NUL byte in input.csv. You can test that with
if '\0' in open('input.csv').read():
print "you have null bytes in your input file"
else:
print "you don't"
if you do,
reader = csv.reader(x.replace('\0', '') for x in mycsv)
may get you around that. Or it may indicate you have utf16 or something 'interesting' in the .csv file.
I've solved a similar problem with an easier solution:
import codecs
csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))
The key was using the codecs module to open the file with the UTF-16 encoding, there are a lot more of encodings, check the documentation.
If you want to replace the nulls with something you can do this:
def fix_nulls(s):
for line in s:
yield line.replace('\0', ' ')
r = csv.reader(fix_nulls(open(...)))
You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.
See the (line.replace('\0','') for line in f) below, also you'll want to probably open that file up using mode rb.
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'rb') as mycsv:
reader = csv.reader( (line.replace('\0','') for line in mycsv) )
for row in reader:
if row[0] not in lines:
writer.writerow(row)
This will tell you what line is the problem.
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'r') as mycsv:
reader = csv.reader(mycsv)
try:
for i, row in enumerate(reader):
if row[0] not in lines:
writer.writerow(row)
except csv.Error:
print('csv choked on line %s' % (i+1))
raise
Perhaps this from daniweb would be helpful:
I'm getting this error when reading from a csv file: "Runtime Error!
line contains NULL byte". Any idea about the root cause of this error?
...
Ok, I got it and thought I'd post the solution. Simply yet caused me
grief... Used file was saved in a .xls format instead of a .csv Didn't
catch this because the file name itself had the .csv extension while
the type was still .xls
A tricky way:
If you develop under Lunux, you can use all the power of sed:
from subprocess import check_call, CalledProcessError
PATH_TO_FILE = '/home/user/some/path/to/file.csv'
try:
check_call("sed -i -e 's|\\x0||g' {}".format(PATH_TO_FILE), shell=True)
except CalledProcessError as err:
print(err)
The most efficient solution for huge files.
Checked for Python3, Kubuntu
def fix_nulls(s):
for line in s:
yield line.replace('\0', '')
with open(csv_file, 'r', encoding = "utf-8") as f:
reader = csv.reader(fix_nulls(f))
for line in reader:
#do something
this way works for me
I've recently fixed this issue and in my instance it was a file that was compressed that I was trying to read. Check the file format first. Then check that the contents are what the extension refers to.
Turning my linux environment into a clean complete UTF-8 environment made the trick for me.
Try the following in your command line:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
This is long settled, but I ran across this answer because I was experiencing an unexpected error while reading a CSV to process as training data in Keras and TensorFlow.
In my case, the issue was much simpler, and is worth being conscious of. The data being produced into the CSV wasn't consistent, resulting in some columns being completely missing, which seems to end up throwing this error as well.
The lesson: If you're seeing this error, verify that your data looks the way that you think it does!
pandas.read_csv now handles the different UTF encoding when reading/writing and therefore can deal directly with null bytes
data = pd.read_csv(file, encoding='utf-16')
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
for skipping the NULL byte rows
import csv
with open('sample.csv', newline='') as csv_file:
reader = csv.reader(csv_file)
while True:
try:
row = next(reader)
print(row)
except csv.Error:
continue
except StopIteration:
break
The above information is great. For me I had this same error. My fix was easy and just user error aka myself. Simply save the file as a csv and not an excel file.
It is very simple.
don't make a csv file by "create new excel" or save as ".csv" from window.
simply import csv module, write a dummy csv file, and then paste your data in that.
csv made by python csv module itself will no longer show you encoding or blank line error.

Categories