Delete empty row in XML File - python

When creating an XML file, it always creates blank lines for me.
This code looks like this:
for row in tbody.find_elements_by_xpath('./tr'):
itemsEmployee = row.find_elements_by_xpath('./td')
fileWriter.writerow([itemsEmployee[1].text, itemsEmployee[5].text, itemsEmployee[2].text, itemsEmployee[3].text,
itemsEmployee[4].text, itemsEmployee[6].text, itemsEmployee[7].text, itemsEmployee[8].text])
First of all, I don't know why I get blank lines. But anyway.
I now want to delete the empty lines and save the XML. (In a new file)
My attempt was as follows:
def deleteEmptyRowsInXML():
input = open('../data/employees_csv.csv', 'rb')
output = open('../data/employees.csv', 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
if row:
writer.writerow(row)
input.close()
os.remove('../data/employees_csv.csv')
output.close()
I would also like a solution in the same file.
Get the error:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
in this line:
for row in csv.reader(input):

A csv writer expects its underlying file to be opened with newline=''. The rationale is that RFC 4180 mandates that a csv file should have '\r\n' as end of line independently on which system it is generated. So the csv module explicitely adds the \r\n, but if you forgot newline='' you get an empty line for each row.
So it should be: output = open('../data/employees.csv', 'w', newline='')

The error message says that the file was probably not opened in text mode.
And in fact you opened it in binary mode : "rb" means "read file in binary mode". And "wb" means "write file in binary mode"
So change to this:
input = open('../data/employees_csv.csv', 'r')
output = open('../data/employees.csv', 'w')
But that's possible that you'll have other errors too. For the moment, I can't say more cause we don't have a reproducible example. but it will perhaps be enough to change the lines I pointed.

Related

Python problem reading CSV files that contain the word NUL [duplicate]

I'm working with some CSV files, with the following code:
reader = csv.reader(open(filepath, "rU"))
try:
for row in reader:
print 'Row read successfully!', row
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
And one file is throwing this error:
file my.csv, line 1: line contains NULL byte
What can I do? Google seems to suggest that it may be an Excel file that's been saved as a .csv improperly. Is there any way I can get round this problem in Python?
== UPDATE ==
Following #JohnMachin's comment below, I tried adding these lines to my script:
print repr(open(filepath, 'rb').read(200)) # dump 1st 200 bytes of file
data = open(filepath, 'rb').read()
print data.find('\x00')
print data.count('\x00')
And this is the output I got:
'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\ .... <snip>
8
13834
So the file does indeed contain NUL bytes.
As #S.Lott says, you should be opening your files in 'rb' mode, not 'rU' mode. However that may NOT be causing your current problem. As far as I know, using 'rU' mode would mess you up if there are embedded \r in the data, but not cause any other dramas. I also note that you have several files (all opened with 'rU' ??) but only one causing a problem.
If the csv module says that you have a "NULL" (silly message, should be "NUL") byte in your file, then you need to check out what is in your file. I would suggest that you do this even if using 'rb' makes the problem go away.
repr() is (or wants to be) your debugging friend. It will show unambiguously what you've got, in a platform independant fashion (which is helpful to helpers who are unaware what od is or does). Do this:
print repr(open('my.csv', 'rb').read(200)) # dump 1st 200 bytes of file
and carefully copy/paste (don't retype) the result into an edit of your question (not into a comment).
Also note that if the file is really dodgy e.g. no \r or \n within reasonable distance from the start of the file, the line number reported by reader.line_num will be (unhelpfully) 1. Find where the first \x00 is (if any) by doing
data = open('my.csv', 'rb').read()
print data.find('\x00')
and make sure that you dump at least that many bytes with repr or od.
What does data.count('\x00') tell you? If there are many, you may want to do something like
for i, c in enumerate(data):
if c == '\x00':
print i, repr(data[i-30:i]) + ' *NUL* ' + repr(data[i+1:i+31])
so that you can see the NUL bytes in context.
If you can see \x00 in the output (or \0 in your od -c output), then you definitely have NUL byte(s) in the file, and you will need to do something like this:
fi = open('my.csv', 'rb')
data = fi.read()
fi.close()
fo = open('mynew.csv', 'wb')
fo.write(data.replace('\x00', ''))
fo.close()
By the way, have you looked at the file (including the last few lines) with a text editor? Does it actually look like a reasonable CSV file like the other (no "NULL byte" exception) files?
data_initial = open("staff.csv", "rb")
data = csv.reader((line.replace('\0','') for line in data_initial), delimiter=",")
This works for me.
Reading it as UTF-16 was also my problem.
Here's my code that ended up working:
f=codecs.open(location,"rb","utf-16")
csvread=csv.reader(f,delimiter='\t')
csvread.next()
for row in csvread:
print row
Where location is the directory of your csv file.
You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.
with open(filepath, "rb") as f:
reader = csv.reader( (line.replace('\0','') for line in f) )
try:
for row in reader:
print 'Row read successfully!', row
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
I bumped into this problem as well. Using the Python csv module, I was trying to read an XLS file created in MS Excel and running into the NULL byte error you were getting. I looked around and found the xlrd Python module for reading and formatting data from MS Excel spreadsheet files. With the xlrd module, I am not only able to read the file properly, but I can also access many different parts of the file in a way I couldn't before.
I thought it might help you.
Converting the encoding of the source file from UTF-16 to UTF-8 solve my problem.
How to convert a file to utf-8 in Python?
import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open(sourceFileName, "r", "utf-16") as sourceFile:
with codecs.open(targetFileName, "w", "utf-8") as targetFile:
while True:
contents = sourceFile.read(BLOCKSIZE)
if not contents:
break
targetFile.write(contents)
Why are you doing this?
reader = csv.reader(open(filepath, "rU"))
The docs are pretty clear that you must do this:
with open(filepath, "rb") as src:
reader= csv.reader( src )
The mode must be "rb" to read.
http://docs.python.org/library/csv.html#csv.reader
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
appparently it's a XLS file and not a CSV file as http://www.garykessler.net/library/file_sigs.html confirm
Instead of csv reader I use read file and split function for string:
lines = open(input_file,'rb')
for line_all in lines:
line=line_all.replace('\x00', '').split(";")
I got the same error. Saved the file in UTF-8 and it worked.
This happened to me when I created a CSV file with OpenOffice Calc. It didn't happen when I created the CSV file in my text editor, even if I later edited it with Calc.
I solved my problem by copy-pasting in my text editor the data from my Calc-created file to a new editor-created file.
I had the same problem opening a CSV produced from a webservice which inserted NULL bytes in empty headers. I did the following to clean the file:
with codecs.open ('my.csv', 'rb', 'utf-8') as myfile:
data = myfile.read()
# clean file first if dirty
if data.count( '\x00' ):
print 'Cleaning...'
with codecs.open('my.csv.tmp', 'w', 'utf-8') as of:
for line in data:
of.write(line.replace('\x00', ''))
shutil.move( 'my.csv.tmp', 'my.csv' )
with codecs.open ('my.csv', 'rb', 'utf-8') as myfile:
myreader = csv.reader(myfile, delimiter=',')
# Continue with your business logic here...
Disclaimer:
Be aware that this overwrites your original data. Make sure you have a backup copy of it. You have been warned!
I opened and saved the original csv file as a .csv file through Excel's "Save As" and the NULL byte disappeared.
I think the original encoding for the file I received was double byte unicode (it had a null character every other character) so saving it through excel fixed the encoding.
For all those 'rU' filemode haters: I just tried opening a CSV file from a Windows machine on a Mac with the 'rb' filemode and I got this error from the csv module:
Error: new-line character seen in unquoted field - do you need to
open the file in universal-newline mode?
Opening the file in 'rU' mode works fine. I love universal-newline mode -- it saves me so much hassle.
I encountered this when using scrapy and fetching a zipped csvfile without having a correct middleware to unzip the response body before handing it to the csvreader. Hence the file was not really a csv file and threw the line contains NULL byte error accordingly.
Have you tried using gzip.open?
with gzip.open('my.csv', 'rb') as data_file:
I was trying to open a file that had been compressed but had the extension '.csv' instead of 'csv.gz'. This error kept showing up until I used gzip.open
One case is that - If the CSV file contains empty rows this error may show up. Check for row is necessary before we proceed to write or read.
for row in csvreader:
if (row):
do something
I solved my issue by adding this check in the code.

Trying to import a list of words using csv (Python 2.7)

import csv, Tkinter
with open('most_common_words.csv') as csv_file: # Opens the file in a 'closure' so that when it's finished it's automatically closed"
csv_reader = csv.reader(csv_file) # Create a csv reader instance
for row in csv_reader: # Read each line in the csv file into 'row' as a list
print row[0] # Print the first item in the list
I'm trying to import this list of most common words using csv. It continues to give me the same error
for row in csv_reader: # Read each line in the csv file into 'row' as a list
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
I've tried a couple different ways to do it as well, but they didn't work either. Any suggestions?
Also, where does this file need to be saved? Is it okay just being in the same folder as the program?
You should always open a CSV file in binary mode (Python 2) or universal newline mode (Python 3). Also, make sure that the delimiters and quote characters are , and ", or you'll need to specify otherwise:
with open('most_common_words.csv', 'rb') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';', quotechar='"') # for EU CSV
You can save the file in the same folder as your program. If you don't, you can provide the correct path to open() as well. Be sure to use raw strings if you're on Windows, otherwise the backslashes may trick you: open(r"C:\Python27\data\table.csv")
It seems you have a file with one column as you say here:
It is a simple list of words. When I open it up, it opens into Excel
with one column and 500 rows of 500 different words.
If so, you don't need the csv module at all:
with open('most_common_words.csv') as f:
rows = list(f)
Note in this case, each item of the list will have the newline appended to it, so if your file is:
apple
dog
cat
rows will be ['apple\n', 'dog\n', 'cat\n']
If you want to strip the end of line, then you can do this:
with open('most_common_words.csv') as f:
rows = list(i.rstrip() for i in f)

How to create a text file in python without csv.writer?

I need a comma seperated txt file with txt extension.
"a,b,c"
I used csv.writer to create a csv file changed the extension. Another prog would not use/process the data. I tried "wb", "w."
F = open(Fn, 'w')
w = csv.writer(F)
w.writerow(sym)
F.close()
opened with notepad ---These are the complete files.
Their file: created using their gui used three symbols
PDCO,ICUI,DVA
my file : created using python
PDCO,ICUI,DVA
Tested: open thier file- worked, opened my file - failed.
Simple open and close with save in notepad. open my file-- worked
Works= 'PDCO,ICUI,DVA'
Fails= 'PDCO,ICUI,DVA\r\r\n'
Edit: writing txt file without Cvs writer.....
sym = ['MHS','MRK','AIG']
with open(r'C:\filename.txt', 'w') as F: # also try 'w'
for s in sym[:-1]: # separate all but the last
F.write(s + ',') # symbols with commas
F.write(sym[-1]) # end with the last symbol
To me, it look like you don't exactly know you third party application input format. If a .CSV isn't reconized, it might be something else.
Did you try to change the delimiter fromn ';' to ','
import csv
spamWriter = csv.writer(open('eggs.csv', 'wb'), delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Take a look in the CSV Python API
I think the problem is your file write mode, as per CSV file written with Python has blank lines between each row
If you create your csv file like
csv.writer(open('myfile.csv', 'w'))
csv.writer ends its lines in '\r\n', and Python's text file handling (on Windows machines) then converts '\n' to '\r\n', resulting in lines ending in '\r\r\n'. Many programs will choke on this; Notepad recognizes it as a problem and strips the extra '\r' out.
If you use
csv.writer(open('myfile.csv', 'wb'))
it produces the expected '\r\n' line ending, which should work as desired.
Edit: #senderle has a good point; try the following:
goodf = open('file_that_works.txt', 'rb')
print repr(goodf.read(100))
badf = open('file_that_fails.txt', 'rb')
print repr(badf.read(100))
paste the results of that here, so we can see how the two compare byte-for-byte.
Try this:
with open('file_that_works.csv', 'rb') as testfile: # file is automatically
d = csv.Sniffer().sniff(testfile.read(1024)) # closed at end of with
# block
with open(Fn, 'wb') as F: # also try 'w'
w = csv.writer(F, dialect=d)
w.writerow(sym)
To explain further: this looks at a sample of a working .csv file and deduces its format. Then it uses that format to write a new .csv file that, hopefully, will not have to be resaved in notepad.
Edit: if the program you're using doesn't accept multi-line input (?!) then don't use csv. Just do something like this:
syms = ['JAGHS','GJKDGJ','GJDFAJ']
with open('filename.txt', 'wb') as F:
for s in syms[:-1]: # separate all but the last
F.write(s + ',') # symbols with commas
F.write(syms[-1]) # end with the last symbol
Or more tersely:
with open('filename.txt', 'wb') as F:
F.write(','.join(syms))
Also, check different file extensions (i.e. .txt, .csv, etc) to make sure that's not the problem. If this program chokes on a newline, then anything is possible.
So, I save as text file.
Now, create my own txt file with python.
What are the exact differences between their file and your file? Exact.
I suspect that #Hugh's comment is correct that it's an encoding issue.
When you do a Save As in notepad, what's selected in the Encoding dropdown? If you select different encodings do some or all of those fail to be opened by the 3rd party program?

How to seek and append to a binary file in python?

I am having problems appending data to a binary file. When i seek() to a location, then write() at that location and then read the whole file, i find that the data was not written at the location that i wanted. Instead, i find it right after every other data/text.
My code
file = open('myfile.dat', 'wb')
file.write('This is a sample')
file.close()
file = open('myfile.dat', 'ab')
file.seek(5)
file.write(' text')
file.close()
file = open('myfile.dat', 'rb')
print file.read() # -> This is a sample text
You can see that the seek does not work. How do i resolve this? are there other ways of achieving this?
Thanks
On some systems, 'ab' forces all writes to happen at the end of the file. You probably want 'r+b'.
r+b should work as you wish
Leave out the seek command. You already opened the file for append with 'a'.
NOTE : Remember new bytes over write previous bytes
As per python 3 syntax
with open('myfile.dat', 'wb') as file:
b = bytearray(b'This is a sample')
file.write(b)
with open('myfile.dat', 'rb+') as file:
file.seek(5)
b1 = bytearray(b' text')
#remember new bytes over write previous bytes
file.write(b1)
with open('myfile.dat', 'rb') as file:
print(file.read())
OUTPUT
b'This textample'
remember new bytes over write previous bytes

How to append a new row to an old CSV file in Python?

I am trying to add a new row to my old CSV file. Basically, it gets updated each time I run the Python script.
Right now I am storing the old CSV rows values in a list and then deleting the CSV file and creating it again with the new list value.
I wanted to know are there any better ways of doing this.
with open('document.csv','a') as fd:
fd.write(myCsvRow)
Opening a file with the 'a' parameter allows you to append to the end of the file instead of simply overwriting the existing content. Try that.
I prefer this solution using the csv module from the standard library and the with statement to avoid leaving the file open.
The key point is using 'a' for appending when you open the file.
import csv
fields=['first','second','third']
with open(r'name', 'a') as f:
writer = csv.writer(f)
writer.writerow(fields)
If you are using Python 2.7 you may experience superfluous new lines in Windows. You can try to avoid them using 'ab' instead of 'a' this will, however, cause you TypeError: a bytes-like object is required, not 'str' in python and CSV in Python 3.6. Adding the newline='', as Natacha suggests, will cause you a backward incompatibility between Python 2 and 3.
Based in the answer of #G M and paying attention to the #John La Rooy's warning, I was able to append a new row opening the file in 'a'mode.
Even in windows, in order to avoid the newline problem, you must declare it as newline=''.
Now you can open the file in 'a'mode (without the b).
import csv
with open(r'names.csv', 'a', newline='') as csvfile:
fieldnames = ['This','aNew']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({'This':'is', 'aNew':'Row'})
I didn't try with the regular writer (without the Dict), but I think that it'll be ok too.
If you use pandas, you can append your dataframes to an existing CSV file this way:
df.to_csv('log.csv', mode='a', index=False, header=False)
With mode='a' we ensure that we append, rather than overwrite, and with header=False we ensure that we append only the values of df rows, rather than header + values.
Are you opening the file with mode of 'a' instead of 'w'?
See Reading and Writing Files in the python docs
7.2. Reading and Writing Files
open() returns a file object, and is most commonly used with two arguments: open(filename, mode).
>>> f = open('workfile', 'w')
>>> print f <open file 'workfile', mode 'w' at 80a0960>
The first argument is a string containing the filename. The second argument is
another string containing a few characters describing the way in which
the file will be used. mode can be 'r' when the file will only be
read, 'w' for only writing (an existing file with the same name will
be erased), and 'a' opens the file for appending; any data written to
the file is automatically added to the end. 'r+' opens the file for
both reading and writing. The mode argument is optional; 'r' will be
assumed if it’s omitted.
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
If the file exists and contains data, then it is possible to generate the fieldname parameter for csv.DictWriter automatically:
# read header automatically
with open(myFile, "r") as f:
reader = csv.reader(f)
for header in reader:
break
# add row to CSV file
with open(myFile, "a", newline='') as f:
writer = csv.DictWriter(f, fieldnames=header)
writer.writerow(myDict)
I use the following approach to append a new line in a .csv file:
pose_x = 1
pose_y = 2
with open('path-to-your-csv-file.csv', mode='a') as file_:
file_.write("{},{}".format(pose_x, pose_y))
file_.write("\n") # Next line.
[NOTE]:
mode='a' is append mode.
# I like using the codecs opening in a with
field_names = ['latitude', 'longitude', 'date', 'user', 'text']
with codecs.open(filename,"ab", encoding='utf-8') as logfile:
logger = csv.DictWriter(logfile, fieldnames=field_names)
logger.writeheader()
# some more code stuff
for video in aList:
video_result = {}
video_result['date'] = video['snippet']['publishedAt']
video_result['user'] = video['id']
video_result['text'] = video['snippet']['description'].encode('utf8')
logger.writerow(video_result)

Categories