I am trying to write to a .tsv file using python's CSV module, this is my code so far
file_name = "test.tsv"
TEMPLATE = "template.tsv"
fil = open(file_name, "w")
# Added suggested change
template = csv.DictReader(open(TEMPLATE, 'r'), delimiter='\t')
new_file = csv.DictWriter(fil, fieldnames=template.fieldnames, delimiter='\t')
new_file.writeheader()
basically TEMPLATE is a file that will contain the headers for the file, so i read the headers using DictReader and pass the fieldnames to DictWriter, as far as i know the code is fine, the file test.tsv is being created but for some reason the headers are not being written.
Any help as to why this is happening is appreciated, thanks.
DictReader's first argument should be a file object (create with open()), cf. http://docs.python.org/py3k/library/csv.html#csv.DictReader
You forgot open() for the TEMPLATE file.
import csv
file_name = "test.tsv"
TEMPLATE = "template.tsv"
fil = open(file_name, "w")
# you forgot this line, which will open the file
template_file = open(TEMPLATE, 'r')
template = csv.DictReader(template_file, delimiter='\t')
new_file = csv.DictWriter(fil, fieldnames=template.fieldnames, delimiter='\t')
new_file.writeheader()
Try to give DictReader opened file instead of file name:
csv.DictReader(open(TEMPLATE, 'r'), delimiter='\t')
Same for the writer, but opened for writing.
Related
So I'm trying to run a scrape of a website. The scraper runs very well. But whenever I try to write the scraped information/rows into the csv file it deletes the previous row. I end up just having the very last scrape result in the file at the end. I'm sure it's just an indentation error? I'm still new to Python so any help would be appreciated!
Code:
# create general field names
fields = ['name', 'about', 'job_title', 'location','company',
'education','accomplishments','linkedin_url']
with open('ScrapeResults.csv', 'w') as f:
# using csv.writer method from CSV package
write = csv.writer(f)
write.writerow(fields)
f.close()
# Loop-through urls to scrape multiple pages at once
for individual,link in contact_dict.items():
## assign ##
the_name = individual
the_link = link
# scrape peoples url:
person = Person(the_link, driver=driver, close_on_complete=False)
# rows to be written
rows = [[person.name, person.about, person.job_title, person.location, person.company,
person.educations, person.accomplishments, person.linkedin_url]]
# write
with open('ScrapeResults.csv', 'w') as f:
# using csv.writer method from CSV package
write = csv.writer(f)
write.writerows(rows)
f.close()
You will need to open the file in append mode.
change
with open('ScrapeResults.csv', 'w') as f:
to
with open('ScrapeResults.csv', 'a') as f:
I think you're overwriting the file instead of appending it. Append mode in the for loop should fix the issue
with open('ScrapeResults.csv', "a")
Please refer python IO documentation: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
You have to open the file in append mode.
replace
with open('ScrapeResults.csv', 'w') as f:
to
with open('ScrapeResults.csv', 'a') as f:
I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?
You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html
file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.
Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()
import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.
See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()
in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)
Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
So, I'm trying to make a function to delete a row from my csv depending on the Name given by the parameter.
Original File:
Janet,5,cats
Wilson,67,dogs
Karen,8,mice
John,12,birds
My Code:
csv_remove("Karen")
Intended File:
Janet,5,cats
Wilson,67,dogs
John,12,birds
However, when I execute my code, I get weird newlines everywhere.
Janet,5,cats
Wilson,67,dogs
John,12,birds
Here is the full code:
def csv_remove(name):
element_list = []
with open(csv_path, 'r') as j:
csv_file = csv.reader(j)
for row in csv_file:
element_list.append(row)
if row[0] == name:
element_list.remove(row)
with open(csv_path, 'w') as j:
csv_file = csv.writer(j)
csv_file.writerows(element_list)
csv_remove("Karen")
Read the documentation. When opening a file for writing using the csv-module you need to
supply newline="":
Source: https://docs.python.org/3/library/csv.html#csv.writer
csv.writer(csvfile, dialect='excel', **fmtparams)
Return a writer object responsible for converting the user’s data into delimited strings on the given file-like object. csvfile can be any object with a write() method. If csvfile is a file object, it should be opened with newline=''
The csv-module handles newlines itself. If you do not specify newline='' for the opened file, it will muck up the line endings and you end up with empty lines in it.
I am generating a number of csv files dynamically, using the following code:
import csv
fieldnames = ['foo1', 'foo2', 'foo3', 'foo4']
with open(csvfilepath, 'wb') as csvfile:
csvwrite = csv.DictWriter(csvfile, delimiter=',', fieldnames=fieldnames)
csvwrite.writeheader()
for row in data:
csvwrite.writerow(row)
To save space, I want to compress them.
Using the gzip module is quite easy:
with gzip.open("foo.gz", "w") as csvfile :
csvwrite = csv.DictWriter(csvfile, delimiter=',', fieldnames=fieldnames)
csvwrite.writeheader()
for row in data:
csvwrite.writerow(row)
But I want the file in 'zip' format.
I tried the zipfile module, but I am unable to directly write files into the zip archive.
Instead, I have to write the csv file to disk, compress them in a zip file using following code, and then delete the csv file.
with ZipFile(zipfilepath, 'w') as zipfile:
zipfile.write(csvfilepath, csvfilename, ZIP_DEFLATED)
How can I write a csv file directly to a compressed zip similar to gzip?
Use the cStringIO.StringIO object to imitate a file:
with ZipFile(your_zip_file, 'w', ZIP_DEFLATED) as zip_file:
string_buffer = StringIO()
writer = csv.writer(string_buffer)
# Write data using the writer object.
zip_file.writestr(filename + '.csv', string_buffer.getvalue())
Thanks kroolik
It's done with little modification.
with ZipFile(your_zip_file, 'w', ZIP_DEFLATED) as zip_file:
string_buffer = StringIO()
csvwriter = csv.DictWriter(string_buffer, delimiter=',', fieldnames=fieldnames)
csvwrite.writeheader()
for row in cdrdata:
csvwrite.writerow(row)
zip_file.writestr(filename + '.csv', string_buffer.getvalue())
Having IOString to store every bytes in memory could be very memory consuming.
Based on the zipfile module documentation after creating a ZipFile object, all individual files has to be opened. Like this:
with ZipFile('spam.zip') as myzip:
with myzip.open('eggs.txt') as myfile:
print(myfile.read())
This example can be used for write as well...
I am trying to read and write on the same CSV file:
file1 = open(file.csv, 'rb')
file2 = open(file.csv, 'wb')
reader = csv.reader(file1)
writer = csv.writer(file2)
for row in reader:
if row[2] == 'Test':
writer.writerow( row[0], row[1], 'Somevalue')
My csv files are:
val1,2323,Notest
val2, 2323,Test
So basically if my row[2] value is Test I want to replace it with Some new value.
The above code gives me empty CSV files.
You should use different output file name. Even if you want the name to be the same, you should use some temporary name and finally rename file.
When you open file in 'w' (or 'wb') mode this file is "cleared" -- whole file content disappears. Python documentation for open() says:
... 'w' for only writing (an existing file with the same name will be erased), ...
So your file is erased before csv functions start parsing it.
You can't open a file in both read and write modes at once.
Your code could be modified as follows:-
# Do the reading
file1 = open(file.csv, 'rb')
reader = csv.reader(file1)
new_rows_list = []
for row in reader:
if row[2] == 'Test':
new_row = [row[0], row[1], 'Somevalue']
new_rows_list.append(new_row)
file1.close() # <---IMPORTANT
# Do the writing
file2 = open(file.csv, 'wb')
writer = csv.writer(file2)
writer.writerows(new_rows_list)
file2.close()
As Jason points out, if your CSV is too big for your memory, then you'll need to write to a different filename and then rename it. This will likely be a bit slower.
If your csv file is not big enough(to explode the memory), read it all into memory and close the file before open it in write mode.
Or you should consider writing to a new file rather than the same one.
It is not possible to open the same file in two different modes in python.You have to release one of the file pointers with file_name.close() before opening the file in another mode!