So I'm trying to run a scrape of a website. The scraper runs very well. But whenever I try to write the scraped information/rows into the csv file it deletes the previous row. I end up just having the very last scrape result in the file at the end. I'm sure it's just an indentation error? I'm still new to Python so any help would be appreciated!
Code:
# create general field names
fields = ['name', 'about', 'job_title', 'location','company',
'education','accomplishments','linkedin_url']
with open('ScrapeResults.csv', 'w') as f:
# using csv.writer method from CSV package
write = csv.writer(f)
write.writerow(fields)
f.close()
# Loop-through urls to scrape multiple pages at once
for individual,link in contact_dict.items():
## assign ##
the_name = individual
the_link = link
# scrape peoples url:
person = Person(the_link, driver=driver, close_on_complete=False)
# rows to be written
rows = [[person.name, person.about, person.job_title, person.location, person.company,
person.educations, person.accomplishments, person.linkedin_url]]
# write
with open('ScrapeResults.csv', 'w') as f:
# using csv.writer method from CSV package
write = csv.writer(f)
write.writerows(rows)
f.close()
You will need to open the file in append mode.
change
with open('ScrapeResults.csv', 'w') as f:
to
with open('ScrapeResults.csv', 'a') as f:
I think you're overwriting the file instead of appending it. Append mode in the for loop should fix the issue
with open('ScrapeResults.csv', "a")
Please refer python IO documentation: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
You have to open the file in append mode.
replace
with open('ScrapeResults.csv', 'w') as f:
to
with open('ScrapeResults.csv', 'a') as f:
Related
I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?
You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html
file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.
Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()
import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.
See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()
in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)
Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
So, I'm trying to make a function to delete a row from my csv depending on the Name given by the parameter.
Original File:
Janet,5,cats
Wilson,67,dogs
Karen,8,mice
John,12,birds
My Code:
csv_remove("Karen")
Intended File:
Janet,5,cats
Wilson,67,dogs
John,12,birds
However, when I execute my code, I get weird newlines everywhere.
Janet,5,cats
Wilson,67,dogs
John,12,birds
Here is the full code:
def csv_remove(name):
element_list = []
with open(csv_path, 'r') as j:
csv_file = csv.reader(j)
for row in csv_file:
element_list.append(row)
if row[0] == name:
element_list.remove(row)
with open(csv_path, 'w') as j:
csv_file = csv.writer(j)
csv_file.writerows(element_list)
csv_remove("Karen")
Read the documentation. When opening a file for writing using the csv-module you need to
supply newline="":
Source: https://docs.python.org/3/library/csv.html#csv.writer
csv.writer(csvfile, dialect='excel', **fmtparams)
Return a writer object responsible for converting the user’s data into delimited strings on the given file-like object. csvfile can be any object with a write() method. If csvfile is a file object, it should be opened with newline=''
The csv-module handles newlines itself. If you do not specify newline='' for the opened file, it will muck up the line endings and you end up with empty lines in it.
I am looking for some assistance with writing API results to a .CSV file using Python.
I have my source as CSV file. It contains the below urls in a column as separate rows.
https://webapi.nhtsa.gov/api/SafetyRatings/modelyear/2013/make/Acura/model/rdx?format=csv
https://webapi.nhtsa.gov/api/SafetyRatings/modelyear/2017/make/Chevrolet/model/Corvette?format=csv
I can call the Web API and get the printed results. Please find attached 'Web API results' snapshot.
When I try to export these results into a csv, I am getting them as per the attached 'API results csv'. It is not transferring all the records. Right now, It is only sending the last record to csv.
My final output should be as per the attached 'My final output should be' for all the given inputs.
Please find the below python code that I have used. I appreciate your help on this. Please find attached image for my code.My Code
import csv, requests
with open('C:/Desktop/iva.csv',newline ='') as f:
reader = csv.reader(f)
for row in reader:
urls = row[0]
print(urls)
r = requests.get(urls)
print (r.text)
with open('C:/Desktop/ivan.csv', 'w') as csvfile:
csvfile.write(r.text)
You'll have to create a writer object of the csvfile(to be created). and use the writerow() method you could write to the csvfile.
import csv,requests
with open('C:/Desktop/iva.csv',newline ='') as f:
reader = csv.reader(f)
for row in reader:
urls = row[0]
print(urls)
r = requests.get(urls)
print (r.text)
with open('C:/Desktop/ivan.csv', 'w') as csvfile:
writerobj=csv.writer(r.text)
for line in reader:
writerobj.writerow(line)
One problem in your code is that every time you open a file using open and mode w, any existing content in that file will be lost. You could prevent that by using append mode open(filename, 'a') instead.
But even better. Just open the output file once, outside the for loop.
import csv, requests
with open('iva.csv') as infile, open('ivan.csv', 'w') as outfile:
reader = csv.reader(infile)
for row in reader:
r = requests.get(urls[0])
outfile.write(r.text)
I'm trying to print a list or dict of file names into a text file. it's currently only returning the first item on the list. the items are fetched from s3 Aws.I'm using Python 2.6
for obj in bucket.objects.filter(Prefix=prefix):
s = obj.key
with open('test.txt', 'w') as f:
f.write(s)
The problem here is that for every item, you create a new file (in case the file already exists, you remove the content so to speak), and then write s to it.
So you should swap the order of things here:
with open('test.txt', 'w') as f: # first open the file
for obj in bucket.objects.filter(Prefix=prefix): # then iterate
f.write(obj.key)
So we keep the file handle open, and each item will be written. A potential problem is that you will not write a new line after you written the key of an object. We can do this by writing a new line as well:
with open('test.txt', 'w') as f:
for obj in bucket.objects.filter(Prefix=prefix):
f.write(obj.key)
f.write('\n')
whenever you open a file for writing, the previous content is erased and new text is written. So in this case you are erasing whatever you wrote to the file in the next iteration. you can do it in this way or open the file in "append" mode and continue with what you have written.
f= open("test.txt", "w")
for obj in bucket.objects.filter(Prefix=prefix):
s = obj.key
f.write(s)
f.write('\n)
f.close()
I am trying to write to a .tsv file using python's CSV module, this is my code so far
file_name = "test.tsv"
TEMPLATE = "template.tsv"
fil = open(file_name, "w")
# Added suggested change
template = csv.DictReader(open(TEMPLATE, 'r'), delimiter='\t')
new_file = csv.DictWriter(fil, fieldnames=template.fieldnames, delimiter='\t')
new_file.writeheader()
basically TEMPLATE is a file that will contain the headers for the file, so i read the headers using DictReader and pass the fieldnames to DictWriter, as far as i know the code is fine, the file test.tsv is being created but for some reason the headers are not being written.
Any help as to why this is happening is appreciated, thanks.
DictReader's first argument should be a file object (create with open()), cf. http://docs.python.org/py3k/library/csv.html#csv.DictReader
You forgot open() for the TEMPLATE file.
import csv
file_name = "test.tsv"
TEMPLATE = "template.tsv"
fil = open(file_name, "w")
# you forgot this line, which will open the file
template_file = open(TEMPLATE, 'r')
template = csv.DictReader(template_file, delimiter='\t')
new_file = csv.DictWriter(fil, fieldnames=template.fieldnames, delimiter='\t')
new_file.writeheader()
Try to give DictReader opened file instead of file name:
csv.DictReader(open(TEMPLATE, 'r'), delimiter='\t')
Same for the writer, but opened for writing.