The following script for writing to a CSV file is going to run on a server which will automate the run.
d = {'col1': a, col2': b, col3': c,}
df = pandas.DataFrame(d, index = [0])
with open('foo.csv', 'a') as f:
df.to_csv(f, index = False)
The problem is, everytime I run it, the header gets copied to the CSV file. How can I modify this code to have the header copied to the CSV file only the first time its run, and never after that?
Any help will be appreciated :)
try this:
filename = '/path/to/file.csv'
df.to_csv(filename, index=False, mode='a', header=(not os.path.exists(filename)))
Related
I would like to use pd.write_csv to write "filename" (with headers) if "filename" doesn't exist, otherwise to append to "filename" if it exists. If I simply use command:
df.to_csv('filename.csv',mode = 'a',header ='column_names')
The write or append succeeds, but it seems like the header is written every time an append takes place.
How can I only add the header if the file doesn't exist, and append without header if the file does exist?
Not sure there is a way in pandas but checking if the file exists would be a simple approach:
import os
# if file does not exist write header
if not os.path.isfile('filename.csv'):
df.to_csv('filename.csv', header='column_names')
else: # else it exists so append without writing the header
df.to_csv('filename.csv', mode='a', header=False)
with open(filename, 'a') as f:
df.to_csv(f, mode='a', header=f.tell()==0)
it will add header when writes to the file first time
In Pandas dataframe "to_csv" function, use header=False if csv file exists & append to existing file.
import os
hdr = False if os.path.isfile('filename.csv') else True
df.to_csv('filename.csv', mode='a', header=hdr)
The above solutions are great, but I have a moral obligation to include the pathlib solution here:
from pathlib import Path
file_path = Path(filename)
if file_path.exists():
df.to_csv(file_path, header=False, mode='a')
else:
df.to_csv(file_path, header=True, mode='w')
Alternatively (depending on your inlining preferences):
file_exists = file_path.exists()
df.to_csv(file_path, header=not file_exists, mode='a' if file_exists else 'w')
Apart from file exist check, you can also check for non zero file size. Since it will make sense to add header if file exists but file size is zero i.e file without content. I find it helpful in some exceptional cases
import os.path
header_flag = False if (os.path.exists(fpath) and (os.path.getsize(fpath) > 0)) else True
df.to_csv(fpath, mode='a', index=False, header=header_flag)
In case if you have dict() and want to write and append into CSV file :
import pandas as pd
file_name = 'data.csv'
my_dict = {"column_1":"Apple","column_2":"Mango"}
with open(file_name, 'a') as f:
df = pd.DataFrame(my_dict)
df.to_csv(f, mode='a', header=f.tell()==0)
I am currently conducting a data scraping project with Python 3 and am attempting to write the scraped data to a CSV file. My current process to do it is this:
import csv
outputFile = csv.writer(open('myFilepath', 'w'))
outputFile.writerow(['header1', 'header2'...])
for each in data:
scrapedData = scrap(each)
outputFile.writerow([scrapedData.get('header1', 'header 1 NA'), ...])
Once this script is finished, however, the CSV file is blank. If I just run:
import csv
outputFile = csv.writer(open('myFilepath', 'w'))
outputFile.writerow(['header1', 'header2'...])
a CSV file is produced containing the headers:
header1,header2,..
If I just scrape 1 in data, for example:
outputFile.writerow(['header1', 'header2'...])
scrapedData = scrap(data[0])
outputFile.writerow([scrapedData.get('header1', 'header 1 NA'), ...])
a CSV file will be created including both the headers and the data for data[0]:
header1,header2,..
header1 data for data[0], header1 data for data[0]
Why is this the case?
When you open a file with w, it erases the previous data
From the docs
w: open for writing, truncating the file first
So when you open the file after writing scrape data with w, you just get a blank file and then you write the header on it so you only see the header. Try replacing w with a. So the new call to open the file would look like
outputFile = csv.writer(open('myFilepath', 'a'))
You can fine more information about the modes to open the file here
Ref: How do you append to a file?
Edit after DYZ's comment:
You should also be closing the file after you are done appending. I would suggest using the file like the:
with open('path/to/file', 'a') as file:
outputFile = csv.writer(file)
# Do your work with the file
This way you don't have to worry about remembering to close it. Once the code exists the with block, the file will be closed.
I would use Pandas for this:
import pandas as pd
headers = ['header1', 'header2', ...]
scraped_df = pd.DataFrame(data, columns=headers)
scraped_df.to_csv('filepath.csv')
Here I'm assuming your data object is a list of lists.
I have a list contains names of the files.
I want to append content of all the files into the first file, and then copy that file(first file which is appended) to new path.
This is what I have done till now:
This is part of code for appending (I have put a reproducable program in the end of my question please have a look on that:).
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), out_path_tempappendedfiles)
else:
with open(appended[0],'a+') as myappendedfile:
for file in appended:
myappendedfile.write(file)
shutil.copy(os.path.join(path, myappendedfile.name), out_path_tempappendedfiles)
this one will run successfully and copy successfully but it does not append files it just keep the content of the first file.
I have also tried this link it did not raises error but did not append files. so the same code except instead of using write I used shutil.copyobject
with open(file,'rb') as fd:
shutil.copyfileobj(fd, myappendedfile)
the same thing happend.
Update1
This is the whole code:
Even with the update it still does not append:
import os
import pandas as pd
d = {'Clinic Number':[1,1,1,2,2,3],'date':['2015-05-05','2015-05-05','2015-05-05','2015-05-05','2016-05-05','2017-05-05'],'file':['1a.txt','1b.txt','1c.txt','2.txt','4.txt','5.txt']}
df = pd.DataFrame(data=d)
df.sort_values(['Clinic Number', 'date'], inplace=True)
df['row_number'] = (df.date.ne(df.date.shift()) | df['Clinic Number'].ne(df['Clinic Number'].shift())).cumsum()
import shutil
path= 'C:/Users/sari/Documents/fldr'
out_path_tempappendedfiles='C:/Users/sari/Documents/fldr/temp'
for rownumber in df['row_number'].unique():
appended = df[df['row_number']==rownumber]['file'].tolist()
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), out_path_tempappendedfiles)
else:
with open(appended[0],'a') as myappendedfile:
for file in appended:
fd=open(file,'r')
myappendedfile.write('\n'+fd.read())
fd.close()
Shutil.copy(os.path.join(path, myappendedfile.name), out_path_tempappendedfiles)
Would you please let me know what is the problem?
you can do it like this, and if the size of files are to large to load, you can use readlines as instructed in Python append multiple files in given order to one big file
import os,shutil
file_list=['a.txt', 'a1.txt', 'a2.txt', 'a3.txt']
new_path=
with open(file_list[0], "a") as content_0:
for file_i in file_list[1:]:
f_i=open(file_i,'r')
content_0.write('\n'+f_i.read())
f_i.close()
shutil.copy(file_list[0],new_path)
so this how I resolve it.
that was very silly mistake:| not joining the basic path to it.
I changed it to use shutil.copyobj for the performance purpose, but the problem only resolved with this:
os.path.join(path,file)
before adding this I was actually reading from the file name in the list and not joining the basic path to read from actual file:|
for rownumber in df['row_number'].unique():
appended = df[df['row_number']==rownumber]['file'].tolist()
print(appended)
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), new_path)
else:
with open(appended[0], "w+") as myappendedfile:
for file in appended:
with open(os.path.join(path,file),'r+') as fd:
shutil.copyfileobj(fd, myappendedfile, 1024*1024*10)
myappendedfile.write('\n')
shutil.copy(appended[0],new_path)
I am compiling a load of CSVs into one. The first CSV contains the headers, which I am opening in write mode (maincsv). I am then making a list of all the others which live in a different folder and attempting to append them to the main one.
It works, however it just writes over the headings. I just want to start appending from line 2. I'm sure it's pretty simple but all the next(), etc. things I try just throw errors. The headings and data are aligned if that helps.
import os, csv
maincsv = open(r"C:\Data\OSdata\codepo_gb\CodepointUK.csv", 'w', newline='')
maincsvwriter = csv.writer(maincsv)
curdir = os.chdir(r"C:\Data\OSdata\codepo_gb\Data\CSV")
csvlist = os.listdir()
csvfiles = []
for file in csvlist:
path = os.path.abspath(file)
csvfiles.append(path)
for incsv in csvfiles:
opencsv = open(incsv)
csvreader = csv.reader(opencsv)
for row in csvreader:
maincsvwriter.writerow(row)
maincsv.close()
To simplify things I have the code load all the files in the directory the python code is run in. This will get the first line of the first .csv file and use it as the header.
import os
count=0
collection=open('collection.csv', 'a')
files=[f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
if ('.csv' in f):
solecsv=open(f,'r')
if count==0:
# assuming header is 1 line
header=solecsv.readline()
collection.write(header)
for x in solecsv:
if not (header in x):
collection.write(x)
collection.close()
I would like to use pd.write_csv to write "filename" (with headers) if "filename" doesn't exist, otherwise to append to "filename" if it exists. If I simply use command:
df.to_csv('filename.csv',mode = 'a',header ='column_names')
The write or append succeeds, but it seems like the header is written every time an append takes place.
How can I only add the header if the file doesn't exist, and append without header if the file does exist?
Not sure there is a way in pandas but checking if the file exists would be a simple approach:
import os
# if file does not exist write header
if not os.path.isfile('filename.csv'):
df.to_csv('filename.csv', header='column_names')
else: # else it exists so append without writing the header
df.to_csv('filename.csv', mode='a', header=False)
with open(filename, 'a') as f:
df.to_csv(f, mode='a', header=f.tell()==0)
it will add header when writes to the file first time
In Pandas dataframe "to_csv" function, use header=False if csv file exists & append to existing file.
import os
hdr = False if os.path.isfile('filename.csv') else True
df.to_csv('filename.csv', mode='a', header=hdr)
The above solutions are great, but I have a moral obligation to include the pathlib solution here:
from pathlib import Path
file_path = Path(filename)
if file_path.exists():
df.to_csv(file_path, header=False, mode='a')
else:
df.to_csv(file_path, header=True, mode='w')
Alternatively (depending on your inlining preferences):
file_exists = file_path.exists()
df.to_csv(file_path, header=not file_exists, mode='a' if file_exists else 'w')
Apart from file exist check, you can also check for non zero file size. Since it will make sense to add header if file exists but file size is zero i.e file without content. I find it helpful in some exceptional cases
import os.path
header_flag = False if (os.path.exists(fpath) and (os.path.getsize(fpath) > 0)) else True
df.to_csv(fpath, mode='a', index=False, header=header_flag)
In case if you have dict() and want to write and append into CSV file :
import pandas as pd
file_name = 'data.csv'
my_dict = {"column_1":"Apple","column_2":"Mango"}
with open(file_name, 'a') as f:
df = pd.DataFrame(my_dict)
df.to_csv(f, mode='a', header=f.tell()==0)