how to append a column to a csv file in python - python

Background is the csv file going to grow into huge size after many columns added, so prefer not to use pandas dataframe.to_csv to write the whole matrix from memory. and also the data need to write into the same file instead of generating a new files as historic topic as tried code as below.
might be pandas to_csv append mode, from new column, but not sure how to write.
data1,data2 data3,data4
1,4,2,4
2,32,1,4
3,3,1,5
4,3,1,5
5,2,22,9
6,3,34,9
7,5,4,9
import csv
def add_col_to_csv(csvfile,fileout,new_list):
with open(csvfile, 'r') as read_f, \
open(fileout, 'w', newline='') as write_f:
csv_reader = csv.reader(read_f)
csv_writer = csv.writer(write_f)
i = 0
for row in csv_reader:
row.append(new_list[i])
csv_writer.writerow(row)
i += 1
new_list1 = ['new_col',4,4,5,5,9,9,9]
add_col_to_csv('input.csv','output.csv',new_list1)

you can use something like this
df = pd.DataFrame(new_list1).to_csv(f'output.csv', mode='a', index=False, header=False)
del df
del new_list1
new_list1 = []
this will append it and delete it from memory right after. You can enable index and header based on the values in you're array how ever this is a very weird and bad way to append to csv files try json instead.

Related

Append Data to the end of a row in a csv file (Python)

I am attempting to append 4 elements to the end of a row in a csv file
Original
toetag1,tire11,tire12,tire13,tire14
Desired Outcome
toetag1,tire11,tire12,tire13,tire14,wtire1,wtire2,wtire3,wtire4
I attempted to research ways to do this how ever most search results yielded results such as "how to append to a csv file in python"
Can someone direct me in the correct way to solve this problem
I advise you to use pandas module and read_csv method.
You can use the following code for instance :
data = pandas.read_csv("your_file.csv")
row = data.iloc[0].to_numpy()
row.append(['wtire1','wtire2','wtire3','wtire4'])
You can read the csv file to a list of lists and do the necessary manipulation before writing it back.
import csv
#read csv file
with open("original.csv") as f:
reader = csv.reader(f)
data = [row for row in reader]
#modify data
data[0] = data[0] + ["wtire1", "wtire2", "wtire3", "wtire4"]
#write to new file
with open("output.csv", "w") as f:
writer = csv.writer(f)
writer.writerows(data)

Add new dictionary values to an existing csv

I am trying to add 2 new columns to an existing file in the same program. The csv is generated by the previous function.
After looking at many answers here, I tried this, but it doesn't work because I couldn't find any answers using the csv dict writer in them, they were all about csv writer. This just creates a new file with these 2 columns in them. Can I get some help with this?
for me, sp in zip(meds, specs):
print(me.text, sp.text)
dict2 = {"Medicines": me.text, "Specialities": sp.text}
with open(f'Infusion_t{zip_add}.csv', 'r') as read, \
open(f'(Infusion_final{zip_add}.csv', 'a+', encoding='utf-8-sig', newline='') as f:
reader = csv.reader(read)
w = csv.DictWriter(f, dict2.keys())
for row in reader:
if not header_added:
w.writeheader()
header_added = True
row.append(w.writerow(dict2))
You need to append the new columns to row, then write row to the output file. You don't need the dictionary or DictWriter.
You can also open the output file just once before the loop, and write the header there, rather than each time through the main loop.
with open(f'(Infusion_final{zip_add}.csv', 'w', encoding='utf-8-sig', newline='') as f:
w = csv.writer(f)
w.writerow(['col1', 'col2', 'col3', ..., 'Medicines', 'Specalities']) # replace colX with the names of the original columns
for me, sp in zip(meds, specs):
print(me.text, sp.text)
with open(f'Infusion_t{zip_add}.csv', 'r') as read:
reader = csv.reader(read)
for row in reader:
row.append(me.text)
row.append(sp.text)
w.writerow(row)

Writing to a temporary csv file in Python to read from it for sorting and then writing to another file produces empty results

I am having to add couple of lists in python as columns to an existing CSV file. I want to make use of a temporary file for the output CSV because I want to sort first 2 columns of that resulting data and then write to a new final CSV file. I don't want to keep the unsorted csv file which is why I am trying to use tempfile.NamedTemporaryFile for that step. It's giving nothing in the final CSV file but no other code errors. I changed how the with blocks are indented but unable to fix it. I tested by using a file on disk which works fine. I need help understanding what I am doing wrong. Here is my code:
# Open the existing csv in read mode and new temporary csv in write mode
with open(csvfile.name, 'r') as read_f, \
tempfile.NamedTemporaryFile(suffix='.csv', prefix=('inter'), mode='w', delete=False) as write_f:
csv_reader = csv.reader(read_f)
csv_writer = csv.writer(write_f)
i = 0
for row in csv_reader:
# Append the new list values to that row/list
row.append(company_list[i])
row.append(highest_percentage[i])
# Add the updated row / list to the output file
csv_writer.writerow(row)
i += 1
with open(write_f.name) as data:
stuff = csv.reader(data)
sortedlist = sorted(stuff, key=operator.itemgetter(0, 1))
#now write the sorted result into final CSV file
with open(fileout, 'w', newline='') as f:
fileWriter = csv.writer(f)
for row in sortedlist:
fileWriter.writerow(row)
You should insert a write_f.seek(0, 0)
Just before the line opening the temporary file:
write_f.seek(0, 0)
with open(write_f.name) as data:
I found out what was causing the IndexError and consequently the empty final CSV. I resolved it with the help of this: CSV file written with Python has blank lines between each row. Here's my changed code that worked as desired:
with open(csvfile.name, 'r') as read_f, \
tempfile.NamedTemporaryFile(suffix='.csv', prefix=('inter'), newline='', mode='w+', delete=False) as write_f:
csv_reader = csv.reader(read_f)
csv_writer = csv.writer(write_f)
i = 0
for row in csv_reader:
# Append the new list values to that row/list
row.append(company_list[i])
row.append(highest_percentage[i])
# Add the updated row / list to the output file
csv_writer.writerow(row)
i += 1
with open(write_f.name) as read_stuff, \
open(fileout, 'w', newline='') as write_stuff:
read_data = csv.reader(read_stuff)
write_data = csv.writer(write_stuff)
sortedlist = sorted(read_data, key=operator.itemgetter(0, 1))
for row in sortedlist:
write_data.writerow(row)

Pandas picks wrong columns with df[[]]

I have a large csv file, 40+ columns, I'm trying to sort it using pandas and only write selected ones into a new file. Here's my code:
Edit: I was probably wrong to assume I've done everything correctly up until the end, here's the entire file: I read in 10 csv files, add them to one, filter the rows so that they are unique in a way I need them to, then I want to filter again, this time select just the few columns.
I am completely new to python, so the code probably looks disgusting and there's the issue I assume.
if __name__ == "__main__":
files = ['airOT199701.csv', 'airOT199702.csv', 'airOT199703.csv', 'airOT199704.csv', 'airOT199705.csv', 'airOT199706.csv', 'airOT199707.csv', 'airOT199708.csv', 'airOT199709.csv', 'airOT199710.csv', 'airOT199711.csv', 'airOT199712.csv']
with open('filterflights.csv', 'w') as outcsv:
writer = csv.DictWriter(outcsv, fieldnames = ["YEAR","MONTH","DAY_OF_MONTH","DAY_OF_WEEK","FL_DATE","UNIQUE_CARRIER","TAIL_NUM","FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN","ORIGIN_STATE_ABR","DEST_AIRPORT_ID","DEST","DEST_STATE_ABR","CRS_DEP_TIME","DEP_TIME","DEP_DELAY","DEP_DELAY_NEW","DEP_DEL15","DEP_DELAY_GROUP","TAXI_OUT","WHEELS_OFF","WHEELS_ON","TAXI_IN","CRS_ARR_TIME","ARR_TIME","ARR_DELAY","ARR_DELAY_NEW","ARR_DEL15","ARR_DELAY_GROUP","CANCELLED","CANCELLATION_CODE","DIVERTED","CRS_ELAPSED_TIME","ACTUAL_ELAPSED_TIME","AIR_TIME","FLIGHTS","DISTANCE","DISTANCE_GROUP","CARRIER_DELAY","WEATHER_DELAY","NAS_DELAY","SECURITY_DELAY","LATE_AIRCRAFT_DELAY","DIFFERENCE"])
writer.writeheader()
filewriter = csv.writer(outcsv, delimiter=',')
for i in range(len(files)):
reader = csv.reader(open(files[i], 'r'), delimiter=',')
next(reader, None)
result = set()
for r in reader:
r.append(abs(int(r[8])-int(r[11]))%25)
key = (r[7],r[8],r[11])
if key not in result:
filewriter.writerow(r)
result.add(key)
df = pd.read_csv('filterflights.csv')
df.header(3)
df = df[["FL_DATE","FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN","ORIGIN_STATE_ABR", "DEST_AIRPORT_ID","DEST","DEST_STATE_ABR", "DEP_TIME", "ARR_TIME", "DISTANCE", "DIFFERENCE"]]
df.header(3)
df.to_csv('filteredflights.csv', index=False)
I get the error:AttributeError: 'DataFrame' object has no attribute 'header' in line 23. All csv files are in the same folder as the python file
Possible issue: original csv files do not have DIFFERENCE column, can that cause the issue? Trying to append value with r.append, but maybe it doesn't know what to append to?
you can use pandas.reindex() to subset the data frame and preserve given order,
col_subset = ["FL_DATE","FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN","ORIGIN_STATE_ABR", "DEST_AIRPORT_ID","DEST","DEST_STATE_ABR", "DEP_TIME", "ARR_TIME", "DISTANCE", "DIFFERENCE"]
df = df.reindex(columns= col_subset)

Writing a filtered CSV file to a new file and iterating through a folder

I have been trying initially to create a program to go through one file and select certain columns that will then be moved to a new text file. So far I have
import os, sys, csv
os.chdir("C://Users//nelsonj//Desktop//Master_Project")
with open('CHS_2009_test.txt', "rb") as sitefile:
reader = csv.reader(sitefile, delimiter=',')
pref_cols = [0,1,2,4,6,8,10,12,14,18,20,22,24,26,30,34,36,40]
for row in reader:
new_cols = list(row[i] for i in pref_cols)
print new_cols
I have been trying to use the csv functions to write the new file but I am continuosly getting errors. I will eventually need to do this over a folder of files, but thought I would try to do it on one before tackling that.
Code I attempted to use to write this data to a new file
for row in reader:
with open("CHS_2009_edit.txt", 'w') as file:
new_cols = list(row[i] for i in pref_cols)
newfile = csv.writer(file)
newfile.writerows(new_cols)
This kind of works in that I get a new file, but in only prints the second row of values from my csv, i.e., not the header values and places commas in between each individual character, not just copying over the original columns as they were.
I am using PythonWin with Python 2.6(from ArcGIS)
Thanks for the help!
NEW UPDATED CODE
import os, sys, csv
path = ('C://Users//nelsonj//Desktop//Master_Project')
for filename in os.listdir(path):
pref_cols = [0,1,2,4,6,8,10,12,14,18,20,22,24,26,30,34,36,40]
with open(filename, "rb") as sitefile:
with open(filename.rsplit('.',1)[0] + "_Master.txt", 'w') as output_file:
reader = csv.reader(sitefile, delimiter=',')
writer = csv.writer(output_file)
for row in reader:
new_row = list(row[i] for i in pref_cols)
writer.writerow(new_row)
print new_row
Getting list index out of range for the new_row, but it seems to still be processing the file. Only thing I can't get it to do now is loop through all files in my directory. Here's a hyperlink to Screenshot of data text file
Try this:
new_header = list(row[i] for i in pref_cols if i in row)
That should avoid the error, but it may not avoid the underlying problem. Would you paste your CSV file somewhere that I can access, and I'll fix this for you?
For your purpose of filtering, you don't have to treat the header differently from the rest of the data. You can go ahead remove the following block:
headers = reader.next()
for row in headers:
new_header = list(row[i] for i in pref_cols)
print new_header
Your code did not work because you treated headers as a list of rows, but headers is just one row.
Update
This update deals with writing the CSV data to a new file. You should move the open statement above the for row...
with open("CHS_2009_edit.txt", 'w') as output_file:
writer = csv.writer(output_file)
for row in reader:
new_cols = list(row[i] for i in pref_cols)
writer.writerows(new_cols)
Update 2
This update deals with the header output problem. If you followed my suggestions, you should not have this problem. I don't know what your current code looks like, but it looks like you supplies a string where the code expects a list. Here is the code that I tried on my system (using my made-up data) and it seems to work:
pref_cols = [...] # <<=== Should be set before entering the loop
with open('CHS_2009_test.txt', "rb") as sitefile:
with open('CHS_2009_edit.txt', 'w') as output_file:
reader = csv.reader(sitefile, delimiter=',')
writer = csv.writer(output_file)
for row in reader:
new_row = list(row[i] for i in pref_cols)
writer.writerow(new_row)
One thing to notice: I use writerow() to write a single row, where you use writerows() -- that makes a difference.

Categories