I am new to Python. I have used just letters to simplify my code below.My code writes a CSV file with columns of a,b,c,d values,each has 10 rows (length). I would like to add the average value of c and d to the same CSV file as well as an additional two columns each have one row for ave values. I have tried to append field names and write the new values but it didn't work.
with open('out.csv', 'w') as csvfile:
fieldnames=['a','b','c','d']
csv_writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
csv_writer.writeheader()
total_c=0
total_d=0
for i in range(1,length):
do something get a,b,c,d values.
total_c += c
total_d += d
csv_writer.writerow({'a': a,'b':b,'c':c,'d':d })
mean_c=total_c /length
mean_c=total_c /length
I expect to see something in the picture:
Try to use pandas library to deal with csv file. I provided sample code below, I assume that csv file has no header present on the first line.
import pandas as pd
data = pd.read_csv('out.csv',header=[['a','b','c','d'])
#making sure i am using copy of dataframe
avg_data = data.copy()
#creating new columns average in same dataframe
avg_data['mean_c'] = avg_data.iloc[:,2].mean(axis=1)
avg_data['mean_d'] = avg_data.iloc[:,3].mean(axis=1)
# writing updated data to csv file
avg_data.to_csv('out.csv', sep=',', encoding='utf-8')
Related
I am trying to add a few new columns with fixed values to the csv using python but the all the values are squeezed into one cell instead of separate cells. How to fix this?
My python code:
default_text = 'AoE'
with open('C:/Users/username/Desktop/Output/AoE_test.csv', 'r', newline='') as read_obj, \
open('C:/Users/username/Desktop/Output/output_1.csv', 'w', newline='') as write_obj:
csv_reader = reader(read_obj, delimiter=',')
csv_writer = writer(write_obj, delimiter=',')
for row in csv_reader:
row.append(default_text)
csv_writer.writerow(row)
This is the orginal CSV (AoE_test.csv) which the code reads data from:
This is the final output of the csv (output_1.csv) written with the data:
I've also tried to comment out the row.append():
for row in csv_reader:
# row.append(default_text)
csv_writer.writerow((row, default_text))
and the output:
I want the addtional column to be written in a separate column in CSV file. Thanks so much in advance!
Use pandas when dealing with tables. What you want to do is exactly this:
# pip install pandas
import pandas as pd
default_text = 'AoE'
in_fpath = 'C:/Users/username/Desktop/Output/AoE_test.csv'
out_fpath = 'C:/Users/username/Desktop/Output/output_1.csv', 'w'
df = pd.read_csv(in_fpath, sep=",") # while sep="," is default
df['my_new_col'] = default_text # this works, because
# pandas takes this one string and repeats it in column
# actually you should put [default_text] * df.shape[0]
# think columns as vertical lists!
df.to_csv(out_fpath)
I have a large csv file, 40+ columns, I'm trying to sort it using pandas and only write selected ones into a new file. Here's my code:
Edit: I was probably wrong to assume I've done everything correctly up until the end, here's the entire file: I read in 10 csv files, add them to one, filter the rows so that they are unique in a way I need them to, then I want to filter again, this time select just the few columns.
I am completely new to python, so the code probably looks disgusting and there's the issue I assume.
if __name__ == "__main__":
files = ['airOT199701.csv', 'airOT199702.csv', 'airOT199703.csv', 'airOT199704.csv', 'airOT199705.csv', 'airOT199706.csv', 'airOT199707.csv', 'airOT199708.csv', 'airOT199709.csv', 'airOT199710.csv', 'airOT199711.csv', 'airOT199712.csv']
with open('filterflights.csv', 'w') as outcsv:
writer = csv.DictWriter(outcsv, fieldnames = ["YEAR","MONTH","DAY_OF_MONTH","DAY_OF_WEEK","FL_DATE","UNIQUE_CARRIER","TAIL_NUM","FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN","ORIGIN_STATE_ABR","DEST_AIRPORT_ID","DEST","DEST_STATE_ABR","CRS_DEP_TIME","DEP_TIME","DEP_DELAY","DEP_DELAY_NEW","DEP_DEL15","DEP_DELAY_GROUP","TAXI_OUT","WHEELS_OFF","WHEELS_ON","TAXI_IN","CRS_ARR_TIME","ARR_TIME","ARR_DELAY","ARR_DELAY_NEW","ARR_DEL15","ARR_DELAY_GROUP","CANCELLED","CANCELLATION_CODE","DIVERTED","CRS_ELAPSED_TIME","ACTUAL_ELAPSED_TIME","AIR_TIME","FLIGHTS","DISTANCE","DISTANCE_GROUP","CARRIER_DELAY","WEATHER_DELAY","NAS_DELAY","SECURITY_DELAY","LATE_AIRCRAFT_DELAY","DIFFERENCE"])
writer.writeheader()
filewriter = csv.writer(outcsv, delimiter=',')
for i in range(len(files)):
reader = csv.reader(open(files[i], 'r'), delimiter=',')
next(reader, None)
result = set()
for r in reader:
r.append(abs(int(r[8])-int(r[11]))%25)
key = (r[7],r[8],r[11])
if key not in result:
filewriter.writerow(r)
result.add(key)
df = pd.read_csv('filterflights.csv')
df.header(3)
df = df[["FL_DATE","FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN","ORIGIN_STATE_ABR", "DEST_AIRPORT_ID","DEST","DEST_STATE_ABR", "DEP_TIME", "ARR_TIME", "DISTANCE", "DIFFERENCE"]]
df.header(3)
df.to_csv('filteredflights.csv', index=False)
I get the error:AttributeError: 'DataFrame' object has no attribute 'header' in line 23. All csv files are in the same folder as the python file
Possible issue: original csv files do not have DIFFERENCE column, can that cause the issue? Trying to append value with r.append, but maybe it doesn't know what to append to?
you can use pandas.reindex() to subset the data frame and preserve given order,
col_subset = ["FL_DATE","FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN","ORIGIN_STATE_ABR", "DEST_AIRPORT_ID","DEST","DEST_STATE_ABR", "DEP_TIME", "ARR_TIME", "DISTANCE", "DIFFERENCE"]
df = df.reindex(columns= col_subset)
I'm attempting to write a program that enters a directory full of CSV files (all with the same layout but different data), reads the files, and writes all the data in the specific columns to a new CSV file. I would also like it to miss out the entire row of data is there is a blank space in one of the columns (in this case, if there is a gap in the Name column).
The program works fine in writing in specific columns (in this case Name and Location) from the old CSV files to the new one, however, I am unsure as to how I would miss out a line if there was a blank space.
import nltk
import csv
from nltk.corpus import PlaintextCorpusReader
root = '/Users/bennaylor/local documents/humanotics'
incorpus = root + '/chats/input/'
outcorpus =root + '/chats/output.csv'
doccorpus = PlaintextCorpusReader(incorpus, '.*\.csv')
filelist = doccorpus.fileids()
with open(outcorpus, 'w', newline='') as fw:
fieldnames = (['Name','Location'])
writer = csv.DictWriter(fw, fieldnames=fieldnames)
writer.writeheader()
print('Writing Headers!!')
for rawdoc in filelist:
infile = incorpus + rawdoc
with open(infile, encoding='utf-8') as fr:
reader = csv.DictReader(fr)
for row in reader:
rowName = row['Name']
rowLocation = row['Location']
writer.writerow({'Name': rowName, 'Location': rowLocation})
An example CSV input file would look like this:
Name,Age,Location,Birth Month
Steve,14,London,November
,18,Sydney,April
Matt,12,New York,June
Jeff,20,,December
Jonty,19,Greenland,July
With gaps in the Name column on the third row, and Location column on the fifth. In this case, I would like the program to miss out the third row when writing the data to a new CSV as there is a gap in the Name column
Thanks in advance for any help.
This is easy to do using pandas:
import pandas as pd
import os
# Create an empty data frame
df = pd.DataFrame()
# Add the data from all the files into the data frame
for filename in filelist:
data = pd.read_csv(os.path.join(incorpus, filename))
df = df.append(data)
# Drop rows with any empty values
df = df.dropna()
# Keep only the needed columns
df = df.reindex(columns=['Name', 'Location'])
# Write the dataframe to a .csv file
df.to_csv(outcorpus)
I have 14 CSV files and each has 100 columns, what i want to do is to extract first column from each file and copy it in a single csv file. I have to do it for each 100 columns (for example next step is to put second column from each file in a csv file).
What i've tried before is the code below which is perfect for extracting one column, but i want to put it in a loop so i get the 100 files at once how can i do it?
import csv
import itertools as IT
filenames = ['Sul-v1.csv', 'Sul-v2.csv','Sul-v3.csv', 'Sul-v4.csv', 'Sul-v5.csv', 'Sul-v6.csv', 'Sul-v7.csv', 'Sul-v8.csv', 'Sul-v9.csv', 'Sul-v10.csv', 'Sul-v11.csv', 'Sul-v12.csv', 'Sul-v13.csv', 'Sul-v14.csv']
handles = [open(filename, 'rb') for filename in filenames]
readers = [csv.reader(f, delimiter=',') for f in handles]
with open('combined.csv', 'wb') as h:
writer = csv.writer(h, delimiter=',', lineterminator='\n', )
for rows in IT.izip_longest(*readers, fillvalue=['']*2):
combined_row = []
for row in rows:
row = row[:1] # select the columns you want
if len(row) == 1:
combined_row.extend(row)
else:
combined.extend(['']*2)
writer.writerow(combined_row)
for f in handles:
f.close()
Thanks in advance!
Use pandas.
Start by loading all csv files into one dateframe. (see here)
Next, save each column into a new csv by looping over the columns and using to_csv .
Make sure you pass the column to 'to_csv' using the 'columns' argument
I have two files, the first one is called book1.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
The second file is called book2.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,4
My goal is to copy the column that contains the 5's in book1.csv to the corresponding column in book2.csv.
The problem with my code seems to be that it is not appending right nor is it selecting just the index that I want to copy.It also gives an error that I have selected an incorrect index position. The output is as follows:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,41,2,3,4,5
Here is my code:
import csv
with open('C:/Users/SAM/Desktop/book2.csv','a') as csvout:
write=csv.writer(csvout, delimiter=',')
with open('C:/Users/SAM/Desktop/book1.csv','rb') as csvfile1:
read=csv.reader(csvfile1, delimiter=',')
header=next(read)
for row in read:
row[5]=write.writerow(row)
What should I do to get this to append properly?
Thanks for any help!
What about something like this. I read in both books, append the last element of book1 to the book2 row for every row in book2, which I store in a list. Then I write the contents of that list to a new .csv file.
with open('book1.csv', 'r') as book1:
with open('book2.csv', 'r') as book2:
reader1 = csv.reader(book1, delimiter=',')
reader2 = csv.reader(book2, delimiter=',')
both = []
fields = reader1.next() # read header row
reader2.next() # read and ignore header row
for row1, row2 in zip(reader1, reader2):
row2.append(row1[-1])
both.append(row2)
with open('output.csv', 'w') as output:
writer = csv.writer(output, delimiter=',')
writer.writerow(fields) # write a header row
writer.writerows(both)
Although some of the code above will work it is not really scalable and a vectorised approach is needed. Getting to work with numpy or pandas will make some of these tasks easier so it is great to learn a bit of it.
You can download pandas from the Pandas Website
# Load Pandas
from pandas import DataFrame
# Load each file into a pandas dataframe, this is based on a numpy array
data1 = DataFrame.from_csv('csv1.csv',sep=',',parse_dates=False)
data2 = DataFrame.from_csv('csv2.csv',sep=',',parse_dates=False)
#Now add 'header5' from data1 to data2
data2['header5'] = data1['header5']
#Save it back to csv
data2.to_csv('output.csv')
Regarding the "error that I have selected an incorrect index position," I suspect this is because you're using row[5] in your code. Indexing in Python starts from 0, so if you have A = [1, 2, 3, 4, 5] then to get the 5 you would do print(A[4]).
Assuming the two files have the same number of rows and the rows are in the same order, I think you want to do something like this:
import csv
# Open the two input files, which I've renamed to be more descriptive,
# and also an output file that we'll be creating
with open("four_col.csv", mode='r') as four_col, \
open("five_col.csv", mode='r') as five_col, \
open("five_output.csv", mode='w', newline='') as outfile:
four_reader = csv.reader(four_col)
five_reader = csv.reader(five_col)
five_writer = csv.writer(outfile)
_ = next(four_reader) # Ignore headers for the 4-column file
headers = next(five_reader)
five_writer.writerow(headers)
for four_row, five_row in zip(four_reader, five_reader):
last_col = five_row[-1] # # Or use five_row[4]
four_row.append(last_col)
five_writer.writerow(four_row)
Why not reading the files line by line and use the -1 index to find the last item?
endings=[]
with open('book1.csv') as book1:
for line in book1:
# if not header line:
endings.append(line.split(',')[-1])
linecounter=0
with open('book2.csv') as book2:
for line in book2:
# if not header line:
print line+','+str(endings[linecounter]) # or write to file
linecounter+=1
You should also catch errors if row numbers don't match.