calculation then and insert results into a csv in python - python
this is my first post but I am hoping you can tell me how to perform a calculation and insert the value within a csv data file.
For each row I want to be able to be able to take each 'uniqueclass' and sum the scores achieved in column 12. See example data below;
text1,Data,Class,Uniqueclass1,data1,data,2,data2,data3,data4,data5,175,12,data6,data7
text1,Data,Class,Uniqueclass1,data1,data,2,data2,data3,data4,data5,171,18,data6,data7
text1,Data,Class,Uniqueclass2,data1,data,4,data2,data3,data4,data5,164,5,data6,data7
text1,Data,Class,Uniqueclass2,data1,data,4,data2,data3,data4,data5,121,21.5,data6,data7
text2,Data,Class,Uniqueclass2,data1,data,4,data2,data3,data4,data5,100,29,data6,data7
text2,Data,Class,Uniqueclass2,data1,data,4,data2,data3,data4,data5,85,21.5,data6,data7
text3,Data,Class,Uniqueclass3,data1,data,3,data2,data3,data4,data5,987,35,data6,data7
text3,Data,Class,Uniqueclass3,data1,data,3,data2,data3,data4,data5,286,18,data6,data7
text3,Data,Class,Uniqueclass3,data1,data,3,data2,data3,data4,data5,003,5,data6,data7
So for instance the first Uniqueclass lasts for the first two rows. I would like to be able to therefore insert a subsquent value on that row which would be '346'(the sum of both 175 & 171.) The resultant would look like this:
text1,Data,Class,Uniqueclass1,data1,data,2,data2,data3,data4,data5,175,12,data6,data7,346
text1,Data,Class,Uniqueclass1,data1,data,2,data2,data3,data4,data5,171,18,data6,data7,346
I would like to be able to do this for each of the uniqueclass'
Thanks SMNALLY
I always like the defaultdict class for this type of thing.
Here would be my attempt:
from collections import defaultdict
class_col = 3
data_col = 11
# Read in the data
with open('path/to/your/file.csv', 'r') as f:
# if you have a header on the file
# header = f.readline().strip().split(',')
data = [line.strip().split(',') for line in f]
# Sum the data for each unique class.
# assuming integers, replace int with float if needed
count = defaultdict(int)
for row in data:
count[row[class_col]] += int(row[data_col])
# Append the relevant sum to the end of each row
for row in xrange(len(data)):
data[row].append(str(count[data[row][class_col]]))
# Write the results to a new csv file
with open('path/to/your/new_file.csv', 'w') as nf:
nf.write('\n'.join(','.join(row) for row in data))
Related
How can I create an endless array?
I'm trying to create an array in Python, so I can access the last cell in it without defining how many cells there are in it. Example: from csv import reader a = [] i = -1 with open("ccc.csv","r") as f: csv_reader = reader(f) for row in csv_reader: a[i] = row i = i-1 Here I'm trying to take the first row in the CSV file and put it in the last cell on the array, in order to put it in reverse order on another file. In this case, I don't know how many rows are in the CSV file, so I can not set the cells in the array as the number of the rows in the file I tried to use f.append(row), but it inserts the values to the first cell of the array, and I want it to insert the values to the last cell of the array.
Read all the rows in the normal order, and then reverse the list: from csv import reader with open('ccc.csv') as f: a = list(reader(f)) a.reverse()
First up, your current code is going to raise an index error on account of there being no elements, so a[-1] points to nothing at all. The function you're looking for is list.insert which it inherits from the generic sequence types. list.insert takes two arguments, the index to insert a value in and the value to be inserted. To rewrite your current code for this, you'd end up with something like import dbf from csv import reader a = [] with open("ccc.csv", "r") as f: csv_reader = reader(f) for row in csv_reader: a.insert(0, row) This would reverse the contents of the csv file, which you can then write to a new file or use as you need
CSV to Python Dictionary with multiple lists for one key
So I have a csv file formatted like this data_a,dataA,data1,data11 data_b,dataB,data1,data12 data_c,dataC,data1,data13 , , , data_d,dataD,data2,data21 data_e,dataE,data2,data22 data_f,dataF,data2,data23 HEADER1,HEADER2,HEADER3,HEADER4 The column headers are at the bottom, and I want the third column to be the keys. You can see that the third column is the same value for each of the two blocks of data and these blocks of data are separated by empty values, so I want to store the 3 rows of values to this 1 key and also disregard some columns such as column 4. This is my code right now #!usr/bin/env python import csv with open("example.csv") as f: readCSV = csv.reader(f) for row in readCSV: # disregard separating rows if row[2] != '': myDict = {row[2]:[row[0],row[1]]} print(myDict) What I basically want is that when I call print(myDict['data2']) I get {[data_d,dataD][data_e,dataE][data_f,dataF]} I tried editing my if loop to if row[2] == 'data2': myDict = {'data2':[row[0],row[1]]} and just make an if for every individual key, but I don't think this will work either way.
With your current method, you probably want a defaultdict. This is a dictionary-like object that provides a default value if the key doesn't already exist. So in your case, we set this up to be a list, and then for each row we loop through, we append the values in columns 0 and 1 to this list as a tuple, like so: import csv from collections import defaultdict data = defaultdict(list) with open("example.csv") as f: readCSV = csv.reader(f) for row in readCSV: # disregard separating rows if row[2] != '': data[row[2]].append((row[0], row[1])) print(data) With the example provided, this prints a defaultdict with the following entries: {'data1': [('data_a', 'dataA'), ('data_b', 'dataB'), ('data_c', 'dataC')], 'data2': [('data_d', 'dataD'), ('data_e', 'dataE'), ('data_f', 'dataF')]}
I'm not a super Python geek, but I would suggest to use pandas (import pandas as pd). So you load data with pd.read_csv(file, header). With header you can specify the row you want to be a header and then it's much much easier to manipulate with the dataset (e.g. dropping the vars (del df['column_name']), creating dictionaries, etc). Here is documentation to pd.read_csv: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Converting vertical headers to horizontal in csv
I am new to Python. I have a csv file which will generate the file in below format: Timestamp for usage of CPU 1466707823 1466707828 1466707833 Percent use for CPU# 0 0.590551162 0.588235305 0.59055119 Percent use for CPU# 1 7.874015497 7.843137402 7.67716547 But I need to generate csv file in this format: Timestamp for usage of CPU Percent use for CPU# 0 Percent use for CPU# 1 1466707823 0.590551162 7.874015497 1466707823 0.588235305 7.843137402 1466707828 0.59055119 7.67717547 I am not getting any idea how to proceed further. Could any one please help me out with this?
It seems like the simplest way to do it would be to first read and convert the data in the input file into a list of lists with each sublist corresponding to a column of data in the output csv file. The sublists will start off with the column's header and then be followed by the values associated with it from the next line. Once that is done, the built-in zip() function can be used to transpose the data matrix created. This operation effectively turns the columns of data it contains into the rows of data needed for writing out to the csv file: import csv def is_numeric_string(s): """ Determine if argument is a string representing a numeric value. """ for kind in (int, float, complex): try: kind(s) except (TypeError, ValueError): pass else: return True else: return False columns = [] with open('not_a_csv.txt') as f: for line in (line.strip() for line in f): fields = line.split() if fields: # non-blank line? if is_numeric_string(fields[0]): columns[-1] += fields # add fields to the new column else: # start a new column with this line as its header columns.append([line]) rows = zip(*columns) # transpose with open('formatted.csv', 'w') as f: csv.writer(f, delimiter='\t').writerows(rows)
Add to Values in An Array in a CSV File
I imported my CSV File and made the data into an array. Now I was wondering, what can I do so that I'm able to print a specific value in the array? For instance if I wanted the value in the 2nd row, 2nd column. Also how would I go about adding the two values together? Thanks. import csv import numpy as np f = open("Test.csv") csv_f = csv.reader(f) for row in csv_f: print(np.array(row)) f.close()
There is no need to use csv module. This code reads csv file and prints value of cell in second row and second column. I am assuming that fields are separated by commas. with open("Test.csv") as fo: table = [row.split(",") for row in fo.read().replace("\r", "").split("\n")] print table[1][1]
So, I grabbed a dataset ("Company Funding Records") from here. Then, I just rewrote a little... #!/usr/bin/python import csv #import numpy as np csvaslist = [] f = open("TechCrunchcontinentalUSA.csv") csv_f = csv.reader(f) for row in csv_f: # print(np.array(row)) csvaslist.append(row) f.close() # Now your data is in a dict. Everything past this point is just playing # Add together a couple of arbitrary values... print int(csvaslist[2][7]) + int(csvaslist[11][7]) # Add using a conditional... print "\nNow let's see what Facebook has received..." fbsum = 0 for sublist in csvaslist: if sublist[0] == "facebook": print sublist fbsum += int(sublist[7]) print "Facebook has received", fbsum I've commented lines at a couple points to show what's being used and what was unneeded. Notice at the end that referring to a particular datapoint is simply a matter of referencing what is, effectively, original_csv_file[line_number][field_on_that_line], and then recasting as int, float, whatever you need. This is because the csv file has been changed to a list of lists.
To get specific values within your array/file, and add together: import csv f = open("Test.csv") csv_f = list(csv.reader(f)) #returns the value in the second row, second column of your file print csv_f[1][1] #returns sum of two specific values (in this example, value of second row, second column and value of first row, first column sum = int(csv_f[1][1]) + int(csv_f[0][0]) print sum
Write last three entries per name in a file
I have the following data in a file: Sarah,10 John,5 Sarah,7 Sarah,8 John,4 Sarah,2 I would like to keep the last three rows for each person. The output would be: John,5 Sarah,7 Sarah,8 John,4 Sarah,2 In the example, the first row for Sarah was removed since there where three later rows. The rows in the output also maintain the same order as the rows in the input. How can I do this? Additional Information You are all amazing - Thank you so much. Final code which seems to have been deleted from this post is - import collections with open("Class2.txt", mode="r",encoding="utf-8") as fp: count = collections.defaultdict(int) rev = reversed(fp.readlines()) rev_out = [] for line in rev: name, value = line.split(',') if count[name] >= 3: continue count[name] += 1 rev_out.append((name, value)) out = list(reversed(rev_out)) print (out)
Since this looks like csv data, use the csv module to read and write it. As you read each line, store the rows grouped by the first column. Store the line number along with the row so that they can be written out maintaining the same order as the input. Use a bound deque to keep only the last three rows for each name. Finally, sort the rows and write them out. import csv by_name = defaultdict(lambda x: deque(x, maxlen=3)) with open('my_data.csv') as f_in for i, row in enumerate(csv.reader(f_in)): by_name[row[0]].append((i, row)) # sort the rows for each name by line number, discarding the number rows = sorted(row[1] for value in by_name.values() for row in value, key=lambda row: row[0]) with open('out_data.csv', 'w') as f_out: csv.writer(f_out).writerows(rows)