How to get average from two columns in csv file - python

I have a csv file with 2 columns
rw1, 24
rw2, 34
rw3, 56
rw1, 78
rw2, 56
rw2, 45
rw2, 64
rw3, 32
rw1, 28
Now i want to have average.py file which calculates average of all rw1, rw2 and rw3 respectively and write that to average.txt file
rw1 - average value,
rw2 - average value,
rw3 - average value

With pandas, it's kind of short :
import pandas as pd
df = pd.read_csv(file, header=None)
In [1]: df
Out[1]:
0 1
0 rw1 24
1 rw2 34
2 rw3 56
3 rw1 78
4 rw2 56
5 rw2 45
6 rw2 64
7 rw3 32
8 rw1 28
In [2]: df.groupby(df[0]).mean() # it groups on the column "0", and calculates the mean on the different group
Out[2]:
1
0
rw1 43.333333
rw2 49.750000
rw3 44.000000
Hope this helps !

given read csv and convert them to tuple. then sort them to use in Groupby
import itertools
import csv
fileLocation = 'newslot.csv'
with open(fileLocation,'rb') as f:
r = csv.reader(f)
lis=sorted([(i[0],i[1]) for i in r])
for k,g in itertools.groupby(lis,key=lambda x:x[0]):
g=list(g)
print k,sum(int(i[1]) for i in g)/len(g)

from itertools import groupby
from operator import itemgetter
import csv
def avg(lst):
return sum(map(float, lst)) / len(lst)
def avgcsv(filename, k=0, v=1):
with open(filename) as f:
data = sorted(csv.reader(f, skipinitialspace=True), key=itemgetter(k))
return ['%s - %g' % (name, avg(map(itemgetter(v), group)))
for name, group in groupby(data, key=itemgetter(k))]
with open('average.txt', 'w') as f:
f.write(',\n'.join(avgcsv('filename', 0, 1)))
Output
rw1 - 43.3333,
rw2 - 49.75,
rw3 - 44

Related

Is there any example of code in python which i get table of numbers from the range in the first table?

In my first table I have columns: indeks, il, start and stop. The last two define a range. I need to list (in a new table) all numbers in the range from start to stop, but also save indeks and the other values belonging to the range.
This table shows what kind of data I have (sample):
ID
Indeks
Start
Stop
il
0
A1
1
3
25
1
B1
31
55
5
2
C1
36
900
865
3
D1
900
2500
20
...
...
...
...
...
And this is the table I want to get:
Indeks
Start
Stop
il
kod
A1
1
3
25
1
A1
1
3
25
2
A1
1
3
25
3
B1
31
55
5
31
B1
31
55
5
32
B1
31
55
5
33
...
...
...
...
...
B1
31
55
5
53
B1
31
55
5
54
B1
31
55
5
55
C1
36
900
865
36
C1
36
900
865
37
C1
36
900
865
38
...
...
...
...
...
C1
36
900
865
898
C1
36
900
865
899
C1
36
900
865
900
...
...
...
...
...
EDITET
lidy=pd.read_excel('path' )
lid=pd.DataFrame(lidy)
output = []
for i in range (0,len(lid)):
for j in range (lid.iloc[i,1],lid.iloc[i,2]+1):
y=((lid.iloc[i,0], j))output.append(y)
print(output)
OR
lidy=pd.read_excel('path' )
lid=pd.DataFrame(lidy)
for i in range (0,len(lid)):
for j in range (lid.iloc[i,1],lid.iloc[i,2]+1):
y=((lid.iloc[i,0], j))
print(y)
Two options:
(1 - preferred) Use Pandas (in combination with openpyxl as engine): The Excel-file I'm using is named data.xlsx, and sheet Sheet1 contains your data. Then this
import pandas as pd
df = pd.read_excel("data.xlsx", sheet_name="Sheet1")
df["kod"] = df[["Start", "Stop"]].apply(
lambda row: range(row.iat[0], row.iat[1] + 1), axis=1
)
df = df.iloc[:, 1:].explode("kod", ignore_index=True)
with pd.ExcelWriter("data.xlsx", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, sheet_name="Sheet2", index=False)
should produce the required output in sheet Sheet2. The work is done by putting the required range()s in the new column kod, and then .explode()-ing it.
(2) Use only openpyxl:
from openpyxl import load_workbook
wb = load_workbook(filename="data.xlsx")
ws = wb["Sheet1"]
rows = ws.iter_rows(values_only=True)
# Reading the required column names
data = [list(next(rows)[1:]) + ["kod"]]
for row in rows:
# Read the input data (a row)
base = list(row[1:])
# Create the new data via iterating over the the given range
data.extend(base + [n] for n in range(base[1], base[2] + 1))
if "Sheet2" in wb.sheetnames:
del wb["Sheet2"]
ws_new = wb.create_sheet(title="Sheet2")
for row in data:
ws_new.append(row)
wb.save("data.xlsx")

Strip the last character from a string if it is a letter in python dataframe

It is possibly done with regular expressions, which I am not very strong at.
My dataframe is like this:
import pandas as pd
import regex as re
data = {'postcode': ['DG14','EC3M','BN45','M2','WC2A','W1C','PE35'], 'total':[44, 54,56, 78,87,35,36]}
df = pd.DataFrame(data)
df
postcode total
0 DG14 44
1 EC3M 54
2 BN45 56
3 M2 78
4 WC2A 87
5 W1C 35
6 PE35 36
I want to get these strings in my column with the last letter stripped like so:
postcode total
0 DG14 44
1 EC3 54
2 BN45 56
3 M2 78
4 WC2 87
5 W1C 35
6 PE35 36
Probably something using re.sub('', '\D')?
Thank you.
You could use str.replace here:
df["postcode"] = df["postcode"].str.replace(r'[A-Za-z]$', '')
One of the approaches:
import pandas as pd
import re
data = {'postcode': ['DG14','EC3M','BN45','M2','WC2A','W1C','PE35'], 'total':[44, 54,56, 78,87,35,36]}
data['postcode'] = [re.sub(r'[a-zA-Z]$', '', item) for item in data['postcode']]
df = pd.DataFrame(data)
print(df)
Output:
postcode total
0 DG14 44
1 EC3 54
2 BN45 56
3 M2 78
4 WC2 87
5 W1 35
6 PE35 36

How to read one column data as one by one row in csv file using python

Here I have a dataset with three inputs. Three inputs x1,x2,x3. Here I want to read just x2 column and in that column data stepwise row by row.
Here I wrote a code. But it is just showing only letters.
Here is my code
data = pd.read_csv('data6.csv')
row_num =0
x=[]
for col in data:
if (row_num==1):
x.append(col[0])
row_num =+ 1
print(x)
result : x1,x2,x3
What I expected output is:
expected output x2 (read one by one row)
65
32
14
25
85
47
63
21
98
65
21
47
48
49
46
43
48
25
28
29
37
Subset of my csv file :
x1 x2 x3
6 65 78
5 32 59
5 14 547
6 25 69
7 85 57
8 47 51
9 63 26
3 21 38
2 98 24
7 65 96
1 21 85
5 47 94
9 48 15
4 49 27
3 46 96
6 43 32
5 48 10
8 25 75
5 28 20
2 29 30
7 37 96
Can anyone help me to solve this error?
If you want list from x2 use:
x = data['x2'].tolist()
I am not sure I even get what you're trying to do from your code.
What you're doing (after fixing the indentation to make it somewhat correct):
Iterate through all columns of your dataframe
Take the first character of the column name if row_num is equal to 1.
Based on this guess:
import pandas as pd
data = pd.read_csv("data6.csv")
row_num = 0
x = []
for col in data:
if row_num == 1:
x.append(col[0])
row_num = +1
print(x)
What you probably want to do:
import pandas as pd
data = pd.read_csv("data6.csv")
# Make a list containing the values in column 'x2'
x = list(data['x2'])
# Print all values at once:
print(x)
# Print one value per line:
for val in x:
print(val)
When you are using pandas you can use it. You can try this to get any specific column values by using list to direct convert into a list.For loop not needed
import pandas as pd
data = pd.read_csv('data6.csv')
print(list(data['x2']))

find average value from CSV columns that contain a specific character

I am trying to get a simple python function which will read in a CSV file and find the average for come columns and rows.
The function will examine the first row and for each column whose header
starts with the letter 'Q' it will calculate the average of values in
that column and then print it to the screen. Then for each row of the
data it will calculate the students average for all items in columns
that start with 'Q'. It will calulate this average normally and also
with the lowest quiz dropped. It will print out two values per student.
the CSV file contains grades for students and looks like this:
hw1 hw2 Quiz3 hw4 Quiz2 Quiz1
john 87 98 76 67 90 56
marie 45 67 65 98 78 67
paul 54 64 93 28 83 98
fred 67 87 45 98 56 87
the code I have so far is this but I have no idea how to continue:
import csv
def practice():
newlist=[]
afile= input('enter file name')
a = open(afile, 'r')
reader = csv.reader(a, delimiter = ",")
for each in reader:
newlist.append(each)
y=sum(int(x[2] for x in reader))
print (y)
filtered = []
total = 0
for i in range (0,len(newlist)):
if 'Q' in [i][1]:
filtered.append(newlist[i])
return filtered
May I suggest the use of Pandas:
>>> import pandas as pd
>>> data = pd.read_csv('file.csv', sep=' *')
>>> q_columns = [name for name in data.columns if name.startswith('Q')]
>>> reduced_data = data[q_columns].copy()
>>> reduced_data.mean()
Quiz3 69.75
Quiz2 76.75
Quiz1 77.00
dtype: float64
>>> reduced_data.mean(axis=1)
john 74.000000
marie 70.000000
paul 91.333333
fred 62.666667
dtype: float64
>>> import numpy as np
>>> for index, column in reduced_data.idxmin(axis=1).iteritems():
... reduced_data.ix[index, column] = np.nan
>>> reduced_data.mean(axis=1)
john 83.0
marie 72.5
paul 95.5
fred 71.5
dtype: float64
You would have a nicer code if you change your .csv format. Then we can use DictReader easily.
grades.csv:
name,hw1,hw2,Quiz3,hw4,Quiz2,Quiz1
john,87,98,76,67,90,56
marie,45,67,65,98,78,67
paul,54,64,93,28,83,98
fred,67,87,45,98,56,87
Code:
import numpy as np
from collections import defaultdict
import csv
result = defaultdict( list )
with open('grades.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
for k in row:
if k.startswith('Q'):
result[ row['name'] ].append( int(row[k]) )
for name, lst in result.items():
print name, np.mean( sorted(lst)[1:] )
Output:
paul 95.5
john 83.0
marie 72.5
fred 71.5

Python: How to write values to a csv file from another csv file

For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
import collections
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
out = np.zeros((len(data2),len(data1)))
for row in data2:
for ch_row in range(len(data1)):
if (row[3] == ch_row + 1):
out = row.tolist() + data1[ch_row].tolist()
print(out)
writer = csv.writer(open('dn.csv','w'), delimiter=',',quoting=csv.QUOTE_ALL)
writer.writerow(out)
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
If I do "print(out)", it comes out a correct answer. However, when I input "out" in the shell, there are only one row appears like [1.0, 1.0, 1.0, 1.0, 20.0, 30.0, 50.0]
What I need is to store all the values in the "out" variables and write them to the dn.csv file.
This ought to do the trick for you:
Code:
from csv import reader, writer
data = list(reader(open("filename.csv", "r"), delimiter=" "))
out = writer(open("output.csv", "w"), delimiter=" ")
for row in reader(open("index.csv", "r"), delimiter=" "):
out.writerow(row + data[int(row[3])])
index.csv:
0 0 0 1
0 0 0 2
0 0 0 3
filename.csv:
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
This produces the output:
0 0 0 1 70 60 45
0 0 0 2 35 26 77
0 0 0 3 93 37 68
Note: There's no need to use numpy here. The stadard library csv module will do most of the work for you.
I also had to modify your sample datasets a bit as what you showed had indexes out of bounds of the sample data in filename.csv.
Please also note that Python (like most languages) uses 0th indexes. So you may have to fiddle with the above code to exactly fit your needs.
with open('dn.csv','w') as f:
writer = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
idx = row[3]
out = [idx] + [x for x in data1[idx-1]]
writer.writerow(out)

Categories