Finding SUM, AVG, and DIFF of columns using Python with a CSV

Finding SUM, AVG, and DIFF of columns using Python with a CSV - python

Working with a CSV on Python 3.8 that goes something like:
Column_0>>>>>>>>>Column_1>>>>>>Column_2>>>>Column_3>>>>>Column_4
Some_Numbers0>>>Some_String1>>>Some_String2>Some_Numbers3>>Some_Numbers4
Now, the numbers in Column_3 and Column_4 are what need to be SUM, AVG, and finding the differences of their totals.
I'm currently stuck on trying to get both sums to print. This is how far i've got:
import csv
import decimal
with open("sample.csv") as myFile:
reader = csv.DictReader(myFile)
print(sum(float(line["Column_3"]) for line in reader))
print(sum(float(line["Column_4"]) for line in reader))
Using this, Column_3's total prints but Column_4 I get a "0". Remove prin line for Column_3, then I get Column_4's total just fine. I've also tried:
import csv
import decimal
with open("sample.csv") as myFile:
total = 0
for line in csv.DictReader(myFile):
total += int(line["Column_3"])
print(total)
but i get
Traceback (most recent call last):
File "some file pathway", line 7, in <module>
total += int(line["Column_3"])
ValueError: invalid literal for int() with base 10: '1345.67'
Which that number represents the first number value of that column_3.
I'm stumped. Any help is appreciated. I'm sure I'll be returning with questions on finding the AVG and then using their totals to find their differences, all need to print running from the same program but here I am already stuck.

your reader object can only go through the CSV file once, because you go through the list with col 3 its wont print for col 4 because there's nothing left to read. Your second approach would be fine, just replace int() with float() because your working with decimals

Related

How to read through binary (.dat) file and return greatest int value python

I am having trouble working with the integers I loop through and print out in the binary file.
I have a main program that creates a binary file, writes x amount of random integers to the file, then closes the file.
*Throughout these code snippets, I import dump and load from pickle
from pickle import dump
from random import randint
output_file = open('file.dat', 'wb')
# 10 random integers
for i in range(10):
dump(randint(1, 100), output_file)
output_file.close()
I have created a program that will open this file, unpickle each integer and print them out. However, now I also want to work with these numbers: max, min, sum, etc. When I try to produce code that (I thought) would do this, I am getting:
33 Traceback (most recent call last):
File "binary_int_practice.py", line 13, in <module>
for i in load(input_file):
TypeError: 'int' object is not iterable
My code is below:
input_file = open('file.dat', 'rb')
print("Here are the integers:")
while True:
try:
i = load(input_file)
print(i, end=' ')
big = 0
for i in load(input_file):
if i > big:
big = i
print('The max number in the file is: ', big)
except EOFError:
input_file.close()
break
Can someone explain or help me understand where I am going wrong?
Thanks

load returns the next value read from the file; in your case, each value read is an int (just as you wrote them). It does not return an iterable that you can loop over.
So you'll have to get each number with its own call to load.

you have to use a list, fill it and add it to the file using "dump". because at each iteration the "randint" number changes in the file.
here is the code that works well
from pickle import dump
from random import randint
output_file = open('file.dat', 'wb')
# 10 random integers
data = []
for i in range(10):
data.append(randint(1, 100))
dump(data, output_file)
output_file.close()

ValueError in Python 3 code

I have this code that will allow me to count the number of missing rows of numbers within the csv for a script in Python 3.6. However, these are the following errors in the program:
Error:
Traceback (most recent call last):
File "C:\Users\GapReport.py", line 14, in <module>
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
File "C:\Users\GapReport.py", line 14, in <genexpr>
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
ValueError: invalid literal for int() with base 10: 'AC-SEC 000000001'
Code:
import csv
def out(*args):
print('{},{}'.format(*(str(i).rjust(4, "0") for i in args)))
prev = 0
data = csv.reader(open('Padded Numbers_export.csv'))
print(*next(data), sep=', ') # header
for line in data:
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
if start != prev+1:
out(prev+1, start-1)
prev = end
out(start, end)
I'm stumped on how to fix these issues.Also, I think the csv many lines in it, so if there's a section that limits it to a few numbers, please feel free to update me on so.
CSV Snippet (Sorry if I wasn't clear before!):

The values you have in your CSV file are not numeric.
For example, FMAC-SEC 000000001 is not a number. So when you run int(s.strip()[2:]), it is not able to convert it to an int.
Some more comments on the code:
What is the utility of doing EndDoc_Padded, EndDoc_Padded = (...)? Currently you are assigning values to two variables with the same name. Either name one of them something else, or just have one variable there.
Are you trying to get the two different values from each column? In that case, you need to split line into two first. Are the contents of your file comma separated? If yes, then do for s in line.split(','), otherwise use the appropriate separator value in split().
You are running this inside a loop, so each time the values of the two variables would get updated to the values from the last line. If you're trying to obtain 2 lists of all the values, then this won't work.

Finding the average of three numbers that are in a CSV file

I have created a piece of coding however I have begun to try to find the average of each persons score, but do not know what else to do. The code does not work:
def average():#makes function 'average'
print ("\nThe Average Score")#outputs the title 'The Average Score'
for pupils in classScore:
pupil["total"] = (int(pupil["Pupil's Score 1"])+int(pupil["Pupil's Score 2"])+int(pupil["Pupil's Score 3"]))
pupil["average"] = (pupil["total"]//3)
print (pupil["Pupil's Name"]+pupil["average"])
average()
The CSV file is laid out like this:
Pupil's Name Pupil's Score 1 Pupil's Score 2 Pupil's Score 3
Joao 10 9 8
Rebecca 7 6 5
Snuffles 0 1 2
The error message that appeared was:
Traceback (most recent call last):
File "E:/Controlled Assesment Computing/Controlled Assesment/Task 3/Try 18.py", line 56, in <module>
average()
File "E:/Controlled Assesment Computing/Controlled Assesment/Task 3/Try 18.py", line 53, in average
print (pupil["Pupil's Name"]+pupil["average"])
TypeError: Can't convert 'int' object to str implicitly
If anyone could help it would be much appreciated.

The message looks clear:
TypeError: Can't convert 'int' object to str implicitly
You must have to do an operation to turn a number into a string. Try this:
print(pupil["Pupil's Name"]+str(pupil["average"]))
You'd done yourself a disservice by limiting this method to three students. You could easily make it work for any number of students.
I would advice against printing from that average method. A method should do one thing well.

Python: Compute Difference between Datetime Objects in a Loop

Using Python 2.7, I am trying to create an array of relative time, where relative time is the difference between the middle of a thunderstorm and some other time during that storm. Ultimately, I am hoping that my relative time array will be of the format -minutes, 0, and +minutes (where -minutes are minutes before the middle of the storm, 0 is the middle of the storm, and +minutes are minutes after the middle of the storm). I figured a loop was the most efficient way to do this. I already have a 1-D array, MDAdatetime, filled with the other storm times I mentioned, as strings. I specified the middle of the storm at the beginning of my code, and it is a string, as well.
So far, my code is as follows:
import csv
import datetime
casedate = '06052009'
MDAfile = "/atmomounts/home/grad/mserino/Desktop/"+casedate+"MDA.csv"
stormmidpoint = '200906052226' #midpoint in YYYYMMDDhhmm
relativetime = []
MDAlons = []
MDAlats = []
MDAdatetime = []
with open(MDAfile, 'rU') as f: #open to read in universal-newline mode
reader = csv.reader(f, dialect=csv.excel, delimiter=',')
for i,row in enumerate(reader):
if i == 0:
continue #skip header row
MDAlons.append(float(row[1]))
MDAlats.append(float(row[2]))
MDAdatetime.append(str(row[0])) #this is the array I'm dealing with now in the section below; each string is of the format YYYYMMDDhhmmss
## This is the section I'm having trouble with ##
for j in range(len(MDAdatetime)):
reltime = datetime.datetime.strptime(MDAdatetime[j],'%YYYY%mm%dd%HH%MM%SS') - datetime.datetime(stormmidpoint,'%YYYY%mm%dd%HH%MM')
retime.strftime('%MM') #convert the result to minutes
reativetime.append(reltime)
print relativetime
So far, I have been getting the error:
ValueError: time data '20090605212523' does not match format '%YYYY%mm%dd%HH%MM%SS'
I am trying to learn as much as I can about the datetime module. I have seen some other posts and resources mention dateutil, but it seems that datetime will be the most useful for me. I could be wrong, though, and I appreciate any advice and help. Please let me know if I need to clarify anything or provide more information.

%Y matches 2016. Not %YYYY, similarly for month, date, etc..
So, your format matcher should be %Y%m%d%H%M%S
Something like this:
datetime.datetime.strptime("20090605212523", "%Y%m%d%H%M%S")
Demo
>>> datetime.datetime.strptime("20090605212523", "%YYYY%mm%dd%HH%MM%SS")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '20090605212523' does not match format '%YYYY%mm%dd%HH%MM%SS'
>>> datetime.datetime.strptime("20090605212523", "%Y%m%d%H%M%S")
datetime.datetime(2009, 6, 5, 21, 25, 23)

After #karthikr pointed out that my strptime formatting was incorrect, I was able to successfully get an answer in minutes. There may be a better way, but I converted to minutes by hand in the code. I also changed my stormmidpoint variable to a string, as it should be. #karthikr was also right about that. Thanks for the help!
for j in range(len(MDAdatetime)):
reltime = datetime.datetime.strptime(MDAdatetime[j],'%Y%m%d%H%M%S') - datetime.datetime.strptime(stormmidpoint,'%Y%m%d%H%M%S')
relativetime.append(int(math.ceil(((reltime.seconds/60.0) + (reltime.days*1440.0))))) #convert to integer minutes
print relativetime

why this python code is producing Runtime Error in ideone?

import sys
def func():
T = int(next(sys.stdin))
for i in range(0,T):
N = int(next(sys.stdin))
print (N)
func()
Here I am taking input T for for loop and iterating over T it gives Runtime error time: 0.1 memory: 10088 signal:-1 again-again . I have tried using sys.stdin.readline() it also giving same error .

I looked at your code at http://ideone.com/8U5zTQ . at the code itself looks fine, but your input can't be processed.
Because it is:
5 24 2
which will be the string:
"5 24 2"
this is not nearly an int, even if you try to cast it. So you could transform it to the a list with:
inputlist = next(sys.stdin[:-2]).split(" ")
to get the integers in a list that you are putting in one line. The loop over that.
After that the code would still be in loop because it want 2 integers more but at least you get some output.
Since I am not totally shure what you try to achieve, you could now iterate over that list and print your inputs:
inputlist = next(sys.stdin[:-2]).split(" ")
for i in inputlist
print(i)
Another solution would be, you just put one number per line in, that would work also
so instead of
5 24 2
you put in
5
24
2
Further Advice
on Ideone you also have an Error Traceback at the bottom auf the page:
Traceback (most recent call last):
File "./prog.py", line 8, in <module>
File "./prog.py", line 3, in func
ValueError: invalid literal for int() with base 10: '1 5 24 2\n'
which showed you that it can't handle your input

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding SUM, AVG, and DIFF of columns using Python with a CSV - python

your reader object can only go through the CSV file once, because you go through the list with col 3 its wont print for col 4 because there's nothing left to read. Your second approach would be fine, just replace int() with float() because your working with decimals

Related

How to read through binary (.dat) file and return greatest int value python

ValueError in Python 3 code

Finding the average of three numbers that are in a CSV file

Python: Compute Difference between Datetime Objects in a Loop

why this python code is producing Runtime Error in ideone?

Categories

Resources