Finding Max value in a column of csv file PYTHON - python

I'm fairly new to coding, and I'm stuck on a current project. I have a .csv file, and rather than use a spreadsheet, I'm trying to write a python program to find the maximum value of a specific column. So far I have the following:
import csv
with open('american_colleges__universities_1993.csv', 'rU') as f:
reader = csv.reader(f)
answer = 0
for column in reader :
answer = 0
for i in column[14]:
if i>answer:
answer = i
print answer
I keep getting something like:
9
The problem is that this is only returning the largest integer (which happens to be 9), when it should be returning something like 15,000. I suspect the program is only looking at each digit as its own value... How can I get it to look at the entire number in each entry?
Sorry for the newb question. Thanks!

Currently you are comparing each character in column[14] and setting the lexically maximum character to answer. Assuming you want to arithmetically compare the whole of column[14] you will need to replace the comma with '' and convert to int, e.g.:
with open('american_colleges__universities_1993.csv', 'rU') as f:
reader = csv.reader(f)
next(reader) # Skip header row
answer = max(int(column[14].replace(',', '')) for column in reader)
print answer
If you need the whole row that has the maximum column[14] you could alternatively use the key argument to max:
answer = max(reader, key=lambda column: int(column[14].replace(',','')))
print answer

import csv
with open('american_colleges__universities_1993.csv', 'r') as f:
reader = csv.reader(f)
maxnum = max(reader, key=lambda row: int(row[14]))
print(maxnum)
This should do the work for you.

Related

IndexError: tuple index out of range in showing columns of CSV

Guys i am new in python and i dont know how to solve the problem. Thanks for the help guys.
import csv
with open("ict.csv", 'r') as csvFile:
csvRead = csv.reader(csvFile)
print(csvRead)
# for line in csvRead :
# print(line)
header = csvFile.readline().strip().split(',')
print(header)
entries = []
for line in csvFile:
parts = line.strip().split(',')
row = dict()
for i, h in enumerate(header):
row[h] = parts[i]
# print(row)
entries.append(row)
entries.sort(key= lambda r: r['Gen. Ave.'])
for e in entries [:12]:
print('{0}Student No.,Gen. Ave. {10:,}'.format(
e['Student No.'],e['Gen. Ave.']
))
Student No. | Gen. Ave. | Program
1 | 90.5 | CS
The problem, as pointed out in the comments, is that one of your format specifiers - {10:,} - is wrong. The initial 10 is telling Python to use the 10th argument provided to format, but you have only provided two, hence the IndexError.
You actually want to provide the second element of the tuple, at index 1, so change to {10:,} to {1:,}. Also, the comma (,) operator in the format string - telling the formatter to use a comma as the thousand separator - can only be used on numeric inputs. The value of entries['Gen. Ave.'] is a string, because it's been read from a file, so you need to convert it to a number. This code should work:
for e in entries [:12]:
print('{0}Student No.,Gen. Ave. {1:,}'.format(
e['Student No.'], int(e['Gen. Ave.'])
))
However the position specifiers in your format strings can be removed completely, because Python will use apply the arguments to format in the order that they are written, so you can have:
for e in entries [:12]:
print('{}Student No.,Gen. Ave. {:,}'.format(
e['Student No.'], int(e['Gen. Ave.'])
))
Finally, you can avoid manually building dicts for each row in your csv by using the csv module's DictReader class, which will create a dict for each row as it's read, leaving your code looking like this:
with open("ict.csv", 'r') as csvFile:
csvRead = csv.DictReader(csvFile)
entries = []
for line in csvRead:
entries.append(line)
entries.sort(key=lambda r: r['Gen. Ave.'])
for e in entries[:12]:
print('{}Student No.,Gen. Ave. {:,}'.format(e['Student No.'], int(e['Gen. Ave.'])))
First of all, you are not using the csvRead instance. You should be reading from it instead of the csvFile.
Example:
I have the following something.csv CSV file:
76.94,76.944,76.945
76.97,76.979,76.980
77.025,77.025,77.025
77.063,77.264,77.064
77.1,77.64,77.3
Now if I do:
import csv
pf = open("something.csv", "r")
read = csv.reader(pf)
for r in read:
print(r)
pf.close()
You will get the following output:
python your_script.py
['76.94', '76.944', '76.945']
['76.97', '76.979', '76.980']
['77.025', '77.025', '77.025']
['77.063', '77.264', '77.064']
['77.1', '77.64', '77.3']

Python reading in integers from a csv file into a list

I am having some trouble trying to read a particular column in a csv file into a list in Python. Below is an example of my csv file:
Col 1 Col 2
1,000,000 1
500,000 2
250,000 3
Basically I am wanting to add column 1 into a list as integer values and am having a lot of trouble doing so. I have tried:
for row in csv.reader(csvfile):
list = [int(row.split(',')[0]) for row in csvfile]
However, I get a ValueError that says "invalid literal for int() with base 10: '"1'
I then tried:
for row in csv.reader(csvfile):
list = [(row.split(',')[0]) for row in csvfile]
This time I don't get an error however, I get the list:
['"1', '"500', '"250']
I have also tried changing the delimiter:
for row in csv.reader(csvfile):
list = [(row.split(' ')[0]) for row in csvfile]
This almost gives me the desired list however, the list includes the second column as well as, "\n" after each value:
['"1,000,000", 1\n', etc...]
If anyone could help me fix this it would be greatly appreciated!
Cheers
You should choose your delimiter wisely :
If you have floating numbers using ., use , delimiter, or if you use , for floating numbers, use ; as delimiter.
Moreover, as referred by the doc for csv.reader you can use the delimiter= argument to define your delimiter, like so:
with open('myfile.csv', 'r') as csvfile:
mylist = []
for row in csv.reader(csvfile, delimiter=';'):
mylist.append(row[0]) # careful here with [0]
or short version:
with open('myfile.csv', 'r') as csvfile:
mylist = [row[0] for row in csv.reader(csvfile, delimiter=';')]
To parse your number to a float, you will have to do
float(row[0].replace(',', ''))
You can open the file and split at the space using regular expressions:
import re
file_data = [re.split('\s+', i.strip('\n')) for i in open('filename.csv')]
final_data = [int(i[0]) for i in file_data[1:]]
First of all, you must parse your data correctly. Because it's not, in fact, CSV (Comma-Separated Values) but rather TSV (Tab-Separated) of which you should inform CSV reader (I'm assuming it's tab but you can theoretically use any whitespace with a few tweaks):
for row in csv.reader(csvfile, delimiter="\t"):
Second of all, you should strip your integer values of any commas as they don't add new information. After that, they can be easily parsed with int():
int(row[0].replace(',', ''))
Third of all, you really really should not iterate the same list twice. Either use a list comprehension or normal for loop, not both at the same time with the same variable. For example, with list comprehension:
csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
next(reader, None) # skip the header
lst = [int(row[0].replace(',', '')) for row in reader]
Or with normal iteration:
csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
lst = []
for i, row in enumerate(reader):
if i == 0:
continue # your custom header-handling code here
lst.append(int(row[0].replace(',', '')))
In both cases, lst is set to [1000000, 500000, 250000] as it should. Enjoy.
By the way, using reserved keyword list as a variable is an extremely bad idea.
UPDATE. There's one more option that I find interesting. Instead of setting the delimiter explicitly you can use csv.Sniffer to detect it e.g.:
csvdata = "Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n"
csvfile = StringIO(csvdata)
dialect = csv.Sniffer().sniff(csvdata)
reader = csv.reader(csvfile, dialect=dialect)
and then just like the snippets above. This will continue working even if you replace tabs with semicolons or commas (would require quotes around your weird integers) or, possibly, something else.

Row in Excel to array?

I have lots of data in an Excel spreadsheet that I need to import using Python. i need each row to be read as an array so I can call on the first data point in a specified row, the second, the third, and so on.
This is my code so far:
from array import *
import csv
with open ('vals.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
reader_x = []
reader_y = []
reader_z = []
row = next(reader)
reader_x.append(row)
row = next(reader)
reader_y.append(row)
row = next(reader)
reader_z.append(row)
print reader_x
print reader_y
print reader_z
print reader_x[0]
It is definitely storing it as an array I think. But I think it is storing the entire row of Excel as a string instead of each block being a separate data point, because when I tell Python to print an entire array it looks something like this (a shortened version because there's like a thousand in each row):
[['13,14,12']]
And when I tell it to print reader_x[0] (or any of the other two for that matter) it looks like this:
['13,14,12']
But when I tell it to print anything beyond the 0th thing in the array, it just gives me an error because it's out of range.
How can I fix this? How can I make it [13,14,12] instead of ['13,14,12'] so I can actually use these numbers in calculation? (I want to avoid downloading any more libraries if I can because this is for a school thing and I need to avoid that if possible.)
I have been stuck on this for several days and nothing I can find has worked for me and half of it I didn't even understand. Please try to explain simply if you can, as if you're talking to someone who doesn't even know how to print "Hello World".
You can use split to do this and use , as a separator.
For example:
row = '11,12,13'
row = row.split(',')
It is a csv, (comma separated values) try setting delimiter to ','
You don't need from array import * ... What the rest of the world calls an array is called a list in Python. The Python array is rather specialised and you are not actually using it so just delete that line of code.
As others have pointed out, you need incoming lines to be split. The csv default delimiter is a comma. Just let csv.reader do the job, something like this:
reader = csv.reader(csvfile)
data = [map(int, row) for row in reader]

Copying CSV File when integer exceeds maximum

If this input exists in a specific row, for example DD in row WORD4 (row 3), the program will then ask them to enter an integer and if this is over a certain number it will write it including the line.
Something like so:
a0,a1,a2,a3,a4
JA,BV,PA,DD,6
The error received I did receive was:
TypeError: writerows() takes exactly one argument (2 given)
And
TypeError: can only concatenate list (not "str") to list
Thanks to Joel Johnson and Stevieb for the solution to this problem!
The solution is as followed, Thanks Joel Johnson:
First, you need to use with open('CSVFile2.csv', 'a') as f: to write anything to the file(if you want to keep any content already in CSVFile2.csv or use 'w' if you want to overwrite it).
Second, since you are only trying to write one row with format
['JA',BV','PA','DD','6'] use writer.writerow() instead of writer.writerows() else you will end up with J,A,B,V,P,A,D,D,6 as your output.
Third, simply append integer_input to row before passing it to writer.writerow() also note that it needs to be in str() format
If you have any other questions I would refer you to the docs here
example:
with open('CSVFile1.csv', "rb") as csvfile:
a = csv.reader(csvfile, delimiter=',')
for row in a:
if user_input in row[3] and integer_input>5:
with open('CSVFile2.csv', 'a') as f:
new_row = row
new_row.append(str(integer_input))
writer = csv.writer(f)
writer.writerow(new_row)
f.close()
writerows() does only take one parameter. The below code appends the row[3] to the row, then the entire row is sent to writerow() as its only parameter. I've also moved the writer file to outside of the loop, otherwise if more than one match occurs, you'd be overwriting it on each iteration.
with open('CSVFile1.csv', 'rb') as csvfile:
fh = csv.reader(csvfile)
wfh = open('CSVFile2.csv', 'ab')
for row in fh:
if user_input in row[3] and int(integer_input) > 5:
row.append(integer_input)
writer = csv.writer(wfh)
writer.writerow(row)
wfh.close()

replace blank values in column in csv with python

I am trying to replace blank values in a certain column (column 6 'Author' for example) with "DMD" in CSV using Python. I am fairly new to the program, so a lot of the lingo throws me. I have read through the CSV Python documentation but there doesn't seem to be anything that is specific to my question. Here is what I have so far. It doesn't run. I get the error 'dict' object has no attribute replace. It seems like there should be something similar to replace in the dict. Also, I am not entirely sure my method to search the field is accurate. Any guidance would be appreciated.
import csv
inputFileName = "C:\Author.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_edited.csv"
field = ['Author']
with open(inputFileName) as infile, open(outputFileName, "w") as outfile:
r = csv.DictReader(infile)
w = csv.DictWriter(outfile, field)
w.writeheader()
for row in r:
row.replace(" ","DMD")
w.writerow(row)
I think you're pretty close. You need to pass the fieldnames to the writer and then you can edit the row directly, because it's simply a dictionary. For example:
with open(inputFileName, "rb") as infile, open(outputFileName, "wb") as outfile:
r = csv.DictReader(infile)
w = csv.DictWriter(outfile, r.fieldnames)
w.writeheader()
for row in r:
if not row["Author"].strip():
row["Author"] = "DMD"
w.writerow(row)
turns
a,b,c,d,e,Author,g,h
1,2,3,4,5,Smith,6,7
8,9,10,11,12,Jones,13,14
13,14,15,16,17,,18,19
into
a,b,c,d,e,Author,g,h
1,2,3,4,5,Smith,6,7
8,9,10,11,12,Jones,13,14
13,14,15,16,17,DMD,18,19
I like using if not somestring.strip(): because that way it won't matter if there are no spaces, or one, or seventeen and a tab. I also prefer DictReader to the standard reader because this way you don't have to remember which column Author is living in.
[PS: The above assumes Python 2, not 3.]
Dictionaries don't need the replace method because simple assignment does this for you:
for row in r:
if row[header-6] == "":
row[header-6] = "DMD"
w.writerow(row)
Where header-6 is the name of your sixth column
Also note that your call to DictReader appears to have the wrong fields attribute. That argument should be a list (or other sequence) containing all the headers of your new CSV, in order.
For your purposes, it appears to be simpler to use the vanilla reader:
import csv
inputFileName = "C:\Author.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_edited.csv"
with open(inputFileName) as infile, open(outputFileName, "w") as outfile:
r = csv.reader(infile)
w = csv.writer(outfile)
w.writerow(next(r)) # Writes the header unchanged
for row in r:
if row[5] == "":
row[5] = "DMD"
w.writerow(row)
(1) to use os.path.splitest, you need to add an import os
(2) Dicts don't have a replace method; dicts aren't strings. If you're trying to alter a string that's the value of a dict entry, you need to reference that dict entry by key, e.g. row['Author']. If row['Author'] is a string (should be in your case), you can do a replace on that. Sounds like you need an intro to Python dictionaries, see for example http://www.sthurlow.com/python/lesson06/ .
(3) A way to do this, that also deals with multiple spaces, no spaces etc. in the field, would look like this:
field = 'Author'
marker = 'DMD'
....
## longhand version
candidate = str(row[field]).strip()
if candidate:
row[field] = candidate
else:
row[field] = marker
or
## shorthand version
row[field] = str(row[field]).strip() and str(row[field]) or marker
Cheers
with open('your file', 'r+') as f2:
txt=f2.read().replace('#','').replace("'",'').replace('"','').replace('&','')
f2.seek(0)
f2.write(txt)
f2.truncate()
Keep it simple and replace your choice of characters.

Categories