IndexError: tuple index out of range in showing columns of CSV - python

Guys i am new in python and i dont know how to solve the problem. Thanks for the help guys.
import csv
with open("ict.csv", 'r') as csvFile:
csvRead = csv.reader(csvFile)
print(csvRead)
# for line in csvRead :
# print(line)
header = csvFile.readline().strip().split(',')
print(header)
entries = []
for line in csvFile:
parts = line.strip().split(',')
row = dict()
for i, h in enumerate(header):
row[h] = parts[i]
# print(row)
entries.append(row)
entries.sort(key= lambda r: r['Gen. Ave.'])
for e in entries [:12]:
print('{0}Student No.,Gen. Ave. {10:,}'.format(
e['Student No.'],e['Gen. Ave.']
))
Student No. | Gen. Ave. | Program
1 | 90.5 | CS

The problem, as pointed out in the comments, is that one of your format specifiers - {10:,} - is wrong. The initial 10 is telling Python to use the 10th argument provided to format, but you have only provided two, hence the IndexError.
You actually want to provide the second element of the tuple, at index 1, so change to {10:,} to {1:,}. Also, the comma (,) operator in the format string - telling the formatter to use a comma as the thousand separator - can only be used on numeric inputs. The value of entries['Gen. Ave.'] is a string, because it's been read from a file, so you need to convert it to a number. This code should work:
for e in entries [:12]:
print('{0}Student No.,Gen. Ave. {1:,}'.format(
e['Student No.'], int(e['Gen. Ave.'])
))
However the position specifiers in your format strings can be removed completely, because Python will use apply the arguments to format in the order that they are written, so you can have:
for e in entries [:12]:
print('{}Student No.,Gen. Ave. {:,}'.format(
e['Student No.'], int(e['Gen. Ave.'])
))
Finally, you can avoid manually building dicts for each row in your csv by using the csv module's DictReader class, which will create a dict for each row as it's read, leaving your code looking like this:
with open("ict.csv", 'r') as csvFile:
csvRead = csv.DictReader(csvFile)
entries = []
for line in csvRead:
entries.append(line)
entries.sort(key=lambda r: r['Gen. Ave.'])
for e in entries[:12]:
print('{}Student No.,Gen. Ave. {:,}'.format(e['Student No.'], int(e['Gen. Ave.'])))

First of all, you are not using the csvRead instance. You should be reading from it instead of the csvFile.
Example:
I have the following something.csv CSV file:
76.94,76.944,76.945
76.97,76.979,76.980
77.025,77.025,77.025
77.063,77.264,77.064
77.1,77.64,77.3
Now if I do:
import csv
pf = open("something.csv", "r")
read = csv.reader(pf)
for r in read:
print(r)
pf.close()
You will get the following output:
python your_script.py
['76.94', '76.944', '76.945']
['76.97', '76.979', '76.980']
['77.025', '77.025', '77.025']
['77.063', '77.264', '77.064']
['77.1', '77.64', '77.3']

Related

Python reading in integers from a csv file into a list

I am having some trouble trying to read a particular column in a csv file into a list in Python. Below is an example of my csv file:
Col 1 Col 2
1,000,000 1
500,000 2
250,000 3
Basically I am wanting to add column 1 into a list as integer values and am having a lot of trouble doing so. I have tried:
for row in csv.reader(csvfile):
list = [int(row.split(',')[0]) for row in csvfile]
However, I get a ValueError that says "invalid literal for int() with base 10: '"1'
I then tried:
for row in csv.reader(csvfile):
list = [(row.split(',')[0]) for row in csvfile]
This time I don't get an error however, I get the list:
['"1', '"500', '"250']
I have also tried changing the delimiter:
for row in csv.reader(csvfile):
list = [(row.split(' ')[0]) for row in csvfile]
This almost gives me the desired list however, the list includes the second column as well as, "\n" after each value:
['"1,000,000", 1\n', etc...]
If anyone could help me fix this it would be greatly appreciated!
Cheers
You should choose your delimiter wisely :
If you have floating numbers using ., use , delimiter, or if you use , for floating numbers, use ; as delimiter.
Moreover, as referred by the doc for csv.reader you can use the delimiter= argument to define your delimiter, like so:
with open('myfile.csv', 'r') as csvfile:
mylist = []
for row in csv.reader(csvfile, delimiter=';'):
mylist.append(row[0]) # careful here with [0]
or short version:
with open('myfile.csv', 'r') as csvfile:
mylist = [row[0] for row in csv.reader(csvfile, delimiter=';')]
To parse your number to a float, you will have to do
float(row[0].replace(',', ''))
You can open the file and split at the space using regular expressions:
import re
file_data = [re.split('\s+', i.strip('\n')) for i in open('filename.csv')]
final_data = [int(i[0]) for i in file_data[1:]]
First of all, you must parse your data correctly. Because it's not, in fact, CSV (Comma-Separated Values) but rather TSV (Tab-Separated) of which you should inform CSV reader (I'm assuming it's tab but you can theoretically use any whitespace with a few tweaks):
for row in csv.reader(csvfile, delimiter="\t"):
Second of all, you should strip your integer values of any commas as they don't add new information. After that, they can be easily parsed with int():
int(row[0].replace(',', ''))
Third of all, you really really should not iterate the same list twice. Either use a list comprehension or normal for loop, not both at the same time with the same variable. For example, with list comprehension:
csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
next(reader, None) # skip the header
lst = [int(row[0].replace(',', '')) for row in reader]
Or with normal iteration:
csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
lst = []
for i, row in enumerate(reader):
if i == 0:
continue # your custom header-handling code here
lst.append(int(row[0].replace(',', '')))
In both cases, lst is set to [1000000, 500000, 250000] as it should. Enjoy.
By the way, using reserved keyword list as a variable is an extremely bad idea.
UPDATE. There's one more option that I find interesting. Instead of setting the delimiter explicitly you can use csv.Sniffer to detect it e.g.:
csvdata = "Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n"
csvfile = StringIO(csvdata)
dialect = csv.Sniffer().sniff(csvdata)
reader = csv.reader(csvfile, dialect=dialect)
and then just like the snippets above. This will continue working even if you replace tabs with semicolons or commas (would require quotes around your weird integers) or, possibly, something else.

Finding Max value in a column of csv file PYTHON

I'm fairly new to coding, and I'm stuck on a current project. I have a .csv file, and rather than use a spreadsheet, I'm trying to write a python program to find the maximum value of a specific column. So far I have the following:
import csv
with open('american_colleges__universities_1993.csv', 'rU') as f:
reader = csv.reader(f)
answer = 0
for column in reader :
answer = 0
for i in column[14]:
if i>answer:
answer = i
print answer
I keep getting something like:
9
The problem is that this is only returning the largest integer (which happens to be 9), when it should be returning something like 15,000. I suspect the program is only looking at each digit as its own value... How can I get it to look at the entire number in each entry?
Sorry for the newb question. Thanks!
Currently you are comparing each character in column[14] and setting the lexically maximum character to answer. Assuming you want to arithmetically compare the whole of column[14] you will need to replace the comma with '' and convert to int, e.g.:
with open('american_colleges__universities_1993.csv', 'rU') as f:
reader = csv.reader(f)
next(reader) # Skip header row
answer = max(int(column[14].replace(',', '')) for column in reader)
print answer
If you need the whole row that has the maximum column[14] you could alternatively use the key argument to max:
answer = max(reader, key=lambda column: int(column[14].replace(',','')))
print answer
import csv
with open('american_colleges__universities_1993.csv', 'r') as f:
reader = csv.reader(f)
maxnum = max(reader, key=lambda row: int(row[14]))
print(maxnum)
This should do the work for you.

Remove double quotes from iterator when using csv writer

I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?
I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])
writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.

csv.writer writing each character of word in separate column/cell

Objective: To extract the text from the anchor tag inside all lines in models and put it in a csv.
I'm trying this code:
with open('Sprint_data.csv', 'ab') as csvfile:
spamwriter = csv.writer(csvfile)
models = soup.find_all('li' , {"class" : "phoneListing"})
for model in models:
model_name = unicode(u' '.join(model.a.stripped_strings)).encode('utf8').strip()
spamwriter.writerow(unicode(u' '.join(model.a.stripped_strings)).encode('utf8').strip())
It's working fine except each cell in the csv contains only one character.
Like this:
| S | A | M | S | U | N | G |
Instead of:
|SAMSUNG|
Of course I'm missing something. But what?
.writerow() requires a sequence ('', (), []) and places each index in it's own column of the row, sequentially. If your desired string is not an item in a sequence, writerow() will iterate over each letter in your string and each will be written to your CSV in a separate cell.
after you import csv
If this is your list:
myList = ['Diamond', 'Sierra', 'Crystal', 'Bridget', 'Chastity', 'Jasmyn', 'Misty', 'Angel', 'Dakota', 'Asia', 'Desiree', 'Monique', 'Tatiana']
listFile = open('Names.csv', 'wb')
writer = csv.writer(listFile)
for item in myList:
writer.writerow(item)
The above script will produce the following CSV:
Names.csv
D,i,a,m,o,n,d
S,i,e,r,r,a
C,r,y,s,t,a,l
B,r,i,d,g,e,t
C,h,a,s,t,i,t,y
J,a,s,m,y,n
M,i,s,t,y
A,n,g,e,l
D,a,k,o,t,a
A,s,i,a
D,e,s,i,r,e,e
M,o,n,i,q,u,e
T,a,t,i,a,n,a
If you want each name in it's own cell, the solution is to simply place your string (item) in a sequence. Here I use square brackets []. :
listFile2 = open('Names2.csv', 'wb')
writer2 = csv.writer(listFile2)
for item in myList:
writer2.writerow([item])
The script with .writerow([item]) produces the desired results:
Names2.csv
Diamond
Sierra
Crystal
Bridget
Chastity
Jasmyn
Misty
Angel
Dakota
Asia
Desiree
Monique
Tatiana
writerow accepts a sequence. You're giving it a single string, so it's treating that as a sequence, and strings act like sequences of characters.
What else do you want in this row? Nothing? If so, make it a list of one item:
spamwriter.writerow([u' '.join(model.a.stripped_strings).encode('utf8').strip()])
(By the way, the unicode() call is completely unnecessary since you're already joining with a unicode delimiter.)
This is usually the solution I use:
import csv
with open("output.csv", 'w', newline= '') as output:
wr = csv.writer(output, dialect='excel')
for element in list_of_things:
wr.writerow([element])
output.close()
This should provide you with an output of all your list elements in a single column rather than a single row.
Key points here is to iterate over the list and use '[list]' to avoid the csvwriter sequencing issues.
Hope this is of use!
Just surround it with a list sign (i.e [])
writer.writerow([str(one_column_value)])

Python comparing strings

I have a code where in I first convert a .csv file into multiple lists and then I have to create a subset of the original file containing only those with a particular word in column 5 of my file.
I am trying to use the following code to do so, but it gives me a syntax error for the if statement. Can anyone tell me how to fix this?
import csv
with open('/Users/jadhav/Documents/Hubble files/m4_hubble_1.csv') as f:
bl = [[],[],[],[],[]]
reader = csv.reader(f)
for r in reader:
for c in range(5):
bl[c].append(r[c])
print "The files have now been sorted into lists"
name = 'HST_10775_64_ACS_WFC_F814W_F606W'
for c in xrange(0,1):
if bl[4][c]!='HST_10775_64_ACS_WFC_F814W_F606W'
print bl[0][c]
You need a colon after your if test, and you need to indent the if taken clause:
if bl[4][c]!='HST_10775_64_ACS_WFC_F814W_F606W':
print bl[0][c]

Categories