Still new to Python, this is how far I've managed to get:
import csv
import sys
import os.path
#VARIABLES
reader = None
col_header = None
total_rows = None
rows = None
#METHODS
def read_csv(csv_file):
#Read and display CSV file w/ HEADERS
global reader, col_header, total_rows, rows
#Open assign dictionaries to reader
with open(csv_file, newline='') as csv_file:
#restval = blank columns = - /// restkey = extra columns +
reader = csv.DictReader(csv_file, fieldnames=None, restkey='+', restval='-', delimiter=',',
quotechar='"')
try:
col_header = reader.fieldnames
print('The headers: ' + str(reader.fieldnames))
for row in reader:
print(row)
#Calculate number of rows
rows = list(reader)
total_rows = len(rows)
except csv.Error as e:
sys.exit('file {}, line {}: {}'.format(csv_file, reader.line_num, e))
def calc_total_rows():
print('\nTotal number of rows: ' + str(total_rows))
My issue is that, when I attempt to count the number of rows, it comes up as 0 (impossible because csv_file contains 4 rows and they print on screen.
I've placed the '#Calculate number of rows' code above my print row loop and it works, however the rows then don't print. It's as if each task is stealing the dictionary from one another? How do I solve this?
The problem is that the reader object behaves like a file as its iterating through the CSV. Firstly you iterate through in the for loop, and print each row. Then you try to create a list from whats left - which is now empty as you've iterated through the whole file. The length of this empty list is 0.
Try this instead:
rows = list(reader)
for row in rows:
print(row)
total_rows = len(rows)
Related
Im attempting to simplify a python code that will print the first five rows (plus header) of a large csv file in a more condensed output if possible. I would prefer to use pandas, however in this case I would like to just to just use the import cv and import os (Mac user).
Code as follows:
import csv
filename = "/Users/xx/Desktop/xx.csv"
fields = []
rows = []
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile)
fields = next(csvreader)
for row in csvreader:
rows.append(row)
print("Total no. of rows:%d"%(csvreader.line_num))
print('Field names are:' + ', '.join(field for field in fields))
print('\nFirst 5 rows are:\n')
for row in rows[:5]:
for col in row:
print("%10s"%col,end=" "),
print('\n')
I have this following code
with open('data.csv') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
print(data)
row_count = row_count = sum(1 for lines in data)
print(row_count)
for row in data:
print(row)
It prints:
<_csv.reader object at 0x00000295CB6933C8>
505
So it prints data as an object and prints the row_count as 505. Just does not seem to print row in the for-loop. I am not sure why there is nothing being passed to the variable row?
This is particularly frustrating because if i get rid of row_count it works! Why?
data = csv.reader(csvfile, delimiter=' ')
print(data)
row_count = row_count = sum(1 for lines in data)
You just read the entire file; you've exhausted the input. There is nothing left for your second for to find. You have to reset the reader. The most obvious way is to close the file and reopen. Less obvious ... and less flexible ... is to reset the file pointer to the beginning, with
csvfile.seek(0)
This doesn't work for all file subtypes, but does work for CSV.
Even better, simply count the lines as you print them:
with open('data.csv') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
row_count = 0
for row in data:
print(row)
row_count += 1
print(row_count)
You consumed the rows from data already with your set comprehension:
with open('data.csv') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
print(data)
row_count = row_count = sum(1 for lines in data) # This consumes all of the rows
print(row_count)
for row in data: # no more rows at this point.
print(row) # doesn't run because there are no more rows left
You'll have to save all of the rows in memory or create a second CSV reader object if you want to print the count before printing each row.
It could be the best solution for you to make that data into a list and itterate through it to save yourself all the troubles if you have some memory to spend
with open('your_csv.csv','r') as f_:
reader=csv.reader(f_)
new_list=list(reader)
for row in new_list:
print(row) #or whatever else you want to do afterwards
So I have csv file with over 1m records:(https://i.imgur.com/rhIhy5u.png)
I need data to be arranged differently that "params" who repeats become column/row themselves for example category1, category2, category3 (there is over 20 categories and no repeats) but all the data maintain their relations.
I tried using "pandas" and "csv" in python but i am completly new to it and i never had anything to do with such a data.
import csv
with open('./data.csv', 'r') as _filehandler:
csv_file_reader = csv.reader(_filehandler)
param = [];
csv_file_reader = csv.DictReader(_filehandler)
for row in csv_file_reader:
if not row['Param'] in param:
param.append(row['Param']);
col = "";
for p in param:
col += str(p) + '; ';
print(col);
import numpy as np
np.savetxt('./SortedWexdord.csv', (parameters), delimiter=';', fmt='%s')
I tried to think about it but data is nor my forte, any ideas?
Here's something that should work. If you need more than one value per row normalized like this, you could edit line 9 (beginning category) to grab a list of values instead of just row[1].
import csv
data = {}
with open('data.csv', 'r') as file:
reader = csv.reader(file)
next(reader) # Skip header row
for row in reader:
category, value = row[0], row[1] # Assumes category is in column 0 and target value is in column 1
if category in data:
data[category].append(value)
else:
data[category] = [value] # New entry only for each unique category
with open('output.csv', 'wb') as file: # wb is write and binary, avoids double newlines on windows
writer = csv.writer(file)
writer.writerow(['Category', 'Value'])
for category in data:
print([category] + data[category])
writer.writerow([category] + data[category]) # Make a list starting with category and then listing each value
Why the unique[1] is never accessed in the second for???
unique is an array of strings.
import csv
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
for i in range(len(unique)):
# print unique[i] #prints all the items in the array
for row in reader:
print unique[i] # always prints the first item unique[0]
if row[1]==unique[i]:
print row[1], row[0] # prints only the unique[0] stuff
Thank you
I think it would be useful to go through the program flow.
First, it will assign i=0, then it will read the entire CSV file, printing unique[0] for each line in the CSV file, then after it finishes reading the CSV file, it will go to the second iteration, assigning i=1, and then since the program has finished reading the file, it won't enter for row in reader:, hence it exits the loop.
Further Clarification
The csv.reader(f) won't actually read the file until you do for row in reader, and after that it has nothing more to read. If you want to read the file multiple times, then read it into a list first beforehand, like this:
import csv
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
rows = [row for row in reader]
for i in range(len(unique)):
for row in rows:
print unique[i]
if row[1]==unique[i]:
print row[1], row[0]
I think you might have better luck if you change your nested structure to:
import csv
res = {}
for x in unique:
res[x] = []
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
for i in range(len(unique)):
# print unique[i] #prints all the items in the array
if row[1]==unique[i]:
res[unique[i]].append([row[1],row[0]])
#print row[1], row[0] # prints only the unique[0] stuff
for x in unique:
print res[x]
I have data in a csv file e.g
1,2,3,4
4,5,6,7
what I want is to create an extra column that sums the first rows so that the result will look like.
1,2,3,4,10
4,5,6,7,22
And an extra row that sums the columns.
1,2,3,4,10
4,5,6,7,22
5,7,9,11,32
This is probably really basic but I could do with the help please?
#!/usr/bin/python
import sys
from itertools import imap, repeat
from operator import add
total = repeat(0) # See how to handle initialization without knowing the number of columns ?
for line in sys.stdin:
l = map(int, line.split(','))
l.append(sum(l))
print ','.join(map(str,l))
total = imap(add, total, l)
print ','.join(map(str, total))
I know, I'm treating Python like Haskell these days.
import csv
thefile = ["1,2,3,4","4,5,6,7"]
reader = csv.reader(thefile)
temp = []
final = []
# read your csv into a temporary array
for row in reader:
temp.append([int(item) for item in row])
# add a row for sums along the bottom
temp.append(final)
for item in temp[0]:
final.append(0)
for row in temp:
sum = 0
for index, item in enumerate(row):
sum += item #total the items in each row
temp[-1][index] = temp[-1][index] + item #add each item to the column sum
row.append(sum) #add the row sum
print temp
import sys
import csv
def is_number(s):
try:
float(s)
return True
except ValueError:
return False
with open(sys.argv[2], 'wb') as writefile:
writer = csv.writer(writefile, delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
with open(sys.argv[1], 'rb') as readfile:
reader = csv.reader(readfile, delimiter=',', quotechar='"')
for row in reader:
writer.writerow(row+[sum([float(r) for r in row if is_number(r)])])
How about some pythonic list comprehension:
import csv
in_file = ["1,2,3,4","4,5,6,7"]
in_reader = list(csv.reader(in_file))
row_sum = [ sum(map(int,row)) for row in in_reader]
col_sum = [sum(map(int,row)) for row in map(list, zip (*in_file)[::2])]
for (index,row_run) in enumerate([map(int,row) for row in in_reader]):
for data in row_run:
print str(data)+",",
print row_sum[index]
for data in col_sum:
print str(data)+",",
print str(sum(col_sum))
Let me know if you need anything else.