I'm trying to load the two coluns of my csv files into an array in python. However I am getting:
ValueError: could not convert string to float: ''.
I have attached the snippets of the code implemented and the csv file I'm trying to store in an array.
import csv
col1 = []
col2 = []
path = r'C:\Users\angel\OneDrive\Documents\CSV_FILES_NV_LAB\1111 x 30.csv'
with open(path, "r") as f_in:
reader = csv.reader(f_in)
next(reader) # skip headers
for line in reader:
col1.append(float(line[0]))
col2.append(float(line[1]))
print(col1)
print(col2)
What values are in the CSV file? If the values cannot be converted to floats, you will get the ValueError. For example, if your CSV file looks like this:
ColName,ColName2
abc,def
123,45.6
g,20
the error will be raised on the first iteration of your loop because abc cannot be converted to a float. If, however, all the values in the CSV file are numbers:
ColName, ColName2
1,2
123,45.6
100,20
the error will not be raised.
If you have some numeric and some non-numeric values in the CSV file, you can omit the lines containing non-numeric values by including a try...except block in your loop:
for line in reader:
try:
float_1, float_2 = float(line[0]), float(line[1])
# If either of the above conversions failed, the next two lines will not be reached
col1.append(float_1)
col2.append(float_2)
except ValueError:
continue # Move on to next line
Maybe you forgot to add .split(',')? Right now, line[0] and line[1] simply take the first and second character of the line.
I have a CSV file with 5 columns and many many rows of data.
I need to delete an entire row based on data in one column which I can do on my own.
My issue is I am unable to print the data back into the CSV format properly.
I'm importing the CSV like so.
data=open(datafile) #this datafile variable has the CSV path
parse=csv.DictReader(data)
newfile= open("validated.csv", "w", newline="")#I'd like to output my changes in this new file and leave the original CSV as is
output = csv.writer(newfile)
Based on what I've read my CSV in interpreted as many different dictionaries.
I've tried many different list, dictionary, and for loop combos but I just can't get it right.
def validate_profits(datafile):#this will remove non numeric profits rows so we can get a count of our useful dataset.
data=open(datafile) #Open then parse data.
parse=csv.DictReader(data)
newfile= open("validated.csv", "w", newline="")#New file for output.
output = csv.writer(newfile)
outputlist=[]
for rows in parse: #Looping through CSV files to check each profit column.
try:
(float(rows["Profit (in millions)"]))#This is the validation for the Profit column
outputlist.append(rows)
except ValueError:
pass
counter=0
while True:
try:
counter+=1
output.writerows([[outputlist[counter]]])#Output the numericaly valid rows to a new file.
except IndexError:
break
count_rows("validated.csv")
validate_profits ("data.csv")
If your sole job is "write only those rows with a valid numeric value in the Profit column", then it's just this:
def validate_profits(datafile):
#this will remove non numeric profits rows so we can get a count of our useful dataset.
data=open(datafile) #Open then parse data.
count = 0
parse=csv.DictReader(data)
newfile= open("validated.csv", "w", newline="")#New file for output.
output = csv.DictWriter(newfile, fieldnames=parse.fieldnames)
for row in parse:
try:
_ = float(rows["Profit (in millions)"])
output.writerow(row)
count += 1
except ValueError:
pass
return count
Many people would use a regular expression to test the contents of that field rather than relying on an exception from float, but this works.
I rewrote your function trying to fix some points and suggesting a somewhat more pythonic way to do what you want:
def validate_profits(datafile):
with open(datafile, 'r', encoding='utf-8')as f: #Open then parse data.
parsed=csv.DictReader(f)
with open("validated.csv", "w", encoding='utf-8'):
output = csv.writer(newfile)
outputlist=[]
for row in parsed:
try:
(float(row["Profit (in millions)"]))
outputlist.append(rows)
except ValueError:
pass
output.writerows(outputlist)
I am trying to read a csv file, and parse the data and return on row (start_date) only if the date is before September 6, 2010. Then print the corresponding values from row (words) in ascending order. I can accomplish the first half using the following:
import csv
with open('sample_data.csv', 'rb') as f:
read = csv.reader(f, delimiter =',')
for row in read:
if row[13] <= '1283774400':
print(row[13]+"\t \t"+row[16])
It returns the correct start_date range, and corresponding word column values, but they are not returning in ascending order which would display a message if done correctly.
I have tried to use the sort() and sorted() functions, after creating an empty list to populate then appending it to the rows, but I am just not sure where or how to incorporate that into the existing code, and have been terribly unsuccessful. Any help would be greatly appreciated.
Just read the list, filter the list according to the < date criteria and sort it according to the 13th row as integer
Note that the common mistake would be to filter as ASCII (which may appear to work), but integer conversion is relaly required to avoid sort problems.
import csv
with open('sample_data.csv', 'r') as f:
read = csv.reader(f, delimiter =',')
# csv has a title, we have to skip it (comment if no title)
title_row = next(read)
# read csv and filter out to keep only earlier rows
lines = filter(lambda row : int(row[13]) < 1283774400,read)
# sort the filtered list according to the 13th row, as numerical
slist = sorted(lines,key=lambda row : int(row[13]))
# print the result, including title line
for row in title_row+slist:
#print(row[13]+"\t \t"+row[16])
print(row)
I am trying to determine the type of data contained in each column of a .csv file so that I can make CREATE TABLE statements for MySQL. The program makes a list of all the column headers and then grabs the first row of data and determines each data type and appends it to the column header for proper syntax. For example:
ID Number Decimal Word
0 17 4.8 Joe
That would produce something like CREATE TABLE table_name (ID int, Number int, Decimal float, Word varchar());.
The problem is that in some of the .csv files the first row contains a NULL value that is read as an empty string and messes up this process. My goal is to then search each row until one is found that contains no NULL values and use that one when forming the statement. This is what I have done so far, except it sometimes still returns rows that contains empty strings:
def notNull(p): # where p is a .csv file that has been read in another function
tempCol = next(p)
tempRow = next(p)
col = tempCol[:-1]
row = tempRow[:-1]
if any('' in row for row in p):
tempRow = next(p)
row = tempRow[:-1]
else:
rowNN = row
return rowNN
Note: The .csv file reading is done in a different function, whereas this function simply uses the already read .csv file as input p. Also each row is ended with a , that is treated as an extra empty string so I slice the last value off of each row before checking it for empty strings.
Question: What is wrong with the function that I created that causes it to not always return a row without empty strings? I feel that it is because the loop is not repeating itself as necessary but I am not quite sure how to fix this issue.
I cannot really decipher your code. This is what I would do to only get rows without the empty string.
import csv
def g(name):
with open('file.csv', 'r') as f:
r = csv.reader(f)
# Skip headers
row = next(r)
for row in r:
if '' not in row:
yield row
for row in g('file.csv'):
print('row without empty values: {}'.format(row))
this is my first post but I am hoping you can tell me how to perform a calculation and insert the value within a csv data file.
For each row I want to be able to be able to take each 'uniqueclass' and sum the scores achieved in column 12. See example data below;
text1,Data,Class,Uniqueclass1,data1,data,2,data2,data3,data4,data5,175,12,data6,data7
text1,Data,Class,Uniqueclass1,data1,data,2,data2,data3,data4,data5,171,18,data6,data7
text1,Data,Class,Uniqueclass2,data1,data,4,data2,data3,data4,data5,164,5,data6,data7
text1,Data,Class,Uniqueclass2,data1,data,4,data2,data3,data4,data5,121,21.5,data6,data7
text2,Data,Class,Uniqueclass2,data1,data,4,data2,data3,data4,data5,100,29,data6,data7
text2,Data,Class,Uniqueclass2,data1,data,4,data2,data3,data4,data5,85,21.5,data6,data7
text3,Data,Class,Uniqueclass3,data1,data,3,data2,data3,data4,data5,987,35,data6,data7
text3,Data,Class,Uniqueclass3,data1,data,3,data2,data3,data4,data5,286,18,data6,data7
text3,Data,Class,Uniqueclass3,data1,data,3,data2,data3,data4,data5,003,5,data6,data7
So for instance the first Uniqueclass lasts for the first two rows. I would like to be able to therefore insert a subsquent value on that row which would be '346'(the sum of both 175 & 171.) The resultant would look like this:
text1,Data,Class,Uniqueclass1,data1,data,2,data2,data3,data4,data5,175,12,data6,data7,346
text1,Data,Class,Uniqueclass1,data1,data,2,data2,data3,data4,data5,171,18,data6,data7,346
I would like to be able to do this for each of the uniqueclass'
Thanks SMNALLY
I always like the defaultdict class for this type of thing.
Here would be my attempt:
from collections import defaultdict
class_col = 3
data_col = 11
# Read in the data
with open('path/to/your/file.csv', 'r') as f:
# if you have a header on the file
# header = f.readline().strip().split(',')
data = [line.strip().split(',') for line in f]
# Sum the data for each unique class.
# assuming integers, replace int with float if needed
count = defaultdict(int)
for row in data:
count[row[class_col]] += int(row[data_col])
# Append the relevant sum to the end of each row
for row in xrange(len(data)):
data[row].append(str(count[data[row][class_col]]))
# Write the results to a new csv file
with open('path/to/your/new_file.csv', 'w') as nf:
nf.write('\n'.join(','.join(row) for row in data))