Using CSV module in Python to replace zeros - python

This may seem like an odd thing to do, but I essentially have a csv file that has some values of '0' in quite a number of cells.
How would I, in Python, convert these numbers to read as something like 0.00 instead of just 0? I have a script in ArcMap which needs to read the values as double rather than short integer, and the '0' value really messes that up.
I am new with the CSV module, so I am not sure where to go with this. Any help with making a script convert my values so that when I open the new CSV, it will read a "0.00" rather than '0' would be greatly appreciated.
I would have liked to have some code to give you as an example, but I am at a loss.

Here's a short script that will read a CSV file, convert any numbers to floats and then write it back to the same file again.
import csv
import sys
# These indices won't be converted
dont_touch = [0]
def convert(index, value):
if not index in dont_touch:
try:
return float(value)
except ValueError:
# Not parseable as a number
pass
return value
table = []
with open(sys.argv[1]) as f:
for row in csv.reader(f, delimiter=","):
for i in range(len(row)):
row[i] = convert(i, row[i])
table.append(row)
with open(sys.argv[1], "w") as f:
writer = csv.writer(f, delimiter=",")
writer.writerows(table)
If you have any columns that should not be converted, specify their indices in the dont_touch array.
If you want them to have two trailing zeroes you can play around with format strings instead:
return "{:.02f}".format(float(value))

You can format the 0s and then write them out, you may want to look into the appropriate quoting for your csv (e.g. you may need quoting=csv.QUOTE_NONE in your writer object):
reader = csv.reader(fr)
writer = csv.writer(fw)
for row in reader:
writer.writerow([f if f != '0' else '{:0.2f}'.format(0) for f in row])

Related

Converting CSV into Array in Python

I have a csv file like below. A small csv file and I have uploaded it here
I am trying to convert csv values into array.
My expectation output like
My solution
results = []
with open("Solutions10.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
but I am getting a
ValueError: could not convert string to float: ' [1'
There is a problem with your CSV. It's just not csv (coma separated values). To do this you need some cleaning:
import re
# if you expect only integers
pattern = re.compile(r'\d+')
# if you expect floats (uncomment below)
# pattern = re.compile(r'\d+\.*\d*')
result = []
with open(filepath) as csvfile:
for row in csvfile:
result.append([
int(val.group(0))
# float(val.group(0))
for val in re.finditer(pattern, row)
])
print(result)
You can also solve this with substrings if it's easier for you and you know the format exactly.
Note: Also I see there is "eval" suggestion. Please, be careful with it as you can get into a lot of trouble if you scan unknown/not trusted files...
You can do this:
with open("Solutions10.csv") as csvfile:
result = [eval(k) for k in csvfile.readlines()]
Edit: Karl is cranky and wants you todo this:
with open("Solutions10.csv") as csvfile:
result = []
for line in csvfile.readlines():
line = line.replace("[","").replace("]","")
result.append([int(k) for k in line.split(",")]
But you're the programmer so you can do what you want. If you trust your input file eval is fine.

Could not convert string to float error while using csv files

I'm trying to load the two coluns of my csv files into an array in python. However I am getting:
ValueError: could not convert string to float: ''.
I have attached the snippets of the code implemented and the csv file I'm trying to store in an array.
import csv
col1 = []
col2 = []
path = r'C:\Users\angel\OneDrive\Documents\CSV_FILES_NV_LAB\1111 x 30.csv'
with open(path, "r") as f_in:
reader = csv.reader(f_in)
next(reader) # skip headers
for line in reader:
col1.append(float(line[0]))
col2.append(float(line[1]))
print(col1)
print(col2)
What values are in the CSV file? If the values cannot be converted to floats, you will get the ValueError. For example, if your CSV file looks like this:
ColName,ColName2
abc,def
123,45.6
g,20
the error will be raised on the first iteration of your loop because abc cannot be converted to a float. If, however, all the values in the CSV file are numbers:
ColName, ColName2
1,2
123,45.6
100,20
the error will not be raised.
If you have some numeric and some non-numeric values in the CSV file, you can omit the lines containing non-numeric values by including a try...except block in your loop:
for line in reader:
try:
float_1, float_2 = float(line[0]), float(line[1])
# If either of the above conversions failed, the next two lines will not be reached
col1.append(float_1)
col2.append(float_2)
except ValueError:
continue # Move on to next line
Maybe you forgot to add .split(',')? Right now, line[0] and line[1] simply take the first and second character of the line.

Read csv files in a loop, isolating the non-empty elements from a specific column each time-Python

I am new to Python and I attempt to read .csv files sequentially in a for or while loop (I have about 250 .csv). The aim is to read each .csv file and isolate only all the columns, whenever a specific column (let's call it "wanted_column") is non-empty (i.e. its non-empty rows). Then, save all the non-empty rows of the "wanted_column", as well as all its columns, in a new .csv file.
Hence, at the end, I want to have 250 .csv files with all columns for each row that has non-empty elements in the "wanted_column".
Hopefully this is clear. I would appreciate any ideas.
George
I wrote this code below just to give you an idea of how to do it. Beware that this code below does not check for any errors. Its behavior is undefined if one of your CSV files is empty, if it couldn't find the file, and if the column you defined is a none existence column in one of the file. There could be more. Thus, you would want to build a check code around it. Also, your CSV formatting could greatly be depended on python csv package.
So now to the code explanation. For the "paths" variable. You can give that a string, a tuple, or a list. If you give it a string, it will convert that to a tuple with one index. You can give that variable the file(s) that you want to work with.
For the "column" variable, that one should be a string. You need to build an error checking for that if needed.
For code routines, the function will read all the CSV files of the paths list. Each time it read a file, it will read the first line first and save the content to a variable(rowFields).
After that, it generates the dict header(fields) with key(column name) to value(position). That dict is used to search for the column position by using its name. For here, you could also go through each field and if the field matches the column name then you save that value as the column position. Then that position could later be used instead of keep on searching the dict for the position using the name. The later method described in this paragraph should be the fastest.
After that, it goes on and read each row of the CSV file until the end. Each time it read a row, it will check if the length of the string from the column that defined by the "column" variable is larger than zero. If that string length is larger than zero, then it will append that row to the variable contentRows.
After the function done reading the CSV file, it will write the contents of variable "rowFields" and "contentRows" to a CSV file that defined by the "outfile" variable. To make it easy for me, outfile simply equal to input file + ".new". You can just change that.
import csv
def getNoneEmpty( paths, column ):
if isinstance(paths, str):
paths = (paths, )
if not isinstance(paths, list) and not isinstance(paths, tuple):
raise("paths have to be a or string, list, or tuple")
quotechar='"'
delimiter=","
lineterminator="\n"
for f in paths:
outfile = f + ".new" # change this area to how you want to generate the new file
fields = {}
rowFields = None
contentRows = []
with open(f, newline='') as csvfile:
csvReader = csv.reader(csvfile, delimiter=delimiter, quotechar=quotechar, lineterminator=lineterminator)
rowFields = next(csvReader)
for i in range(0, len(rowFields)):
fields[rowFields[i]] = i
for row in csvReader:
if len(row[fields[column]]) != 0:
contentRows.append(row)
with open(outfile, 'w') as csvfile:
csvWriter = csv.writer(csvfile,delimiter=delimiter, quotechar=quotechar,quoting=csv.QUOTE_MINIMAL, lineterminator=lineterminator)
csvWriter.writerow(rowFields)
csvWriter.writerows(contentRows)
getNoneEmpty(["test.csv","test2.csv"], "1958")
test.csv content:
"Month","1958","1959","1960"
"JAN",115,360,417
"FEB",318,342,391
"MAR",362,406,419
"APR",348,396,461
"MAY",11,420,472
"JUN",124,472,535
"JUL",158,548,622
"AUG",505,559,606
"SEP",404,463,508
"OCT",,407,461
"NOV",310,362,390
"DEC",110,405,432
test2.csv content:
"Month","1958","1959","1960"
"JAN",,360,417
"FEB",318,342,391
"MAR",362,406,419
"APR",348,396,461
"MAY",,420,472
"JUN",,472,535
"JUL",,548,622
"AUG",505,559,606
"SEP",404,463,508
"OCT",,407,461
"NOV",310,362,390
"DEC",110,405,432
Hopefully it will work:
def main():
temp = []
with open(r'old_csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
for row in csv_reader:
for x in row:
temp.append(x)
with open(r'new_csv', mode='w') as new_file:
writer = csv.writer(new_file, delimiter=',', lineterminator='\n')
for col in temp:
list_ = col.split(',')
writer.writerow(list_)

How would I validate that a CSV file is delimited by certain character (in this case a backtick ( ` ) )

I have these huge CSV files that I need to validate; need to make sure they are all delimited by back tick `. I have a reader opening each file and printing it's content. Just wondering the different ways you all would go about validating that each value is delimited by the back tick character
for csvfile in self.fullcsvpathfiles:
#print("file..")
with open(self.fullcsvpathfiles[0], mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter = "`")
for row in csv_reader:
print (row)
Not sure how to go about validating that each value is seperated by a backtick and throw an error if otherwise. These tables are huge (not that thats a problem for electricity ;) )
Method 1
With pandas library you could use pandas.read_csv() function to read the csv file with sep='`' (it specifies the delimiter). If it parses the file to a dataframe in a good shape, then you could almost be sure that's good.
Also, to automate the validation process, you could check if the number of NaN values in the dataframe is within an acceptable level. Assume your csv files do not have many blanks (so only a few NaN values are expected), you could compare the number of NaN values with a threshold you set.
import pandas as pd
nan_threshold = 20
for csvfile in self.fullcsvpathfiles:
my_df = pd.read_csv(csvfile, sep="`") # if it fails at this step, then something (probably the delimiter) must be wrong
nans = my_df.is_null().sum()
if nans > nan_threshold:
print(csvfile) # make some warning here
Refer to this page for more information about pandas.read_csv().
Method 2
As mentioned in the comments, you could also check if the number of occurrence of the delimiter is equal in each line of the file.
num_of_sep = -1 # initial value
# assume you are at the step of reading a file f
for line in f:
num = line.count("`")
if num_of_sep == -1:
num_of_sep = num
elif num != num_of_sep:
print('Some warning here')
If you don't know how many columns are in a file, you could check to make sure all the rows have the same number of columns - if you expect the header (first) to always be correct use it to determine the number of columns.
for csvfile in self.fullcsvpathfiles:
with open(self.fullcsvpathfiles[0], mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter = "`")
ncols = len(next(csv_reader))
if not all(len(row)==ncols for row in reader):
#do something
for csvfile in self.fullcsvpathfiles:
with open(self.fullcsvpathfiles[0], mode='r') as f:
row = next(f)
ncols = row.count('`')
if not all(row.count('`')==ncols for row in f):
#do something
If you know how many columns are in a file...
for csvfile in self.fullcsvpathfiles:
with open(self.fullcsvpathfiles[0], mode='r') as csv_file:
#figure out how many columns it is suppose to have here?
ncols = special_process()
csv_reader = csv.DictReader(csv_file, delimiter = "`")
if not all(len(row)==ncols for row in reader):
#do something
for csvfile in self.fullcsvpathfiles:
#figure out how many columns it is suppose to have here?
ncols = special_process()
with open(self.fullcsvpathfiles[0], mode='r') as f:
#figure out how many columns it is suppose to have here?
if not all(row.count('`')==ncols for row in f):
#do something
If you know the number of expected elements, you could inspect each line
f=open(filename,'r')
for line in f:
line=line.split("`")
if line!=numElements:
raise Exception("Bad file")
If you know the delimiter that is being accidentally inserted, you could also try to recover instead of throwing exception. Perhaps something like:
line="`".join(line).replace(wrongDelimiter,"`").split("`")
Of course, once you're that far into reading the file, there's no great need for using an external library to read the data. Just go ahead and use it.

Reading from and Writing to CSV files

I am struggling with Python 2.7.10. Iā€™m trying to create a program that will eventually open a CSV file, read numbers from the file, perform calculations with the numbers and write back to the CSV file.
The code (i.e. the calculations) is not finished, I just wanted to try a few small bits so I could start to identify problems. The data in the CSV file looks like this:
['110000,75000\n', '115000,72500\n', '105000,85250\n', '100000,70000']
One thing that I am having issues with is properly converting the CSV strings to numbers and then telling Python what row, column I want to use in the calculation; something like Row(0), Column(0) ā€“ Row(1) Column(1).
I have tried a few different things but its seems to crash on the converting to numbers bit. Error message is TypeError int() argument must be a string or a number, not list OR IOError File not open for string ā€“ depending on what I have tried. Can someone point me in the right direction?
import csv
def main():
my_file = open('InputData.csv','rU')
#test = csv.writer(my_file, delimiter=',')
file_contents = my_file.readlines()
print file_contents
for row in file_contents:
print row
#convert to numbers
#val0 = int(file_contents.readlines(0))
#val1 = int(file_contents.readlines(1))
#val0 = int(my_file.readlines(0))
#val1 = int(my_file.readlines(1))
#perform calculation
#valDiff = val1 - val0
#append to third column, may need to be in write file mode, num to strings
#file_contents.append
my_file.close()
main()
The list file_contents now contains all of your excel data, so trying to use readlines probably won't work on the list type. I would try
row0 = file_contents[0].split(",")
Which should give you the first row in a list format. You should (and most likely will need to) put this in a loop to cover any size excel sheet you have. Then
val0 = int(row0[0])
should give you the value you want. But again I would make this iterative to save yourself some time and effort.
Assuming that your file is in plain text format and that you do not want to use a third party library like pandas then this would be the basic way to do it:
data = []
with open('InputData.csv','r') as my_file:
for row in my_file:
columns = row.split(',') #clean and split
data.append([int(value) for value in columns])
print(data[0][0]) #row=0 col=0
print(data[0][1]) #row=0 col=1
I think this will do what you want:
import csv
def main(filename):
# read entire csv file into memory
with open(filename, 'rb') as my_file:
reader = csv.reader(my_file, delimiter=',')
file_contents = list(reader)
# rewrite file adding a difference column
with open(filename, 'wb') as my_file:
writer = csv.writer(my_file, delimiter=',')
for row in file_contents:
val0, val1 = map(int, row)
difference = val1 - val0
#print(val0, val1, difference)
writer.writerow([val0, val1, difference])
if __name__ == '__main__':
main('InputData.csv')
Be careful when using this because it will rewrite the file. For testing and debugging, you might want to have it write the results to a second file with a different name.

Categories