Reading from and Writing to CSV files - python

I am struggling with Python 2.7.10. Iā€™m trying to create a program that will eventually open a CSV file, read numbers from the file, perform calculations with the numbers and write back to the CSV file.
The code (i.e. the calculations) is not finished, I just wanted to try a few small bits so I could start to identify problems. The data in the CSV file looks like this:
['110000,75000\n', '115000,72500\n', '105000,85250\n', '100000,70000']
One thing that I am having issues with is properly converting the CSV strings to numbers and then telling Python what row, column I want to use in the calculation; something like Row(0), Column(0) ā€“ Row(1) Column(1).
I have tried a few different things but its seems to crash on the converting to numbers bit. Error message is TypeError int() argument must be a string or a number, not list OR IOError File not open for string ā€“ depending on what I have tried. Can someone point me in the right direction?
import csv
def main():
my_file = open('InputData.csv','rU')
#test = csv.writer(my_file, delimiter=',')
file_contents = my_file.readlines()
print file_contents
for row in file_contents:
print row
#convert to numbers
#val0 = int(file_contents.readlines(0))
#val1 = int(file_contents.readlines(1))
#val0 = int(my_file.readlines(0))
#val1 = int(my_file.readlines(1))
#perform calculation
#valDiff = val1 - val0
#append to third column, may need to be in write file mode, num to strings
#file_contents.append
my_file.close()
main()

The list file_contents now contains all of your excel data, so trying to use readlines probably won't work on the list type. I would try
row0 = file_contents[0].split(",")
Which should give you the first row in a list format. You should (and most likely will need to) put this in a loop to cover any size excel sheet you have. Then
val0 = int(row0[0])
should give you the value you want. But again I would make this iterative to save yourself some time and effort.

Assuming that your file is in plain text format and that you do not want to use a third party library like pandas then this would be the basic way to do it:
data = []
with open('InputData.csv','r') as my_file:
for row in my_file:
columns = row.split(',') #clean and split
data.append([int(value) for value in columns])
print(data[0][0]) #row=0 col=0
print(data[0][1]) #row=0 col=1

I think this will do what you want:
import csv
def main(filename):
# read entire csv file into memory
with open(filename, 'rb') as my_file:
reader = csv.reader(my_file, delimiter=',')
file_contents = list(reader)
# rewrite file adding a difference column
with open(filename, 'wb') as my_file:
writer = csv.writer(my_file, delimiter=',')
for row in file_contents:
val0, val1 = map(int, row)
difference = val1 - val0
#print(val0, val1, difference)
writer.writerow([val0, val1, difference])
if __name__ == '__main__':
main('InputData.csv')
Be careful when using this because it will rewrite the file. For testing and debugging, you might want to have it write the results to a second file with a different name.

Related

Python: Replace string in a txt file but not on every occurrence

I am really new to python and I need to change new artikel Ids to the old ones. The Ids are mapped inside a dict. The file I need to edit is a normal txt where every column is sperated by Tabs. The problem is not replacing the values rather then only replacing the ouccurances in the desired column which is set by pos.
I really would appreciate some help.
def replaceArtCol(filename, pos):
with open(filename) as input_file, open('test.txt','w') as output_file:
for each_line in input_file:
val = each_line.split("\t")[pos]
for row in artikel_ID:
if each_line[pos] == pos
line = each_line.replace(val, artikel_ID[val])
output_file.write(line)`
This Code just replaces any occurance of the string in the text file.
supposed your ID mapping dict looks like ID_mapping = {'old_id': 'new_id'}, I think your code is not far from working correctly. A modified version could look like
with open(filename) as input_file, open('test.txt','w') as output_file:
for each_line in input_file:
line = each_line.split("\t")
if line[pos] in ID_mapping.keys():
line[pos] = ID_mapping[line[pos]]
line = '\t'.join(line)
output_file.write(line)
if you're not working in pandas anyway, this can save a lot of overhead.
if your data is tab separated then you must load this data into dataframe.. this way you can have columns and rows structure.. what you are sdoing right now will not allow you to do what you want to do without some complex and buggy logic. you may try these steps
import pandas as pd
df = pd.read_csv("dummy.txt", sep="\t", encoding="latin-1")
df['desired_column_name'] = df['desired_column_name'].replace({"value_to_be_changed": "newvalue"})
print(df.head())

Read csv files in a loop, isolating the non-empty elements from a specific column each time-Python

I am new to Python and I attempt to read .csv files sequentially in a for or while loop (I have about 250 .csv). The aim is to read each .csv file and isolate only all the columns, whenever a specific column (let's call it "wanted_column") is non-empty (i.e. its non-empty rows). Then, save all the non-empty rows of the "wanted_column", as well as all its columns, in a new .csv file.
Hence, at the end, I want to have 250 .csv files with all columns for each row that has non-empty elements in the "wanted_column".
Hopefully this is clear. I would appreciate any ideas.
George
I wrote this code below just to give you an idea of how to do it. Beware that this code below does not check for any errors. Its behavior is undefined if one of your CSV files is empty, if it couldn't find the file, and if the column you defined is a none existence column in one of the file. There could be more. Thus, you would want to build a check code around it. Also, your CSV formatting could greatly be depended on python csv package.
So now to the code explanation. For the "paths" variable. You can give that a string, a tuple, or a list. If you give it a string, it will convert that to a tuple with one index. You can give that variable the file(s) that you want to work with.
For the "column" variable, that one should be a string. You need to build an error checking for that if needed.
For code routines, the function will read all the CSV files of the paths list. Each time it read a file, it will read the first line first and save the content to a variable(rowFields).
After that, it generates the dict header(fields) with key(column name) to value(position). That dict is used to search for the column position by using its name. For here, you could also go through each field and if the field matches the column name then you save that value as the column position. Then that position could later be used instead of keep on searching the dict for the position using the name. The later method described in this paragraph should be the fastest.
After that, it goes on and read each row of the CSV file until the end. Each time it read a row, it will check if the length of the string from the column that defined by the "column" variable is larger than zero. If that string length is larger than zero, then it will append that row to the variable contentRows.
After the function done reading the CSV file, it will write the contents of variable "rowFields" and "contentRows" to a CSV file that defined by the "outfile" variable. To make it easy for me, outfile simply equal to input file + ".new". You can just change that.
import csv
def getNoneEmpty( paths, column ):
if isinstance(paths, str):
paths = (paths, )
if not isinstance(paths, list) and not isinstance(paths, tuple):
raise("paths have to be a or string, list, or tuple")
quotechar='"'
delimiter=","
lineterminator="\n"
for f in paths:
outfile = f + ".new" # change this area to how you want to generate the new file
fields = {}
rowFields = None
contentRows = []
with open(f, newline='') as csvfile:
csvReader = csv.reader(csvfile, delimiter=delimiter, quotechar=quotechar, lineterminator=lineterminator)
rowFields = next(csvReader)
for i in range(0, len(rowFields)):
fields[rowFields[i]] = i
for row in csvReader:
if len(row[fields[column]]) != 0:
contentRows.append(row)
with open(outfile, 'w') as csvfile:
csvWriter = csv.writer(csvfile,delimiter=delimiter, quotechar=quotechar,quoting=csv.QUOTE_MINIMAL, lineterminator=lineterminator)
csvWriter.writerow(rowFields)
csvWriter.writerows(contentRows)
getNoneEmpty(["test.csv","test2.csv"], "1958")
test.csv content:
"Month","1958","1959","1960"
"JAN",115,360,417
"FEB",318,342,391
"MAR",362,406,419
"APR",348,396,461
"MAY",11,420,472
"JUN",124,472,535
"JUL",158,548,622
"AUG",505,559,606
"SEP",404,463,508
"OCT",,407,461
"NOV",310,362,390
"DEC",110,405,432
test2.csv content:
"Month","1958","1959","1960"
"JAN",,360,417
"FEB",318,342,391
"MAR",362,406,419
"APR",348,396,461
"MAY",,420,472
"JUN",,472,535
"JUL",,548,622
"AUG",505,559,606
"SEP",404,463,508
"OCT",,407,461
"NOV",310,362,390
"DEC",110,405,432
Hopefully it will work:
def main():
temp = []
with open(r'old_csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
for row in csv_reader:
for x in row:
temp.append(x)
with open(r'new_csv', mode='w') as new_file:
writer = csv.writer(new_file, delimiter=',', lineterminator='\n')
for col in temp:
list_ = col.split(',')
writer.writerow(list_)

How would I validate that a CSV file is delimited by certain character (in this case a backtick ( ` ) )

I have these huge CSV files that I need to validate; need to make sure they are all delimited by back tick `. I have a reader opening each file and printing it's content. Just wondering the different ways you all would go about validating that each value is delimited by the back tick character
for csvfile in self.fullcsvpathfiles:
#print("file..")
with open(self.fullcsvpathfiles[0], mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter = "`")
for row in csv_reader:
print (row)
Not sure how to go about validating that each value is seperated by a backtick and throw an error if otherwise. These tables are huge (not that thats a problem for electricity ;) )
Method 1
With pandas library you could use pandas.read_csv() function to read the csv file with sep='`' (it specifies the delimiter). If it parses the file to a dataframe in a good shape, then you could almost be sure that's good.
Also, to automate the validation process, you could check if the number of NaN values in the dataframe is within an acceptable level. Assume your csv files do not have many blanks (so only a few NaN values are expected), you could compare the number of NaN values with a threshold you set.
import pandas as pd
nan_threshold = 20
for csvfile in self.fullcsvpathfiles:
my_df = pd.read_csv(csvfile, sep="`") # if it fails at this step, then something (probably the delimiter) must be wrong
nans = my_df.is_null().sum()
if nans > nan_threshold:
print(csvfile) # make some warning here
Refer to this page for more information about pandas.read_csv().
Method 2
As mentioned in the comments, you could also check if the number of occurrence of the delimiter is equal in each line of the file.
num_of_sep = -1 # initial value
# assume you are at the step of reading a file f
for line in f:
num = line.count("`")
if num_of_sep == -1:
num_of_sep = num
elif num != num_of_sep:
print('Some warning here')
If you don't know how many columns are in a file, you could check to make sure all the rows have the same number of columns - if you expect the header (first) to always be correct use it to determine the number of columns.
for csvfile in self.fullcsvpathfiles:
with open(self.fullcsvpathfiles[0], mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter = "`")
ncols = len(next(csv_reader))
if not all(len(row)==ncols for row in reader):
#do something
for csvfile in self.fullcsvpathfiles:
with open(self.fullcsvpathfiles[0], mode='r') as f:
row = next(f)
ncols = row.count('`')
if not all(row.count('`')==ncols for row in f):
#do something
If you know how many columns are in a file...
for csvfile in self.fullcsvpathfiles:
with open(self.fullcsvpathfiles[0], mode='r') as csv_file:
#figure out how many columns it is suppose to have here?
ncols = special_process()
csv_reader = csv.DictReader(csv_file, delimiter = "`")
if not all(len(row)==ncols for row in reader):
#do something
for csvfile in self.fullcsvpathfiles:
#figure out how many columns it is suppose to have here?
ncols = special_process()
with open(self.fullcsvpathfiles[0], mode='r') as f:
#figure out how many columns it is suppose to have here?
if not all(row.count('`')==ncols for row in f):
#do something
If you know the number of expected elements, you could inspect each line
f=open(filename,'r')
for line in f:
line=line.split("`")
if line!=numElements:
raise Exception("Bad file")
If you know the delimiter that is being accidentally inserted, you could also try to recover instead of throwing exception. Perhaps something like:
line="`".join(line).replace(wrongDelimiter,"`").split("`")
Of course, once you're that far into reading the file, there's no great need for using an external library to read the data. Just go ahead and use it.

Read Textfile Write new CSV file

Currently I have the following code which prints my desired lines from a .KAP file.
f = open('120301.KAP')
for line in f:
if line.startswith('PLY'):
print line
This results in the following output
PLY/1,48.107478621032,-69.733975000000
PLY/2,48.163516399836,-70.032838888053
PLY/3,48.270000002883,-70.032838888053
PLY/4,48.270000002883,-69.712824977522
PLY/5,48.192379262383,-69.711801581207
PLY/6,48.191666671083,-69.532840015422
PLY/7,48.033358898628,-69.532840015422
PLY/8,48.033359033880,-69.733975000000
PLY/9,48.107478621032,-69.733975000000
My goal is not to have it just print these lines. I'd like to have a CSV file created named 120301.csv with the coordinates in there own columns (leaving the PLY/# behind). Simple enough? I've been trying different import CSV functions for awhile now. I can't seem to get anywhere.
Step by step, since it looks like you're struggling with some basics:
f_in = open("120301.KAP")
f_out = open("outfile.csv", "w")
for line in f_in:
if line.startswith("PLY"): # copy this line to the new file
# split it into columns, ignoring the first one ("PLY/x")
_, col1, col2 = line.split(",")
# format your output
outstring = col1 + "," + col2 + "\n"
# and write it out
f_out.write(outstring)
f_in.close()
f_out.close() # really bad practice, but I'll get to that
Of course this is really not the best way to do this. There's a reason we have things like the csv module.
import csv
with open("120301.KAP") as inf, open("outfile.csv", "wb") as outf:
reader = csv.reader(inf)
writer = csv.writer(outf)
for row in reader:
# if the first cell of row starts with "PLY"...
if row[0].startswith("PLY"):
# write out the row, ignoring the first column
writer.writerow(row[1:])
# opening the files using the "with" context managers means you don't have
# to remember to close them when you're done with them.

Using CSV module in Python to replace zeros

This may seem like an odd thing to do, but I essentially have a csv file that has some values of '0' in quite a number of cells.
How would I, in Python, convert these numbers to read as something like 0.00 instead of just 0? I have a script in ArcMap which needs to read the values as double rather than short integer, and the '0' value really messes that up.
I am new with the CSV module, so I am not sure where to go with this. Any help with making a script convert my values so that when I open the new CSV, it will read a "0.00" rather than '0' would be greatly appreciated.
I would have liked to have some code to give you as an example, but I am at a loss.
Here's a short script that will read a CSV file, convert any numbers to floats and then write it back to the same file again.
import csv
import sys
# These indices won't be converted
dont_touch = [0]
def convert(index, value):
if not index in dont_touch:
try:
return float(value)
except ValueError:
# Not parseable as a number
pass
return value
table = []
with open(sys.argv[1]) as f:
for row in csv.reader(f, delimiter=","):
for i in range(len(row)):
row[i] = convert(i, row[i])
table.append(row)
with open(sys.argv[1], "w") as f:
writer = csv.writer(f, delimiter=",")
writer.writerows(table)
If you have any columns that should not be converted, specify their indices in the dont_touch array.
If you want them to have two trailing zeroes you can play around with format strings instead:
return "{:.02f}".format(float(value))
You can format the 0s and then write them out, you may want to look into the appropriate quoting for your csv (e.g. you may need quoting=csv.QUOTE_NONE in your writer object):
reader = csv.reader(fr)
writer = csv.writer(fw)
for row in reader:
writer.writerow([f if f != '0' else '{:0.2f}'.format(0) for f in row])

Categories