Writing to a file efficiently after prediction python - python

i'm rather new to programming and am trying to reduce the time taken to write my data into a file, and i found that the writing part is the main issue.
The following is part of my code for a machine learning program:
filename="data.csv"
f=open(filename,"w")
headers="row,open\n"
f.write(headers)
for i in range (0,55970):
score=rf.predict(edit[i].reshape(1, -1))
score=str(score).replace('[','').replace(']','')
f.write(str(i) +","+ score +"\n")
f.close()
I understand that I should be writing the data only after i have gotten all of it, but i am not sure how to go about doing it - given that i only know f.write(). Do i make a function for my prediction and return score, then create a list to store all the scores and write it in? (if that is possible)
[Edit]
score=rf.predict(edit)
with open('data.csv', 'w',newline='') as f:
writer = csv.writer(f)
writer.writerow(['row', 'open'])
for i in range(55970):
writer.writerow([i,str(score[i])])
^ added based on new suggestion. Found that i should just do the predict and then write the rows which improved the time taken significantly!
Thank you for your help!!

The CSV module is a better tool for this. More specifically, writerows() is what you are looking for.
https://docs.python.org/3/library/csv.html#csv.csvwriter.writerows
Here is an example from the docs:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
import csv
with open('data.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(['row_id', 'open_flag'])
for i in range(55970):
score = str(rf.predict(edit[i].reshape(1, -1)))
score.replace('[', '').replace(']', '')
writer.writerow([i, score])

Related

how to read from and write to a csv file using csv module

import csv
with open("C:\\Users\\ki386179\\Desktop\\output.csv","r") as f:
reader = csv.reader(f)
for row in reader:
if 'india' in row:
pass
if 'india' not in row:
with open("C:\\Users\\ki386179\\Desktop\\output.csv","w") as f:
writer = csv.writer(f)
writer.writerow('india')
I am trying to achieve something like this , to check for a particular value in a particular column and if not write to it.
You can't write to a file whilst you are in the middle of reading from it. Also, surely you don't want to require every line to contain the string, like your current logic does?
I'm guessing you want to add one line at the end if none of the lines matched:
import csv
seen_string = False
with open("C:\\Users\\ki386179\\Desktop\\output.csv","r") as f:
reader = csv.reader(f)
for row in reader:
if 'india' in row:
seen_string = True
break
if not seen_string:
# Notice append mode; "a" not "w"
with open("C:\\Users\\ki386179\\Desktop\\output.csv","a") as f1:
# Noice this one is f1
writer = csv.writer(f1)
writer.writerow('india')
Maybe notice also that in looks for a substring; if the file contains e.g. amerindian" the code will regard that as a match. Check for string equality with == instead if that's what you want.
You can use r+ mode for both reading and writing. Although it is difficult to understand what you are trying to achieve here, an example code that can do what you're probably trying to achieve is shown below. I put the above code in a script india.py and made it executable (chmod +x india.py)
#!/usr/bin/env python
import sys
from csv import reader, writer
with open(sys.argv[1], "r+") as f:
for row in reader(f):
if "india" in row:
break
else:
writer(f).writerow("india")
A test run
$ cat test_write.csv
us,canada,mexico
france,germany,norway
brazil,argentina,colombia
china,japan,korea
$ ./india.py test_write.csv
$ cat test_write.csv
us,canada,mexico
france,germany,norway
brazil,argentina,colombia
china,japan,korea
i,n,d,i,a

How to avoid repeating header when Writing to CSV in loop?

I want to save the values of different variables in a CSV file. But it prints another header every time. I don't want this, I am attaching my CSV file snapshot for your understanding. Output csv
file_orimg = open('Org_image.csv', 'a', newline='')
writer_orimg = csv.writer(file_orimg, delimiter='\t',lineterminator='\n',)
writer_orimg.writerow(["Image Name", "epsilon","MSE", "SSIM", "Prediction", "Probability"])
for i in images:
writer_orimg.writerow([i, epsilon, mse, ssim, clean_pred, clean_prob, label_idx.item()])
Try not to use writerow to write your headers. You can look at DictWriter in the CSV python module, writing headers and writing rows will be done more efficiently!
list_of_headers = ['No.', 'Image Name', 'Epsilon']
dictionary_content = {'No.': 1, 'Image Name': 'image_123', 'Epsilon': 'what?'}
w = csv.DictWriter(my_csvfile, fieldnames= list_of_headers)
w.writeheader()
w.writerow(dictionay_content)
Hope this helps, let me know if there is any rectification to be made!
Edit: Answering 'where & when should writeheader be done'
I use the os python module to determine whether the file exists, if not I'm going to create one!
if os.path.isfile(filename):
with open(filename, 'a', newline='') as my_file:
w = csv.DictWriter(my_file, fieldnames= list_of_headers)
w.writerow(dictionay_content)
else:
with open(filename, 'w', newline='') as my_file:
w = csv.DictWriter(my_file, fieldnames= list_of_headers)
w.writeheader()
w.writerow(dictionay_content)
!!! Take note of the 'a' which is to append whereas 'w' means to write. Hence appending with new rows of your data from where it left off/last occupied.

Number formatting a CSV

I have developed a script that produces a CSV file. On inspection of the file, some cell's are being interpreted not the way I want..
E.g In my list in python, values that are '02e4' are being automatically formatted to be 2.00E+04.
table = [['aa02', 'fb4a82', '0a0009'], ['02e4, '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
for i in range(0, len(table)):
for j in range(0, len(table[i]):
ofile.write(table[i][j] + ",")
ofile.write("\n")
This gives me:
aa02 fb4a82 0a0009
2.00E+04 452ca2 0b0004
I've tried using the csv.writer instead where writer = csv.writer(ofile, ...)
and giving attributes from the lib (e.g csv.QUOTE_ALL)... but its the same output as before..
Is there a way using the CSV lib to automatically format all my values as strings before it's written?
Or is this not possible?
Thanks
Try setting the quoting parameter in your csv writer to csv.QUOTE_ALL.
See the doc for more info:
import csv
with open('myfile.csv', 'wb') as csvfile:
wtr = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
wtr.writerow(...)
Although it sounds like the problem might lie with your csv viewer. Excel has a rather annoying habit of auto-formatting data like you describe.
If you want the '02e4' to show up in excel as "02e4" then annoyingly you have to write a csv with triple-double quotes: """02e4""". I don't know of a way to do this with the csv writer because it limits your quote character to a character. However, you can do something similar to your original attempt:
table = [['aa02', 'fb4a82', '0a0009'], ['02e4', '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
for i in range(0, len(table)):
for j in range(len(table[i])):
ofile.write('"""%s""",'%table[i][j])
ofile.write("\n")
If opened in a text editor your csv file will read:
"""aa02""","""fb4a82""","""0a0009""",
"""02e4""","""452ca2""","""0b0004""",
This produces the following result in Excel:
If you wanted to use any single character quotation you could use the csv module like so:
import csv
table = [['aa02', 'fb4a82', '0a0009'], ['02e4', '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
writer = csv.writer(ofile, delimiter=',', quotechar='|',quoting=csv.QUOTE_ALL)
for i in range(len(table)):
writer.writerow(table[i])
The output in the text editor will be:
|aa02|,|fb4a82|,|0a0009|
|02e4|,|452ca2|,|0b0004|
and Excel will show:

How to write data onto csv file over a loop in python

I want to write data onto several rows in a csv file. With this code I am only getting a single line written onto file. Need help.
for i in range(1,10):
with open("output.csv",'wb') as f:
writer = csv.writer(f, dialect='excel')
writer.writerow([value1, value2, value3])
You are encountering the error because it is re-writing a csv for every iteration in your loop. You should move the with open() statement outside of your loop block.
Try opening the file only once and then doing the loop:
with open("output.csv",'wb') as f:
writer = csv.writer(f, dialect='excel')
for item in list_A:
writer.writerow([value1, value2, value3])
You would need to use the w option in open as follows:
import csv
list_A = [0,0,0,0,0,0,0,0,0,0,0]
with open("output.csv",'w') as f:
writer = csv.writer(f)
for item in list_A:
writer.writerow([1,0,0,0])

Accessing elements of [[[i j k l]]]

I've recently started working with Python and image processing. The HoughLinesP function from CV outputted this "[[[465 391 521 391]]]" and I need to export the values to an excel sheet afterwards. So, I need to access each of those elements individually.
How would I go about accessing those elements and storing them for later use?
Thank-you for your help!
Here is the corresponding documentation. The result of calling the function is an array, with the points you need. Here is how you could access them and store them to a CSV:
lines = cv2.HoughLinesP(...)
with open('tmp.csv', 'w') as f:
for l in lines:
f.write(','.join(str(x) for x in l) + "\n")
The file tmp.csv should contain data that can be opened in Excel.
The easier way, with csv:
import csv
# Assuming lines is already defined and in scope.
with open('tmp.csv', 'w') as f:
writer = csv.writer(f, delimiter=',')
writer.writerow(lines)

Categories