File writing and reading - python

try:
studfile = open("students.csv","r")
except IOError:
studfile = open("students.csv","w")
#later in the code
studfile.write(students)
The purpose of this try/except block was to try and route out the IOError, but I ended up getting another error message, which was "expected a character buffer object". help on how to fix it?

Assuming students is some form of data you wish to save as a csv file, it's probably best to use python's built in csv file IO. For example:
import csv
with open("students.csv","wb") as studfile: # using with is good practice
...
csv_writer = csv.writer(studfile)
csv_writer.writerow(students) # assuming students is a list of data

This is the TypeError you are getting. Your 'students' type should be string while writing to file. Using str(students) should solve your problem.
EDIT:
str can convert any object to string type. Considering the comments below:
Since you didn't mention the type of student. If it is list of string (assuming). Then you can't write like this: studfile.write(students).
You should do similar to this:
for entry in students:
studfile.write(entry) # decide whether to add newline character or not

Related

Python's json.load weird behavior

I'm trying to extract a specific value from log files in a directory.
Now the log files contains JSON data and i want to extract the value for the id field.
JSON Data look something like this
{
id: "123",
name: "foo"
description: "bar baz"
}
Code Looks like this
def test_load_json_directly(self):
with open('source_data/testing123.json') as log_file:
data = json.load(log_file)
print data
def test_load_json_from_iteration(self, dir_path, file_ext):
path_name = os.path.join(dir_path, '*.' + file_ext)
files = glob.glob(path_name)
for filename in files:
with open(filename) as log_file:
data = json.load(log_file)
print data
Now I try to call the function test_load_json_directly the JSON string gets loaded correctly. No problem there. This is just to check the correct behavior of the json.load function.
The issue is when I try to call the function test_load_json_from_iteration, the JSON string is not being recognized and returns an error.
ValueError: No JSON object could be decoded
What am I doing wrong here?
Your json is invalid. The property names and the values must be wrapped with quotes (except if they are numbers). You're also missing the commas.
The most probable reason for this error is an error in a json file. Since json module doesn't show detailed errors, you can use the simplejson module to see what's actually happening.
Change your code to:
import simplejson
.
.
.
data = simplejson.load(log_file)
And look at the error message. It will show you the line and the column where it fails.
Ex:
simplejson.errors.JSONDecodeError: Expecting value: line 5 column 17 (char 84)
Hope it helps :) Feel free to ask if you have any doubts.

How to not have set written on my file- python 2

So I basically just want to have a list of all the pixel colour values that overlap written in a text file so I can then access them later.
The only problem is that the text file is having (set([ or whatever written with it.
Heres my code
import cv2
import numpy as np
import time
om=cv2.imread('spectrum1.png')
om=om.reshape(1,-1,3)
om_list=om.tolist()
om_tuple={tuple(item) for item in om_list[0]}
om_set=set(om_tuple)
im=cv2.imread('RGB.png')
im=cv2.resize(im,(100,100))
im= im.reshape(1,-1,3)
im_list=im.tolist()
im_tuple={tuple(item) for item in im_list[0]}
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Also, if I run this program again but with a different picture for comparison, will it append the data or overwrite it? It's kinda hard to tell when just looking at numbers.
If you replace these lines:
im=cv2.imread('RGB.png')
File= open('Weedlist', 'w')
File.write(str(ColourCount))
with:
import sys
im=cv2.imread(sys.argv[1])
open(sys.argv[1]+'Weedlist', 'w').write(str(list(ColourCount)))
you will get a new file for each input file and also you don't have to overwrite the RGB.png every time you want to try something new.
Files opened with mode 'w' will be overwritten. You can use 'a' to append.
You opened the file with the 'w' mode, write mode, which will truncate (empty) the file when you open it. Use 'a' append mode if you want data to be added to the end each time
You are writing the str() conversion of a set object to your file:
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Don't use str to convert the whole object; format your data to a string you find easy to read back again. You probably want to add a newline too if you want each new entry to be added on a new line. Perhaps you want to sort the data too, since a set lists items in an ordered determined by implementation details.
If comma-separated works for you, use str.join(); your set contains tuples of integer numbers, and it sounds as if you are fine with the repr() output per tuple, so we can re-use that:
with open('Weedlist', 'a') as outputfile:
output = ', '.join([str(tup) for tup in sorted(ColourCount)])
outputfile.write(output + '\n')
I used with there to ensure that the file object is automatically closed again after you are done writing; see Understanding Python's with statement for further information on what this means.
Note that if you plan to read this data again, the above is not going to be all that efficient to parse again. You should pick a machine-readable format. If you need to communicate with an existing program, you'll need to find out what formats that program accepts.
If you are programming that other program as well, pick a format that other programming language supports. JSON is widely supported for example (use the json module and convert your set to a list first; json.dump(sorted(ColourCount), fileobj), then `fileobj.write('\n') to produce newline-separated JSON objects could do).
If that other program is coded in Python, consider using the pickle module, which writes Python objects to a file efficiently in a format the same module can load again:
with open('Weedlist', 'ab') as picklefile:
pickle.dump(ColourCount, picklefile)
and reading is as easy as:
sets = []
with open('Weedlist', 'rb') as picklefile:
while True:
try:
sets.append(pickle.load(output))
except EOFError:
break
See Saving and loading multiple objects in pickle file? as to why I use a while True loop there to load multiple entries.
How would you like the data to be written? Replace the final line by
File.write(str(list(ColourCount)))
Maybe you like that more.
If you run that program, it will overwrite the previous content of the file. If you prefer to apprend the data open the file with:
File= open('Weedlist', 'a')

Validate and format JSON files

I have around 2000 JSON files which I'm trying to run through a Python program. A problem occurs when a JSON file is not in the correct format. (Error: ValueError: No JSON object could be decoded) In turn, I can't read it into my program.
I am currently doing something like the below:
for files in folder:
with open(files) as f:
data = json.load(f); # It causes an error at this part
I know there's offline methods to validating and formatting JSON files but is there a programmatic way to check and format these files? If not, is there a free/cheap alternative to fixing all of these files offline i.e. I just run the program on the folder containing all the JSON files and it formats them as required?
SOLVED using #reece's comment:
invalid_json_files = []
read_json_files = []
def parse():
for files in os.listdir(os.getcwd()):
with open(files) as json_file:
try:
simplejson.load(json_file)
read_json_files.append(files)
except ValueError, e:
print ("JSON object issue: %s") % e
invalid_json_files.append(files)
print invalid_json_files, len(read_json_files)
Turns out that I was saving a file which is not in JSON format in my working directory which was the same place I was reading data from. Thanks for the helpful suggestions.
The built-in JSON module can be used as a validator:
import json
def parse(text):
try:
return json.loads(text)
except ValueError as e:
print('invalid json: %s' % e)
return None # or: raise
You can make it work with files by using:
with open(filename) as f:
return json.load(f)
instead of json.loads and you can include the filename as well in the error message.
On Python 3.3.5, for {test: "foo"}, I get:
invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
and on 2.7.6:
invalid json: Expecting property name: line 1 column 2 (char 1)
This is because the correct json is {"test": "foo"}.
When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.
If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.
Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.
Yes, there are ways to validate that a JSON file is valid. One way is to use a JSON parsing library that will throw exceptions if the input you provide is not well-formatted.
try:
load_json_file(filename)
except InvalidDataException: # or something
# oops guess it's not valid
Of course, if you want to fix it, you naturally cannot use a JSON loader since, well, it's not valid JSON in the first place. Unless the library you're using will automatically fix things for you, in which case you probably wouldn't even have this question.
One way is to load the file manually and tokenize it and attempt to detect errors and try to fix them as you go, but I'm sure there are cases where the error is just not possible to fix automatically and would be better off throwing an error and asking the user to fix their files.
I have not written a JSON fixer myself so I can't provide any details on how you might go about actually fixing errors.
However I am not sure whether it would be a good idea to fix all errors, since then you'd have assume your fixes are what the user actually wants. If it's a missing comma or they have an extra trailing comma, then that might be OK, but there may be cases where it is ambiguous what the user wants.
Here is a full python3 example for the next novice python programmer that stumbles upon this answer. I was exporting 16000 records as json files. I had to restart the process several times so I needed to verify that all of the json files were indeed valid before I started importing into a new system.
I am no python programmer so when I tried the answers above as written, nothing happened. Seems like a few lines of code were missing. The example below handles files in the current folder or a specific folder.
verify.py
import json
import os
import sys
from os.path import isfile,join
# check if a folder name was specified
if len(sys.argv) > 1:
folder = sys.argv[1]
else:
folder = os.getcwd()
# array to hold invalid and valid files
invalid_json_files = []
read_json_files = []
def parse():
# loop through the folder
for files in os.listdir(folder):
# check if the combined path and filename is a file
if isfile(join(folder,files)):
# open the file
with open(join(folder,files)) as json_file:
# try reading the json file using the json interpreter
try:
json.load(json_file)
read_json_files.append(files)
except ValueError as e:
# if the file is not valid, print the error
# and add the file to the list of invalid files
print("JSON object issue: %s" % e)
invalid_json_files.append(files)
print(invalid_json_files)
print(len(read_json_files))
parse()
Example:
python3 verify.py
or
python3 verify.py somefolder
tested with python 3.7.3
It was not clear to me how to provide path to the file folder, so I'd like to provide answer with this option.
path = r'C:\Users\altz7\Desktop\your_folder_name' # use your path
all_files = glob.glob(path + "/*.json")
data_list = []
invalid_json_files = []
for filename in all_files:
try:
df = pd.read_json(filename)
data_list.append(df)
except ValueError:
invalid_json_files.append(filename)
print("Files in correct format: {}".format(len(data_list)))
print("Not readable files: {}".format(len(invalid_json_files)))
#df = pd.concat(data_list, axis=0, ignore_index=True) #will create pandas dataframe
from readable files, if you like

asking a person for a file to save in

What I'm trying to do is to ask a user for a name of a file to make and then save some stuff in this file.
My portion of the program looks like this:
if saving == 1:
ask=raw_input("Type the name file: ")
fileout=open(ask.csv,"w")
fileout.write(output)
I want the format to be .csv, I tried different options but can't seem to work.
The issue here is you need to pass open() a string. ask is a variable that contains a string, but we also want to append the other string ".csv" to it to make it a filename. In python + is the concatenation operator for strings, so ask+".csv" means the contents of ask, followed by .csv. What you currently have is looking for the csv attribute of the ask variable, which will throw an error.
with open(ask+".csv", "w") as file:
file.write(output)
You might also want to do a check first if the user has already typed the extension:
ask = ask if ask.endswith(".csv") else ask+".csv"
with open(ask, "w") as file:
file.write(output)
Note my use of the with statement when opening files. It's good practice as it's more readable and ensures the file is closed properly, even on exceptions.
I am also using the python ternary operator here to do a simple variable assignment based on a condition (setting ask to itself if it already ends in ".csv", otherwise concatenating it).
Also, this is presuming your output is already suitable for a CSV file, the extension alone won't make it CSV. When dealing with CSV data in general, you probably want to check out the csv module.
You need to use ask+'.csv' to concatenate the required extension on to the end of the user input.
However, simply naming the file with a .csv extension is not enough to make it a comma-separated file. You need to format the output. Use csvwriter to do that. The python documentation has some simple examples on how to do this.
I advise you not to attempt to generate the formatted comma-separated output yourself. That's a surprisingly hard task and utterly pointless in the presence of the built-in functionality.
Your variable ask is gonna be of type string after the raw_input.
So, if you want to append the extension .csv to it, you should do:
fileout = open(ask + ".csv", "w")
That should work.

Upload and parse csv file with google app engine

I'm wondering if anyone with a better understanding of python and gae can help me with this. I am uploading a csv file from a form to the gae datastore.
class CSVImport(webapp.RequestHandler):
def post(self):
csv_file = self.request.get('csv_import')
fileReader = csv.reader(csv_file)
for row in fileReader:
self.response.out.write(row)
I'm running into the same problem that someone else mentions here - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717
That is, the csv.reader is iterating over each character and not the line. A google engineer left this explanation:
The call self.request.get('csv') returns a String. When you iterate over a
string, you iterate over the characters, not the lines. You can see the
difference here:
class ProcessUpload(webapp.RequestHandler):
def post(self):
self.response.out.write(self.request.get('csv'))
file = open(os.path.join(os.path.dirname(__file__), 'sample.csv'))
self.response.out.write(file)
# Iterating over a file
fileReader = csv.reader(file)
for row in fileReader:
self.response.out.write(row)
# Iterating over a string
fileReader = csv.reader(self.request.get('csv'))
for row in fileReader:
self.response.out.write(row)
I really don't follow the explanation, and was unsuccessful implementing it. Can anyone provide a clearer explanation of this and a proposed fix?
Thanks,
August
Short answer, try this:
fileReader = csv.reader(csv_file.split("\n"))
Long answer, consider the following:
for thing in stuff:
print thing.strip().split(",")
If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.
Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.
I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.
The Python csv module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.
The webapp.RequestHandler request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two. When you invoke self.request.get('csv') this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c will print each character in the string on a separate line).
Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:
class ProcessUpload(webapp.RequestHandler):
def post(self):
self.response.out.write(self.request.get('csv'))
# Iterating over a string as a file
stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
for row in stringReader:
self.response.out.write(row)
Which will work as you expect it to.
Edit I'm assuming that you are using something like a <textarea/> to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).
You need to call csv_file = self.request.POST.get("csv_import") and not csv_file = self.request.get("csv_import").
The second one just gives you a string as you mentioned in your original post. But accessing via self.request.POST.get gives you a cgi.FieldStorage object.
This means that you can call csv_file.filename to get the object’s filename and csv_file.type to get the mimetype.
Furthermore, if you access csv_file.file, it’s a StringO object (a read-only object from the StringIO module), not just a string. As ig0774 mentioned in his answer, the StringIO module allows you to treat a string as a file.
Therefore, your code can simply be:
class CSVImport(webapp.RequestHandler):
def post(self):
csv_file = self.request.POST.get('csv_import')
fileReader = csv.reader(csv_file.file)
for row in fileReader:
# row is now a list containing all the column data in that row
self.response.out.write(row)

Categories