How do I add a separate function for average calculation? - python

I am stuck on this problem. Code I have so far works but my Professor wants to see some changes. I need to add error handing and I need a separate function for calculating average which I will call in main. Here is the what I have so far...
import os
def process_file(filename):
f = open(filename,'r')
lines = f.readlines()[1:]
f.close()
scores = []
for line in lines:
parsed = line.split(",")
count = int(parsed[1])
scores.append(count)
calculate_result(scores)
def calculate_result(scores):
print("High: ", max(scores))
print("Low: ", min(scores))
print("Average: ", sum(scores)/len(scores))
def main():
filename = "scores.text"
if os.path.isfile(filename):
process_file(filename)
else:
print ("File does not exist")
return 0
main()

I guess there are 2 parts:
I need to add error handling
and
I need a separate function for calculating average which I will call in main
The second part I don't think you need help with. But error handling is kind of an art, so I can see where you might be stuck on that. Here are some suggestions to help get started.
The most common type of error handling involves dealing with input. Thinking more broadly, we could expand that to anything that crosses the boundary of the programs memory space. This includes not just user input, but also output; filesystem interaction; using network interfaces (or any communication device or hardware interface); starting/stopping or otherwise interacting with other programs; calling a library that does any of these things on our behalf; and many more....
So what parts of your program are interacting with "the outside" ? I can see a few:
in main() the program is making an assumption about the existence of a file. You are already checking to make sure this file exists, and returning 0 if it doesn't (you might want to change that to a non-zero value, since 0 is usually used to signal that no error occurred)
process_file() does this: f = open(filename,'r') but are you sure that will work? Are there conditions where this could fail?
What if the user that is running the program doesn't have permissions to read that file?
What if the file was deleted or changed between the time it was checked in main and the subsequent open call in process_file? This is a TOCTOU race condition, and it is something that every software developer needs to watch out for.
Probably the most obvious source of potential errors for this program is the content of the input file:
We're assuming the input is comma-separated. What if the user uses tabs or some other character?
While processing the lines, you've got: count = int(parsed[1]), but how do you know that parsed[1] can be cast to an int?
What will happen if the file exists, but is empty (hint: len(scores)==0)? Always look at these edge cases.
Finally, it looks like you are using if-then statements for error checking. That is fine, but another powerful tool for dealing with errors are try-except statements. They are not mutually exclusive: sometimes it's easier to use an if statement, and sometimes catching an exception with try-except is better. Some of the errors you'll need to deal with are easier to handle using one approach over the other.

Related

Segmentation fault when initializing array

I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api
This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library
To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.

Writing to file without losing data after crash

I have a file that I open before a loop starts, and I'm writing to that file almost at each iteration of the loop. Then I close the file once the loop has finished. So e.g. something like:
testfile = open('datagathered','w')
for i in range(n):
...
testfile.write(line)
testfile.close()
The issue I'm having is that, in case the program crashes or I want to crash it, what has already been written to testfile will be deleted, and the text file datagathered will be empty. I understand that this happens because I'm closing the file only after the loop, but if I close and open the file after each write (i.e. in the loop) doesn't that lead to an incredible slow-down?
If yes, what alternatives do I have for doing the writing, and making sure that in case of a crash the already-written-lines won't get lost, in an efficient way?
The linked posts do bring up good suggestions that arguably answer this question, but they don't cover risks and efficiency differences involved. More precisely: Are there any risks involved with playing with the buffersize? e.g. testfile = open('datagathered','w',0) Finally is using with open... still a viable alternative if there are multiple files to write to?
Small note: This is asked in the context of a very long run, where the file is being written to for 2-3 days. Thus having a speedy and safe way of doing the writing is definitely valuable here.
From the question I understood that you are talking about exceptions may occur at runtime and SIGINT.
You may use 'try-except-finally' block to achieve your goal. It enables you to catch both exceptions and SIGINT signal. Since the finally block will be executed either exception is caught or everything goes well, closing file there is the best choice. Following sample code would solve your problem I guess.
testfile = open('datagathered','w')
try:
for i in range(n):
...
testfile.write(line)
except KeyboardInterrupt:
print "Interrupt from keyboard"
except:
print "Other exception"
finally:
testfile.close()
Use a context:
with open('datagathered','w') as f:
f.write(data)

Why does my generator hang instead of throwing exception?

I have a generator that returns lines from a number of files, through a filter. It looks like this:
def line_generator(self):
# Find the relevant files
files = self.get_files()
# Read lines
input_object = fileinput.input(files)
for line in input_object:
# Apply filter and yield if it is not *None*
filtered = self.__line_filter(input_object.filename(), line)
if filtered is not None:
yield filtered
input_object.close()
The method self.get_files() returns a list of file paths or an empty list.
I have tried to do s = fileinput.input([]), and then call s.next(). This is where it hangs, and I cannot understand why. I'm trying to be pythonic, and not handling all errors myself, but I guess this is one where there is no way around. Or is there?
Unfortunately I have no means of testing this on Linux right now, but could someone please try the following on Linux, and comment what they get?
import fileinput
s = fileinput.input([])
s.next()
I'm on Windows with Python 2.7.5 (64 bit).
All in all, I'd really like to know:
Is this a bug in Python, or me that is doing something wrong?
Shouldn't .next() always return something, or raise a StopIteration?
fileinput defaults to stdin if the list is empty, so it's just waiting for you to type something.
An obvious fix would be to get rid of fileinput (is not terribly useful anyway) and to be explicit, as python zen suggests:
for path in self.get_files():
with open(path) as fp:
for line in fp:
etc
As others already have answered, I try to answer one specific sub-item:
Shouldn't .next() always return something, or raise a StopIteration?
Yes, but it is not specified when this return is supposed to happen: within some milliseconds, seconds or even longer.
If you have a blocking iterator, you can define some wrapper around it so that it runs inside a different thread, filling a list or something, and the originating thread gets an interface to determine if there are data, if there are currently no data or if the source is exhausted.
I can elaborate on this even more if needed.

File open and close in python

I have read that when file is opened using the below format
with open(filename) as f:
#My Code
f.close()
explicit closing of file is not required . Can someone explain why is it so ? Also if someone does explicitly close the file, will it have any undesirable effect ?
The mile-high overview is this: When you leave the nested block, Python automatically calls f.close() for you.
It doesn't matter whether you leave by just falling off the bottom, or calling break/continue/return to jump out of it, or raise an exception; no matter how you leave that block. It always knows you're leaving, so it always closes the file.*
One level down, you can think of it as mapping to the try:/finally: statement:
f = open(filename)
try:
# My Code
finally:
f.close()
One level down: How does it know to call close instead of something different?
Well, it doesn't really. It actually calls special methods __enter__ and __exit__:
f = open()
f.__enter__()
try:
# My Code
finally:
f.__exit__()
And the object returned by open (a file in Python 2, one of the wrappers in io in Python 3) has something like this in it:
def __exit__(self):
self.close()
It's actually a bit more complicated than that last version, which makes it easier to generate better error messages, and lets Python avoid "entering" a block that it doesn't know how to "exit".
To understand all the details, read PEP 343.
Also if someone does explicitly close the file, will it have any undesirable effect ?
In general, this is a bad thing to do.
However, file objects go out of their way to make it safe. It's an error to do anything to a closed file—except to close it again.
* Unless you leave by, say, pulling the power cord on the server in the middle of it executing your script. In that case, obviously, it never gets to run any code, much less the close. But an explicit close would hardly help you there.
Closing is not required because the with statement automatically takes care of that.
Within the with statement the __enter__ method on open(...) is called and as soon as you go out of that block the __exit__ method is called.
So closing it manually is just futile since the __exit__ method will take care of that automatically.
As for the f.close() after, it's not wrong but useless. It's already closed so it won't do anything.
Also see this blogpost for more info about the with statement: http://effbot.org/zone/python-with-statement.htm

multiprocessing when getting URLs python 3.2

I've made a script to get inventory data from the Steam API and I'm a bit unsatisfied with the speed. So I read a bit about multiprocessing in python and simply cannot wrap my head around it. The program works as such: it gets the SteamID from a list, gets the inventory and then appends the SteamID and the inventory in a dictionary with the ID as the key and inventory contents as the value.
I've also understood that there are some issues involved with using a counter when multiprocessing, which is a small problem as I'd like to be able to resume the program from the last fetched inventory rather than from the beginning again.
Anyway, what I'm asking for is really a concrete example of how to do multiprocessing when opening the URL that contains the inventory data so that the program can fetch more than one inventory at a time rather than just one.
onto the code:
with open("index_to_name.json", "r", encoding=("utf-8")) as fp:
index_to_name=json.load(fp)
with open("index_to_quality.json", "r", encoding=("utf-8")) as fp:
index_to_quality=json.load(fp)
with open("index_to_name_no_the.json", "r", encoding=("utf-8")) as fp:
index_to_name_no_the=json.load(fp)
with open("steamprofiler.json", "r", encoding=("utf-8")) as fp:
steamprofiler=json.load(fp)
with open("itemdb.json", "r", encoding=("utf-8")) as fp:
players=json.load(fp)
error=list()
playerinventories=dict()
c=127480
while c<len(steamprofiler):
inventory=dict()
items=list()
try:
url=urllib.request.urlopen("http://api.steampowered.com/IEconItems_440/GetPlayerItems/v0001/?key=DD5180808208B830FCA60D0BDFD27E27&steamid="+steamprofiler[c]+"&format=json")
inv=json.loads(url.read().decode("utf-8"))
url.close()
except (urllib.error.HTTPError, urllib.error.URLError, socket.error, UnicodeDecodeError) as e:
c+=1
print("HTTP-error, continuing")
error.append(c)
continue
try:
for r in inv["result"]["items"]:
inventory[r["id"]]=r["quality"], r["defindex"]
except KeyError:
c+=1
error.append(c)
continue
for key in inventory:
try:
if index_to_quality[str(inventory[key][0])]=="":
items.append(
index_to_quality[str(inventory[key][0])]
+""+
index_to_name[str(inventory[key][1])]
)
else:
items.append(
index_to_quality[str(inventory[key][0])]
+" "+
index_to_name_no_the[str(inventory[key][1])]
)
except KeyError:
print("keyerror, uppdate def_to_index")
c+=1
error.append(c)
continue
playerinventories[int(steamprofiler[c])]=items
c+=1
if c % 10==0:
print(c, "inventories downloaded")
I hope my problem was clear, otherwise just say so obviously. I would optimally avoid using 3rd party libraries but if it's not possible it's not possible. Thanks in advance
So you're assuming the fetching of the URL might be the thing slowing your program down? You'd do well to check that assumption first, but if it's indeed the case using the multiprocessing module is a huge overkill: for I/O bound bottlenecks threading is quite a bit simpler and might even be a bit faster (it takes a lot more time to spawn another python interpreter than to spawn a thread).
Looking at your code, you might get away with sticking most of the content of your while loop in a function with c as a parameter, and starting a thread from there using another function, something like:
def process_item(c):
# The work goes here
# Replace al those 'continue' statements with 'return'
for c in range(127480, len(steamprofiler)):
thread = threading.Thread(name="inventory {0}".format(c), target=process_item, args=[c])
thread.start()
A real problem might be that there's no limit to the amount of threads being spawned, which might break the program. Also the guys at Steam might not be amused at getting hammered by your script, and they might decide to un-friend you.
A better approach would be to fill a collections.deque object with your list of c's and then start a limited set of threads to do the work:
def process_item(c):
# The work goes here
# Replace al those 'continue' statements with 'return'
def process():
while True:
process_item(work.popleft())
work = collections.deque(range(127480, len(steamprofiler)))
threads = [threading.Thread(name="worker {0}".format(n), target=process)
for n in range(6)]
for worker in threads:
worker.start()
Note that I'm counting on work.popleft() to throw an IndexError when we're out of work, which will kill the thread. That's a bit sneaky, so consider using a try...except instead.
Two more things:
Consider using the excellent Requests library instead of urllib (which, API-wise, is by far the worst module in the entire Python standard library that I've worked with).
For Requests, there's an add-on called grequests which allows you to do fully asynchronous HTTP-requests. That would have made for even simpler code.
I hope this helps, but please keep in mind this is all untested code.
The outermost while loop seems to be distributed over a few processes(or tasks).
When you break the loop into tasks, note that you are sharing playerinventories and error object between processes. You will need to use multiprocessing.Manager for the sharing issue.
I recommend you to start modifying your code from this snippet.

Categories