I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api
This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library
To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.
Related
I'm using an in-house Python library for scientific computing. I need to consecutively copy an object, modify it, and then delete it. The object is huge which causes my machine to run out of memory after a few cycles.
The first problem is that I use python's del to delete the object, which apparently only dereferences the object, rather than freeing up RAM.
The second problem is that even when I encapsulate the whole process in a function, after the function is invoked, the RAM is still not freed up. Here's a code snippet to better explain the issue.
ws = op.core.Workspace()
net = op.network.Cubic(shape=[100,100,100], spacing=1e-6)
proj = net.project
def f():
for i in range(5):
clone = ws.copy_project(proj)
result = do_something_with(clone)
del clone
f()
gc.collect()
>>> ws
{'sim_01': [<openpnm.network.Cubic object at 0x7fed1c417780>],
'sim_02': [<openpnm.network.Cubic object at 0x7fed1c417888>],
'sim_03': [<openpnm.network.Cubic object at 0x7fed1c417938>],
'sim_04': [<openpnm.network.Cubic object at 0x7fed1c417990>],
'sim_05': [<openpnm.network.Cubic object at 0x7fed1c4179e8>],
'sim_06': [<openpnm.network.Cubic object at 0x7fed1c417a40>]}
My question is how do I completely delete a Python object?
Thanks!
PS. In the code snippet, each time ws.copy_project is called, a copy of proj is stored in ws dictionary.
There are some really smart python people on here. They may be able to tell you better ways to keep your memory clear, but I have used leaky libraries before, and found one (so-far) foolproof way to guarantee that your memory gets cleared after use: execute the memory hog in another process.
To do this, you'd need to arrange for an easy way to make your long calculation be executable separately. I have done this by adding special flags to my existing python script that tells it just to run that function; you may find it easier to put that function in a separate .py file, e.g.:
do_something_with.py
import sys
def do_something_with(i)
# Your example is still too vague. Clearly, something differentiates
# each do_something_with, otherwise you're just taking the
# same inputs 5 times over.
# Whatever the difference is, pass it in as an argument to the function
ws = op.core.Workspace()
net = op.network.Cubic(shape=[100,100,100], spacing=1e-6)
proj = net.project
# You may not even need to clone anymore?
clone = ws.copy_project(proj)
result = do_something_with(clone)
# Whatever arg(s) you need to get to the function, just pass it in on the command line
if __name__ == "__main__":
sys.exit(do_something_with(sys.args[1:]))
You can do this using any of the python tools that handle subprocesses. In python 3.5+, the recommended way to do this is subprocess.run. You could change your bigger function to something like this:
import subprocess
invoke_do_something(i):
completed_args = subprocess.run(["python", "do_something_with.py", str(i)], check=False)
return completed_args.returncode
results = map(invoke_do_something, range(5))
You'll obviously need to tailor this to fit your own situation, but by running in a subprocess, you're guaranteed to not have to worry about the memory getting cleaned up. As an added bonus, you could potentially use multiprocess.Pool.map to use multiple processors at one time. (I deliberately coded this to use map to make such a transition simple. You could still use your for loop if you prefer, and then you don't need the invoke... function.) Multiprocessing could speed up your processing, but since you're already worried about memory, is almost certainly a bad idea - with multiple processes of the big memory hog, your system itself will likely quickly run out of memory and kill your process.
Your example is fairly vague, so I've written this at a high level. I can answer some questions if you need.
Your program just paused on a pdb.set_trace().
Is there a way to monkey patch the function that is currently running, and "resume" execution?
Is this possible through call frame manipulation?
Some context:
Oftentimes, I will have a complex function that processes large quantities of data, without having a priori knowledge of what kind of data I'll find:
def process_a_lot(data_stream):
#process a lot of stuff
#...
data_unit= data_stream.next()
if not can_process(data_unit)
import pdb; pdb.set_trace()
#continue processing
This convenient construction launches a interactive debugger when it encounters unknown data, so I can inspect it at will and change process_a_lot code to handle it properly.
The problem here is that, when data_stream is big, you don't really want to chew through all the data again (let's assume next is slow, so you can't save what you already have and skip on the next run)
Of course, you can replace other functions at will once in the debugger. You can also replace the function itself, but it won't change the current execution context.
Edit:
Since some people are getting side-tracked:
I know there are a lot of ways of structuring your code such that your processing function is separate from process_a_lot. I'm not really asking about ways to structure the code as much as how to recover (in runtime) from the situation when the code is not prepared to handle the replacement.
First a (prototype) solution, then some important caveats.
# process.py
import sys
import pdb
import handlers
def process_unit(data_unit):
global handlers
while True:
try:
data_type = type(data_unit)
handler = handlers.handler[data_type]
handler(data_unit)
return
except KeyError:
print "UNUSUAL DATA: {0!r}". format(data_unit)
print "\n--- INVOKING DEBUGGER ---\n"
pdb.set_trace()
print
print "--- RETURNING FROM DEBUGGER ---\n"
del sys.modules['handlers']
import handlers
print "retrying"
process_unit("this")
process_unit(100)
process_unit(1.04)
process_unit(200)
process_unit(1.05)
process_unit(300)
process_unit(4+3j)
sys.exit(0)
And:
# handlers.py
def handle_default(x):
print "handle_default: {0!r}". format(x)
handler = {
int: handle_default,
str: handle_default
}
In Python 2.7, this gives you a dictionary linking expected/known types to functions that handle each type. If no handler is available for a type, the user is dropped own into the debugger, giving them a chance to amend the handlers.py file with appropriate handlers. In the above example, there is no handler for float or complex values. When they come, the user will have to add appropriate handlers. For example, one might add:
def handle_float(x):
print "FIXED FLOAT {0!r}".format(x)
handler[float] = handle_float
And then:
def handle_complex(x):
print "FIXED COMPLEX {0!r}".format(x)
handler[complex] = handle_complex
Here's what that run would look like:
$ python process.py
handle_default: 'this'
handle_default: 100
UNUSUAL DATA: 1.04
--- INVOKING DEBUGGER ---
> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue
--- RETURNING FROM DEBUGGER ---
retrying
FIXED FLOAT 1.04
handle_default: 200
FIXED FLOAT 1.05
handle_default: 300
UNUSUAL DATA: (4+3j)
--- INVOKING DEBUGGER ---
> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue
--- RETURNING FROM DEBUGGER ---
retrying
FIXED COMPLEX (4+3j)
Okay, so that basically works. You can improve and tweak that into a more production-ready form, making it compatible across Python 2 and 3, et cetera.
Please think long and hard before you do it that way.
This "modify the code in real-time" approach is an incredibly fragile pattern and error-prone approach. It encourages you to make real-time hot fixes in the nick of time. Those fixes will probably not have good or sufficient testing. Almost by definition, you have just this moment discovered you're dealing with a new type T. You don't yet know much about T, why it occurred, what its edge cases and failure modes might be, etc. And if your "fix" code or hot patches don't work, what then? Sure, you can put in some more exception handling, catch more classes of exceptions, and possibly continue.
Web frameworks like Flask have debug modes that work basically this way. But those are debug modes, and generally not suited for production. Moreover, what if you type the wrong command in the debugger? Accidentally type "quit" rather than "continue" and the whole program ends, and with it, your desire to keep the processing alive. If this is for use in debugging (exploring new kinds of data streams, maybe), have at.
If this is for production use, consider instead a strategy that sets aside unhandled-types for asynchronous, out-of-band examination and correction, rather than one that puts the developer / operator in the middle of a real-time processing flow.
No.
You can't moneky-patch a currently running Python function and continue pressing as though nothing else had happened. At least not in any general or practical way.
In theory, it is possible--but only under limited circumstances, with much effort and wizardly skill. It cannot be done with any generality.
To make the attempt, you'd have to:
Find the relevant function source and edit it (straightforward)
Compile the changed function source to bytecode (straightforward)
Insert the new bytecode in place of the old (doable)
Alter the function housekeeping data to point at the "logically" "same point" in the program where it exited to pdb. (iffy, under some conditions)
"Continue" from the debugger, falling back into the debugged code (iffy)
There are some circumstances where you might achieve 4 and 5, if you knew a lot about the function housekeeping and analogous debugger housekeeping variables. But consider:
The bytecode offset at which your pdb breakpoint is called (f_lasti in the frame object) might change. You'd probably have to narrow your goal to "alter only code further down in the function's source code than the breakpoint occurred" to keep things reasonably simple--else, you'd have to be able to compute where the breakpoint is in the newly-compiled bytecode. That might be feasible, but again under restrictions (such as "will only call pdb_trace() once, or similar "leave breadcrumbs for post-breakpoint analysis" stipulations).
You're going to have to be sharp at patching up function, frame, and code objects. Pay special attention to func_code in the function (__code__ if you're also supporting Python 3); f_lasti, f_lineno, and f_code in the frame; and co_code, co_lnotab, and co_stacksize in the code.
For the love of God, hopefully you do not intend to change the function's parameters, name, or other macro defining characteristics. That would at least treble the amount of housekeeping required.
More troubling, adding new local variables (a pretty common thing you'd want to do to alter program behavior) is very, very iffy. It would affect f_locals, co_nlocals, and co_stacksize--and quite possibly, completely rearrange the order and way bytecode accesses values. You might be able to minimize this by adding assignment statements like x = None to all your original locals. But depending on how the bytecodes change, it's possible you'll even have to hot-patch the Python stack, which cannot be done from Python per se. So C/Cython extensions could be required there.
Here's a very simple example showing that bytecode ordering and arguments can change significantly even for small alterations of very simple functions:
def a(x): LOAD_FAST 0 (x)
y = x + 1 LOAD_CONST 1 (1)
return y BINARY_ADD
STORE_FAST 1 (y)
LOAD_FAST 1 (y)
RETURN_VALUE
------------------ ------------------
def a2(x): LOAD_CONST 1 (2)
inc = 2 STORE_FAST 1 (inc)
y = x + inc LOAD_FAST 0 (x)
return y LOAD_FAST 1 (inc)
BINARY_ADD
STORE_FAST 2 (y)
LOAD_FAST 2 (y)
RETURN_VALUE
Be equally sharp at patching some of the pdb values that track where it's debugging, because when you type "continue," those are what dictates where control flow goes next.
Limit your patchable functions to those that have rather static state. They must, for example, never have objects that might be garbage-collected before the breakpoint is resumed, but accessed after it (e.g. in your new code). E.g.:
some = SomeObject()
# blah blah including last touch of `some`
# ...
pdb.set_trace()
# Look, Ma! I'm monkey-patching!
if some.some_property:
# oops, `some` was GC'd - DIE DIE DIE
While "ensuring the execution environment for the patched function is same as it ever was" is potentially problematic for many values, it's guaranteed to crash and burn if any of them exit their normal dynamic scope and are garbage-collected before patching alters their dynamic scope/lifetime.
Assert you only ever want to run this on CPython, since PyPy, Jython, and other Python implementations don't even have standard Python bytecodes and do their function, code, and frame housekeeping differently.
I would love to say this super-dynamic patching is possible. And I'm sure you can, with a lot of housekeeping object twiddling, construct simple cases where it does work. But real code has objects that go out of scope. Real patches might want new variables allocated. Etc. Real world conditions vastly multiply the effort required to make the patching work--and in some cases, make that patching strictly impossible.
And at the end of the day, what have you achieved? A very brittle, fragile, unsafe way to extend your processing of a data stream. There is a reason most monkey-patching is done at function boundaries, and even then, reserved for a few very-high-value use cases. Production data streaming is better served adopting a strategy that sets aside unrecognized values for out-of-band examination and accommodation.
If I understand correctly:
you don't want to repeat all the work that has already been done
you need a way to replace the #continue processing as usual with the new code once you have figured out how to handle the new data
#user2357112 was on the right track: expected_types should be a dictionary of
data_type:(detect_function, handler_function)
and detect_type needs to go through that to find a match. If no match is found, pdb pops up, you can then figure out what's going on, write a new detect_function and handler_funcion, add them to expected_types, and continue from pdb.
What I wanted to know is if there's a way to monkey patch the function that is currently running (process_a_lot), and "resume" execution.
So you want to somehow, from within pdb, write a new process_a_lot function, and then transfer control to it at the location of the pdb call?
Or, do you want to rewrite the function outside pdb, and then somehow reload that function from the .py file and transfer control into the middle of the function at the location of the pdb call?
The only possibility I can think of is: from within pdb, import your newly written function, then replace the current process_a_lot byte-code with the byte-code from the new function (I think it's func.co_code or something). Make sure you change nothing in the new function (not even the pdb lines) before the pdb lines, and it might work.
But even if it does, I would imagine it is a very brittle solution.
Background:
I am currently writing a process monitoring tool (Windows and Linux) in Python and implementing unit test coverage. The process monitor hooks into the Windows API function EnumProcesses on Windows and monitors the /proc directory on Linux to find current processes. The process names and process IDs are then written to a log which is accessible to the unit tests.
Question:
When I unit test the monitoring behavior I need a process to start and terminate. I would love if there would be a (cross-platform?) way to start and terminate a fake system process that I could uniquely name (and track its creation in a unit test).
Initial ideas:
I could use subprocess.Popen() to open any system process but this runs into some issues. The unit tests could falsely pass if the process I'm using to test is run by the system as well. Also, the unit tests are run from the command line and any Linux process I can think of suspends the terminal (nano, etc.).
I could start a process and track it by its process ID but I'm not exactly sure how to do this without suspending the terminal.
These are just thoughts and observations from initial testing and I would love it if someone could prove me wrong on either of these points.
I am using Python 2.6.6.
Edit:
Get all Linux process IDs:
try:
processDirectories = os.listdir(self.PROCESS_DIRECTORY)
except IOError:
return []
return [pid for pid in processDirectories if pid.isdigit()]
Get all Windows process IDs:
import ctypes, ctypes.wintypes
Psapi = ctypes.WinDLL('Psapi.dll')
EnumProcesses = self.Psapi.EnumProcesses
EnumProcesses.restype = ctypes.wintypes.BOOL
count = 50
while True:
# Build arguments to EnumProcesses
processIds = (ctypes.wintypes.DWORD*count)()
size = ctypes.sizeof(processIds)
bytes_returned = ctypes.wintypes.DWORD()
# Call enum processes to find all processes
if self.EnumProcesses(ctypes.byref(processIds), size, ctypes.byref(bytes_returned)):
if bytes_returned.value < size:
return processIds
else:
# We weren't able to get all the processes so double our size and try again
count *= 2
else:
print "EnumProcesses failed"
sys.exit()
Windows code is from here
edit: this answer is getting long :), but some of my original answer still applies, so I leave it in :)
Your code is not so different from my original answer. Some of my ideas still apply.
When you are writing Unit Test, you want to only test your logic. When you use code that interacts with the operating system, you usually want to mock that part out. The reason being that you don't have much control over the output of those libraries, as you found out. So it's easier to mock those calls.
In this case, there are two libraries that are interacting with the sytem: os.listdir and EnumProcesses. Since you didn't write them, we can easily fake them to return what we need. Which in this case is a list.
But wait, in your comment you mentioned:
"The issue I'm having with it however is that it really doesn't test
that my code is seeing new processes on the system but rather that the
code is correctly monitoring new items in a list."
The thing is, we don't need to test the code that actually monitors the processes on the system, because it's a third party code. What we need to test is that your code logic handles the returned processes. Because that's the code you wrote. The reason why we are testing over a list, is because that's what your logic is doing. os.listir and EniumProcesses return a list of pids (numeric strings and integers, respectively) and your code acts on that list.
I'm assuming your code is inside a Class (you are using self in your code). I'm also assuming that they are isolated inside their own methods (you are using return). So this will be sort of what I suggested originally, except with actual code :) Idk if they are in the same class or different classes, but it doesn't really matter.
Linux method
Now, testing your Linux process function is not that difficult. You can patch os.listdir to return a list of pids.
def getLinuxProcess(self):
try:
processDirectories = os.listdir(self.PROCESS_DIRECTORY)
except IOError:
return []
return [pid for pid in processDirectories if pid.isdigit()]
Now for the test.
import unittest
from fudge import patched_context
import os
import LinuxProcessClass # class that contains getLinuxProcess method
def test_LinuxProcess(self):
"""Test the logic of our getLinuxProcess.
We patch os.listdir and return our own list, because os.listdir
returns a list. We do this so that we can control the output
(we test *our* logic, not a built-in library's functionality).
"""
# Test we can parse our pdis
fakeProcessIds = ['1', '2', '3']
with patched_context(os, 'listdir', lamba x: fakeProcessIds):
myClass = LinuxProcessClass()
....
result = myClass.getLinuxProcess()
expected = [1, 2, 3]
self.assertEqual(result, expected)
# Test we can handle IOERROR
with patched_context(os, 'listdir', lamba x: raise IOError):
myClass = LinuxProcessClass()
....
result = myClass.getLinuxProcess()
expected = []
self.assertEqual(result, expected)
# Test we only get pids
fakeProcessIds = ['1', '2', '3', 'do', 'not', 'parse']
.....
Windows method
Testing your Window's method is a little trickier. What I would do is the following:
def prepareWindowsObjects(self):
"""Create and set up objects needed to get the windows process"
...
Psapi = ctypes.WinDLL('Psapi.dll')
EnumProcesses = self.Psapi.EnumProcesses
EnumProcesses.restype = ctypes.wintypes.BOOL
self.EnumProcessses = EnumProcess
...
def getWindowsProcess(self):
count = 50
while True:
.... # Build arguments to EnumProcesses and call enun process
if self.EnumProcesses(ctypes.byref(processIds),...
..
else:
return []
I separated the code into two methods to make it easier to read (I believe you are already doing this). Here is the tricky part, EnumProcesses is using pointers and they are not easy to play with. Another thing is, that I don't know how to work with pointers in Python, so I couldn't tell you of an easy way to mock that out =P
What I can tell you is to simply not test it. Your logic there is very minimal. Besides increasing the size of count, everything else in that function is creating the space EnumProcesses pointers will use. Maybe you can add a limit to the count size but other than that, this method is short and sweet. It returns the windows processes and nothing more. Just what I was asking for in my original comment :)
So leave that method alone. Don't test it. Make sure though, that anything that uses getWindowsProcess and getLinuxProcess get's mocked out as per my original suggestion.
Hopefully this makes more sense :) If it doesn't let me know and maybe we can have a chat session or do a video call or something.
original answer
I'm not exactly sure how to do what you are asking, but whenever I need to test code that depends on some outside force (external libraries, popen or in this case processes) I mock out those parts.
Now, I don't know how your code is structured, but maybe you can do something like this:
def getWindowsProcesses(self, ...):
'''Call Windows API function EnumProcesses and
return the list of processes
'''
# ... call EnumProcesses ...
return listOfProcesses
def getLinuxProcesses(self, ...):
'''Look in /proc dir and return list of processes'''
# ... look in /proc ...
return listOfProcessses
These two methods only do one thing, get the list of processes. For Windows, it might just be a call to that API and for Linux just reading the /proc dir. That's all, nothing more. The logic for handling the processes will go somewhere else. This makes these methods extremely easy to mock out since their implementations are just API calls that return a list.
Your code can then easy call them:
def getProcesses(...):
'''Get the processes running.'''
isLinux = # ... logic for determining OS ...
if isLinux:
processes = getLinuxProcesses(...)
else:
processes = getWindowsProcesses(...)
# ... do something with processes, write to log file, etc ...
In your test, you can then use a mocking library such as Fudge. You mock out these two methods to return what you expect them to return.
This way you'll be testing your logic since you can control what the result will be.
from fudge import patched_context
...
def test_getProcesses(self, ...):
monitor = MonitorTool(..)
# Patch the method that gets the processes. Whenever it gets called, return
# our predetermined list.
originalProcesses = [....pids...]
with patched_context(monitor, "getLinuxProcesses", lamba x: originalProcesses):
monitor.getProcesses()
# ... assert logic is right ...
# Let's "add" some new processes and test that our logic realizes new
# processes were added.
newProcesses = [...]
updatedProcesses = originalProcessses + (newProcesses)
with patched_context(monitor, "getLinuxProcesses", lamba x: updatedProcesses):
monitor.getProcesses()
# ... assert logic caught new processes ...
# Let's "kill" our new processes and test that our logic can handle it
with patched_context(monitor, "getLinuxProcesses", lamba x: originalProcesses):
monitor.getProcesses()
# ... assert logic caught processes were 'killed' ...
Keep in mind that if you test your code this way, you won't get 100% code coverage (since your mocked methods won't be run), but this is fine. You're testing your code and not third party's, which is what matters.
Hopefully this might be able to help you. I know it doesn't answer your question, but maybe you can use this to figure out the best way to test your code.
Your original idea of using subprocess is a good one. Just create your own executable and name it something that identifies it as a testing thing. Maybe make it do something like sleep for a while.
Alternately, you could actually use the multiprocessing module. I've not used python in windows much, but you should be able to get process identifying data out of the Process object you create:
p = multiprocessing.Process(target=time.sleep, args=(30,))
p.start()
pid = p.getpid()
I've made a script to get inventory data from the Steam API and I'm a bit unsatisfied with the speed. So I read a bit about multiprocessing in python and simply cannot wrap my head around it. The program works as such: it gets the SteamID from a list, gets the inventory and then appends the SteamID and the inventory in a dictionary with the ID as the key and inventory contents as the value.
I've also understood that there are some issues involved with using a counter when multiprocessing, which is a small problem as I'd like to be able to resume the program from the last fetched inventory rather than from the beginning again.
Anyway, what I'm asking for is really a concrete example of how to do multiprocessing when opening the URL that contains the inventory data so that the program can fetch more than one inventory at a time rather than just one.
onto the code:
with open("index_to_name.json", "r", encoding=("utf-8")) as fp:
index_to_name=json.load(fp)
with open("index_to_quality.json", "r", encoding=("utf-8")) as fp:
index_to_quality=json.load(fp)
with open("index_to_name_no_the.json", "r", encoding=("utf-8")) as fp:
index_to_name_no_the=json.load(fp)
with open("steamprofiler.json", "r", encoding=("utf-8")) as fp:
steamprofiler=json.load(fp)
with open("itemdb.json", "r", encoding=("utf-8")) as fp:
players=json.load(fp)
error=list()
playerinventories=dict()
c=127480
while c<len(steamprofiler):
inventory=dict()
items=list()
try:
url=urllib.request.urlopen("http://api.steampowered.com/IEconItems_440/GetPlayerItems/v0001/?key=DD5180808208B830FCA60D0BDFD27E27&steamid="+steamprofiler[c]+"&format=json")
inv=json.loads(url.read().decode("utf-8"))
url.close()
except (urllib.error.HTTPError, urllib.error.URLError, socket.error, UnicodeDecodeError) as e:
c+=1
print("HTTP-error, continuing")
error.append(c)
continue
try:
for r in inv["result"]["items"]:
inventory[r["id"]]=r["quality"], r["defindex"]
except KeyError:
c+=1
error.append(c)
continue
for key in inventory:
try:
if index_to_quality[str(inventory[key][0])]=="":
items.append(
index_to_quality[str(inventory[key][0])]
+""+
index_to_name[str(inventory[key][1])]
)
else:
items.append(
index_to_quality[str(inventory[key][0])]
+" "+
index_to_name_no_the[str(inventory[key][1])]
)
except KeyError:
print("keyerror, uppdate def_to_index")
c+=1
error.append(c)
continue
playerinventories[int(steamprofiler[c])]=items
c+=1
if c % 10==0:
print(c, "inventories downloaded")
I hope my problem was clear, otherwise just say so obviously. I would optimally avoid using 3rd party libraries but if it's not possible it's not possible. Thanks in advance
So you're assuming the fetching of the URL might be the thing slowing your program down? You'd do well to check that assumption first, but if it's indeed the case using the multiprocessing module is a huge overkill: for I/O bound bottlenecks threading is quite a bit simpler and might even be a bit faster (it takes a lot more time to spawn another python interpreter than to spawn a thread).
Looking at your code, you might get away with sticking most of the content of your while loop in a function with c as a parameter, and starting a thread from there using another function, something like:
def process_item(c):
# The work goes here
# Replace al those 'continue' statements with 'return'
for c in range(127480, len(steamprofiler)):
thread = threading.Thread(name="inventory {0}".format(c), target=process_item, args=[c])
thread.start()
A real problem might be that there's no limit to the amount of threads being spawned, which might break the program. Also the guys at Steam might not be amused at getting hammered by your script, and they might decide to un-friend you.
A better approach would be to fill a collections.deque object with your list of c's and then start a limited set of threads to do the work:
def process_item(c):
# The work goes here
# Replace al those 'continue' statements with 'return'
def process():
while True:
process_item(work.popleft())
work = collections.deque(range(127480, len(steamprofiler)))
threads = [threading.Thread(name="worker {0}".format(n), target=process)
for n in range(6)]
for worker in threads:
worker.start()
Note that I'm counting on work.popleft() to throw an IndexError when we're out of work, which will kill the thread. That's a bit sneaky, so consider using a try...except instead.
Two more things:
Consider using the excellent Requests library instead of urllib (which, API-wise, is by far the worst module in the entire Python standard library that I've worked with).
For Requests, there's an add-on called grequests which allows you to do fully asynchronous HTTP-requests. That would have made for even simpler code.
I hope this helps, but please keep in mind this is all untested code.
The outermost while loop seems to be distributed over a few processes(or tasks).
When you break the loop into tasks, note that you are sharing playerinventories and error object between processes. You will need to use multiprocessing.Manager for the sharing issue.
I recommend you to start modifying your code from this snippet.
I defined an handler for EVT_IDLE that does a certain background task for me. (That task is to take completed work from a few processes and integrate it into some object, making a visible change in the GUI.)
The problem is that when the user is not moving the mouse or doing anything, EVT_IDLE doesn't get called more than once. I would like this handler to be working all the time. So I tried calling event.RequestMore() at the end of the handler. Works, but now it takes a whole lot of CPU. (I'm guessing it's just looping excessively on that task.)
I'm willing to limit the number of times the task will be carried out per second; How do I do that?
Or do you have another solution in mind?
Something like this (executes at most every second):
...
def On_Idle(self, event):
if not self.queued_batch:
wx.CallLater(1000, self.Do_Batch)
self.queued_batch = True
def Do_Batch(self):
# <- insert your stuff here
self.queued_batch = False
...
Oh, and don't forget to set self.queued_batch to False in the constructor and maybe call event.RequestMore() in some way in On_Idle.
This sounds like a use case for wxTimerEvent instead of wxIdleEvent. When there is processing to do call wxTimerEvent.Start(). When there isn't any to do, call wxTimerEvent.Stop() and call your methods to do processing from EVT_TIMER.
(note: i use from wxWidghets for C++ and am not familiar with wxPython but I assume they have a similar API)