So I am an inexperienced Python coder, with what I have gathered might be a rather complicated need. I am a cognitive scientist and I need precise stimulus display and button press detection. I have been told that the best way to do this is by using real-time operating, but have no idea how to go about this. Ideally, with each trial, the program would operate in real-time, and then once the trial is over, the OS can go back to not being as meticulous. There would be around 56 trials. Might there be a way to code this from my python script?
(Then again, all I need to know is when a stimulus is actually displayed. The real-time method would assure me that the stimulus is displayed when I want it to be, a top-down approach. On the other hand, I could take a more bottom-up approach if it is easier to just know to record when the computer actually got a chance to display it.)
When people talk about real-time computing, what they mean is that the latency from an interrupt (most commonly set off by a timer) to application code handling that interrupt being run, is both small and predictable. This then means that a control process can be run repeatedly at very precise time intervals or, as in your case, external events can be timed very precisely. The variation in latency is usually called "jitter" - 1ms maximum jitter means that an interrupt arriving repeatedly will have a response latency that varies by at most 1ms.
"Small" and "predictable" are both relative terms and when people talk about real-time performance they might mean 1μs maximum jitter (people building inverters for power transmission care about this sort of performance, for instance) or they might mean a couple of milliseconds maximum jitter. It all depends on the requirements of the application.
At any rate, Python is not likely to be the right tool for this job, for a few reasons:
Python runs mostly on desktop operating systems. Desktop operating systems impose a lower limit on the maximum jitter; in the case of Windows, it is several seconds. Multiple-second events don't happen very often, every day or two, and you'd be unlucky to have one coincide with the thing you're trying to measure, but sooner or later it will happen; jitter in the several-hundred-milliseconds region happens more often, perhaps every hour, and jitter in the tens-of-milliseconds region is fairly frequent. The numbers for desktop Linux are probably similar, though you can apply different compile-time options and patch sets to the Linux kernel to improve the situation - Google PREEMPT_RT_FULL.
Python's stop-the-world garbage collector makes latency non-deterministic. When Python decides it needs to run the garbage collector, your program gets stopped until it finishes. You may be able to avoid this through careful memory management and carefully setting the garbage collector parameters, but depending on what libraries you are using, you may not, too.
Other features of Python's memory management make deterministic latency difficult. Most real-time systems avoid heap allocation (ie C's malloc or C++'s new) because the amount of time they take is not predictable. Python neatly hides this from you, making it very difficult to control latency. Again, using lots of those nice off-the-shelf libraries only makes the situation worse.
In the same vein, it is essential that real-time processes have all their memory kept in physical RAM and not paged out to swap. There is no good way of controlling this in Python, especially running on Windows (on Linux you might be able to fit a call to mlockall in somewhere, but any new allocation will upset things).
I have a more basic question though. You don't say whether your button is a physical button or one on the screen. If it's one on the screen, the operating system will impose an unpredictable amount of latency between the physical mouse button press and the event arriving at your Python application. How will you account for this? Without a more accurate way of measuring it, how will you even know whether it is there?
Python is not, by purist's standards, a real-time language- it has too many libraries and functions to be bare-bones fast. If you're already going through an OS though, as opposed to an embedded system, you've already lost a lot of true real time capability. (When I hear "real time" I think the time it takes VHDL code to flow through the wires of an FPGA. Other people use it to mean "I hit a button and it does something that is, from my slow human perspective, instantaneous". I'll assume you're using the latter interpretation of real time.)
By stimulus display and button press detection I assume you mean you have something (for example) like a trial where you show a person an image and have them click a button to identify the image or confirm that they've seen it- perhaps to test reaction speed. Unless you're worried about accuracy down to the millisecond (which should be negligible compared to the time for a human reaction) you would be able to do a test like this using python. To work on the GUI, look into Tkinter: http://www.pythonware.com/library/tkinter/introduction/. To work on the timing between stimulus and a button press, look at the time docs: http://docs.python.org/library/time.html
Good luck!
Because you are trying to get a scientific measurement on a time delay in millisecond precision, I cannot recommend any process that is subject to time slicing on a general purpose computer. Whether implemented in C, or Java, or Python, if it runs in a time-shared mode, then how can the result be verifiable? You could be challenged to prove that the CPU never interrupted the process during a measurement, thereby distorting the results.
It sounds like you may need to construct a dedicated device for this purpose, with a clock circuit that ticks at a known rate and can measure the discrete number of ticks that occur between stimulus and response. That device can then be controlled by software that has no such timing constraints. Maybe you should post this question to the Electrical Engineering exchange.
Without a dedicated device, you will have to develop truly real-time software that, it terms of modern operating systems, runs within the kernel and is not subject to task switching. This is not easy to do, and it takes a lot of effort to get it right. More time, I would guess, than you would spend building a dedicated software-controllable device for your purpose.
Most common operating systems' interrupts are variable enough to ruin timing in your experiment regardless of your programming language. Python adds it's own unreliability. Windows interrupts are especially bad. In Windows, most interrupts are serviced in about 4 milliseconds, but occasionally an interrupts last longer than 35 milliseconds! (Windows 7).
I would recommend trying the PsycoPy application to see if will work for you. It approaches the problem by trying to make the graphics card do the work in openGL, however some of it's code still runs outside the graphics card and is subject to the operating system's interrupts. Your existing python code may not be compatible with PsycoPy, but at least you would stay in Python. PsycoPy is especially good at showing visual stimulations without timing issues. See this page in their documentation to see how you would handle a button press: http://www.psychopy.org/api/event.html
To solve your problem the right way, you need a real-time operating system, such as LinuxRT or QNX. You could try your python application in one of those to see if running python in a real-time environment is good enough, but even python introduces variability. If python decides to garbage collect, you will have a glitch. Python itself isn't real time.
National Instruments sells a setup that allows you to program in real-time in a very easy-to-use programming language called LabviewRT. LabviewRT pushing your code into an FPGA daughter card that operates in real time. It's expensive.
I strongly suggest you don't just minimize this problem, but solve it, otherwise, your reviewers will be uncomfortable.
If you are running the Python code on Linux machine, make the kernel low latency (preemptive).
There is a flag for it when you compile the kernel.
Make sure that other processes running on the machine are minimum so they do not interrupt the kernel.
Assign higher task priority to your Python script.
Run the python interpreter on a real time operating system or tweaked linux.
Shut down the garbage collector during the experiments and back on afterward.
Maybe actively trigger a garbage collection round after the end of an experiment.
Additionally, keep in mind that showing an image is not instantaneous. You must synchronize your experiment with your monitor's vertical retrace phase (the pause between transmitting the last line of a frame of the display's content and the first line of the next frame).
I would start the timer at the beginning of the vsync phase after transmission of the frame containing whatever candidates are supposed to react to.
And one would have to keep in mind that the image is going to be ast least partially visible a bit earlier than that for purposes of getting absolute reaction times as opppsed to just well comparable results with ~ half a frame of offset due to the non-instantaneous appearance of the monitor's contents (~10 ms # 60Hz).
Related
In Python, while I was testing a bruteforce script I saw that not printing something like Trying Password: *password* with every attempt significantly decreases the time it takes in order to find the password. I just let it run on a blank screen but if I put something as simple as a loading animation (Running . . .)in the beginning to let me know it's working fine, would that slow down my program too?
(Excuse me if any of what I said was hard to understand. I'm confused as well)
When attempting a bruteforce, it's best to have as much processing power available. A constant call from Python to update the screen (with a loading status, in this case) takes up some processing power and would indeed slow down the bruteforce.
By how much it slows down depends on how your script is written and the hardware it's running on. Better hardware - faster. Better threading for the script - faster. You might be able to avoid a noticeable impact if you offload the "animation" to a thread which isn't fully utilized (if your script leaves any such threads in the first place).
Though unless you are on a very slow PC, the main slow down probably doesn't come from the CPU, but from the data bus. Sending information between components at a very rapid pace could cause a bottleneck. So if your script waits for that bottleneck to pass before it continues cycling passwords - it gets slowed down. Try to separate the "loading" status from the rest of the logic, so that the CPU can keep cycling passwords without waiting for each screen refresh to pass.
I hope this helped.
I/O bound operations like printing are very slow compared to CPU bound ops like calculations.
So, everytime you printed, trying password, your program could have tried 1000 more combinations.
But if you want to print once in the beginning, it wont slow down, printing repetitively will.
I am working on a monitor with python to detect new products on websites. I would like to run multiple instances to increase the chance of detecting something.
My goal is to decrease the workload for the CPU to be able to run more instances. A function like time.sleep is not a real option since it decreases the likelihood to detect a product.
I have already brought the script down to a minimum. Are there any other options to minimize the workload?
It's hard to answer without more details on how the monitor works and what is consuming the CPU. I would suggest you use a profiling tool (There is one in PyCharm for example) that will monitor your function calls and the time they took. This way you can focus on what takes too long or consumes to much resources and improve those parts accordingly.
I'm wanting to take photos from 2 different cameras at exactly the same time (or as close as possible).
If I use multithreading or multiprocessing, it still runs the threads/processes consecutively.. For instance if I start the following processes:
Take_photo_1.start()
Take_photo_2.start()
While those processes would run in parallel, the commands to start the processes are still executed sequentially. Is there any way to execute both those processes at exactly the same time?
There's no way to make this exact even if you're writing directly in machine code. Even if you have all the threads wait on a kernel barrier, that wait can take different times on different cores, and there are opcodes to process between the barrier wait and the camera get that have to get fetched and run on a system where the caches may be in different states, and there's nothing stopping the OS from stealing the CPU from one of the threads to run some completely unrelated code, and the I/O to the camera (even if it isn't serialized, which it may be) probably isn't a guaranteed static time, and so on.
When you throw an interpreted language on top of it (especially one with a GIL, like Python, which means the bytecodes between the barrier wait and the camera get can't be run in parallel)… well, you're not really changing anything; "impossible * 7" is still "impossible". But you are making it even more obvious.
Fortunately, very few real-life problems have a true hard real-time requirement like that. Instead, you have a requirement like "99.9% of the time, all camera gets should happen within +/-4ms of the desired exact 30fps". Or, maybe, "90% of the time it's within +/-1ms, 99.9% of the time it's within +/-4ms, 99.999% of the time it's within +/-20ms, as long as you don't do anything stupid like change the wall-power state of the laptop while running the code".
Or… well, only you know why you wanted "exact", and can figure out what the actual requirements are that would satisfy you.
And for that case, often the simplest thing to do is write the code the obvious way, stress test the hell out of it, see if it meets your requirements, and figure out how to optimize things only if it doesn't.
So, your existing code may well be fine.
If not, adding a shared barrier = threading.Barrier() and doing a barrier.wait() right before the camera.get() may be all you need.
You may need to add logic to detect timer lag and re-synchronize (which you might do independently in each thread, or have whichever thread gets there first compute it and just make everyone else wait at the barrier).
You may need to rewrite the core loop in C. Or dump whichever OS you're using for one with better real-time guarantees like QNX. Or throw out the OS entirely so there's no scheduler to get in the way. Or throw out the complex superscalar CPUs and implement the whole thing as a hardware state machine. Or…
But, assuming you have reasonable requirements in the first place, you usually don't have to go very far.
I am currently working on some simulation code written in C, which runs on different remote machines. While the C part is finished I want to simplify my work by extending it with a python simulation api and some kind of a job-queue system, which should do the following:
1.specifiy a set of parameters on which simulations should be performed and put them into a queue on a host computer
2.perform simulation on remote machines by workers
3.return results to host computer
I had a look at different frameworks for accomplishing this task and my first choice goes down to IPython.parallel. I had a look at the documentation and from what I tested out it seems pretty easy to use. My approach would be to use a load balanced view like explained at
http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#creating-a-loadbalancedview-instance
But what I dont see is:
what happens i.e. if the ipcontroller crashes, is my job queue gone?
what happens if a remote machine crashes? is there some kind of error handling?
Since I run relatively long simulations (1-2 weeks) I don't want my simulations to fail if some part of the system crashes. So is there maybe some way to handle this in IPython.parallel?
My Second approach would be to use pyzmq and implement the jobsystem from scratch.
In this case what would be the best zmq-pattern for this situation?
And last but not least, is there maybe a better framework for this scenario?
What lies behind the curtain is a bit more complex view on how to arrange the work-package flow alongside the ( parallelised ) number-crunching pipeline(s).
Being the work-package of a many CPU-core-week(s),
or
being the lumpsum volume of the job above a few hundred-of-thousands of CPU-core-hours, the principles are similar and follow a common sense.
Key Features
scaleability of the computing performance of all resources involved ( ideally a linear one )
ease of task submission role
fault-resilience of submitted task(s) ( ideally with an automated self-healing )
feasible TCO cost of access to / use of a sufficient pool of resources ( upfront co$ts, recurring co$ts, adaptation$ co$ts, co$ts of $peed )
Approaches to Solution
home-brew architecture for a distributed massively parallel scheduler based self-healing computation engine
re-use of available grid-based computing resources
Based on own experience to solve a need for repetitive runs of numerical intensive optimisation problem over a vast parameterSetVectorSPACE ( which could not be de-composed into any trivialised GPU parallelisation scheme ), selection of the second approach has been validated to be more productive rather than an attempt to burn dozens of man*years in just-another-trial to re-invent a wheel.
Being in academia environment, one may get a way easier to an acceptable access to resources-pool(s) for processing the work-packages, while commercial entities may acquire the same, based on their acceptable budgeting tresholds.
My gut instinct is to suggest rolling your own solution for this, because like you said otherwise you're depending on IPython not crashing.
I would run a simple python service on each node which listens for run commands. When it receives one it launches your C program. However, I suggest you ensure the C program is a true Unix daemon, so when it runs it completely disconnects itself from python. That way if your node python instance crashes you can still get data if the C program executes successfully. Have the C program write the output data to a file or database, and when the task is finished write "finished" to a "status" or something similar. The python service should monitor that file and when finished is indicated it should retrieve the data and send it back to the server.
The central idea of this design is to have as few possible points of failure as possible. As long as the C program doesn't crash, you can still get the data one way or another. As far as handling system crashes, network disconnects, etc, that's up to you.
I have a port scanning application that uses work queues and threads.
It uses simple TCP connections and spends a lot of time waiting for packets to come back (up to half a second). Thus the threads don't need to fully execute (i.e. first half sends a packet, context switch, does stuff, comes back to thread which has network data waiting for it).
I suspect I can improve performance by modifying the sys.setcheckinterval from the default of 100 (which lets up to 100 bytecodes execute before switching to another thread).
But without knowing how many bytecodes are actually executing in a thread or function I'm flying blind and simply guessing values, testing and relying on the testing shows a measurable difference (which is difficult since the amount of code being executed is minimal; a simple socket connection, thus network jitter will likely affect any measurements more than changing sys.setcheckinterval).
Thus I would like to find out how many bytecodes are in certain code executions (i.e. total for a function or in execution of a thread) so I can make more intelligent guesses at what to set sys.setcheckinterval to.
For higher level (method, class) wise, dis module should help.
But if one needs finer grain, tracing will be unavoidable. Tracing does operate line by line basis but explained here is a great hack to dive deeper at the bytecode level. Hats off to Ned Batchelder.
Reasoning about a system of this complexity will rarely produce the right answer. Measure the results, and use the setting that runs the fastest. If as you say, testing can't measure the difference in various settings of setcheckinterval, then why bother changing it? Only measurable differences are interesting. If your test run is too short to provide meaningful data, then make the run longer until it does.
" I suspect I can improve performance by modifying the sys.setcheckinterval"
This rarely works. Correct behavior can't depend on timing -- you can't control timing. Slight changes on OS, hardware, patch level of Python or phase of the moon will change how your application behaves.
The select module is what you use to wait for I/O's. Your application can be structured as a main loop that does the select and queues up work for other threads. The other threads are waiting for an entries in their queue of requests to process.