Infinite While Loop - Memory Use? - python

I am trying to figure out how a while loop determines how much memory to use.
At the basic level:
while True:
pass
If I did a similar thing in PHP, it would grind my localhost to a crawl. But in this case I would expect it to use next to nothing.
So for an infinite loop in python, should I expect that it would use up tons of memory, or does it scale according to what is being done, and where?

Your infinite loop is a no-op (it doesn't do anything), so it won't increase the memory use beyond what is being used by the rest of your program. To answer your question, you need to post the code that you suspect is causing memory problems.
In PHP however, the same loop will "hang" because the web server is expecting a response to send back to the client. Since no response is being received, the web browser will simply "freeze". Depending on how the web server is configured, it may choose to end the process an issue a timeout error.
You could do the same if you used Python and a web framework and put an infinite loop in one of your methods that returns a response to the client.
If you ran the equivalent PHP code from the shell, it will have the same effect as if it was written in Python (or any other language). That is, your console will block until you kill the process.
I'm asking because I want to create a program that runs infinitely,
but I'm not sure how to determine it's footprint or how much it will
take from system resources.
A program that runs indefinitely (I think that's what you mean) - it generally has two cases:
Its waiting to do some work on a trigger (like a web server runs indefinitely, but its just sitting there until someone visits your website)
Its doing a process that is taking a long time.
For #2, you need to determine the resource use by figuring out what is the work being done.
If its building a large list of items to do some calculations/sorting, then memory use will grow as the list grows.
If its processing a bunch of files, and during this process, it generates a lot of output stored on disk - then disk usage will grow, and then shrink when the process is done.
If its a rendering engine, then memory use and CPU use will increase, along with disk use as the memory is swapped out during rendering. However, such a system will not tax the disk too much.
The bottom line is, you can't get an answer to this unless you explain the process being run.

Related

Would adding an animation slow down my program in Python?

In Python, while I was testing a bruteforce script I saw that not printing something like Trying Password: *password* with every attempt significantly decreases the time it takes in order to find the password. I just let it run on a blank screen but if I put something as simple as a loading animation (Running . . .)in the beginning to let me know it's working fine, would that slow down my program too?
(Excuse me if any of what I said was hard to understand. I'm confused as well)
When attempting a bruteforce, it's best to have as much processing power available. A constant call from Python to update the screen (with a loading status, in this case) takes up some processing power and would indeed slow down the bruteforce.
By how much it slows down depends on how your script is written and the hardware it's running on. Better hardware - faster. Better threading for the script - faster. You might be able to avoid a noticeable impact if you offload the "animation" to a thread which isn't fully utilized (if your script leaves any such threads in the first place).
Though unless you are on a very slow PC, the main slow down probably doesn't come from the CPU, but from the data bus. Sending information between components at a very rapid pace could cause a bottleneck. So if your script waits for that bottleneck to pass before it continues cycling passwords - it gets slowed down. Try to separate the "loading" status from the rest of the logic, so that the CPU can keep cycling passwords without waiting for each screen refresh to pass.
I hope this helped.
I/O bound operations like printing are very slow compared to CPU bound ops like calculations.
So, everytime you printed, trying password, your program could have tried 1000 more combinations.
But if you want to print once in the beginning, it wont slow down, printing repetitively will.

Is it possible to execute two functions at EXACTLY the same time

I'm wanting to take photos from 2 different cameras at exactly the same time (or as close as possible).
If I use multithreading or multiprocessing, it still runs the threads/processes consecutively.. For instance if I start the following processes:
Take_photo_1.start()
Take_photo_2.start()
While those processes would run in parallel, the commands to start the processes are still executed sequentially. Is there any way to execute both those processes at exactly the same time?
There's no way to make this exact even if you're writing directly in machine code. Even if you have all the threads wait on a kernel barrier, that wait can take different times on different cores, and there are opcodes to process between the barrier wait and the camera get that have to get fetched and run on a system where the caches may be in different states, and there's nothing stopping the OS from stealing the CPU from one of the threads to run some completely unrelated code, and the I/O to the camera (even if it isn't serialized, which it may be) probably isn't a guaranteed static time, and so on.
When you throw an interpreted language on top of it (especially one with a GIL, like Python, which means the bytecodes between the barrier wait and the camera get can't be run in parallel)… well, you're not really changing anything; "impossible * 7" is still "impossible". But you are making it even more obvious.
Fortunately, very few real-life problems have a true hard real-time requirement like that. Instead, you have a requirement like "99.9% of the time, all camera gets should happen within +/-4ms of the desired exact 30fps". Or, maybe, "90% of the time it's within +/-1ms, 99.9% of the time it's within +/-4ms, 99.999% of the time it's within +/-20ms, as long as you don't do anything stupid like change the wall-power state of the laptop while running the code".
Or… well, only you know why you wanted "exact", and can figure out what the actual requirements are that would satisfy you.
And for that case, often the simplest thing to do is write the code the obvious way, stress test the hell out of it, see if it meets your requirements, and figure out how to optimize things only if it doesn't.
So, your existing code may well be fine.
If not, adding a shared barrier = threading.Barrier() and doing a barrier.wait() right before the camera.get() may be all you need.
You may need to add logic to detect timer lag and re-synchronize (which you might do independently in each thread, or have whichever thread gets there first compute it and just make everyone else wait at the barrier).
You may need to rewrite the core loop in C. Or dump whichever OS you're using for one with better real-time guarantees like QNX. Or throw out the OS entirely so there's no scheduler to get in the way. Or throw out the complex superscalar CPUs and implement the whole thing as a hardware state machine. Or…
But, assuming you have reasonable requirements in the first place, you usually don't have to go very far.

How do I automatically kill a process that uses too much memory with Python?

The situation: I have a website that allows people to execute arbitrary code in a different language (specifically, an esolang I created), using a Python interpreter on a shared-hosting server. I run this code in a separate process which is given a time limit of 60 seconds.
The problem: You can do stuff like (Python equivalent) 10**(10**10), which rapidly consumes far more memory than I have allotted to me. It also, apparently, locks up Apache - or it takes too long to respond - so I have to restart it.
I have seen this question, but the given answer uses Perl, which I do not know at all, hence I'd like an answer in Python. The OS is Linux too, though.
Specifically, I want the following characteristics:
Runs automatically
Force-kills any process that exceeds some memory limit like 1MB or 100MB
Kills any process spawned by my code that is more than 24 hours old
I use this piece of code (in a Django view) to create the process and run it (proxy_prgm is a Manager so I can retrieve data from the program that's interpreting the esolang code):
prgmT[uid] = multiprocessing.Process(
target = proxy_prgm.runCatch,
args = (steps,),
name="program run")
prgmT[uid].start()
prgmT[uid].join(60) #time limit of 1 minute
if prgmT[uid].is_alive():
prgmT[uid].terminate()
proxy_prgm.stop()
If you need more details, don't hesitate to tell me what to edit in (or ask me questions).
Another approach that might work; using resource.setrlimit() (more details in this other StackOverflow answer). It seems that by doing so you can set a memory limit on a process and it's subprocesses; you'll have to figure out how to handle if the limit is hit though. I don't have personal experience using it, but hopefully doing so would stop Apache from locking up on you.

Slower execution of AWS Lambda batch-writes to DynamoDB with multiple threads

Disclaimer: I know this question will annoy some people because it's vague, theoretical, and has little code.
I have a AWS Lambda function in Python which reads a file of denormalized records off S3, formats its contents correctly, and then uploads that to DynamoDB with a batch write. It all works as advertised. I then tried to break up the uploading part of this pipeline into threads with the hope of more efficiently utilizing DynamoDBs write capacity. However, the multithread version is slower by about 50%. Since the code is very long I have included pseudocode.
NUM_THREADS = 4
for every line in the file:
Add line to list of lines
if we've read enough lines for a single thread:
Create thread that uploads list of lines
thread.start()
clear list of lines.
for every thread started:
thread.join()
Important notes and possible sources of the problem I've checked so far:
When testing this locally using DynamoDB Local, threading does make my program run faster.
If instead I use only 1 thread, or even if I use multiple threads but I join the thread right after I start it (effectively single threaded), the program completes much quicker. With 1 thread ~30s, multi thread ~45s.
I have no shared memory between threads, no locks, etc.
I have tried creating new DynamoDB connections for each thread and sharing one connection instead, with no effect.
I have confirmed that adding more threads does not overwhelm the write capacity of DynamoDB, since it makes the same number of batch write requests and I don't have more unprocessed items throughout execution than with a single thread.
Threading should improve the execution time since the program is network bound, even though Python threads do not really run on multiple cores.
I have tried reading the entire file first, and then spawning all the threads, thinking that perhaps it's better to not interrupt the disk IO, but to no effect.
I have tried both the Thread library as well as the Process library.
Again I know this question is very theoretical so it's probably hard to see the source of the issue, but is there some Lambda quirk I'm not aware of? Is there something I else I can try to help diagnose the issue? Any help is appreciated.
Nate, have you completely ruled out a problem on the Dynamodb end? The total number of write requests may be the same, but the number per second would be different with a multi-thread.
The console has some useful graphs to show if your writes (or batch writes) are being throttled at all. If you don't have the right 'back off, retry' logic in your Lambda function, Lambda will just try and try again and your problem gets worse.
One other thing, which might have been obvious to you (but not me!). I was under the impression that batch_writes saved you money on the capacity planning front. (That 200 writes in batches of 20 would only cost you 10 write units, for example. I could have sworn I heard an AWS guy mention this in a presentation, but that's beside the point.)
In fact the batch_writes save you some time, but nothing economically.
One last thought: I'd bet that Lambda processing time is cheaper than upping your Dynamodb write capacity. If you're in no particular rush for Lambda to finish, why not let it run its course on single-thread?
Good luck!
Turns out that the threading is faster, but only when the file reached a certain file size. I was originally work on a file size of about 1/2 MG. With a 10 MG file, the threaded version came out about 50% faster. Still unsure why it wouldn't work with the smaller file, maybe it just needs time to get a'cooking, you know what I mean? Computers are moody things.
As a backdrop I have good experience with python and dynamoDB along with using python's multiprocessing library. Since your file size was fairly small it may have been the setup time of the process that confused you about performance. If you haven't already, use python multiprocessing pools and use map or imap depending on your use case if you need to communicate any data back to the main thread. Using a pool is the darn simpliest way to run multiple processes in python. If you need your application to run faster as a priority you may want to look into using golang concurrency and you could always build the code into binary to use from within python. Cheers.

For my app, how many threads would be optimal?

I have a simple Python web crawler. It uses SQLite to store its output and also to keep a queue. I want to make the crawler multi-threaded so that it can crawl several pages at a time. I figured i would make a thread and just run several instances of the class at once, so they all run concurrently. But the question is, how many should i run at once? should i stick to two? can i go higher? what would be a reasonable limit for a number of threads? Keep in mind that each thread goes out to a web page, downloads the html, runs a few regex searches through it, stores the info it finds in a SQLite db, and then pops the next url off the queue.
You will probably find your application is bandwidth limited not CPU or I/O limited.
As such, add as many as you like until performance begins to degrade.
You may come up against other limits depending on your network setup. Like if you're behind an ADSL router, there will be a limit on the number of concurrent NAT sessions, which may impact making too many HTTP requests at once. Make too many and your provider may treat you as being infected by a virus or the like.
There's also the issue of how many requests the server you're crawling can handle and how much of a load you want to put on it.
I wrote a crawler once that used just one thread. It took about a day to process all the information I wanted at about one page every two seconds. I could've done it faster but I figured this was less of a burden for the server.
So really theres no hard and fast answer. Assuming a 1-5 megabit connection I'd say you could easily have up to 20-30 threads without any problems.
I would use one thread and twisted with either a deferred semaphore or a task cooperator if you already have an easy way to feed an arbitrarily long list of URLs in.
It's extremely unlikely you'll be able to make a multi-threaded crawler that's faster or smaller than a twisted-based crawler.
It's usually simpler to make multiple concurrent processes. Simply use subprocess to create as many Popens as you feel it necessary to run concurrently.
There's no "optimal" number. Generally, when you run just one crawler, your PC spends a lot of time waiting. How much? Hard to say.
When you're running some small number of concurrent crawlers, you'll see that they take about the same amount of time as one. Your CPU switches among the various processes, filling up the wait time on one with work on the others.
You you run some larger number, you see that the overall elapsed time is longer because there's now more to do than your CPU can manage. So the overall process takes longer.
You can create a graph that shows how the process scales. Based on this you can balance the number of processes and your desirable elapsed time.
Think of it this way.
1 crawler does it's job in 1 minute. 100 pages done serially could take a 100 minutes. 100 crawlers concurrently might take on hour. Let's say that 25 crawlers finishes the job in 50 minutes.
You don't know what's optimal until you run various combinations and compare the results.
cletus's answer is the one you want.
A couple of people proposed an alternate solution using asynchronous I/O, especially looking at Twisted. If you decide to go that route, a different solution is pycurl, which is a thin wrapper to libcurl, which is a widely used URL transfer library. PyCurl's home page has a 'retriever-multi.py' example of how to fetch multiple pages in parallel, in about 120 lines of code.
You can go higher that two. How much higher depends entirely on the hardware of the system you're running this on, how much processing is going on after the network operations, and what else is running on the machine at the time.
Since it's being written in Python (and being called "simple") I'm going to assume you're not exactly concerned with squeezing every ounce of performance out of the thing. In that case, I'd suggest just running some tests under common working conditions and seeing how it performs. I'd guess around 5-10 is probably reasonable, but that's a complete stab in the dark.
Since you're using a dual-core machine, I'd highly recommend checking out the Python multiprocessing module (in Python 2.6). It will let you take advantage of multiple processors on your machine, which would be a significant performance boost.
One thing you should keep in mind is that some servers may interpret too many concurrent requests from the same IP address as a DoS attack and abort connections or return error pages for requests that would otherwise succeed.
So it might be a good idea to limit the number of concurrent requests to the same server to a relatively low number (5 should be on the safe side).
Threading isn't necessary in this case. Your program is I/O bound rather than CPU bound. The networking part would probably be better done using select() on the sockets. This reduces the overhead of creating and maintaining threads. I haven't used Twisted, but I heard it has really good support for asynchronous networking. This would allow you you to specify the URLs you wish to download and register a callback for each. When each is downloaded you the callback will be called, and the page can be processed. In order to allow multiple sites to be downloaded, without waiting for each to be processed, a second "worker" thread can be created with a queue. The callback would add the site's contents to the queue. The "worker" thread would do the actual processing.
As already stated in some answers, the optimal amount of simultaneous downloads depends on your bandwidth.
I'd use one or two threads - one for the actual crawling and the other (with a queue) for processing.

Categories