Ok, I know this sounds crazy, but bear with me...
I'm running an Ubuntu 16 cloud machine. A bunch of background processes happen that pull images, and do a handful of complex Imagemagick operations, then use FFMPEG to make videos of those images. And the process repeats for each image. Overall, it works very well, and is smooth.
This server also houses a Flask App, that houses the front end of the result of this process.
However, from time to time, the Convert module just stalls. When viewing it in Top, the CPU usage goes down to 25% and just sits there. Images are not created, and it just sits there endlessly. If I kill the convert, and restart the image processes, the next image process stalls out. No error codes, it just sits there endlessly.
Overall it is pretty sparatic, BUT this is one way I can reproduce it, which is running the process with a very large image (10,000 pixels across).
The only way to get the process to 'reset' is to kill the convert, and restart Apache. Only after I restart Apache does the pipeline work again.
What the heck is going on? What does Apache have to do with Imagemagick?!
Any help or insight would be much appreciated.
You are apparently running out of resources. You can limit the amount of resources that "convert" will try to acquire either by
the -limit resource option
See http://imagemagick.org/script/command-line-options.php#limit
setting limits in a policy.xml file
See http://imagemagick.org/script/resources.php
I'd probably start by playing with the -limit resource option, then when I was happy with the results, write a policy.xml file containing default limits as a permanent solution until I got a larger machine that could deal with larger limits.
Related
in my vps i have run 4 Python Script and its been 60 days that i don't reboot my vps and now i have to, but if i reboot vps my python Variables & data will be removed because i don't store them in file and they are store in variables in python script.
my OS is Ubuntu Server 16.04 LTS and i was run my python codes with nohup command until they can run in background.
Now i need a way to stop my scripts without removing they variables and start them with same variables and data after i reboot my vps.
Is There Any Way That I Can Do This?
In Addition, I'm sorry for writing mistakes in my question.
Python doesn't provide any way of doing this.
But you might be able to use CRIU, or a similar tool, to freeze and snapshot the interpreter process. Then, after restart, you can resume the snapshot into a new process that just picks up exactly where you left off.
It may not work.1 But there's a good chance it will. This is essentially the same thing as a Live Migration in the CRIU docs, except that you're not migrating to a new computer/container/etc., just to the future of the same computer. So, start reading with that page, and follow the links from there.
You should probably test before you commit to it.
* Try it (obviously don't include the system restart, just kill -9 the executable) on a Python script that doesn't do anything important (maybe increments a counter, print it out, sleep for a second, repeat.
* Maybe try it on a script that does similar kinds of stuff to what yours are doing.
* If it's safe to have two copies of one of your programs running at the same time (they're not going to stomp all over each other writing to the same file, or fight over the same socket, or whatever), start a second copy and test dump/kill/resume that.
* Try it on one of your real processes, still without restart.
* Try it on all four.
* Cross your fingers, sacrifice a chicken, and do it for real.
If that doesn't pan out, the only option I can think of is to go through your scripts, manually figure out everything that needs to be saved and how it could be accessed from the top-level global, and do that in the debugger.
Ideally, you'll write a script that will automate accessing and saving all that stuff—plus another one to feed it into a new instance at restart. Then you just pdb the live interpreters and start dumping everything.
This is guaranteed to be a whole lot of work, and not much fun. On the plus side, it is guaranteed to work if you do it right. On the third hand, it's pretty easy to not do it right.
1. If you rely on open files, pipes, sockets, etc., CRIU does about as much as you could do, which is more than you might expect at first, but still not everything you could possibly want… Also, if you're using almost all of your RAM, it can be hard to wedge things back into exactly the same state. And there are probably other possible issues.
When I run any python script that doesn't even contain any code or imports that could access the internet in any way, I get 2 pythonw.exe processes pop up in my resource monitor under network activity. One of them is always sending more than receiving while the other has the same activity but the amount of sending vs receiving is reversed. The amount of overall activity is dependent on the file size, regardless of how many line are commented out. Even a blank .py document will create network activity of about 200 kb/s. The activity drops from its peak, which is as high as 15,000 kb/s for a file with 10,000 lines, to around zero after around 20 seconds, and then the processes quit on their own. The actual script has finished running long before the network processes stop.
Because the activity is dependent on file size I'm suspicious that every time I run a python script, the whole thing is being transmitted to a server somewhere else in the world.
Is this something that could be built into python, a virus that's infecting my computer, or just something that python is supposed to do and its innocent activity?
If anyone doesn't have an answer but could check to see if this activity affects their own installation of python, that would be great. Thanks!
EDIT:
Peter Wood, to start the process just run any python script from the editor, its runs on its own, at least for me. I'm on 2.7.8.
Robert B, I think you may be right, but why would the communication continue after the script has finished running?
Running a python script on different nodes at school using SSH. Each node has 8 cores. I use GNU Screen to be able to detach from a single process.
Is it more desirable to:
Run several different sessions of screen.
Run a single screen process and use & in a bash terminal.
Are they equivalent?
I am not sure if my experiments are poorly coded and taking an inordinate amount of time (very possible) OR my choice to use 1. is slowing the process down considerably. Thank you!
With bash I imagine you're doing something like this (assuming /home is under network mount):
#!/bin/bash
for i in {1..$NUM_NODES}
do
ssh node$i 'python /home/ryan/my_script.py' &
done
Launching this script from behind a single screen will work fine. Starting up several sessions of screen provides no performance gains but adds in the extra complication of starting multiple screens.
Keep in mind that there are much better ways to distribute load across a cluster (e.g. if someone else is using up all of node7 you'd want a way to detect that and send your job elsewhere). Most clusters I've worked with have Torque, Maui or the qsub command installed. I suggest giving those a look.
I would think they are about the same. I would prefer screen just because I have an easier time managing it. Depending on the scripts usage, that could also have some effect on time to process.
I recently acquired a Go Pro Hero 3. Its working fine but when i attempt to stream live video/audio it gitches every now and then.
Initially i just used vlc to open the m3u8 file, however when that was glitchy i downloaded the android app and attempted to stream over that.
It was a little better on the app.
I used wireshark and i think the cause of it is its simply not transferring/buffering fast enough. Tried just to get everything with wget in loop, it got through 3 loops before it either: caught up (possible but i dont think so ... though i may double check that) or fell behind and hence timed out/hung.
There is also delay in the image, but i can live with that.
I have tried lowering the resolution/frame rate but im not sure if it is actually doing anything as i can't tell any difference. I think it may be just the settings for recording on the go pro. Either way, it didn't work.
Essentially i am looking for any possible methods for removing this 'glitchiness'
My current plan is to attempt writing something in python to get the files over UDP (no TCP overhead).
Ill just add a few more details/symptoms:
The Go Pro is using the Apple m3u8 streaming format.
At anyone time there are 16 .ts files in the folder. (26 Kb each)
These get overwritten in a loop (circular buffer)
When i stream on vlc:
Approx 1s delay - streams fine for ~0.5s, stops for a little less than that, then repeats.
What i think is happening is the file its trying to transfer gets overwritten which causes it to timeout.
Over the android App:
Less delay and shorter 'timeouts' but still there
I want to write a python script to try get a continuous image. The files are small enough that they should fit in a single UDP packet (i think ... 65Kb ish right?)
Is there anything i could change in terms of wifi setting on my laptop to improve it too?
Ie some how dedicate it to that?
Thanks,
Stephen
I've been working on creating a GoPro API recently for Node.js and found the device very glitchy too. Its much more stable after installing the latest gopro firmware (3.0.0).
As for streaming, I couldnt get around the wifi latency and went for a record and copy approach.
Our SGE cluster setup requires there to be a delay between controller and engines starting. If this delay is not there, some of the servers use "old" ipcontroller-client.json files and attempt to connect to previous (and not running) controllers. This is an NFS "feature", so to remedy, I set c.IPClusterStart.delay = 30 in the ipcluster_config.py file and things work well. The controller gets submitted to SGE, has enough time to start and write its json files, and then the engines can start correctly to the newly running controller. However, I'd like to also be able to start the cluster from the notebook. Unfortunately, it appears that this timeout is not used, the controller and engines start up at the same time (as seen with watch qstat), some of the engines connect (because the pick up the new settings from the json file) and some do not (because of NFS).
I ran an strace on the notebook and saw that it's using sge_controller and sge_engines scripts (created by the notebook when you press start) to start these processes.
I'm wondering if there's any way to implement a delay here, as well. It's starting the controller and engines the right way (SGE) so I know it's reading the ipcluster_config.py.
I've Googled around and searched this site, with no luck. Hoping maybe someone can shed some light on the deeper workings of this behavior.
Thanks,
Chris
Well, this is probably too late for the OP, but hopefully it helps someone.
If it is a timeout issue, just set c.EngineFactory.timeout and c.IPEngineApp.wait_for_url_file to some larger times.
If it is due to failure after the first run, it is probably due to lingering security files, which should be deleted ( ipcontroller-engine.json and ipcontroller-client.json ) from the relevant iPython profile using IPython.utils.path.get_security_file to get the full paths. To automate this and make it somewhat less painful, this deletion step can be tacked on to the beginning of the same profile's ipcluster_config.py.
These changes alone were enough for me to get the cluster running with the notebook easily.
If neither of these solve the problem, there are some other thoughts ( http://mail.scipy.org/pipermail/ipython-user/2011-November/008741.html ).