Performance running multiple Python scripts simultaneously - python

Fairly new 'programmer' here, trying to understand how Python interacts with Windows when multiple unrelated scripts are run simultaneously, for example from Task Manager or just starting them manually from IDLE. The scripts just make http calls and write files to disk, and environment is 3.6.
Is the interpreter able to draw resources from the OS (processor/memory/disk) independently such that the time to complete each script is more or less the same as it would be if it were the only script running (assuming the scripts cumulatively get nowhere near using up all the CPU or memory)? If so, what are the limitations (number of scripts, etc.).
Pardon mistakes in terminology. Note the quotes on 'programmer'.

how Python interacts with Windows
Python is an executable, a program. When a program is executed a new process is created.
python myscript.py starts a new python.exe process where the first argument is your script.
when multiple unrelated scripts are run simultaneously
They are multiple processes.
Is the interpreter able to draw resources from the OS (processor/memory/disk) independently?
Yes. Each process may access the OS API however it wishes, to the extend that it is possible.
What are the limitations?
Most likely RAM. The same limitations as any other process might encounter.

These are difficult questions to answer, in part because they depend on:
Your operating system: Your OS gets to schedule and run tasks when it wants, which the Python programmer often does not have control over.
What your scripts are actually doing: If your scripts are all trying to write to the same drive, their execution may be halted more often than if no device was being written to. Or the script might run even faster if only one script writes to the drive, as the CPU can let one script calculate when another script writes. (It's hard to tell without benchmark testing.)
How many CPUs you're using: The number of Central Processing Units can improve parallel processing of programs -- but perhaps not. If your programs are constantly reading and writing from the same disk, more CPUs may not be a benefit.
Your Python version: (I'm just adding this for completeness.)
Ultimately, the only way you're going to get any real information on this is if you do your own benchmarking -- and even then, you should remember that those figures you find are only applicable to your current setup. That is, if you go to another computer elsewhere, you may find you get different results.
If you aren't familiar with Python's timeit module, I recommend you look into it. (I'm pretty sure it's a standard module, so you should already have it.) It'll help you do benchmark testing and let you get some definitive answers for your platform.
By asking questions like yours, you may soon hear about Python's GIL (Global Interpreter Lock). It has to do with Python threads, and some people think it's a blessing, and some think it's a curse. Either way, this page:
https://realpython.com/python-gil/
has a good high-level explanation of it when it can work well and when it might not.

Related

How far can python go?

I've started python a few months ago, for school projects and I was pretty surprised at the file edition power it had without asking for any permissions. My question is, how far can python go? Can it delete System files? Can it delete normal files? I also saw a video that I didn't click that said python malware was really easy to make... So I am just really curious to how far it goes, mostly because my IDE didn't even need admin permissions to be installed...
P-S: not sure if this is appropriate to stack overflow, kinda new here :)
Python can go just as far as the user running Python can. If you have the right to delete a file, then if you start Python and run a script or issue a command that deletes a file, Python will be allowed to. Python will be acting under your user account.
Having said that, it's not always obvious what user is running Python exactly. Normally, if you start Python yourself and pass it a script, or run some commands interactively, it'll be you.
But if, for example, you start Python from a scheduled task, the user running Python won't be you by default, but some sort of system account which may have more restricted rights.
On the other hand, if you're not allowed to do something (say access a folder that has restricted access for other users only), you can still write a Python script that tries to perform the actions. If you were to run that script, it would fail, but if one of those other users logs on and runs the same script, it will succeed.
Python is restricted in that it doesn't contain libraries for every function imaginable (although you could probably write them yourself in Python, given enough time). To get around that, you typically install third party packages, created by people that have already written that code, further extending what Python can do. But no package should be able to get around the restrictions the OS imposes on the user running Python.
To get a sense of how complete Python is, even without third party packages, have a look at the Python Standard Library. All those things can be done with standard Python, provided the user running it is allowed to.

Control number of CPU using in jupyterlab server

I'm using jupyterlab and I know that I have 12 cores available.
At the moment I use only 1 and I would like to use more.
I have tried to changed the number I use by write this in the terminal:
export JULIA_NUM_THREADS=7
but then when I print:
import threading
threading.activeCount()
>>>5
how can I make more CPU available for my jupyterlab notebook?
This is really not my field so I'm sorry if is smething really simple I just don't understand what am I doing wrong and where to start from.
TLDD; No configuration needed. It is available to you, just need to code explicitely what you want to run in parallel.
JULIA_ACTIVE_THREADS is a configuration option for the Julia Kernel in Jupyter, not for the Python Kernel (the process that runs your notebook code).
Unless you run Jupyter inside a container, you can use out of the box all cores available in your system. If Jupyter is in a container or a virtual machine, it will use what you allocate and nothing more.
Just remember that by default you use 1 core when you run your Jupyter kernel.
When you run threading.active_count() and get 1, this means you are using one running thread on your code. Moden processors can use several threads for each available core. The bad news is that this is not a measure about how good you are using the cpu.
Python can act as an orchestrator for libraries that work in paraller behind the scenes (think numpy, pandas, tensorflow...).
If you want to code Python code that use more than 1 thread and/or 1 CPU, take a look at the multiprocess module.
The multipreocessing module is part of the standard library, and you can use it inside without trouble inside Jupyter. Probably you will find the Process and Pool methods useful (if you want to work with deep learning, there is a pytorch.multiprocessing module with the same interface but with support for working with GPUs in different threads).
A few thoughts, but to long for a comment, i am not familiar with jupyter, only "normal python", so maybe this all gets in the wrong direction ;):
As far as i know, the àctive_count (in my opinion you should not use the old camelCase name) only returns the amount of active threads, not the available. So try to add more threads. I have a Quadcore and jupyter starts with 5 threads, but i can add more.
Multithreading is not the same as multiprocessing (If you want to run on different Cores you have to use multiprocessing) (python thread vs. multiproccess), maybe you are looking for the wrong thing?

Fast/interactive development environment for python

I just posted a question here why python imports take as long as they do. Are there environments that don't require reinitializing modules? If so, what are they?
Details: I'm trying to learn basic python syntax while using extended libraries (matplotlib, mayavi), and each time I test my code I wait (several!!) seconds for the modules to load. There must be a faster way to do this, but I don't know what environments are well suited. Suggestions?
Take a look at ipython and pandas they might be closer to what you want. Python does have a reload for modules but I'm not sure how well it works so anything that keeps a single python instance running and doesn't spawn python child processes is likely to fit the bill (sorry not sure what's available in that area).
http://ipython.org/
http://pandas.pydata.org/
Any environment with client/server architecture (short-lived cli/gui/web-clients, long-lived computational kernels) such as https://jupyter.org/ will do.

What scripts should not be ported from bash to python?

I decided to rewrite all our Bash scripts in Python (there are not so many of them) as my first Python project. The reason for it is that although being quite fluent in Bash I feel it's somewhat archaic language and since our system is in the first stages of its developments I think switching to Python now will be the right thing to do.
Are there scripts that should always be written in Bash? For example, we have an init.d daemon script - is it OK to use Python for it?
We run CentOS.
Thanks.
It is OK in the sense that you can do it. But the scripts in /etc/init.d usually need to load config data and some functions (for example to print the nice green OK on the console) which will be hard to emulate in Python.
So try to convert those which make sense (i.e. those which contain complex logic). If you need job control (starting/stopping processes), then bash is better suited than Python.
Generally, scripts in /etc/init.d are written in the "native shell" of the OS (e.g. bash, sh, posix-sh, etc). This is especially true of scripts that will be run at the lower init levels (e.g. not every directory will be mounted in single user mode, including wherever python or the site-libraries might be installed).
Most OS's provide some "helper functions" that make writing scripts in some native shell easier. These scripts define certain return codes and messages that are required/desired when writing service scripts. On RedHat based systems, see:
/etc/init.d/functions
Beyond that, the service scripts in /etc/init.d can be written in any language (including compiled languages). The general calling syntax will need to be supported. Typically there are three arguments that should be supported: start, stop, and status. Some additional arguments might be appropriate, depending on the purpose of the scripts.
% /etc/init.d/foo (start|stop|status)
Every task has languages that are better suited for it and less so. Replacing the backtick ` quote of sh is pretty ponderous in Python as would be myriad quoting details, just to name a couple. There are likely better projects to cut your teeth on.
And all that they said above about Python being relatively heavyweight and not necessarily available when needed.
Certain scripts that I write simply involving looping over a glob in some directories, and then executing some a piped series of commands on them. This kind of thing is much more tedious in python.

Would Python make a good substitute for the Windows command-line/batch scripts?

I've got some experience with Bash, which I don't mind, but now that I'm doing a lot of Windows development I'm needing to do basic stuff/write basic scripts using
the Windows command-line language. For some reason said language really irritates me, so I was considering learning Python and using that instead.
Is Python suitable for such things? Moving files around, creating scripts to do things like unzipping a backup and restoring a SQL database, etc.
Python is well suited for these tasks, and I would guess much easier to develop in and debug than Windows batch files.
The question is, I think, how easy and painless it is to ensure that all the computers that you have to run these scripts on, have Python installed.
Summary
Windows: no need to think, use Python.
Unix: quick or run-it-once scripts are for Bash, serious and/or long life time scripts are for Python.
The big talk
In a Windows environment, Python is definitely the best choice since cmd is crappy and PowerShell has not really settled yet. What's more Python can run on several platform so it's a better investment. Finally, Python has a huge set of library so you will almost never hit the "god-I-can't-do-that" wall. This is not true for cmd and PowerShell.
In a Linux environment, this is a bit different. A lot of one liners are shorter, faster, more efficient and often more readable in pure Bash. But if you know your quick and dirty script is going to stay around for a while or will need to be improved, go for Python since it's far easier to maintain and extend and you will be able to do most of the task you can do with GNU tools with the standard library. And if you can't, you can still call the command-line from a Python script.
And of course you can call Python from the shell using -c option:
python -c "for line in open('/etc/fstab') : print line"
Some more literature about Python used for system administration tasks:
The IBM lab point of view.
A nice example to compare bash and python to script report.
The basics.
The must-have book.
Sure, python is a pretty good choice for those tasks (I'm sure many will recommend PowerShell instead).
Here is a fine introduction from that point of view:
http://www.redhatmagazine.com/2008/02/07/python-for-bash-scripters-a-well-kept-secret/
EDIT: About gnud's concern: http://www.portablepython.com/
Are you aware of PowerShell?
Anything is a good replacement for the Batch file system in windows. Perl, Python, Powershell are all good choices.
#BKB definitely has a valid concern. Here's a couple links you'll want to check if you run into any issues that can't be solved with the standard library:
Pywin32 is a package for working with low-level win32 APIs (advanced file system modifications, COM interfaces, etc.)
Tim Golden's Python page: he maintains a WMI wrapper package that builds off of Pywin32, but be sure to also check out his "Win32 How Do I" page for details on how to accomplish typical Windows tasks in Python.
Python is certainly well suited to that. If you're going down that road, you might also want to investigate SCons which is a build system itself built with Python. The cool thing is the build scripts are actually full-blown Python scripts themselves, so you can do anything in the build script that you could otherwise do in Python. It makes make look pretty anemic in comparison.
Upon rereading your question, I should note that SCons is more suited to building software projects than to writing system maintenance scripts. But I wouldn't hesitate to recommend Python to you in any case.
As a follow up, after some experimentation the thing I've found Python most useful for is any situation involving text manipulation (yourStringHere.replace(), regexes for more complex stuff) or testing some basic concept really quickly, which it is excellent for.
For stuff like SQL DB restore scripts I find I still usually just resort to batch files, as it's usually either something short enough that it actually takes more Python code to make the appropriate system calls or I can reuse snippets of code from other people reducing the writing time to just enough to tweak existing code to fit my needs.
As an addendum I would highly recommend IPython as a great interactive shell complete with tab completion and easy docstring access.
I've done a decent amount of scripting in both Linux/Unix and Windows environments, in Python, Perl, batch files, Bash, etc. My advice is that if it's possible, install Cygwin and use Bash (it sounds from your description like installing a scripting language or env isn't a problem?). You'll be more comfortable with that since the transition is minimal.
If that's not an option, then here's my take. Batch files are very kludgy and limited, but make a lot of sense for simple tasks like 'copy some files' or 'restart this service'. Python will be cleaner, easier to maintain, and much more powerful. However, the downside is that either you end up calling external applications from Python with subprocess, popen or similar. Otherwise, you end up writing a bunch more code to do things that are comparatively simple in batch files, like copying a folder full of files. A lot of this depends on what your scripts are doing. Text/string processing is going to be much cleaner in Python, for example.
Lastly, it's probably not an attractive alternative, but you might also consider VBScript as an alternative. I don't enjoy working with it as a language personally, but if portability is any kind of concern then it wins out by virtue of being available out of the box in any copy of Windows. Because of this I've found myself writing scripts that were unwieldy as batch files in VBScript instead, since I can't usually depend on Python or Perl or Bash being available on Windows.
Python, along with Pywin32, would be fine for Windows automation. However, VBScript or JScript used with the Windows Scripting Host works just as well, and requires nothing additional to install.
I've been using a lot of Windows Script Files lately. More powerful than batch scripts, and since it uses Windows scripting, there's nothing to install.
As much as I love python, I don't think it a good choice to replace basic windows batch scripts.
I can't see see someone having to import modules like sys, os or getopt to do basic things you can do with shell like call a program, check environment variable or an argument.
Also, in my experience, goto is much easier to understand to most sysadmins than a function call.

Categories