I have a scientific code written in python.
The code works as a proper package to be imported and then the underlying functionality used. However, the workflow allows for the package to be used in two "modes":
One as a pre/post-processor to setup simulations, initial conditions, create grids (its a fluid flow solver), manipulate data etc. For this kind of usage, one is typically writing scripts and running them in serial on local machines. And this mode takes advantage of pythons amazing ecosystem, imports and utilizes other packages like scipy, etc...
The package then has another mode where its functionality is used for HPC applications, and runs on tens of thousands of processors with mpi4py. In this mode, I don't need many external packages, and I want to keep them to a minimum, and avoid unnecessary imports (this is question comes from an issue I am having on a cluster where numpy cannot import its random module above a certain number of processes).
My question is, is there a pythonic way to differentiate between these modes of use, and make my package heavy, feature rich, and all things great about python in one mode, and then "trim the fat" when used at run time?
If it spurs an idea, when I want the package to be light and not import unnecessary packages, it is always called from a single "run.py" script which basically bootstraps a simulation and executes it. Otherwise, the packages is imported from custom scripts or from the interpreter and I want the user to have access to all the available functionality.
How to compile a whole Python library along with it's dependencies so that it can be used in C (without invoking Python's runtime). That is, the compiled code has the Python interpreter embedded and Python does not need to be installed on the system.
From my understanding, when Python code is compiled using Cython it:
Does not invoke the python runtime if the --embed argument is used
Compiles files individually
Allows for different modules to be called (from the Python runtime / other compiled Cython files)
The question which are still unclear are:
How to use these module files from C? Can the compiled Python files call other compiled Python files when used in C?
Does only the library entry point need to be declared or do all functions need to be declared?
How to manage the Python dependencies? how to compile them too (so that the Python runtime is not needed).
A simplified example for a python library called module where __init__.py is an empty file:
module/
├── run.py
├── http/
│ ├── __init__.py
│ ├── http_request.py
http_requests.py contains:
import requests
def get_ip():
r = requests.get('https://ipinfo.io/ip')
print(r.text)
and run.py contains the following:
from http import http_request
if __name__ == '__main__':
http_request.get_ip()
How to call the function get_ip from C without using the Python runtime (needing to have Python installed when running the application).
The above example is very simple. The actual use case is collecting/processing robotics data in C at a high sampling rate. Whilst C is great for basic data processing there are excellent Python libraries which allow for much more comprehensive analysis. The objective would be to call the Python libraries on data which has been partially processed in C. This would allow us to get a much more detailed understanding of the data (and process it in "real time"). The data frameworks are way too large for our team to rewrite in C.
How to compile a whole Python library along with it's dependencies so that it can be used in C (without invoking Python's runtime).
This is impossible in general. Python code is practically expected to run on a Python interpreter.
Sometimes, when only a small subset of Python is used (even indirectly by everything your Python code is using) you might use Cython (which is actually a superset of a small subset of Python: a lot of genuine Python features cannot be used from Cython, or uses the Python interpreter). But not every Python code can be cythonized, since Python and C have a very different (and incompatible) semantics (and memory management).
Otherwise (and most often), the C code using your Python stuff should embed the Python interpreter.
A wiser and more robust approach, if your goal is to make a self-sufficient C library usable from many C programs (on systems without Python), is to rewrite your code in C.
You could also consider starting (in your C library) some Python process (server-like, doing your Python stuff) and using inter-process communication facilities, that would be operating system specific. Of course Python needs to be installed on the system of the application using your library. For example, for Linux, you might fork some Python process in your library, and use pipe(7) or unix(7) sockets to communication from the C library to that process (perhaps using something like JSONRPC).
Your edit (still not an MCVE) shows some HTTP interaction done in Python. You could consider doing that in C, with the help of HTTP client libraries in C like libcurl, or (if so needed) of HTTP server libraries like libonion.
So consider rewriting your stuff in C but using several existing C libraries (how and what to choose is a very different question, probably off-topic on StackOverflow). Otherwise, accept the dependencies on Python.
The actual use case is collecting/processing robotics data in C at a high sampling rate. Whilst C is great for basic data processing there are excellent Python libraries which allow for much more comprehensive analysis.
You could keep high-level things in Python (see this) but recode low level things in C to accelerate them (many software are doing that, e.g. TensorFlow, ...), perhaps as extensions in C for Python or in some other process. Of course, that means some development efforts. I don't think that avoiding Python entirely is reasonable (getting rid of Python entirely is not pragmatical), if you use a lot of code in Python. BTW, you might perhaps consider embedding some other language in your C application (e.g. Lua, Guile, Ocaml - all of them are rumored to be faster than Python) and keep Python for the higher level, running in some other process.
You need to put more efforts on the architectural design of your thing. I'm not sure that avoiding Python entirely is a wise thing to do. Mixing Python and C (perhaps by having several processes cooperating) could be wiser. Of course you'll have operating system specific stuff (notably on the C side, for inter process communication). If on Linux, read something about Linux system programming in C, e.g. ALP or something newer.
I'm designing musical training games using JUCE -- a multiplatform C++ framework that allows me to code audio/visuals close to the wire.
However, I have coded my gameplay (control flow / data-processing) in Python -- it is complex and I wish to keep changing it so I can experiment with different gameplays. Python is ideal for this kind of rapid prototyping work.
So I would like my (platform independent, so Win/OSX/Lin/iOS/And) C++ to start up a Python runtime, feed it a .py file, and then call various functions in that .py. Also I would like to be able to call back to the C++ code from the .py.
Here is the relevant official Python documentation: https://docs.python.org/2/extending/extending.html
And here is a CodeProject article: http://www.codeproject.com/Articles/11805/Embedding-Python-in-C-C-Part-I
However, neither of them seem to address the issue of multiplatform.
The technique seems to be to link with the library libpython.a, and #include which contains the various functions for starting up the runtime environment, loading scripts, executing python-code, etc.
But surely this libpython.a would need to be compiled separately per platform? If so, this wouldn't be a very clean solution, so could I instead add the Python source code to my project and get it to compile the .a?
How can I go about doing this?
EDIT: https://wiki.python.org/moin/boost.python/EmbeddingPython
EDIT2: I'm pretty sure trying to bring in the full CPython source code is overkill here -- someone must have made some stripped down Python implementation in C/C++ that doesn't support any system-calls/multithreading/fancy-stuff -- just works through Python syntax line by line. Looking thru https://wiki.python.org/moin/PythonImplementations but I can't see an obvious candidate.
EDIT3: https://github.com/micropython/micropython should be added to that last page, but still it doesn't look like it is what I'm after
There's an entire chapter of the Python docs that explain the different approaches you can take embedding a Python interpreter into another app.
Embedding Python is similar to extending it, but not quite. The
difference is that when you extend Python, the main program of the
application is still the Python interpreter, while if you embed
Python, the main program may have nothing to do with Python — instead,
some parts of the application occasionally call the Python interpreter
to run some Python code.
So if you are embedding Python, you are providing your own main
program. One of the things this main program has to do is initialize
the Python interpreter. At the very least, you have to call the
function Py_Initialize(). There are optional calls to pass command
line arguments to Python. Then later you can call the interpreter from
any part of the application.
There are several different ways to call the interpreter: you can pass
a string containing Python statements to PyRun_SimpleString(), or you
can pass a stdio file pointer and a file name (for identification in
error messages only) to PyRun_SimpleFile(). You can also call the
lower-level operations described in the previous chapters to construct
and use Python objects.
A simple demo of embedding Python can be found in the directory
Demo/embed/ of the source distribution.
I recently decided to create a project that mixes C++ with Python, thus getting the best of both worlds. My idea was to do rapid prototyping of classes and functions in Python for obvious reasons, but still being able to call C++ code within Python (for obvious reasons as well). So instead of embedding Python in the C++ framework, I suggest you do the opposite: embed your C++ framework into a Python project. In order to do so, you just have to write very simple interface files and let Swig take care of the interfacing part.
If you want to start from scratch, there's a nice tool called cookiecutter that can be used to generate a project templates. You can choose either the cookiecutter-pypackage, or the cookiecutter-pylibrary, the latter improving over the former as described here. Interestingly, you can also use the cookiecutter code to generate the structure of a C++ project. This empty project uses the CMake build system, which IMHO is the best framework for developing platform independent C++ code. I then had to decide on the directory structure for this mixed project, so one of my previous posts describes this in detail. Good luck!
I'm using SWIG to embed Python into my C++ application, and to extend it as well, i.e. access my C++ API in Python outside my application. SWIG and Python are multi-platform, so that is not really an issue. One of the main advantage of SWIG is that it can generate bindings for a lot of languages. There are also a lot of C++ code wrappers that could be used, for example boost.python or cython.
Check these links on SO:
Extending python - to swig, not to swig or Cython
Exposing a C++ API to Python
Or you can go the hard way and use plain Python/C API.
I was wondering if someone was able to compile Django based projects (into shared object libs for instance) with pyrex (or anything similar) and still maintain the flexibility using normal Django projects with python.
We have to be able to use the project with apache so it cannot be compiled into a standalone binary. The way I think of it is that it will be compiled into libs and these libs will be exposed to the interpreter so it should behave like the current state of the project with python. Preferably without writing a lot of C code :)
Thanks in advance.
Pyrex and its sucessor - cython - are not fully python compatible - they are rather another language, although Python based.
Django is a very complex project, and would require strict Python compliance to run - i doubt it would be possible without some months of work to make Django work directly in cython or Pyrex - although one could use a profiler to turn specific bottlenecks into native code with much less effort (by replacing individual django modules in the core with ones optimized with Cython)
Moreover, optimization with Pyrex/Cython is not that "free" - one can have around a 30% speedup by running simple, numeric intensive code, in Cython without any changes in the code - but for greater speed increase, the code has to be manually tweaked so that some variables are made statically typed.
On the other hand, you might try running Django with Pypy -
there are some hints here:
http://reinout.vanrees.org/weblog/2011/06/06/django-and-pypy.html
Pypy is an extremly conformant Python interpretor, and Django core is known to work with it. It does use a Just In Time translation approach that makes it several times faster than the reference implementation of Python (CPython) for most workloads.
If I write a python script, anyone can simply point an editor to it and read it. But for programming written in C, one would have to use decompilers and hex tables and such. Why is that? I mean I simply can't open up the Safari web browser and look at its code.
Note: The author disavows a deep expertise in this subject. Some assertions may be incorrect.
Python actually is compiled into bytecode, which is what gets run by the python interpreter. Whenever you use a Python module, Python will generate a .pyc file with a name corresponding to the module. This is the equivalent of the .o file that's generated when you compile a C file.
So if you want something to disassemble, the .pyc file would be it :)
The process that Python goes through when compiling a module is pretty similar to what gcc or another C compiler does with C source code. The major difference is that it happens transparently as part of execution of the file. It's also optional: when running a non-module, i.e. an end-user script, Python will just interpret the code rather than compiling it first.
So really your question is "Why are python programs distributed as source rather than as compiled modules?" Or, put another way, "Why are C applications distributed as compiled binaries rather than as source code?"
It used to be very common for C applications to be distributed as source code. This was back before operating systems and their various subentities (i.e. linux distributions) became more established. Some distros, for example gentoo, still distribute apps as source code. Apps which are a bit more cutting edge or obscure are still distributed as source code for all platforms they target.
The reason for this is compatibility, and dependencies. The reason you can run the precompiled binary Safari on a Mac, or Firefox on Ubuntu Linux, is because it's been specifically built for that operating system, architecture (e.g. x86_64), and set of libraries.
Unfortunately, compilation of a large app is pretty slow, and needs to be redone at least partially every time the app is updated. Thus the motivation for binary distributions.
So why not create a binary distribution of Python? For one thing, as Aaron mentions, modules would need to be recompiled for each new version of the Python bytecode. But this would be similar to rebuilding a C app to link with a newer version of a dynamic library — Python modules are analogous in this sense to C libraries.
The real reason is that Python compilation is very much quicker than C compilation. This is in part, I think, because of the dynamic nature of the language, and also because it's not as thorough of a compilation. This has its tradeoffs: in particular, Python apps run much more slowly than do their C counterparts, because Python has to interpret the compiled bytecode into instructions for the processor, whereas the C app already contains such instructions.
That all being said, there is a program called py2exe that will take a Python module and distribution and build a precompiled windows executable, including in it the logic of the module and its dependencies, including Python itself. I guess the point of this is to avoid having to coerce people into installing Python on their Windows system just to run your app. Under linux, or I think even OS/X, Python is usually already installed, so precompilation is not really necessary. Linux systems also have super-dandy package managers that will transparently install dependencies such as Python if they are not already installed.
Python is a script language, runs in a virtual machine through an interpeter.
C is a compiled language, the code compiled to binary code which the computer can run without all that extra stuff Python needs.
This is sorta a big topic. You should look into your local friendly Computer Science curriculum, you'll find a lot of great stuff on this subject there.
The short answer is the Python is an "interpreted" language, which means that it requires a machine language program (the python interpreter) to run the python program, adding a layer of indirection. C or C++ are different. They are compiled directly to machine code, which runs directly on your processor.
There is a lot of additional voodoo to be learned here, however. Technically Python is compiled to a bytecode, and modern interpreters do more and more "Just in Time" compilation, so the boundaries between compiled and interpreted code are getting fuzzier all the time.
In several comments you asked: "Is it then possible to compile python to an executable binary file and then simply distribute that?"
From a theoretical viewpoint, there's no question the answer is yes -- a Python program could be compiled to, and distributed as, fully compiled machine code.
From a practical viewpoint, it's open to a lot more question. There are a few things like Unladen Swallow, Psyco, Shed Skin, and PyPy that you might want to know about though.
Unladen Swallow is primarily an attempt at making Python run faster, but part of the plan to do so involves using LLVM for its back-end. LLVM can (among other things) produce native machine code output. The last couple of releases of Unladen Swallow have used LLVM for native code generation, but 1) the most recent update on the web site is from late 2009, and 2) the release notes for that version say: "The Unladen Swallow team does not recommend wide adoption of the 2009Q3 release."
Psyco works as a plug-in for Python that basically does JIT compilation, so even though it can speed up execution (quite a lot in some cases), it doesn't produce a machine-code executable you can distribute. In short, while it's sort of similar to what you want, it's not intended to do exactly what you've asked for.
Shed Skin Python-to-C++ produces C++ as its output, and you then compile the C++ and (potentially) distribute the result of that. Shedskin is currently at version 0.5 -- i.e., nobody's claiming that it's a finished, released product. On the other hand, development is ongoing, and each release does seem to include pretty substantial improvements.
PyPy is a Python implementation written in Python. Their intent is to allow code production to be "plugged in" without affecting the rest of the implementation -- but while they currently support 4 different code generation models, I don't believe any of them results in producing native machine code that runs directly on the hardware.
Bottom line: work has been done and is being done with the intent of doing what you asked about, but at least to my knowledge there's not really anything I could reasonably recommend as a finished product that you can really depend on to do the job right now. The primary emphasis is really on execution speed, not producing standalone executables.
Yes, you can - it's called disassembling, and allows you to look at the code of Safari perfectly well. The thing is, C, among other languages, compiles to native code, i.e. code that your CPU can "understand" and execute.
More or less obviously, the level of abstraction present in the instruction set of your CPU is much smaller than that of a high level language like Python. The CPU instructions are not concerned with "downloading that URI", but more "check if that bit is set in a hardware register".
So, in conclusion, the level of complexity present in a native application is much higher when looking at the machine code, so many people simply can't make any sense of what is going on there, it's hard to get the big picture. With experience and time at your hands, it is possible though - people do it all the time, reversing applications and all.
you can't open up and read the code that actually runs for python either. Try
import dis
def foo():
for i in range(100):
print i
print dis.dis(foo)
That will show you the (human readable) bytcode of the foo program. equivalently, you can save the file and import it from the interactive python interpreter. This will create a .pyc file with the same basename as the script. open that with a hex editor and you are looking at the actually python bytecode.
The reason for the difference is that python changes up it's byte code between releases so that you would either need to distribute a different version of a binary only release for each version of python. This would be a pain.
With C, it's compiled to native code and so the byte code is much more stable making binary only releases possible.
because C code is complied to object (machine) code and python code is compiled into an intermediate byte code. I am not sure if you are even referring to the byte code of python - you must be referring to the source file itself which is directly executable (hiding the byte code from you!). C needs to be compiled and linked.
Python scripts are parsed and converted to binary only when they're run - i.e., they're text files and you can read them with an editor.
C code is compiled and linked to an executable binary file before they can be run. Normally, only this executable binary file is distributed - hence you need a decompiler. You can always view the source code, if you've access to it.
Not all C programs require decompilers. There's lots of C code distributed in source form. And some Python programs do require decompilers, if distributed as bytecode (.pyc files).
But, to the extent that your assumptions are valid, it's because C is a compiled language while Python is an interpreted language.
Python scripts are analogous to a man looking at a to-do list written in English (or language he understands). The man has to do all the work, every time that list of things has to be done.
If the man, instead of doing the steps on his own each time, creates and programs a robot which can carry out those steps again and again (and probably faster than him), that robot is analogous to the C program.
The man in the python case is called the "interpreter" and in the C case is called the "compiler", and the C robot is called the compiled program/executable.
When you look at the python program source, you see the to-do list. In case of the robot, you see the gears, motors and batteries, etc, which look very different from the to-do list. If you could get hold of the C "to-do" list, it looks somewhat like the python code, just in a different language.
G-WAN executes ANSI C scripts on the fly -making it just like Python scripts.
This can be server-side scripts (using G-WAN as a Web server) or any general-purpose C program and you can link any existing library.
Oh, and G-WAN C scripts are much faster than Python, PHP or Java...