Python program into a standard assembly?

Python program into a standard assembly? - python

Is it possible to convert Python programs to a Microprocessor standard assembly language like IEEE-694? The assembly syntax is close to this one
or this other one: http://www.ethicalhacker.net/content/view/152/2/

Compile python to C, then use a C compiler of your choice to get it down to assembly.
Alternatively, use PyPy, specifying LLVM as the target, and use the LLVM Static Compiler to yield assembly language for your target architecture.

Not in the same way as C, FORTRAN, COBOL, etc. Languages that support lambda calculus or automatic memory management cannot be compiled directly to assembly. An interpreter can, however, be provided in microcode or in a bootstrap program to bridge the gap and allow "compiled" Python, LISP, etc. (Some operations, such as garbage collection, are still carried out within the embedded interpreter packaged into the compiled binary.)

Since Python is a dynamically typed language, this would only be possible if the assembly program would use the runtime environment / library of Python to dynamically get objects.
So it would only be possible with some overhead.
But there is RPython from the PyPy project. It is a restricted subset of the Python language (it is not longer dynamically typed and lacks most modules from Python's standard library). RPython programs can be translated to machine code (AFAIK it generates C code as a intermediate code).
Python itself generates a intermediate code for it's virtual machine. If you want to have a look at this code, use the dis module from the Python standard library. This generates a assembly-like representation of your Python function. Keep in mind that a "real" microprocessor would not be able to use this and that the result might change with the Python version you are using.

Related

How can I easily convert FORTRAN code to Python code (real code, not wrappers)

I have a numerical library in FORTRAN (I believe FORTRAN IV) and I want to convert it to Python code. I want real source code that I can import on any Python virtual machine --- Windows, MacOS-X, Linux, Android. I started to do this by hand, but there are about 1,000 routines in the library, so that's not a reasonable solution.

Such a tool exists for Fortran to Lisp, or Fortran to C, or even Fortran to Java. But you will never have a Fortran to Python tool, for a simple reason: unlike Fortran, Lisp or C, Python does not have GOTO [1]. And there are many GOTOs in Fortran (especially Fortran IV) code. Even if there is a theorem by Jacopini stating that you can emulate GOTO with structured programming, it's far too cumbersome to implement a real (and efficient) language conversion tool.
So not only will you need to translate the code of 1000 routines, but you will also need to understand each algorithm, with all its imbricated gotos, and translate the algorithm into a structured program before writing it in Python. Good luck!
Hey, why do you think a wrapper is bad? Windows, OSX, and Linux all have Fortran and C [2] compilers and good wrappers!
For C (not your language here, but f2c may be an option), there is SWIG, and Fortran has f2py, now integrated with numpy. SWIG has some support for Android.
By the way, instead of converting to "pure" Python, you can use NumPy: NumPy capabilities are similar to Fortran 90 (see a comparison here), so you may consider first translating your programs to F90 for a smoother transition. There seems to be also a Numpy on Adnroid. And in case you need NumPy on 64-bit Windows, there are binaries here.
If you decide to use wrappers, gfortran runs on Linux (simply install from distribution packages), Windows (MinGW), and Android.
If you go along that line, don't forget you compile FIV code, so there is the usual "one-trip loop" problem (usually a compiler option is fine). You will probably have also to manually convert some old, non-standard statements, not found in modern compilers.
You have also, obviously, the option to switch your project language to Lisp or Java...
[1] You may ask: but if GOTO is the problem, how come there is a Fortran to Java tool? Well, it uses tricks with the JVM, which has internally the GOTO instruction. There is also a GOTO in Python bytecode (look for JUMP here), so there may be something to investigate here. So my previous statement is wrong: there may be a Fortran to Python tool, using bytecode tricks like in Java. But it remains to develop, and the availability of good libraries (like NumPy, matplotlib, pandas...) makes it unnecessary, to say the least.

I wrote a translator that converts a subset of Fortran into Python (and several other languages). It is only compatible with a small subset of Fortran, but I hope it will still be useful.
The translator can parse this Fortran function:
LOGICAL function is_greater_than(a, b)
real,intent(in) :: a
real,intent(in) :: b
is_greater_than = a<b
end function is_greater_than
...and translate it into this Python function:
def is_greater_than(a,b):
return a<b

What does it mean when people say CPython is written in C?

From what I know, CPython programs are compiled into intermediate bytecode, which is executed by the virtual machine. Then how does one identify without knowing beforehand that CPython is written in C. Isn't there some common DNA for both which can be matched to identify this?

The interpreter is written in C.
It compiles Python code into bytecode, and then an evaluation loop interprets that bytecode to run your code.
You identify what Python is written in by looking at it's source code. See the source for the evaluation loop for example.
Note that the Python.org implementation is but one Python implementation. We call it CPython, because it is implemented in C. There are other implementations too, written in other languages. Jython is written in Java, IronPython in C#, and then there is PyPy, which is written in a (subset of) Python, and runs many tasks faster than CPython.

Python isn't written in C. Arguably, Python is written in an esoteric English dialect using BNF.
However, all the following statements are true:
Python is a language, consisting of a language specification and a bunch of standard modules
Python source code is compiled to a bytecode representation
this bytecode could in principle be executed directly by a suitably-designed processor but I'm not aware of one actually existing
in the absence of a processor that natively understands the bytecode, some other program must be used to translate the bytecode to something a hardware processor can understand
one real implementation of this runtime facility is CPython
CPython is itself written in C, but ...
C is a language, consisting of a language specification and a bunch of standard libraries
C source code is compiled to some bytecode format (typically something platform-specific)
this platform specific format is typically the native instruction set of some processor (in which case it may be called "object code" or "machine code")
this native bytecode doesn't retain any magical C-ness: it is just instructions. It doesn't make any difference to the processor which language the bytecode was compiled from
so the CPython executable which translates your Python bytecode is a sequence of
instructions executing directly on your processor
so you have: Python bytecode being interpreted by machine code being interpreted by the hardware processor
Jython is another implementation of the same Python runtime facility
Jython is written in Java, but ...
Java is a language, consisting of a spec, standard libraries etc. etc.
Java source code is compiled to a different bytecode
Java bytecode is also executable either on suitable hardware, or by some runtime facility
The Java runtime environment which provides this facility may also be written in C
so you have: Python bytecode being interpreted by Java bytecode being interpreted by machine code being interpreted by the hardware processor
You can add more layers indefinitely: consider that your "hardware processor" may really be a software emulation, or that hardware processors may have a front-end that decodes their "native" instruction set into another internal bytecode.
All of these layers are defined by what they do (executing or interpreting instructions according to some specification), not how they implement it.
Oh, and I skipped over the compilation step. The C compiler is typically written in C (and getting any language to the stage where it can compile itself is traditionally significant), but it could just as well be written in Python or Java. Again, the compiler is defined by what it does (transforms some source language to some output such as a bytecode, according to the language spec), rather than how it is implemented.

I found a good understanding of my original doubt here:
http://amitsaha.github.io/site/notes/articles/c_python_compiler_interpreter.html

Question about python construction

A friend of mine that is a programmer told me that "Python is written in Python" or something like that. He meant that Python interpreter is written in Python (I think). I've read in some websites that Python interpret in real time ANY programming language (even C++ and ASM). Is this true?
Could someone explain me HOW COULD IT BE?
The unique explanation that I came up with after thinking a bit is: python is at the same "level" of ASM, it makes sense to python interpret any language (that is in a higher level), am I right? Does this make sense?
I would be grateful is someone explain me a little about it.
Thank you

It's not true. The standard implementation of Python - CPython - is written in C, although much of the standard library is written in Python. There are other implementations in Java (Jython) and .NET (IronPython).
There is a project called PyPy which, among other things, is rewriting the C parts of Python into Python. But the main development of Python is still based on C.

Your friend told you that Python is self-hosting:
The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-hosting include kernels, assemblers, shells and revision control software.
Of course, the very first revision of Python had to be bootstrapped by some other mechanism -- perhaps C or C++ as these are fairly standard targets for lexers and parser generators.

Generally, when someone says language X is written in X, they mean that first a compiler or interpreter for X was written in assembly or other such language, compiled, and then a better compiler or interpreter was written in X.
Additionally, once a very basic compiler/interpreter for X exists, it is sometimes easier to add new language features, classes, etc. to X by writing them in X than to extend the compiler/interpreter itself.

Python is written in C (CPython) as well as Python.
Read about pypy -- that's Python written in Python.
Writing Python in Python is a two-step dance.
Write Python in some other language. C, Java, assembler, COBOL, whatever.
Once you have a working implementation of Python (i.e., passes all the tests) you can then write Python in Python.
When you read about pypy, you'll see that they do something a hair more sophisticated than this. "We are using a subset of the high-level language Python, called RPython, in which we write languages as simple interpreters with few references to and dependencies on lower level details."
So they started with a working Python and then broke the run-time into this RPython kernel which is the smallest nugget of Python goodness. Then they built the rest of Python around the RPython kernel.

Is IronPython a 100% pure Python variant?

I just downloaded the original Python interpreter from Python's site. I just want to learn this language but to start with, I want to write Windows-based standalone applications that are powered by any RDBMS. I want to bundle it like any typical Windows setup.
I searched old posts on SO and found guys suggesting wxPython and py2exe. Apart from that few suggested IronPython since it is powered by .NET.
I want to know whether IronPython is a pure variant of Python or a modified variant. Secondly, what is the actual use of Python? Is it for PHP like thing or like C# (you can either program Windows-based app. or Web.).

IronPython isn't a variant of Python, it is Python. It's an implementation of the Python language based on the .NET framework. So, yes, it is pure Python.
IronPython is caught up to CPython (the implementation you're probably used to) 2.6, so some of the features/changes seen in Python 2.7 or 3.x will not be present in IronPython. Also, the standard library is a bit different (but what you lose is replaced by all that .NET has to offer).
The primary application of IronPython is to script .NET applications written in C# etc., but it can also be used as a standalone. IronPython can also be used to write web applications using the SilverLight framework.
If you need access to .NET features, use IronPython. If you're just trying to make a Windows executable, use py2exe.
Update
For writing basic RDBMS apps, just use CPython (original Python), it's more extensible and faster. Then, you can use a number of tools to make it stand alone on a Windows PC. For now, though, just worry about learning Python (those skills will mostly carry over to IronPython if you choose to switch) and writing your application.

IronPython is an independent Python implementation written in C# as opposed to the original implementation, often referred to as CPython due to it being written in (no surprise) C.
Python is multi-purpose - you can use it to write web apps (often using a framework such as Django or Pylons), GUI apps (as you've mentioned), command-line tools and as a scripting language embedded inside an app written in another language (for instance, the 3D modelling tool Blender can be scripted using Python).

what does "Pure Python" mean? If you're talking about implemented in Python in the same sense that a module may be pure python, then no, and no Python implementation is. If you mean "Compatible with cPython" then yes, code written to cPython will work in IronPython, with a few caveats. The one that's likely to matter most is that the libraries are different, for instance code depending on ctypes or Tkinter won't work. Another difference is that IronPython lags behind cPython by a bit. the very latest version of this writing is 2.6.1, with an Alpha version supporting a few of the 2.7 language features available too.
What do you really need? If you want to learn to program with python, and also want to produce code for windows, you can use IronPython for that, but you can also use cPython and py2exe; both will work equally well for this with only differences in the libraries.

IronPython is an implementation of Python using C#. It's just like the implementation of Python using Java by Jython. You might want to note that IronPython and Jython will always lag behind a little bit in development. However, you do get the benefit of having some libraries that's not available in the standard Python libraries. In IronPython, you will be able to get access to some of the .NET stuff, like System.Drawings and such, though by using these non-standard libraries, it will be harder to port your code to other platforms. For example, you will have to install mono to run apps written in IronPython on Linux (On windows you will need the .NET Framework)

Why do C programs require decompilers but python programs dont?

If I write a python script, anyone can simply point an editor to it and read it. But for programming written in C, one would have to use decompilers and hex tables and such. Why is that? I mean I simply can't open up the Safari web browser and look at its code.

Note: The author disavows a deep expertise in this subject. Some assertions may be incorrect.
Python actually is compiled into bytecode, which is what gets run by the python interpreter. Whenever you use a Python module, Python will generate a .pyc file with a name corresponding to the module. This is the equivalent of the .o file that's generated when you compile a C file.
So if you want something to disassemble, the .pyc file would be it :)
The process that Python goes through when compiling a module is pretty similar to what gcc or another C compiler does with C source code. The major difference is that it happens transparently as part of execution of the file. It's also optional: when running a non-module, i.e. an end-user script, Python will just interpret the code rather than compiling it first.
So really your question is "Why are python programs distributed as source rather than as compiled modules?" Or, put another way, "Why are C applications distributed as compiled binaries rather than as source code?"
It used to be very common for C applications to be distributed as source code. This was back before operating systems and their various subentities (i.e. linux distributions) became more established. Some distros, for example gentoo, still distribute apps as source code. Apps which are a bit more cutting edge or obscure are still distributed as source code for all platforms they target.
The reason for this is compatibility, and dependencies. The reason you can run the precompiled binary Safari on a Mac, or Firefox on Ubuntu Linux, is because it's been specifically built for that operating system, architecture (e.g. x86_64), and set of libraries.
Unfortunately, compilation of a large app is pretty slow, and needs to be redone at least partially every time the app is updated. Thus the motivation for binary distributions.
So why not create a binary distribution of Python? For one thing, as Aaron mentions, modules would need to be recompiled for each new version of the Python bytecode. But this would be similar to rebuilding a C app to link with a newer version of a dynamic library — Python modules are analogous in this sense to C libraries.
The real reason is that Python compilation is very much quicker than C compilation. This is in part, I think, because of the dynamic nature of the language, and also because it's not as thorough of a compilation. This has its tradeoffs: in particular, Python apps run much more slowly than do their C counterparts, because Python has to interpret the compiled bytecode into instructions for the processor, whereas the C app already contains such instructions.
That all being said, there is a program called py2exe that will take a Python module and distribution and build a precompiled windows executable, including in it the logic of the module and its dependencies, including Python itself. I guess the point of this is to avoid having to coerce people into installing Python on their Windows system just to run your app. Under linux, or I think even OS/X, Python is usually already installed, so precompilation is not really necessary. Linux systems also have super-dandy package managers that will transparently install dependencies such as Python if they are not already installed.

Python is a script language, runs in a virtual machine through an interpeter.
C is a compiled language, the code compiled to binary code which the computer can run without all that extra stuff Python needs.

This is sorta a big topic. You should look into your local friendly Computer Science curriculum, you'll find a lot of great stuff on this subject there.
The short answer is the Python is an "interpreted" language, which means that it requires a machine language program (the python interpreter) to run the python program, adding a layer of indirection. C or C++ are different. They are compiled directly to machine code, which runs directly on your processor.
There is a lot of additional voodoo to be learned here, however. Technically Python is compiled to a bytecode, and modern interpreters do more and more "Just in Time" compilation, so the boundaries between compiled and interpreted code are getting fuzzier all the time.

In several comments you asked: "Is it then possible to compile python to an executable binary file and then simply distribute that?"
From a theoretical viewpoint, there's no question the answer is yes -- a Python program could be compiled to, and distributed as, fully compiled machine code.
From a practical viewpoint, it's open to a lot more question. There are a few things like Unladen Swallow, Psyco, Shed Skin, and PyPy that you might want to know about though.
Unladen Swallow is primarily an attempt at making Python run faster, but part of the plan to do so involves using LLVM for its back-end. LLVM can (among other things) produce native machine code output. The last couple of releases of Unladen Swallow have used LLVM for native code generation, but 1) the most recent update on the web site is from late 2009, and 2) the release notes for that version say: "The Unladen Swallow team does not recommend wide adoption of the 2009Q3 release."
Psyco works as a plug-in for Python that basically does JIT compilation, so even though it can speed up execution (quite a lot in some cases), it doesn't produce a machine-code executable you can distribute. In short, while it's sort of similar to what you want, it's not intended to do exactly what you've asked for.
Shed Skin Python-to-C++ produces C++ as its output, and you then compile the C++ and (potentially) distribute the result of that. Shedskin is currently at version 0.5 -- i.e., nobody's claiming that it's a finished, released product. On the other hand, development is ongoing, and each release does seem to include pretty substantial improvements.
PyPy is a Python implementation written in Python. Their intent is to allow code production to be "plugged in" without affecting the rest of the implementation -- but while they currently support 4 different code generation models, I don't believe any of them results in producing native machine code that runs directly on the hardware.
Bottom line: work has been done and is being done with the intent of doing what you asked about, but at least to my knowledge there's not really anything I could reasonably recommend as a finished product that you can really depend on to do the job right now. The primary emphasis is really on execution speed, not producing standalone executables.

Yes, you can - it's called disassembling, and allows you to look at the code of Safari perfectly well. The thing is, C, among other languages, compiles to native code, i.e. code that your CPU can "understand" and execute.
More or less obviously, the level of abstraction present in the instruction set of your CPU is much smaller than that of a high level language like Python. The CPU instructions are not concerned with "downloading that URI", but more "check if that bit is set in a hardware register".
So, in conclusion, the level of complexity present in a native application is much higher when looking at the machine code, so many people simply can't make any sense of what is going on there, it's hard to get the big picture. With experience and time at your hands, it is possible though - people do it all the time, reversing applications and all.

you can't open up and read the code that actually runs for python either. Try
import dis
def foo():
for i in range(100):
print i
print dis.dis(foo)
That will show you the (human readable) bytcode of the foo program. equivalently, you can save the file and import it from the interactive python interpreter. This will create a .pyc file with the same basename as the script. open that with a hex editor and you are looking at the actually python bytecode.
The reason for the difference is that python changes up it's byte code between releases so that you would either need to distribute a different version of a binary only release for each version of python. This would be a pain.
With C, it's compiled to native code and so the byte code is much more stable making binary only releases possible.

because C code is complied to object (machine) code and python code is compiled into an intermediate byte code. I am not sure if you are even referring to the byte code of python - you must be referring to the source file itself which is directly executable (hiding the byte code from you!). C needs to be compiled and linked.

Python scripts are parsed and converted to binary only when they're run - i.e., they're text files and you can read them with an editor.
C code is compiled and linked to an executable binary file before they can be run. Normally, only this executable binary file is distributed - hence you need a decompiler. You can always view the source code, if you've access to it.

Not all C programs require decompilers. There's lots of C code distributed in source form. And some Python programs do require decompilers, if distributed as bytecode (.pyc files).
But, to the extent that your assumptions are valid, it's because C is a compiled language while Python is an interpreted language.

Python scripts are analogous to a man looking at a to-do list written in English (or language he understands). The man has to do all the work, every time that list of things has to be done.
If the man, instead of doing the steps on his own each time, creates and programs a robot which can carry out those steps again and again (and probably faster than him), that robot is analogous to the C program.
The man in the python case is called the "interpreter" and in the C case is called the "compiler", and the C robot is called the compiled program/executable.
When you look at the python program source, you see the to-do list. In case of the robot, you see the gears, motors and batteries, etc, which look very different from the to-do list. If you could get hold of the C "to-do" list, it looks somewhat like the python code, just in a different language.

G-WAN executes ANSI C scripts on the fly -making it just like Python scripts.
This can be server-side scripts (using G-WAN as a Web server) or any general-purpose C program and you can link any existing library.
Oh, and G-WAN C scripts are much faster than Python, PHP or Java...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.