Python optimization through bytecode

Python optimization through bytecode - python

When it comes to optimization of Python code (or any code), it often ends up coming down to profiling to optimize bottlenecks or slow functions. However, when optimizing these areas of code, are there any use cases for using pure Python bytecode inline? I know you can do this through the use of the compile built-in function and compiler module.

... are there any use cases for using pure Python bytecode inline?
Yes. Sometimes you can handroll a little faster code than Python normally generates on its own.
Also, you can gain access to the loop induction variables for list comprehensions.
Here are some links to get you started: https://www.google.com/search?q=python+bytecode+hacks
If you would like to make the bytecode manipulations programmatically, here is an optimization recipe that shows how to go about it: Decorator for BindingConstants at compile time
That said, if you care about speed, the simplest speed-up is often to run pypy instead of cpython if your application allows it.

No. The source code is compiled to bytecode only once, when the module is first loaded. The bytecode is what is interpreted at runtime. So even if you could put bytecode inline into your source, it would at most only affect the startup time of the program by reducing the amount of time Python spent converting the source code into bytecode. It wouldn't actually change how fast that code runs. For instance, a "pure bytecode" version of a loop that runs 1000 times wouldn't run any faster than the same loop written in Python source code. See this similar question and also this one.
compile and compiler exist to let you dynamically create new executable source code from strings while the program is running. This isn't for a performance benefit, it's just because there's no other way to do it. When you run a Python program, Python compiles its source code to bytecode. But if you want to dynamically create new functions that aren't directly present in the source code (e.g., by mixing and matching code fragments, or allowing users to type in code while the program is running), you need a way to compile them on the fly, and that's what compile is for. (This is actually not a common need, so those functions aren't used that often.)

Related

LinkedList on python and c++ [duplicate]

Why does Python seem slower, on average, than C/C++? I learned Python as my first programming language, but I've only just started with C and already I feel I can see a clear difference.

Python is a higher level language than C, which means it abstracts the details of the computer from you - memory management, pointers, etc, and allows you to write programs in a way which is closer to how humans think.
It is true that C code usually runs 10 to 100 times faster than Python code if you measure only the execution time. However if you also include the development time Python often beats C. For many projects the development time is far more critical than the run time performance. Longer development time converts directly into extra costs, fewer features and slower time to market.
Internally the reason that Python code executes more slowly is because code is interpreted at runtime instead of being compiled to native code at compile time.
Other interpreted languages such as Java bytecode and .NET bytecode run faster than Python because the standard distributions include a JIT compiler that compiles bytecode to native code at runtime. The reason why CPython doesn't have a JIT compiler already is because the dynamic nature of Python makes it difficult to write one. There is work in progress to write a faster Python runtime so you should expect the performance gap to be reduced in the future, but it will probably be a while before the standard Python distribution includes a powerful JIT compiler.

CPython is particularly slow because it has no Just in Time optimizer (since it's the reference implementation and chooses simplicity over performance in certain cases). Unladen Swallow is a project to add an LLVM-backed JIT into CPython, and achieves massive speedups. It's possible that Jython and IronPython are much faster than CPython as well as they are backed by heavily optimized virtual machines (JVM and .NET CLR).
One thing that will arguably leave Python slower however, is that it's dynamically typed, and there is tons of lookup for each attribute access.
For instance calling f on an object A will cause possible lookups in __dict__, calls to __getattr__, etc, then finally call __call__ on the callable object f.
With respect to dynamic typing, there are many optimizations that can be done if you know what type of data you are dealing with. For example in Java or C, if you have a straight array of integers you want to sum, the final assembly code can be as simple as fetching the value at the index i, adding it to the accumulator, and then incrementing i.
In Python, this is very hard to make code this optimal. Say you have a list subclass object containing ints. Before even adding any, Python must call list.__getitem__(i), then add that to the "accumulator" by calling accumulator.__add__(n), then repeat. Tons of alternative lookups can happen here because another thread may have altered for example the __getitem__ method, the dict of the list instance, or the dict of the class, between calls to add or getitem. Even finding the accumulator and list (and any variable you're using) in the local namespace causes a dict lookup. This same overhead applies when using any user defined object, although for some built-in types, it's somewhat mitigated.
It's also worth noting, that the primitive types such as bigint (int in Python 3, long in Python 2.x), list, set, dict, etc, etc, are what people use a lot in Python. There are tons of built in operations on these objects that are already optimized enough. For example, for the example above, you'd just call sum(list) instead of using an accumulator and index. Sticking to these, and a bit of number crunching with int/float/complex, you will generally not have speed issues, and if you do, there is probably a small time critical unit (a SHA2 digest function, for example) that you can simply move out to C (or Java code, in Jython). The fact is, that when you code C or C++, you are going to waste lots of time doing things that you can do in a few seconds/lines of Python code. I'd say the tradeoff is always worth it except for cases where you are doing something like embedded or real time programming and can't afford it.

Compilation vs interpretation isn't important here: Python is compiled, and it's a tiny part of the runtime cost for any non-trivial program.
The primary costs are: the lack of an integer type which corresponds to native integers (making all integer operations vastly more expensive), the lack of static typing (which makes resolution of methods more difficult, and means that the types of values must be checked at runtime), and the lack of unboxed values (which reduce memory usage, and can avoid a level of indirection).
Not that any of these things aren't possible or can't be made more efficient in Python, but the choice has been made to favor programmer convenience and flexibility, and language cleanness over runtime speed. Some of these costs may be overcome by clever JIT compilation, but the benefits Python provides will always come at some cost.

The difference between python and C is the usual difference between an interpreted (bytecode) and compiled (to native) language. Personally, I don't really see python as slow, it manages just fine. If you try to use it outside of its realm, of course, it will be slower. But for that, you can write C extensions for python, which puts time-critical algorithms in native code, making it way faster.

Python is typically implemented as a scripting language. That means it goes through an interpreter which means it translates code on the fly to the machine language rather than having the executable all in machine language from the beginning. As a result, it has to pay the cost of translating code in addition to executing it. This is true even of CPython even though it compiles to bytecode which is closer to the machine language and therefore can be translated faster. With Python also comes some very useful runtime features like dynamic typing, but such things typically cannot be implemented even on the most efficient implementations without heavy runtime costs.
If you are doing very processor-intensive work like writing shaders, it's not uncommon for Python to be somewhere around 200 times slower than C++. If you use CPython, that time can be cut in half but it's still nowhere near as fast. With all those runtmie goodies comes a price. There are plenty of benchmarks to show this and here's a particularly good one. As admitted on the front page, the benchmarks are flawed. They are all submitted by users trying their best to write efficient code in the language of their choice, but it gives you a good general idea.
I recommend you try mixing the two together if you are concerned about efficiency: then you can get the best of both worlds. I'm primarily a C++ programmer but I think a lot of people tend to code too much of the mundane, high-level code in C++ when it's just a nuisance to do so (compile times as just one example). Mixing a scripting language with an efficient language like C/C++ which is closer to the metal is really the way to go to balance programmer efficiency (productivity) with processing efficiency.

Comparing C/C++ to Python is not a fair comparison. Like comparing a F1 race car with a utility truck.
What is surprising is how fast Python is in comparison to its peers of other dynamic languages. While the methodology is often considered flawed, look at The Computer Language Benchmark Game to see relative language speed on similar algorithms.
The comparison to Perl, Ruby, and C# are more 'fair'

Aside from the answers already posted, one thing is Python's ability to change things during runtime, which you can't do in other languages such as C. You can add member functions to classes as you go.
Also, Pythons' dynamic nature makes it impossible to say what type of parameters will be passed to a function, which in turn makes optimizing a whole lot harder.
RPython seems to be a way of getting around the optimization problem.
Still, it'll probably won't be near the performance of C for number-crunching and the like.

C and C++ compile to native code- that is, they run directly on the CPU. Python is an interpreted language, which means that the Python code you write must go through many, many stages of abstraction before it can become executable machine code.

Python is a high-level programming language. Here is how a python script runs:
The python source code is first compiled into Byte Code. Yes, you heard me right! Though Python is an interpreted language, it first gets compiled into byte code. This byte code is then interpreted and executed by the Python Virtual Machine(PVM).
This compilation and execution are what make Python slower than other low-level languages such as C/C++. In languages such as C/C++, the source code is compiled into binary code which can be directly executed by the CPU thus making their execution efficient than that of Python.

This answer applies to python3. Most people do not know that a JIT-like compile occurs whenever you use the import statement. CPython will search for the imported source file (.py), take notice of the modification date, then look for compiled-to-bytecode file (.pyc) in a subfolder named "_ _ pycache _ _" (dunder pycache dunder). If everything matches then your program will use that bytecode file until something changes (you change the source file or upgrade Python)
But this never happens with the main program which is usually started from a BASH shell, interactively or via. Here is an example:
#!/usr/bin/python3
# title : /var/www/cgi-bin/name2.py
# author: Neil Rieck
# edit : 2019-10-19
# ==================
import name3 # name3.py will be cache-checked and/or compiled
import name4 # name4.py will be cache-checked and/or compiled
import name5 # name5.py will be cache-checked and/or compiled
#
def main():
#
# code that uses the imported libraries goes here
#
if __name__ == "__main__":
main()
#
Once executed, the compiled output code will be discarded. However, your main python program will be compiled if you start up via an import statement like so:
#!/usr/bin/python3
# title : /var/www/cgi-bin/name1
# author: Neil Rieck
# edit : 2019-10-19
# ==================
import name2 # name2.py will be cache-checked and/or compiled
#name2.main() #
And now for the caveats:
if you were testing code interactively in the Apache area, your compiled file might be saved with privs that Apache can't read (or write on a recompile)
some claim that the subfolder "_ _ pycache _ _" (dunder pycache dunder) needs to be available in the Apache config
will SELinux allow CPython to write to subfolder (this was a problem in CentOS-7.5 but I believe a patch has been made available)
One last point. You can access the compiler yourself, generate the pyc files, then change the protection bits as a workaround to any of the caveats I've listed. Here are two examples:
method #1
=========
python3
import py_compile
py_compile("name1.py")
exit()
method #2
=========
python3 -m py_compile name1.py

python is interpreted language is not complied and its not get combined with CPU hardware
but I have a solutions for increase python as a faster programing language
1.Use python3 for run and code python command like Ubuntu or any Linux distro use python3 main.py and update regularly your python so you python3 framework modules and libraries i will suggest use pip 3.
2.Use [Numba][1] python framework with JIT compiler this framework use for data visualization but you can use for any program this framework use GPU acceleration of your program.
3.Use [Profiler optimizing][1] so this use for see with function or syntax for bit longer or faster also have use full to change syntax as a faster for python its very god and work full so this give a with function or syntax using much more time execution of code.
4.Use multi threading so making multiprocessing of program for python so use CPU cores and threads so this make your code much more faster.
5.Using C,C#,C++ increasing python much more faster i think its called parallel programing use like a [cpython][1] .
6.Debug your code for test your code to make not bug in your code so then you will get little bit your code faster also have one more thing Application logging is for debugging code.
and them some low things that makes your code faster:
1.Know the basic data structures for using good syntax use make best code.
2.make a best code have Reduce memory footprinting.
3.Use builtin functions and libraries.
4.Move calculations outside the loop.
5.keep your code base small.
so using this thing then get your code much more faster yes so using this python not a slow programing language

Embed python into fortran 90

I was looking at the option of embedding python into fortran90 to add python functionality to my existing fortran90 code. I know that it can be done the other way around by extending python with fortran90 using the f2py from numpy. But, i want to keep my super optimized main loop in fortran and add python to do some additional tasks / evaluate further developments before I can do it in fortran, and also to ease up code maintenance. I am looking for answers for the following questions:
1) Is there a library that already exists from which I can embed python into fortran? (I am aware of f2py and it does it the other way around)
2) How do we take care of data transfer from fortran to python and back?
3) How can we have a call back functionality implemented? (Let me describe the scenario a bit....I have my main_fortran program in Fortran, that call Func1_Python module in python. Now, from this Func1_Python, I want to call another function...say Func2_Fortran in fortran)
4) What would be the impact of embedding the interpreter of python inside fortran in terms of performance....like loading time, running time, sending data (a large array in double precision) across etc.
Thanks a lot in advance for your help!!
Edit1: I want to set the direction of the discussion right by adding some more information about the work I am doing. I am into scientific computing stuff. So, I would be working a lot on huge arrays / matrices in double precision and doing floating point operations. So, there are very few options other than fortran really to do the work for me. The reason i want to include python into my code is that I can use NumPy for doing some basic computations if necessary and extend the capabilities of the code with minimal effort. For example, I can use several libraries available to link between python and some other package (say OpenFoam using PyFoam library).

1. Don't do it
I know that you're wanting to add Python code inside a Fortan program, instead of having a Python program with Fortran extensions. My first piece of advice is to not do this. Fortran is faster than Python at array arithmetic, but Python is easier to write than Fortran, it's easier to extend Python code with OOP techniques, and Python may have access to libraries that are important to you. You mention having a super-optimized main loop in Fortran; Fortran is great for super-optimized inner loops. The logic for passing a Fortran array around in a Python program with Numpy is much more straightforward than what you would have to do to correctly handle a Python object in Fortran.
When I start a scientific computing project from scratch, I always write first in Python, identify performance bottlenecks, and translate those into Fortran. Being able to test faster Fortran code against validated Python code makes it easier to show that the code is working correctly.
Since you have existing code, extending the Python code with a module made in Fortran will require refactoring, but this process should be straightforward. Separate the initialization code from the main loop, break the loop into logical pieces, wrap each of these routines in a Python function, and then your main Python code can call the Fortran subroutines and interleave these with Python functions as appropriate. In this process, you may be able to preserve a lot of the optimizations you have in your main loop in Fortran. F2PY is a reasonably standard tool for this, so it won't be tough to find people who can help you with whatever problems will arise.
2. System calls
If you absolutely must have Fortran code calling Python code, instead of the other way around, the simplest way to do this is to just have the Fortran code write some data to disk, and run the Python code with a SYSTEM or EXECUTE_COMMAND_LINE. If you use EXECUTE_COMMAND_LINE, you can have the Python code output its result to stdout, and the Fortran code can read it as character data; if you have a lot of output (e.g., a big matrix), it would make more sense for the Python code to output a file that the Fortran code then reads. Disk read/write overhead could wind up being prohibitively significant for this. Also, you would have to write Fortran code to output your data, Python code to read it, Python code to output it again, and Fortran code to re-input the data. This code should be straightforward to write and test, but keeping these four parts in sync as you edit the code may turn into a headache.
(This approach is tried in this Stack Overflow question)
3. Embedding Python in C in Fortran
There is no way that I know of to directly pass a Python object in memory to Fortran. However, Fortran code can call C code, and C code can have Python embedded in it. (See the Python tutorial on extending and embedding.) In general, extending Python (like I recommend in point 1) is preferable to embedding it in C/C++. (See Extending Vs. Embedding: There is Only One Correct Decision.) Getting this to work will be a nightmare, because any communication problems between Python and Fortran could happen between Python and C, or between C and Fortran. I don't know if anyone is actually embedding Python in C in Fortran, and so getting help will be difficult.

I have developed the library Forpy that allows you to use Python in Fortran (embedding).
It uses Fortran C interoperability to call Python C API functions.
While I agree that extending (using Fortran in Python) is often preferable, embedding has its uses:
Large, existing Fortran codes might need a substantial amount of refactoring before
they can be used from Python - here embedding can save development time
Replacing a part of an existing code with a Python implementation
Temporarily embedding Python to experiment with a given Fortran code:
for example to test alternative algorithms or to extract intermediary results
Besides embedding, Forpy also supports extending Python.
With Forpy you can write a Python extension module entirely in Fortran.
An advantage to existing tools such as f2py is that you can use Python datatypes
(e. g. to write a function that takes a Python list as argument or a function that returns a Python dict).
Working with existing, possibly legacy, Fortran codes is often very challenging and I
think that developers should have tools at their disposal both for embedding and extending Python.

If you are going to embed Python in Fortran, you will have to do it via Fortran's C interface; that's what ISO_C_BINDING is for. I would caution against embedding Python, not because of the technical difficulty in doing so, but because Python (the language or the community) seems adamantly opposed to Python being used as a subordinate language. The common view is that whatever non-Python language your code is currently written in should be broken up into libraries and used to extend Python, never the other way around. So you will see (as here) more responses trying to convince you that you really don't want to do what you actually want to do than actual technical assistance.
This is not flaming or editorializing or making a moral judgment; this is a simple statement of fact. You will not get help from the Python community if you try to embed Python.
If what you need is functionality beyond what the Fortran language itself supports (e.g. filesystem operations) and you do not specifically need Python and you want a language more expressive than C, you may want to look at embedding Lua instead. Unlike Python, Lua is specifically meant to be embedded so you are likely to face much less social and technical resistance.
There are a number of projects which integrate Fortran and Lua, the most complete one I've seen to date is Aotus. The author is very responsive and the integration process is simple.
Admittedly, this does not answer the original question (how to embed a Python interpreter in a Fortran 90 application) but to be fair, none of the other responses do either. I use Python as my portable general-purpose language of choice these days and I'd really prefer to stick with it when extending our primary products (written in Fortran). For the reasons laid out above, I abandoned my attempts to embed Python and switched to embedding Lua; for social reasons, I feel Lua is a much better technical choice. It's not my first choice but it's workable, at least in my case.
Apologies if I've offended anyone; I'm not trying to pick a fight, just relating my experience when researching this specific topic.

There is a very easy way to do this using f2py. Write your python method and add it as an input to your Fortran subroutine. Declare it in both the cf2py hook and the type declaration as EXTERNAL and also as its return value type, e.g. REAL*8. Your Fortran code will then have a pointer to the address where the python method is stored. It will be SLOW AS MOLASSES, but for testing out algorithms it can be useful. I do this often (I port a lot of ancient spaghetti Fortran to python modules...) It's also a great way to use things like optimised Scipy calls in legacy fortran

I have tried out several approaches to solve the problem and I have found one possibly optimal way of doing it. I will briefly list down the approaches and the results.
1) Embedding via System call: Everytime we want to access python from fortran, we use the system call to execute some python script and exchange data between them. The speed in this approach is limited by the disk read, write (in this age of cache level optimization of code, going to disk is a mortal sin). Also, we need to initialize the interpreter everytime we want to execute the script, which is a considerable overhead. A simple Runge Kutta 4th order method running for 300 timesteps took a whopping 59 seconds to execute.
2) Going from Fortran to Python via C: We use the ISO_C bindings to communicate between Fortran and C; and we embed the Python interpreter inside C. I got parts of it working, but in the meanwhile I found a better way and dropped this idea. I would still like to evaluate this for the sake of completeness though.
3) Using f2py to import Fortran subroutines to Python (Extending) :
Here, we take the main loop out of Fortran and code it in Python (this approach is called Extending Python with Fortran); and we import all Fortran subroutines into Python using f2py (http://cens.ioc.ee/projects/f2py2e/usersguide/). We have the flexibility of having the most important data in any Scientific application, i.e. the outermost loop (generally the time loop) in Python so that we can couple it with other applications. But, we also have the drawback of having to exchange possibly more than needed data between Fortran and Python. The same Runge Kutta 4th order method example took 0.372 seconds to execute.
4) Mimicking Embedding via Extending:
Till now we have seen the two pure approaches of Embedding (the main loop stays in fortran and we call python as needed) and Extending (the main loop stay in python and we call fortran as needed). There is another way of doing it, which I found to be the most optimal. Transferring parts of main loop into Python would lead to an overhead, which may not be necessary all the time. To get rid of this overhead, we can keep the main loop in Fortran which is converted into a subroutine without any changes, have a pseudo main loop in Python, which just calls the main loop in Fortran and the Program executes as if it was our untouched Fortran program. Whenever necessary, we can use a callback function to come back to python with the required data, execute a script and go back to fortran again. In this approach, the Runge Kutta 4th Order method took 0.083 seconds. I profiled the code, and found that the initialization of python interpreter and loading took 0.075 seconds and the program took only 0.008 seconds (which includes 300 callback functions to python). The original fortran code took 0.007 seconds. So, we get almost Fortran like performance with python like flexibility using this approach.

I've just successfully embedded Python into our in-house ~500 KLOC Fortran program with cffi. An important aspect was not to touch the existing code. The program is written in Fortran 95. I wrote a thin 2003 wrapper using the iso_c_binding module that simply imports data from the various modules, gets C pointers to those data and/or wraps Fortran types into C structs, puts everything into a single type/struct and sends off to a C function. This C function happens to be a Python function wrapped with cffi. It unpacks the C struct into a more user-friendly Python object, wraps Fortran arrays as Numpy arrays (no copying) and then either drops into an interactive Python console or runs a Python script, based on user configuration. There is no need to write C code except for a single header file. There is obviously quite some overhead, but this functionality is intended for extendibility, not performance.
I would advise against f2py. It's not maintained well and severely limits the public interface of your Fortran code.

I wrote a library for that, forcallpy, using a C layer which embeds Python expressions interpretation functions, and working specifically on arguments passing between Fortran and Python to make scripts call as easy as possible (uses embedded numpy to directly map Fortran arrays inside ndarrays, use arguments names to know their type in the C/Python side).
You can see some examples in the documentation at readthedocs.
Laurent.

Does PyPy translate itself?

Am I getting this straight? Does the PyPy interpreter actually interpret itself and then translate itself?
So here's my current understanding:
RPython's toolchain involves partially executing the program to be translated to get a sort of preprocessed version to annotate and translate.
The PyPy interpreter, running on top of CPython, executes to partially interpret itself, at which point it hands control off to its RPython half, which performs the translation?
If this is true, then this is one of the most mind-bending things I have ever seen.

PyPy's translation process is actually much less conceptually recursive than it sounds.
Really all it is is a Python program that processes Python function/class/other objects (not Python source code) and outputs C code. But of course it doesn't process just any Python objects; it can only handle particular forms, which are what you get if you write your to-be-translated code in RPython.
Since the translation toolchain is a Python program, you can run it on top of any Python interpreter, which obviously includes PyPy's python interpreter. So that's nothing special.
Since it translates RPython objects, you can use it to translate PyPy's python interpreter, which is written in RPython.
But you can't run it on the translation framework itself, which is not RPython. Only PyPy's python interpreter itself is RPython.
Things only get interesting because RPython code is also Python code (but not the reverse), and because RPython doesn't ever "really exist" in source files, but only in memory inside a working Python process that necessarily includes other non-RPython code (there are no "pure-RPython" imports or function definitions, for example, because the translator operates on functions that have already been defined and imported).
Remember that the translation toolchain operates on in-memory Python code objects. Python's execution model means that these don't exist before some Python code has been running. You can imagine that starting the translation process looks a bit like this, if you highly simplify it:
from my_interpreter import main
from pypy import translate
translate(main)
As we all know, just importing main is going to run lots of Python code, including all the other modules my_interpreter imports. But the translation process starts analysing the function object main; it never sees, and doesn't care about, whatever code was executed to come up with main.
One way to think of this is that "programming in RPython" means "writing a Python program which generates an RPython program and then feeds it to the translation process". That's relatively easy to understand and is kind of similar to how many other compilers work (e.g. one way to think of programming in C is that you are essentially writing a C pre-processor program that generates a C program, which is then fed to the C compiler).
Things only get confusing in the PyPy case because all 3 components (the Python program which generates the RPython program, the RPython program, and the translation process) are loaded into the same Python interpreter. This means it's quite possible to have functions that are RPython when called with some arguments and not when called with other arguments, to call helper functions from the translation framework as part of generating your RPython program, and lots of other weird things. So the situation gets rather blurry around the edges, and you can't necessarily divide your source lines cleanly into "RPython to be translated", "Python generating my RPython program" and "handing the RPython program over to the translation framework".
The PyPy interpreter, running on top of CPython, executes to partially
interpret itself
What I think you're alluding to here is PyPy's use of the the flow object space during translation, to do abstract interpretation. Even this isn't as crazy and mind-bending as it seems at first. I'm much less informed about this part of PyPy, but as I understand it:
PyPy implements all of the operations of a Python interpreter by delegating them to an "object space", which contains an implementation of all the basic built in operations. But you can plug in different object spaces to get different effects, and so long as they implement the same "object space" interface the interpreter will still be able to "execute" Python code.
The RPython code objects that the PyPy translation toolchain processes is Python code that could be executed by an interpreter. So PyPy re-uses part of their Python interpreter as part of the translation tool-chain, by plugging in the flow object space. When "executing" code with this object space, the interpreter doesn't actually carry out the operations of the code, it instead produces flow graphs, which are analogous to the sorts of intermediate representation used by many other compilers; it's just a simple machine-manipulable representation of the code, to be further processed. This is how regular (R)Python code objects get turned into the input for the rest of the translation process.
Since the usual thing that is translated with the translation process is PyPy's Python interpreter, it indeed "interprets itself" with the flow object space. But all that really means is that you have a Python program that is processing Python functions, including the ones doing the processing. In itself it isn't any more mind-bending than applying a decorator to itself, or having a wrapper-class wrap an instance of itself (or wrap the class itself).
Um, that got a bit rambly. I hope it helps, anyway, and I hope I haven't said anything inaccurate; please correct me if I have.

Disclaimer: I'm not an expert on PyPy - in particular, I don't understand the details of the RPython translation, I'm only citing stuff that I've read before. For a more specific post on how RPython translation may work, check out this answer.
The answer is, yes, it can (but only after it was first compiled using CPython).
Longer description:
At first it seems highly mind bending and paradoxical, but once you understand it, it's easy. Checkout the answer on Wikipedia.
Bootstrapping in program development began during the 1950s when each program was constructed on paper in decimal code or in binary code, bit by bit (1s and 0s), because there was no high-level computer language, no compiler, no assembler, and no linker. A tiny assembler program was hand-coded for a new computer (for example the IBM 650) which converted a few instructions into binary or decimal code: A1. This simple assembler program was then rewritten in its just-defined assembly language but with extensions that would enable the use of some additional mnemonics for more complex operation codes.
The process is called software bootstrapping. Basically, you build one tool, say a C++ compiler, in a lower language which has already been made (everything at one point had to be coded from binary), say ASM. Now that you have C++ in existence, you can now code a C++ compiler in C++, then use the ASM C++ compiler to compile your new one. After you once have your new compiler compiled, you can now use it to compile itself.
So basically, make the first computer tool ever by hand coding it, use that interpreter to make another slightly better one, and use that one to make a better one, ... And eventually you get all the complex software today! :)
Another interesting case, is the CoffeeScript language, which is written in... CoffeeScript. (Although this use case still requires the use of an external interpreter, namely Node.js)
The PyPy interpreter, running on top of CPython, executes to partially interpret itself, at which point it hands control off to its RPython half, which performs the translation?
You can compile PyPy using an already compiled PyPy interpreter, or you can use CPython to compile it instead. However, since PyPy has a JIT now, it'll be faster to compile PyPy using itself, rather than CPython. (PyPy is now faster than CPython in most cases)

Implementing a function in Python vs C

Is there a difference (in terms of execution time) between implementing a function in Python and implementing it in C and then calling it from Python? If so, why?

Python (at least the "standard" CPython implementation) never actually compiles to native machine code; it compiles to bytecode which is then interpreted. So a C function which is in fact compiled to machine code will run faster; the question is whether it will make a relevant difference. So what's the actual problem you're trying to solve?

If I understand and restate your question properly, you are asking, if wrapping python over a c executable be anyway faster than a pure python module itself? The answer to it is, it depends upon the executable and the kind of task you are performing.
There are a set of modules in Python that are written using Python C-API's. Performance of those would be comparable to wrapping a C executable
On the other hand, wrapping c program would be faster than pure python both implementing the same functionality with a sane logic. Compare difflib usage vs wrapping subprocess over diff.

The C version is often faster, but not always. One of the main points of speedup is that C code does not have to look up values dynamically, like Python (Python has reference semantics). A good example for this is Numpy. Numpy arrays are typed, all values in the array have the same type, and are internally stored in continuous block of memory. This is the main reason that numpy is so much faster, because it skips all the dynamic variable lookup that Python has to do. The most efficient C implementation of an algorithm can become very slow if it operates on Python data structures, where each value has to be looked up dynamically.
A good way to implement such things yourself and save all the hassle of Python C-APIs is to use Cython.

Typically, a function written in C will be substantially faster that the Python equivalent. It is also much more difficult to integrate, since it involves:
compiling C code that #includes the Python headers and exposes appropriate wrapper code so that it is callable from Python;
linking against the correct Python libraries;
deploying the resulting shared library to the appropriate location, so that your Python code can import it.
You would want to be very certain that the benefits outweigh the costs before trying this, which means this should only be reserved for performance-critical sections of your code that you simply can't make fast enough with pure Python.
If you really need to go down this path, Boost.Python can make the task much less painful.

Will Python be faster if I put commonly called code into separate methods or files?

I thought I once read on SO that Python will compile and run slightly more quickly if commonly called code is placed into methods or separate files. Does putting Python code in methods have an advantage over separate files or vice versa? Could someone explain why this is? I'd assume it has to do with memory allocation and garbage collection or something.

It doesn't matter. Don't structure your program around code speed; structure it around coder speed. If you write something in Python and it's too slow, find the bottleneck with cProfile and speed it up. How do you speed it up? You try things and profile them. In general, function call overhead in critical loops is high. Byte compiling your code takes a very small amount of time and only needs to be done once.

No. Regardless of where you put your code, it has to be parsed once and compiled if necessary. Distinction between putting code in methods or different files might have an insignificant performance difference, but you shouldn't worry about it.
About the only language right now that you have to worry about structuring "right" is Javascript. Because it has to be downloaded from net to client's computer. That's why there are so many compressors and obfuscators for it. Stuff like this isn't done with Python because it's not needed.

Two things:
Code in separate modules is compiled into bytecode at first runtime and saved as a precompiled .pyc file, so it doesn't have to be recompiled at the next run as long as the source hasn't been modified since. This might result in a small performance advantage, but only at program startup.
Also, Python stores variables etc. a bit more efficiently if they are placed inside functions instead of at the top level of a file. But I don't think that's what you're referring to here, is it?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.