Numpy correlate-underlying code - python

What is the underlying python code of numpy.correlate?
I am trying to understand the logic of cross correlation. the underlying python code will be of great help.

All the code is somewhere on your system, you just need to find where.
If you're using ipython, the help command (numpy.correlate?) includes the filepath (on the second line from the end).
On my system it's "/usr/local/lib/python3.5/dist-packages/numpy/core/numeric.py
"
If you're not using ipython, numpy.__file__ will give you a path to the installation directory for the module, and you'll have to look around a bit.
The module name given by help(numpy.correlate) will give some hints.
However, once you find the file you will see that numpy.correlate only does the following:
mode = _mode_from_name(mode)
return multiarray.correlate2(a, v, mode)
That is a compiled function, so it's a little harder to find.
You can view the file here, the main function is defined beginning on line 1353, and the actual algorithm begins on line 1190.
This is fairly optimized code, so it's doing quite a bit more than what is necessary for simple correlation: handling datatypes, multi-threading, and error handling.
If you just want to understand the general principles rather than specifics of what python is doing, I would recommend starting with a more basic explanation. Numeric operations such as correlation are very well defined, and numpy rarely does anything different from the standard definitions.

Related

Replacement for execfile in IPython3?

There are various ways of re-enabling an execfile-like behaviour for Python 3.x environments - in the documentation and here on stackoverflow, but I did not find an exact replacement for my use case.
I am using IPython, and in Python 2.7.x execfile used to run a script file just as if I typed the exact same lines straight into IPython. This includes:
useful exception-tracebacks are given
local variables of my environment are available in the scripted code
variables defined locally in the script are available in the environment (after the execfile call, of course)
import X as Y statements in the script also make Y available in the environment
the execfile call works in interactive modes and also directly in python scripts
execution of the entire script code is guaranteed for each call (except when hitting an exception)
execfile is readily available wherever Python is - no lengthy definition or import of obscure package
Common solutions that have not entirely worked so far:
from scriptfile import * does not satisfy #2 and #4. For function definitions, it also fails #6, as re-issuing the import does not update a function - this can be remedied with a reload(scriptfile) call.
The exec(scriptfilehandle.read()) construction satisfies #5-7. With some amendments also #2-4 can be dealt with - but this evolves to a lengthy definition, which I can't recall right now, and tracebacks are still a mess.
IPython's %run scriptfile is nice, but falls short at least on requirement #2, #4 and #5.
Copying the script code from file and using IPython's %paste leaves out on #5 and #7 - and is quite cumbersome to do for each call.
Do you have any solutions I have not yet heard of?
I am using IPython+execfile while playing around with data, producing (lots of) matplotlib figures, trying stuff,...
And if I like some lines I wrote, I put the code snippet in a script. Some examples of what I am doing:
writing a script that prepares the environment for a specific dataset: doing imports, loading some data, defining some useful functions to work on this dataset,...
semiautomatic plotting: elaborate script for beautifully plotting ten figures of data held in a local variable, then revising the plot-script, and re-executing it, then filtering the data, re-executing plot-script,...
writing a script that utilizes several of my smaller snippets, to be run overnight on a large dataset
apart from data exploration and plotting, sometimes I need to write small scripts on various systems: a RasPi, a router with OpenWRT, a machine without internet access, a Windows machine (without admin rights) - all of which may have their restrictions on what libraries are available
On the other hand I have to admit, that I'm not a professional programmer - my insight on Python's inner workings with local/global variables, and what really happens in an import statement, are very limited.
Any help - may it be a solution to my problems or a helpful explaination - would be greatly appreciated!

Where to Store Borrowed Python Code?

Recently, I have been working on a Python project with usual directory structure, and have received help from someone else who has given me a code snippet (a single function definition, about 30 lines long) which I would like to import into my code. What is the most proper directory/location in a Python project to store borrowed code of this size? Is it best to store the snippet into an entirely different module and import it from there?
I generally find it easiest to put such code in a separate file, because for clarity you don't want more than one different copyright/licensing term to apply within a single file. So in Python this does indeed mean a separate module. Then the file can contain whatever attribution and other legal boilerplate you need.
As long as your file headers don't accidentally claim copyright on something to which you do not own the copyright, I don't think it's actually a legal problem to mix externally-licensed or public domain code into files you mostly own. I may be wrong, though, which is why I normally avoid giving myself reason to think about it. A comment saying "this is external code from the following source with the following license:" may well be clearer than dividing code into different files that naturally wouldn't be. So I do occasionally do that.
I don't see any definite need for a separate directory (or package) per separate external source. If that's already part of your project structure (that is, it already uses external libraries by incorporating their source) then I suppose you might as well continue the trend.
I usually place scripts I copy off the internet in a folder/package called borrowed so I know all of the code here is stuff that I didn't write myself.
That is, if it's something more substantial than a one or two-liner demonstrating how something works.

"writing a python binding" vs "using command-line directly"

I have a question regarding python bindings.
I have a command-line which exposes some functionality and code is re-factored to provide the functionality through a shared library. I wanted to know what the real advantage that I get from "writing a python binding for the shared library" vs "calling the command line directly".
One obvious advantage I think will be performance, the shared library will link to the same process and the functionality can called within the same process. It will avoid spawning a new process through the command line.
Any other advantages I can get from writing a python binding for such a case ?
Thanks.
I can hardly imagine a case where one would prefer wrapping a library's command line interface over wrapping the library itself. (Unless there is a library that comes with a neat command line interface while being a total mess internally; but the OP indicates that the same functionality available via the command line is easily accessible in terms of library function calls).
The biggest advantage of writing a Python binding is a clearly defined data interface between the library and Python. Ideally, the library can operate directly on memory managed by Python, without any data copying involved.
To illustrate this, let's assume a library function does something more complicated than printing the current time, i.e., it obtains a significant amount of data as an input, performs some operation, and returns a significant amount of data as an output. If the input data is expected as an input file, Python would need to generate this file first. It must make sure that the OS has finished writing the file before calling the library via the command line (I have seen several C libraries where sleep(1) calls were used as a band-aid for this issue...). And Python must get the output back in some way.
If the command line interface does not rely on files but obtains all arguments on the command line and prints the output on stdout, Python probably needs to convert between binary data and string format, not always with the expected results. It also needs to pipe stdout back and parse it. Not a problem, but getting all this right is a lot of work.
What about error handling? Well, the command line interface will probably handle errors by printing error messages on stderr. So Python needs to capture, parse and process these as well. OTOH, the corresponding library function will almost certainly make a success flag accessible to the calling program. This is much more directly usable for Python.
All of this is obviously affecting performance, which you already mentioned.
As another point, if you are developing the library yourself, you will probably find after some time that the Python workflow has made the whole command line interface obsolete, so you can drop supporting it altogether and save yourself a lot of time.
So I think there is a clear case to be made for the Python bindings. To me, one of the biggest strengths of Python is the ease with which such wrappers can be created and maintained. Unfortunately, there are about 7 or 8 equally easy ways to do this. To get started, I recommend ctypes, since it does not require a compiler and will work with PyPy. For best performance use the native C-Python API, which I also found very easy to learn.

Using Python's basic I/O to manipulate or create Python Files?

Would the most efficient way-and I know it's not very efficient, but I honestly can't find any better way-to manipulate a Python (.py) file, to add/subtract/append code, be to use the basic file I/O module included in Python?
For an example:
obj = open('Codemanipulationtest.py', 'w+')
obj.write("print 'This shows you can do basic I/O?'")
obj.close()
Will manipulate a file I have, named "codemanipulationtest.py", and add to it a print statement. Is this something that can be worked upon or are there any easier or more safe/efficient methods for manipulating/creating new python code?
I've read over this: Parse a .py file, read the AST, modify it, then write back the modified source code
And honestly it seems like the I/O method is easier. I am kind of newbish to Python so I may just be acting stupid.....thanks in advance for any responses.
Edit
The point of it all was simply to play around with the effects playing around with the code. I was thinking of hooking up whatever I end up using to some sort of learning algorithm and seeing how well it could generate little bits of code at a time, and seeing where it could go from there....
To go about with generating the code I would break it out into various classes, IF class, FOR class, and so on. Then you can use the output wherein each class has a to_str() method that you can call in turn.
statements = [ ... ]
obj = open( "some.py", "w+" )
for s in statements:
obj.write( s.to_str() )
obj.close()
This way you can extend your project easily and it will be more understandable and flexible. And, it keeps with the use of the simple write method that you wanted.
Depending on the learning algorithm this break out of the various classes can lead quite well into a sort of pseudo genetic algorithm for code. You can encode the genome as a sequence of statements and then you just have to find a way to go about passing parameters to each statement if they are required and such.
It depends on what you'll be doing with the code you're generating. You have a few options, each more advanced than the last.
Create a file and import it
Create a string and exec it
Write code to create classes (or modules) on the fly directly rather than as text, inserting whatever functions you need into them
Generate Python bytecode directly and execute that!
If you are writing code that will be used and modified by other programmers, then the first approach is probably best. Otherwise I recommend the third for most use cases. The last is only to masochists and former assembly language programmers.
If you want to modify existing Python source code, you can sometimes get away with doing simple modifications with basic search-and-replace, especially if you know something about the source file you're working with, but a better approach is the ast module. This gives you an abstract representation of the Python source that you can modify and then compile directly into Python objects.

OOP programming in python

I was reading Dietel's C++ programming book. In this book they mention how a programmer should release only the interface part of his code and not the implementation.
So carrying this over to python:
I have 2 files:
1) the implementation file = accountClass.py and
2) the interface file = useAccountClass.py
I have compiled the implementation file and have obtained the .pyc file. So when I provide my code to someone else, I would provide him with the .pyc file and the interface file, right?
Also, if I provide someone else with ONLY the .pyc file, can I expect him to write the interface on his own? I'm going to say no. But there's this one nagging doubt that I have:
The creators of numpy and scipy did not share the implementation with us end users. And I don't think they shared any interfaces either. But we can still search for the different classes and their methods inside both numpy and scipy. So, using this example of numpy and scipy, I guess what I'm trying to ask is:
Is it possible for someone else to create an interface to my code if I provide him/ her with only the compiled implementation file (in this case accountClass.pyc)? How will that person know what classes and methods I have defined in my implementation? I mean, will they use the
if __name__ = "__main__" :
blah blah
or is there some other way??
You got that entirely wrong. Or perhaps it's a horrible book whose author got something seriously wrong. Code using other code should indeed, barring significant counterarguments, adhere to an interface and not care about the details of the implementation. However, even in the world of static compilation to machine code (e.g. C++), this does not mean you should lock away the source code of the implementation.
Whether someone has access to the implementation, and whether they make use of that knowledge while writing a specific piece of code, are completely different issues. Heck, even the author of the implementation can/should still program to an interface when working on other code (e.g. other modules). Likewise, even if you lock the implementation away from someone, they may very well rely on implementation quirks which are not part of the interface. If anyone in the world of static compilation to machine code provides only headers and object files, and not the source code, it's because the projects are closed source, not to encourage good programming practices among clients.
In Python, your question makes no sense - there are no "interface" and "implementation" files, there's just code which is run and defines functions, classes, and other values. There is no such thing as an interface file you'd provide. You provide an implementation - and (hopefully) documentation which details both interface and possibly implementation details. And once a module is imported, the class objects, function objects, and other objects, contain plenty of information (including, in many cases, the text from which large parts of the documentation was generated). This is also true for extension modules like numpy. And note that their implementation is accessible, it's just not included in all distributions because it's of little use. With Python code, you practically have to distribute the source code because anything else is platform-specific.
On a side note, .pyc files are pretty high level, and easily understood when disassembled (which is as easy as importing the module and running the stdlib module dis on any function inside). I consider this a minor technicality as it's already the wrong question to ask.
Deitel's advice to C++ programmers doesn't apply to Python, for a number of reasons:
Python isn't compiled to machine code, so no matter what form you provide the program in, it will be relatively easy for someone to read the code.
Python doesn't have .h and .c files, all you can provide is the .py or .pyc files.
Treating code as a secret is kind of silly anyway. What is in your code that you need to keep hidden from others?
Numpy and Scipy are largely implemented in C, which is why you don't have the source, for your own convenience. You can get the source if you like. The "interface" to that code is the module that you can import and then call.
You should not confuse "user interface" with "class interface". If you have a useAccountClass file, that file probably performs some task using the classes and methods defined in the accountClass file, if I understood right.
If you send the file to other person, they are not supposed to "guess" what your compiled class does. That's what DOCUMENTATION is for: a description of the functions contained in the module (compiled or not), which parameters they take, which values they return, and what they are expected to do, the "meaning" of the task they perform.
As an abstract example, let's suppose you have an image processing class. If that class has the function findCircles(image), the documentation should explain that it takes an image, possibly containing circles, and returns a list or array of coordinates of the centers of circles contained in the image. HOW the circles are detected is not important, you don't need to know that to use the function. Now if the function was called like findCircles(image, gaussian_threshold=10), the caller would have to know the function uses some "gaussian_threshold" parameter, that is, the caller would NEED to know about the function's entrails, and in OOP this is Not Good. If you decided to use another algorithm in the future, every code using that function would have to be rewritten, because the gaussian_threshold most probably wouldn't make sense anymore.
So, the interface, in OOP, is the abstraction used to communicate to the object only the canonical parameters or inputs it needs to know to perform a task in the language of the problem, not in the language of the implementation (that can change anytime).
The documentation, in this sense, is a contract that assures to the user (in this case, another developer) that the function will perform as expected if sane inputs are given to it.
Now the FINAL USER, a non-technical person wanting to use your program, would need the WHOLE working program (controls and views), not only the class definitions (the model).
Hope this helps, and I must recommend the books "Code Complete 2nd ed." and "Pragmatic Programmer - From Journeyman to Master" as VERY enlightening readings on the broad topic.

Categories