This is a follow up question to the last question I'd ask, which you can read here:
Using Python's basic I/O to manipulate or create Python Files?
Basically I was asking for the best way to allow Python to edit other Python programs sources, for the purpose of screwing around/experimenting and seeing if I could hook them up to either a genetic algorithm or some sort of backprop network to get results.
Basically, one of the answers suggests making every python operator or code-bit, such as '=', 'if', etc, etc, into classes which can then be used by me/various other programs to piece together/edit other python files, utilizing Python's basic file i/o as a means of manipulating.
The classes would each, upon initialization, get their own unique ID, which would be stored, along with line number, type, etc, in an SQLite3 database for logging purposes, and would be operated upon by other classes/the basic file i/o system.
My question is: Is this a sane task? If not, what can I change or do differently? Am I undertaking something that will be worthwhile, or is it completely idiotic? If you need clarification please ask, I want to know if what I'm doing seems reasonable to an outside source, or if I should reconsider/scrap the whole deal...
Thanks in advance.
Don't generate code modify the AST for the code.
import ast
p = ast.parse("testcode")
Docs for AST: http://docs.python.org/library/ast.html
Example of modifying code: http://docs.python.org/library/ast.html#ast.NodeTransformer
Example: (modifying 3*3 to 3+3)
from ast import *
tree = parse("print(3*3)") #parse the code
tree.body[0].value.args[0].op = Add() #modify multiplication to plus
codeobj = compile(tree,"","exec") # compile the code
exec(codeobj) #run it - it should print 6 (not 9 which the original code would print)
BTW, I am interested in genetic algorithms.
If you start a project, I can help you.
Related
I'm writing an application for scientific data analysis and I'm wondering what's the best way to structure the code to avoid (or address) the circular import problem. Currently I'm using a mix of OO and procedural programming.
Other questions address this issue but in a more abstract way. Here I'm looking for a solution that is optimal in a more specific context.
I have a class Container defined in DataLib.py whose data consist in lists and/or arrays. With all methods and supporting functions DataLib.py is quite large (~1000 lines).
I have a second module SelectionLib.py (~400 lines) that contains only functions to "filter" the data in Container according to different criteria. These functions return new Container objects (with filtered data) and thus SelectionLib.py needs to import Container from DataLib.py. Note that, logically, these functions are "methods" for "Container", they are just implemented using python functions.
Now, I want to add some high level method to Container so that a complex analysis can be performed with a single function of method call. And by "complex analysis" I mean an arbitrary number of Container methods call, local function (defined in DataLib.py) and filter functions (defined inSelectionLib.py).
So the problem is that DataLib.py needs to import SelectionLib.py to use the filter functions, but SelectionLib.py already imports DataLib.py.
Right know my hackish solution is to run the two files with run -i ... from IPython so it is like having a big single file and I avoid the circular import. But at the same time this scripts are difficult to integrate for example in a GUI.
How do you suggest to solve this problem:
use pure OO and inheritance and split the object in 3: CoreContainer -> SelectionContainer -> HighLevelContainer
Restructuring the code (everything in one file?)
Some sort of Import trickery (put imports at the end)
Any feedback is appreciated!
If functions in SelectionLib are, as you say, "methods" for Container, it seems reasonable that DataLib imports SelectionLib, not the other way around.
Then the user code would just import DataLib. This would require some refactoring. One possibility to minimize the disruption to the user code would be to rename your existing DataLib and SelectionLib to _DataLib and _SelectionLib, and have a new DataLib to import the necessary bits from either (or both).
As an aside, it's better to follow the PEP-8 conventions and name your modules in lowercase_with_underscores.
Would the most efficient way-and I know it's not very efficient, but I honestly can't find any better way-to manipulate a Python (.py) file, to add/subtract/append code, be to use the basic file I/O module included in Python?
For an example:
obj = open('Codemanipulationtest.py', 'w+')
obj.write("print 'This shows you can do basic I/O?'")
obj.close()
Will manipulate a file I have, named "codemanipulationtest.py", and add to it a print statement. Is this something that can be worked upon or are there any easier or more safe/efficient methods for manipulating/creating new python code?
I've read over this: Parse a .py file, read the AST, modify it, then write back the modified source code
And honestly it seems like the I/O method is easier. I am kind of newbish to Python so I may just be acting stupid.....thanks in advance for any responses.
Edit
The point of it all was simply to play around with the effects playing around with the code. I was thinking of hooking up whatever I end up using to some sort of learning algorithm and seeing how well it could generate little bits of code at a time, and seeing where it could go from there....
To go about with generating the code I would break it out into various classes, IF class, FOR class, and so on. Then you can use the output wherein each class has a to_str() method that you can call in turn.
statements = [ ... ]
obj = open( "some.py", "w+" )
for s in statements:
obj.write( s.to_str() )
obj.close()
This way you can extend your project easily and it will be more understandable and flexible. And, it keeps with the use of the simple write method that you wanted.
Depending on the learning algorithm this break out of the various classes can lead quite well into a sort of pseudo genetic algorithm for code. You can encode the genome as a sequence of statements and then you just have to find a way to go about passing parameters to each statement if they are required and such.
It depends on what you'll be doing with the code you're generating. You have a few options, each more advanced than the last.
Create a file and import it
Create a string and exec it
Write code to create classes (or modules) on the fly directly rather than as text, inserting whatever functions you need into them
Generate Python bytecode directly and execute that!
If you are writing code that will be used and modified by other programmers, then the first approach is probably best. Otherwise I recommend the third for most use cases. The last is only to masochists and former assembly language programmers.
If you want to modify existing Python source code, you can sometimes get away with doing simple modifications with basic search-and-replace, especially if you know something about the source file you're working with, but a better approach is the ast module. This gives you an abstract representation of the Python source that you can modify and then compile directly into Python objects.
I was reading Dietel's C++ programming book. In this book they mention how a programmer should release only the interface part of his code and not the implementation.
So carrying this over to python:
I have 2 files:
1) the implementation file = accountClass.py and
2) the interface file = useAccountClass.py
I have compiled the implementation file and have obtained the .pyc file. So when I provide my code to someone else, I would provide him with the .pyc file and the interface file, right?
Also, if I provide someone else with ONLY the .pyc file, can I expect him to write the interface on his own? I'm going to say no. But there's this one nagging doubt that I have:
The creators of numpy and scipy did not share the implementation with us end users. And I don't think they shared any interfaces either. But we can still search for the different classes and their methods inside both numpy and scipy. So, using this example of numpy and scipy, I guess what I'm trying to ask is:
Is it possible for someone else to create an interface to my code if I provide him/ her with only the compiled implementation file (in this case accountClass.pyc)? How will that person know what classes and methods I have defined in my implementation? I mean, will they use the
if __name__ = "__main__" :
blah blah
or is there some other way??
You got that entirely wrong. Or perhaps it's a horrible book whose author got something seriously wrong. Code using other code should indeed, barring significant counterarguments, adhere to an interface and not care about the details of the implementation. However, even in the world of static compilation to machine code (e.g. C++), this does not mean you should lock away the source code of the implementation.
Whether someone has access to the implementation, and whether they make use of that knowledge while writing a specific piece of code, are completely different issues. Heck, even the author of the implementation can/should still program to an interface when working on other code (e.g. other modules). Likewise, even if you lock the implementation away from someone, they may very well rely on implementation quirks which are not part of the interface. If anyone in the world of static compilation to machine code provides only headers and object files, and not the source code, it's because the projects are closed source, not to encourage good programming practices among clients.
In Python, your question makes no sense - there are no "interface" and "implementation" files, there's just code which is run and defines functions, classes, and other values. There is no such thing as an interface file you'd provide. You provide an implementation - and (hopefully) documentation which details both interface and possibly implementation details. And once a module is imported, the class objects, function objects, and other objects, contain plenty of information (including, in many cases, the text from which large parts of the documentation was generated). This is also true for extension modules like numpy. And note that their implementation is accessible, it's just not included in all distributions because it's of little use. With Python code, you practically have to distribute the source code because anything else is platform-specific.
On a side note, .pyc files are pretty high level, and easily understood when disassembled (which is as easy as importing the module and running the stdlib module dis on any function inside). I consider this a minor technicality as it's already the wrong question to ask.
Deitel's advice to C++ programmers doesn't apply to Python, for a number of reasons:
Python isn't compiled to machine code, so no matter what form you provide the program in, it will be relatively easy for someone to read the code.
Python doesn't have .h and .c files, all you can provide is the .py or .pyc files.
Treating code as a secret is kind of silly anyway. What is in your code that you need to keep hidden from others?
Numpy and Scipy are largely implemented in C, which is why you don't have the source, for your own convenience. You can get the source if you like. The "interface" to that code is the module that you can import and then call.
You should not confuse "user interface" with "class interface". If you have a useAccountClass file, that file probably performs some task using the classes and methods defined in the accountClass file, if I understood right.
If you send the file to other person, they are not supposed to "guess" what your compiled class does. That's what DOCUMENTATION is for: a description of the functions contained in the module (compiled or not), which parameters they take, which values they return, and what they are expected to do, the "meaning" of the task they perform.
As an abstract example, let's suppose you have an image processing class. If that class has the function findCircles(image), the documentation should explain that it takes an image, possibly containing circles, and returns a list or array of coordinates of the centers of circles contained in the image. HOW the circles are detected is not important, you don't need to know that to use the function. Now if the function was called like findCircles(image, gaussian_threshold=10), the caller would have to know the function uses some "gaussian_threshold" parameter, that is, the caller would NEED to know about the function's entrails, and in OOP this is Not Good. If you decided to use another algorithm in the future, every code using that function would have to be rewritten, because the gaussian_threshold most probably wouldn't make sense anymore.
So, the interface, in OOP, is the abstraction used to communicate to the object only the canonical parameters or inputs it needs to know to perform a task in the language of the problem, not in the language of the implementation (that can change anytime).
The documentation, in this sense, is a contract that assures to the user (in this case, another developer) that the function will perform as expected if sane inputs are given to it.
Now the FINAL USER, a non-technical person wanting to use your program, would need the WHOLE working program (controls and views), not only the class definitions (the model).
Hope this helps, and I must recommend the books "Code Complete 2nd ed." and "Pragmatic Programmer - From Journeyman to Master" as VERY enlightening readings on the broad topic.
So I need to write a parser (or simulator) which would take an input file with simple code written in my own pseudo code like language, for instance:
a = 5
b = 5 * a
[FOR 10]
b = b * 5
[ENDFOR]
[IF b>30]
a = a + 3
[ENDIF]
So the pseudo language only supports integer variables, basic operations with them (+,-,/,*), a basic for loop and a basic if statement. I need to build parser that will in the end deliver the final values of a and b (or any other variables used in the code).
I was thinking of trying to do this in XML so I simulate the loop and the if with tags, but I am not really sure if this is the right (or most efficient) approach. Any suggestions?
Quick edit ^^:
It's not about my own programming language...it's part of a bigger project...I need a simple way of evaluating small snippets of code written like the example and get the states of the variables used after simulating it...thats why I wanted to use XML...this is not intended to be a programming lanuage of any sort...
Much of this may already be implemented using the examples from the pyparsing wiki, such as this one, or this one, which uses the more current operatorPrecedence helper method.
EDIT Links to PyParsing Wikispace are dead but you can found an another wiki on the github repository from here : https://github.com/pyparsing/pyparsing/wiki
Take a look at PLY. Python's implementation of glorious LEX/YACC. You can definitely write a compiler or interpreter for your language with this tool.
Python is so dynamic that it's not always clear what's going on in a large program, and looking at a tiny bit of source code does not always help. To make matters worse, editors tend to have poor support for navigating to the definitions of tokens or import statements in a Python file.
One way to compensate might be to write a special profiler that, instead of timing the program, would record the runtime types and paths of objects of the program and expose this data to the editor.
This might be implemented with sys.settrace() which sets a callback for each line of code and is how pdb is implemented, or by using the ast module and an import hook to instrument the code, or is there a better strategy? How would you write something like this without making it impossibly slow, and without runnning afoul of extreme dynamism e.g side affects on property access?
I don't think you can help making it slow, but it should be possible to detect the address of each variable when you encounter a STORE_FAST STORE_NAME STORE_* opcode.
Whether or not this has been done before, I do not know.
If you need debugging, look at PDB, this will allow you to step through your code and access any variables.
import pdb
def test():
print 1
pdb.set_trace() # you will enter an interpreter here
print 2
What if you monkey-patched object's class or another prototypical object?
This might not be the easiest if you're not using new-style classes.
You might want to check out PyChecker's code - it does (i think) what you are looking to do.
Pythoscope does something very similar to what you describe and it uses a combination of static information in a form of AST and dynamic information through sys.settrace.
BTW, if you have problems refactoring your project, give Pythoscope a try.