Hi i am looking to implement my own custom file like object for an internal binary format we use at work(i don't really want to go into too much detail because i don't know if i can). I am trying to go for a more pythonic way of doing things since currently we have two functions read/write(each ~4k lines of code) which do everything. However we need more control/finesse hence the fact of me rewriting this stuff.
I looked at the python documentation and they say what methods i need to implement, but don't mention stuff like iter() / etc.
Basically what i would love to do is stuff like this:
output_file_objs = [
open("blah.txt", "w")
open("blah142.txt", "wb")
my_lib.open("internal_file.something", "wb", ignore_something=True)
]
data_to_write = <data>
for f in output_file_objs:
f.write(data_to_write)
So i can mix it in with the others, and basically have a level of transparency. I will add custom methods to it, but thats not a problem.
Is there any sort of good reference regarding writing your own custom file like objects? Like any form of restrictions or special methods (iter). I should implement?
Or is there a good example of one from within the python standard library that i can look at?
What makes up a "file-like" actually depends on what you intend to use it for; not all methods are required to be implemented (or to have a sane implementation).
Having said that, the file and iterator docs are what you want.
Why not stuff your data in StringIO? Otherwise, you can look at the documentation and implement all of the file like methods. Truth be told, there are no real interfaces in Python, and some features (like tell()) may not make sense for your files, so you can leave them unimplemented.
Related
Would the most efficient way-and I know it's not very efficient, but I honestly can't find any better way-to manipulate a Python (.py) file, to add/subtract/append code, be to use the basic file I/O module included in Python?
For an example:
obj = open('Codemanipulationtest.py', 'w+')
obj.write("print 'This shows you can do basic I/O?'")
obj.close()
Will manipulate a file I have, named "codemanipulationtest.py", and add to it a print statement. Is this something that can be worked upon or are there any easier or more safe/efficient methods for manipulating/creating new python code?
I've read over this: Parse a .py file, read the AST, modify it, then write back the modified source code
And honestly it seems like the I/O method is easier. I am kind of newbish to Python so I may just be acting stupid.....thanks in advance for any responses.
Edit
The point of it all was simply to play around with the effects playing around with the code. I was thinking of hooking up whatever I end up using to some sort of learning algorithm and seeing how well it could generate little bits of code at a time, and seeing where it could go from there....
To go about with generating the code I would break it out into various classes, IF class, FOR class, and so on. Then you can use the output wherein each class has a to_str() method that you can call in turn.
statements = [ ... ]
obj = open( "some.py", "w+" )
for s in statements:
obj.write( s.to_str() )
obj.close()
This way you can extend your project easily and it will be more understandable and flexible. And, it keeps with the use of the simple write method that you wanted.
Depending on the learning algorithm this break out of the various classes can lead quite well into a sort of pseudo genetic algorithm for code. You can encode the genome as a sequence of statements and then you just have to find a way to go about passing parameters to each statement if they are required and such.
It depends on what you'll be doing with the code you're generating. You have a few options, each more advanced than the last.
Create a file and import it
Create a string and exec it
Write code to create classes (or modules) on the fly directly rather than as text, inserting whatever functions you need into them
Generate Python bytecode directly and execute that!
If you are writing code that will be used and modified by other programmers, then the first approach is probably best. Otherwise I recommend the third for most use cases. The last is only to masochists and former assembly language programmers.
If you want to modify existing Python source code, you can sometimes get away with doing simple modifications with basic search-and-replace, especially if you know something about the source file you're working with, but a better approach is the ast module. This gives you an abstract representation of the Python source that you can modify and then compile directly into Python objects.
I've noticed I have the same piece of code sitting at the top of several of my controllers. They tend to look like this:
def app_description(app):
""" Dictionary describing an app. """
return {'name': app.app,
'id': app.id,
'is_new': app.is_new(),
'created_on': app.created_on.strftime("%m/%d/%Y"),
'configured': app.configured }
I'll call this from a couple different actions in the controller, but generally not outside that controller. It accesses properties. It calls methods. It formats opaque objects (like dates).
My question is: is this controller code, or model code?
The case for controller:
It defines my API.
It's currently only used in that module.
There doesn't seem to be any logic here.
The case for model:
It seems like a description of the data, which the model should be responsible for.
It feels like I might want to use this in other controllers. Haven't gotten there yet, but these functions are still pretty new, so they might.
Attaching a function to the object it clearly belongs to seems better than leaving it as a module-level function.
It could be more succinctly defined on the model. Something like having the top-level model object define .description(), and the subclasses just define a black/whitelist of properties, plus override the method itself to call functions. I'm pretty sure that would be fewer lines of code (as it would save me the repetition of things like 'name': app.name), which seems like a good thing.
Not sure which framework you are using, but I would suggest creating this helper functionality in its own class and put it in a shared folder like lib/
Alternatively you could have an application helper module that just has a bunch of these helpful application-wide functions.
Either way, I'd keep it away from both the model and the controller.
The answer I finally decided on:
In the short term, having these methods is fine in the controllers. If they define the output, then, OK, they can stay there. They're only used in the model.
Theres a couple things to watch out for, which indicate they've grown up, and need to go elsewhere:
In one case, I needed access to a canonical serialization of the object. At that point, it moved into the model, as a model method.
In another case, I found that I was formatting all timestamps the same. I have a standard #ajaxify decorator that does things like sets Content-Type headers, does JSON encoding, etc. In this case, I moved the datetime standard formatting into there -- when the JSON encoder hits a datetime (formerly unserializable), it always treats it the same (seconds since the epoch, for me).
In yet a third case, I realized that I was re-using this function in a couple controllers. For that, I pulled it out into a common class (like another answer suggested) and used that to define my "Web API". I'd used this pattern before -- it makes sense for grouping similarly-used data (like timeseries data, or top-N lists).
I suspect there's more, but basically, I don't think there all as similar as I thought they were initially. I'm currently happy thinking about them as a convention for simple objects in our (small-ish, new-ish) codebase, with the understanding that after a few iterations, a better solution may present itself. In the meantime, they stay in the controller and define my AJAXy-JSON-only interface.
(note) I would appreciate help generalizing the title. I am sure that this is a class of problems in OO land, and probably has a reasonable pattern, I just don't know a better way to describe it.
I'm considering the following -- Our server script will be called by an outside program, and have a bunch of text dumped at it, (usually XML).
There are multiple possible types of data we could be getting, and multiple versions of the data representation we could be getting, e.g. "Report Type A, version 1.2" vs. "Report Type A, version 2.0"
We will generally want to do the same thing action with all the data -- namely, determine what sort and version it is, then parse it with a custom parser, then call a synchronize-to-database function on it.
We will definitely be adding types and versions as time goes on.
So, what's a good design pattern here? I can come up with two, both seem like they may have som problems.
Option 1
Write a monolithic ID script which determines the type, and then
imports and calls the properly named class functions.
Benefits
Probably pretty easy to debug,
Only one file that does the parsing.
Downsides
Seems hack-ish.
It would be nice to not have to create
knowledge of dataformats in two places, once for ID, once for actual
merging.
Option 2
Write an "ID" function for each class; returns Yes / No / Maybe when given identifying text.
the ID script now imports a bunch of classes, instantiates them on the text and asks if the text and class type match.
Upsides:
Cleaner in that everything lives in one module?
Downsides:
Slower? Depends on logic of running through the classes.
Put abstractly, should Python instantiate a bunch of Classes, and consume an ID function, or should Python instantiate one (or many) ID classes which have a paired item class, or some other way?
You could use the Strategy pattern which would allow you to separate the logic for the different formats which need to be parsed into concrete strategies. Your code would typically parse a portion of the file in the interface and then decide on a concrete strategy.
As far as defining the grammar for your files I would find a fast way to identify the file without implementing the full definition, perhaps a header or other unique feature at the beginning of the document. Then once you know how to handle the file you can pick the best concrete strategy for that file handling the parsing and writes to the database.
I've got a Codebase of around 5,3k LOC with around 30 different classe. The code is already very well formatted and I want to improve it further by prefixing methods that are only called in the module that were defined in with a "_", in order to indicate that. Yes it would have been a good idea to do that from the beginning on but now it's too late :D
Basically I'm searching for a tool that will tell me if a method is not called outside of the module it was defined in, I'm not looking for stuff that will automatically convert the whole thing to use underscores, just a "simple" thing that tells me where I have to look for prefixing stuff.
I'd took a look at the AST module, but there's no easy way to get a list of method definitions and calls, also parsing the plain text yields just too many false positives. I don't insist in spending day(s) on reinventing the wheel when there might be an already existing solution to my problem.
For me, this sounds like special case of coverage.
Thus I'd take a look at coverage.py or figleaf and modify it to ignore inter-module calls.
In my application I have to maintain some global application state and global application wide methods like currently connected users, total number of answers, create an application config file etc. There are two options:
Make a separate appstate.py file with global variables with functions over them. It looks fine initially but it seems that I am missing something in clarity of my code.
Create a class AppState with class functions in a appstate.py file, all other modules have been defined by their specific jobs. This looks fine. But now I have to write longer line like appstate.AppState.get_user_list(). Moreover, the methods are not so much related to each other. I can create separate classes but that would be too many classes.
EDIT: If I use classes I will be using classmethods. I don't think there is a need to instantiate the class to an object.
Sounds like the classic conundrum :-).
In Python, there's nothing dirty or shameful about choosing to use a module if that's the best approach. After all, modules, functions, and the like are, in fact, first-class citizens in the language, and offer introspection and properties that many other programming languages offer only by the use of objects.
The way you've described your options, it kinda sounds like you're not too crazy about a class-based approach in this case.
I don't know if you've used the Django framework, but if not, have a look at the documentation on how it handle settings. These are app-wide, they are defined in a module, and they are available globally. The way it parses the options and expose them globally is quite elegant, and you may find such an approach inspiring for your needs.
The second approach is only significantly different from the first approach if you have application state stored in an instance of AppState, in which case your complaint doesn't apply. If you're just storing stuff in a class and using static/class methods, your class is no different than a module, and it would be pythonic to instead actually have it as a module.
The second approach seems better. I'd use the first one only for configuration files or something.
Anyway, to avoid the problem you could always:
from myapp.appstate import AppState
That way you don't have to write the long line anymore.
Why not go with an instance of that class? That way you might even be able later on to have 2 different "sessions" running, depending on what instance you use. It might make it more flexible. Maybe add some method get_appstate() to the module so it instanciates the class once. Later on if you might want several instances you can change this method to eventually take a parameter and use some dictionary etc. to store those instances.
You could also use property decorators btw to make things more readable and have the flexibility of storing it how and where you want it stores.
I agree that it would be more pythonic to use the module approach instead of classmethods.
BTW, I am not such a big fan of having things available globally by some "magic". I'd rather use some explicit call to obtain that information. Then I know where things come from and how to debug it when things fail.
Consider this example:
configuration
|
+-> graphics
| |
| +-> 3D
| |
| +-> 2D
|
+-> sound
The real question is: What is the difference between classes and modules in this hierarchy, as it could be represented by both means?
Classes represent types. If you implement your solution with classes instead of modules, you are able to check a graphics object for it's proper type, but write generic graphics functions.
With classes you can generate parametrized values. This means it is possible to initialize differently the sounds class with a constructor, but it is hard to initialize a module with different parameters.
The point is, that you really something different from the modeling standpoint.
I would go with the classes route as it will better organize your code. Remember that for readability you can do this:
from appstate import AppSate
I'd definitely go for the second option : having already used the first one, I'm now forced to refactor, as my application evolved and have to support more modular constructs, so I now need to handle multiple simulataneous 'configurations'.
The second approach is, IMO, more flexible and future proof. To avoid the longer lines of code, you could use from appstate import AppState instead of just import appstate.