Need example/help with GtkTextBuffer (of GtkTextView) serialize/deserialize - python

I am trying to save user's bold/italic/font/etc tags in a GtkTextView.
Using GtkTextBuffer.get_text() does not return the tags.
The best documentation I have found on this is:
http://www.pygtk.org/docs/pygtk/class-gtktextbuffer.html#method-gtktextbuffer--register-serialize-format
However, I do not understand the function arguments.
It would be infinitely handy to have an example of how these are used to save/load a textview with tags in it.
Edit: I would like to clarify what I am trying to accomplish. Basically I want to save/load the textview's text+tags. I have no desire to do anything more complicated than that. I am using pickle as the file format, so I dont need any help here on how to save it or in what format. Just need a way to pull/push the data so that the user loses nothing that he/she sees on screen. Thank you.

If you need to save the tags because you just want to copy the text into another text buffer, you can use gtk.TextBuffer.insert_range().
If you need to save the text with tags into another format readable by other programs, I once wrote a library with a GTK text buffer serializer to and from RTF. It doesn't have any Python bindings though. But in any case the code is a good example of how to use the serializer facility. Link: Osxcart

I haven't worked with GtkTextBuffer's serialization. Reading the documentation you linked, I would suggest trying the default serializer, by calling
textbuffer.register_serialize_tagset()
This gives you GTK+'s built-in proprietary serializer. Being proprietary here means that it doesn't serialize into some well-known format; but if all you need is the ability to save out the text buffer's contents and load them back, this should be fine.
Of course the source code is available inside GTK+ if you really want to figure out how it works; I would recommend against trying to implement e.g. a stand-alone de-serializer though, since there are probably no guarantees made by GTK+ that the format will remain as-is.

Related

restructuredtext: what is the best way of extracting bibliographic and other fields?

I am putting together a website where the content is maintained as restructuredtext that is then converted into html. I need more control than e.g. rst2html.py, so I am using a my own python script that uses things like
docutils.core.publish_parts(source, writer_name='html')
to create the html.
publish_parts() gives me useful parts like the title, body, etc. However, it seems I must look elsewhere to get the values of rst fields like
:Authors:
:version:
etc. For this, I have been using publish_doctree() as in
doctree = core.publish_doctree(source).asdom()
and then going through this recursively using using getElementsByTagName() as in
doctree.getElementsByTagName('authors')
doctree.getElementsByTagName('version')
etc.
Using publish_doctree() to extract fields does the job, and that's good, but it does seem more convoluted than using e.g. publish_parts().
My question is simply whether this is the best recommended way of extracting out these rst fields, or is there a more direct and less convoluted way? If not, that is fine, but I thought I would inquire in case I am missing something.

Using Python's basic I/O to manipulate or create Python Files?

Would the most efficient way-and I know it's not very efficient, but I honestly can't find any better way-to manipulate a Python (.py) file, to add/subtract/append code, be to use the basic file I/O module included in Python?
For an example:
obj = open('Codemanipulationtest.py', 'w+')
obj.write("print 'This shows you can do basic I/O?'")
obj.close()
Will manipulate a file I have, named "codemanipulationtest.py", and add to it a print statement. Is this something that can be worked upon or are there any easier or more safe/efficient methods for manipulating/creating new python code?
I've read over this: Parse a .py file, read the AST, modify it, then write back the modified source code
And honestly it seems like the I/O method is easier. I am kind of newbish to Python so I may just be acting stupid.....thanks in advance for any responses.
Edit
The point of it all was simply to play around with the effects playing around with the code. I was thinking of hooking up whatever I end up using to some sort of learning algorithm and seeing how well it could generate little bits of code at a time, and seeing where it could go from there....
To go about with generating the code I would break it out into various classes, IF class, FOR class, and so on. Then you can use the output wherein each class has a to_str() method that you can call in turn.
statements = [ ... ]
obj = open( "some.py", "w+" )
for s in statements:
obj.write( s.to_str() )
obj.close()
This way you can extend your project easily and it will be more understandable and flexible. And, it keeps with the use of the simple write method that you wanted.
Depending on the learning algorithm this break out of the various classes can lead quite well into a sort of pseudo genetic algorithm for code. You can encode the genome as a sequence of statements and then you just have to find a way to go about passing parameters to each statement if they are required and such.
It depends on what you'll be doing with the code you're generating. You have a few options, each more advanced than the last.
Create a file and import it
Create a string and exec it
Write code to create classes (or modules) on the fly directly rather than as text, inserting whatever functions you need into them
Generate Python bytecode directly and execute that!
If you are writing code that will be used and modified by other programmers, then the first approach is probably best. Otherwise I recommend the third for most use cases. The last is only to masochists and former assembly language programmers.
If you want to modify existing Python source code, you can sometimes get away with doing simple modifications with basic search-and-replace, especially if you know something about the source file you're working with, but a better approach is the ast module. This gives you an abstract representation of the Python source that you can modify and then compile directly into Python objects.

Is it possible to inject shell/python commands from a configuration file?

Say you have a some meta data for a custom file format that your python app reads. Something like a csv with variables that can change as the file is manipulated:
var1,data1
var2,data2
var3,data3
So if the user can manipulate this meta data, do you have to worry about someone crafting a malformed meta data file that will allow some arbitrary code execution? The only thing I can imagine if you you made the poor choice to make var1 be a shell command that you execute with os.sys(data1) in your own code somewhere. Also, if this were C then you would have to worry about buffers being blown, but I don't think you have to worry about that with python. If your reading in that data as a string is it possible to somehow escape the string "\n os.sys('rm -r /'), this SQL like example totally wont work, but is there similar that is possible?
If you are doing what you say there (plain text, just reading and parsing a simple format), you will be safe. As you indicate, Python is generally safe from the more mundane memory corruption errors that C developers can create if they are not careful. The SQL injection scenario you note is not a concern when simply reading in files in python.
However, if you are concerned about security, which it seems you are (interjection: good for you! A good programmer should be lazy and paranoid), here are some things to consider:
Validate all input. Make sure that each piece of data you read is of the expected size, type, range, etc. Error early, and don't propagate tainted variables elsewhere in your code.
Do you know the expected names of the vars, or at least their format? Make sure the validate that it is the kind of thing you expect before you use it. If it should be just letters, confirm that with a regex or similar.
Do you know the expected range or format of the data? If you're expecting a number, make sure it's a number before you use it. If it's supposed to be a short string, verify the length; you get the idea.
What if you get characters or bytes you don't expect? What if someone throws unicode at you?
If any of these are paths, make sure you canonicalize and know that the path points to an acceptable location before you read or write.
Some specific things not to do:
os.system(attackerControlledString)
eval(attackerControlledString)
__import__(attackerControlledString)
pickle/unpickle attacker controlled content (here's why)
Also, rather than rolling your own config file format, consider ConfigParser or something like JSON. A well understood format (and libraries) helps you get a leg up on proper validation.
OWASP would be my normal go-to for providing a "further reading" link, but their Input Validation page needs help. In lieu, this looks like a reasonably pragmatic read: "Secure Programmer: Validating Input". A slightly dated but more python specific one is "Dealing with User Input in Python"
Depends entirely on the way the file is processed, but generally this should be safe. In Python, you have to put in some effort if you want to treat text as code and execute it.

Converting Python App into Django

I've got a Python program with about a dozen classes, with several classes possessing instances of other classes, e.g. ObjectA has a list of ObjectB's, and a dictionary of (ObjectC, ObjectD) pairs.
My goal is to put the program's functionality on a website.
I've written and tested JSON encode and decode methods for each class. The problem as I see it now is that I need to choose between starting over and writing the models and logic afresh from a database perspective, or simply storing the python objects (encoded as JSON) in the database, and pulling out the saved states for changes.
Can someone confirm that these are both valid approaches, and that I'm not missing any other simple options?
Man, what I think you can do is convert the classes you already have made into django model classes. Of course, only the ones that need to be saved to a database. The other classes, as the rest of the code, I recommend you to encapsulate them for use as helper functions. So you don't have to change too much your code and it's going to work fine. ;D
Or, another choice, that can be easier to implement is: put everything in a helper, the classes, the functions and everything else.
SO you'll just need to call the functions in your views and define models to save your data into the database.
Your idea of saving the objects as JSON on the database works, but it's ugly. ;)
Anyway, if you are in a hurry to deliver the website, anything is valid. Just remember that things made in this way always give us lots of problems in the future.
It hopes that it could be useful! :D

Creating a custom file like object python suggestions?

Hi i am looking to implement my own custom file like object for an internal binary format we use at work(i don't really want to go into too much detail because i don't know if i can). I am trying to go for a more pythonic way of doing things since currently we have two functions read/write(each ~4k lines of code) which do everything. However we need more control/finesse hence the fact of me rewriting this stuff.
I looked at the python documentation and they say what methods i need to implement, but don't mention stuff like iter() / etc.
Basically what i would love to do is stuff like this:
output_file_objs = [
open("blah.txt", "w")
open("blah142.txt", "wb")
my_lib.open("internal_file.something", "wb", ignore_something=True)
]
data_to_write = <data>
for f in output_file_objs:
f.write(data_to_write)
So i can mix it in with the others, and basically have a level of transparency. I will add custom methods to it, but thats not a problem.
Is there any sort of good reference regarding writing your own custom file like objects? Like any form of restrictions or special methods (iter). I should implement?
Or is there a good example of one from within the python standard library that i can look at?
What makes up a "file-like" actually depends on what you intend to use it for; not all methods are required to be implemented (or to have a sane implementation).
Having said that, the file and iterator docs are what you want.
Why not stuff your data in StringIO? Otherwise, you can look at the documentation and implement all of the file like methods. Truth be told, there are no real interfaces in Python, and some features (like tell()) may not make sense for your files, so you can leave them unimplemented.

Categories