Python lib to Read a Flash swf Format File - python

I'm interested in using Python to hack on the data in Flash swf files. There is good documentation available on the format of swf files, and I am considering writing my own Python lib to parse that data out using the standard Python struct lib.
Does anybody know of a Python project that already does this? I would also be interested in any available solutions that use Perl, Ruby, Haskell, etc.

Well, unless you're doing it for fun (in which case, go for it!), why not use Ming? It supposedly has python wrappers...

I found another option in SWF Tools. They provide a Python wrapper that supports generating SWF files in Python.
I'm not sure if either SWF Tools or Ming actually supports parsing in and modifying an existing swf file, however. Both seem geared more towards generating swf files from scratch.

Related

Using libexif in Python

I'm writing a Python-based [web] application that needs to be able to read and write EXIF data.
libexif seems to have all the right ingredients, but I can't work out how (or if) I could access it access it by using Python's ctypes library? I'm new to C, suppose I need see a .so for this to work?
You need to be running on an os that you can obtain the required library, to download the .h files, (usually the -dev package gives you these).
Then you need to work your way through the ctypes tutorial found here which explains all the steps you need to take.

solution to convert PDFs, DOCs, DOCXs into a textual format with python

I am developing a full text search engine for indexing popular binary formats. I know that there are hundereds of such questions (and solutions) already, but I found it tough to find one:
cross platform
supports DOC, DOCX and PDF formats at once
easy to use with python
can be set up in a major shared host
For PDFs, I recommend PDFminer.
Try the docx module (I have not used it myself)
I am not aware of any pure python module that can read .doc files.
There are command-line tools to extract text from .doc files: antiword and catdoc (and probably others). If the packages are installed on your shared host, you could use subprocess to shell out to these tools. Available on Windows via Cygwin.
Apache POI is a Java library that can extract text from Office documents. If your shared host has Java installed, you could write a bit of Java (or Jython) code and execute using subprocess.
If at server side you can use OpenOffice then you can use unoconv: Convert between any document format supported by OpenOffice
One possible solution is to use google documents to extract the text contents from binary .doc-files. You upload the document to google docs and then download the text contents. It is a fairly slow process, but it is the only "pure Python" solution I know of since it doesn't require any external tools except for network access. An external tool such as catdoc or antiword is a much better solution if you are allowed to install it on your host.
Textract uses the default tools for every kind of file.
https://github.com/deanmalmgren/textract

looking for pure python package to create images of websites

I've previously used http://code.google.com/p/wkhtmltopdf/ with http://pypi.python.org/pypi/wkhtmltopdf/0.2 to create screenshots of websites from the command line. However, I was wondering whether a pure python package exists, that can do the same. Currently I always need to download the correct binary of http://code.google.com/p/wkhtmltopdf/ if I switch computers. A pure python package would relieve me from this. Any ideas?
That would require a browser engine written in pure python. And this means you need a CSS processor und, more important, a complete Javascript engine written in Python. While this is undoubtedly possible, I'm pretty sure nobody has done it.

Resources (resx) with Python

Do you know of any Python module for resources (resx files) manipulation?
P.S.: I know I could write a custom wrapper on top of base XML processor available, I'm just checking out before going to hack my own code...
This question Resources (resx) maintenance in big projects has an answer pointing to some .NET source code for a tool to manage RESX resources. Since IronPython can interface with any existing .NET objects written in C#, you should be able to adapt that RESX tool source code into an object that you can then use in IronPython.

OLE Compound Documents in Python

how would you parse a Microsoft OLE compound document using Python?
Edit: Sorry, I forgot to say that I need write support too.. In short, I have an OLE compound file that I have to read, modify a bit and write back to disk (it's a file made with a CAD application)
Just found OleFileIO_PL, but it doesn't have write support.. :/ and as of version 0.40 (2014) it has write support.
Edit: Looks like there's a way (though Windows-only) that supports writing too.. The pywin32 extensions (StgOpenStorage function and related)
An alternative: The xlrd package has a reader. The xlwt package (a fork of pyExcelerator) has a writer. They handle filesizes of 100s of MB cheerfully; the packages have been widely used for about 4 years. The compound document modules are targetted at getting "Workbook" streams into and out of Excel .xls files as efficiently as possible, but are reasonably general-purpose. Unlike OleFileIO_PL, they don't provide access to the internals of Property streams.
http://pypi.python.org/pypi/xlrd
http://pypi.python.org/pypi/xlwt
If you decide to use them and need help, ask in this forum:
http://groups.google.com/group/python-excel
For completeness: on Linux there's also the GNOME Structured File Library (but the default package for Debian/Ubuntu has Python support disabled, since the Python bindings are unsupported since 2006) and the POIFS Java library.

Categories