Open Alteryx .yxdb file in Python? - python

Is there a way to import .yxdb (Alteryx database files) into Pandas/Python, without using Alteryx as a go-between?

The short answer is no, not at this time.
Longer answer: the raw C++ for .yxdb support is available on github, as it was open sourced in order to adhere to R licensing when Alteryx hooked into R. See this link where Ned Harding explains it all in his blog. So basically, everything is there for someone to build Python support by utilizing the open source C++ ... but nobody has done so just yet.

Not python specific, a YXDB to SQLite DB command line based on the C++ library open sourced by Alteryx.
Limitations:
Not a Python module: use subprocess to invoke the command, then pandas/sqlite3 to read from the SQLite database file.
Read YXDB into SQLite only: write to YXDB not implemented (altough the alteryx library would allow it)
Disclaimer: I'm the author of the fork.

Related

solution to convert PDFs, DOCs, DOCXs into a textual format with python

I am developing a full text search engine for indexing popular binary formats. I know that there are hundereds of such questions (and solutions) already, but I found it tough to find one:
cross platform
supports DOC, DOCX and PDF formats at once
easy to use with python
can be set up in a major shared host
For PDFs, I recommend PDFminer.
Try the docx module (I have not used it myself)
I am not aware of any pure python module that can read .doc files.
There are command-line tools to extract text from .doc files: antiword and catdoc (and probably others). If the packages are installed on your shared host, you could use subprocess to shell out to these tools. Available on Windows via Cygwin.
Apache POI is a Java library that can extract text from Office documents. If your shared host has Java installed, you could write a bit of Java (or Jython) code and execute using subprocess.
If at server side you can use OpenOffice then you can use unoconv: Convert between any document format supported by OpenOffice
One possible solution is to use google documents to extract the text contents from binary .doc-files. You upload the document to google docs and then download the text contents. It is a fairly slow process, but it is the only "pure Python" solution I know of since it doesn't require any external tools except for network access. An external tool such as catdoc or antiword is a much better solution if you are allowed to install it on your host.
Textract uses the default tools for every kind of file.
https://github.com/deanmalmgren/textract

Can i take package of cpython?

I used cpython api to load py from C/C++.
But, if i want not setup cpython in client, can I take package dll of cpython in my program?
How to do that?
Installer-builders like PyInstaller (cross-platform) and py2exe (Windows only) basically do that job for you in a general way, except that the executable at the heart of the produced package is their own instead of yours.
But basically, you can imitate their behavior in terms of setting up a .zip file with all the Python library modules you need (or just zip up everything in the standard python library if you want to allow python code running form your app to import anything from there), and follow the simple advice in the Embedding Python in Another Application section of the Python docs.
Note that embedding Python equals extending Python plus a little bit of code to initialize and finalize the interpreter itself and a little bit of packaging as I just mentioned; if you've never writted Python extensions I would suggest practicing that first since it's the most substantial part of the task (not all that hard with helpers such as boost python, but more work if you choose to do it as the "bare C" level instead).
You don't need to setup Python to embed it in applications. The core of the Python interpreter is available as a shared library which you can dynamically load in your application and distribute with it.
Read on embedding Python in the official docs. Also, this article seems nice and comprehensive for Linux. For Windows, read the notes here.
Here's another SO question that discusses this issue.
The Python license is probably hard to understand for a non-lawyer, non-native English speaker. So yes, you can redistribute the unmodified DLL as it contains the copyright notice within it.
It would be polite to give credit like "This program contains the Python Language Interpreter version X.XX http://python.org for more information" or similar somewhere in the program or documentation.

How to extract a windows cabinet file in python

Is it somehow possible to extract .cab files in python?
Not strictly answering what you asked, but if you are running on a windows platform you could spawn a process to do it for you.
Taken from Wikipedia:
Microsoft Windows provides two
command-line tools for creation and
extraction of CAB files. They are
MAKECAB.EXE (included within Windows
packages such as 'ie501sp2.exe' and
'orktools.msi'; also available from
the SDK, see below) and EXTRACT.EXE
(included on the installation CD),
respectively. Windows XP also provides
the EXPAND.EXE command.
I had the same problem last week so I implemented this in python. Comments, additions and especially pull requests welcome: https://github.com/hughsie/python-cabarchive
Oddly, the msilib can only create or append to .CAB files, but not extract them. :(
However, the hachoir parser module can apparently read & edit Cabinets. (I have not used it, though, so I couldn't tell you how fitting it is or not!)

Python lib to Read a Flash swf Format File

I'm interested in using Python to hack on the data in Flash swf files. There is good documentation available on the format of swf files, and I am considering writing my own Python lib to parse that data out using the standard Python struct lib.
Does anybody know of a Python project that already does this? I would also be interested in any available solutions that use Perl, Ruby, Haskell, etc.
Well, unless you're doing it for fun (in which case, go for it!), why not use Ming? It supposedly has python wrappers...
I found another option in SWF Tools. They provide a Python wrapper that supports generating SWF files in Python.
I'm not sure if either SWF Tools or Ming actually supports parsing in and modifying an existing swf file, however. Both seem geared more towards generating swf files from scratch.

OLE Compound Documents in Python

how would you parse a Microsoft OLE compound document using Python?
Edit: Sorry, I forgot to say that I need write support too.. In short, I have an OLE compound file that I have to read, modify a bit and write back to disk (it's a file made with a CAD application)
Just found OleFileIO_PL, but it doesn't have write support.. :/ and as of version 0.40 (2014) it has write support.
Edit: Looks like there's a way (though Windows-only) that supports writing too.. The pywin32 extensions (StgOpenStorage function and related)
An alternative: The xlrd package has a reader. The xlwt package (a fork of pyExcelerator) has a writer. They handle filesizes of 100s of MB cheerfully; the packages have been widely used for about 4 years. The compound document modules are targetted at getting "Workbook" streams into and out of Excel .xls files as efficiently as possible, but are reasonably general-purpose. Unlike OleFileIO_PL, they don't provide access to the internals of Property streams.
http://pypi.python.org/pypi/xlrd
http://pypi.python.org/pypi/xlwt
If you decide to use them and need help, ask in this forum:
http://groups.google.com/group/python-excel
For completeness: on Linux there's also the GNOME Structured File Library (but the default package for Debian/Ubuntu has Python support disabled, since the Python bindings are unsupported since 2006) and the POIFS Java library.

Categories