Can I call external *python* functions from google refine? - python

I'm investigating Google refine to speed up some of my data work -- never used it before this week, but I like a lot of what I see.
My biggest question so far is whether it's possible to call external python functions from Refine. I know you can call jython internally, but that doesn't provide access to C-based python libraries (e.g. lxml), and I have scripts elsewhere that I'd like to integrate, without lots of copy-paste or rewrite hassle.
What options are there for doing this in Refine? I'm willing to get creative -- I just want a stable, re-usable solution.

As Google Refine Wiki says:
lxml will NOT work in Jython, since lxml has C bindings for CPython (regular Python), hence will not work in Refine which is Jython / Java only, and has no CPython interpreter built-in
But you can try Google Refine Python Client Library to create projects and manipulate your data programmatically.

I'm going to mark reclosedev's answer as accepted, but there's still a litle more to the story.
The other answer to this question is that you can set up your own python-based API. For this project, I was able to set up a django app running on a local server. It only took an hour or so to build the API to my existing library.
More hassle than I'd have liked, but it fit the bill for this project without soaking up too much time.

Related

Python interface with SWI-Prolog

I want to use a Python script as a frontend to a Prolog program that uses the SWI-PL engine.
So, the components of the setup are:
Python (2.7 or higher)
SWI-PL: website here
I've been looking around for an interface between SWI-PL and Python.
What I found are:
PySwip but it seems to be lacking from what i see from old questions here, and also seems unsupported.
PyLog, which seems newer but also has some activity. Although i don't know how good it is.
What is the recommended way of using Python to communicate with SWI-prolog?
Are there perhaps other ways to accomplish this?
Maybe with another prolog engine?
I'm stuck with the Prolog language and Python because I know them best, so that would be necessary (I know for instance there are also tools for Java).
I've personally used PySWIP successfully. Here's a link to a project I did for my AI class in university in which I used PySWIP.
I think the difference is that PySWIP is a bridge (just send queries to a Prolog database and get responses) whereas PyLog seems to be an implementation of Prolog (or a built-in Prolog engine) in Python, with abstractions on Prolog code using objects.
I have no particular recommendation for you. Choose whichever you deem will suit your project best. Consider the licenses under which these libraries are published if you will need to worry about your code's license.

Can I use Java scientific libraries in Google App Engine?

I am trying to make a web application to perform scientific and engineering calculations. I am new to web developing and I've been looking for a free framework (with free hosting), that's why I came to Google App Engine, but there is no way to get scipy working, so I decided to switch to Java instead of python (although I found PythonAnywhere and it has numpy, scipy, etcetera, it has no GUI-building support like PyQt, wx, Tkinter...).
I would like to know if there is a way to use COLT or so for Java in the Google App Engine, or if there is some other option. I would rather free options since I'm at college, but cheap-customizable-options are totally welcome :D (even if it means to use another language)
PD: I hope this was understandable since english is not my language.
EDIT:
I TRIED to use apache commons math, and it seems like it's not going to work. The short answer to my question is: NO.
I believe GAE is severely limiting in what it will allow you to run.
I doubt you will find a completely free Java hosting solution.
To clarify the statements in other posts, GAE is incredibly limiting with respect to Python packages with C extensions. Anything pure Python will work fine. Scipy makes heavy use of C extensions, so it falls into this category.
Google recently introduced Python2.7 support, and with it, the ability to use NumPy on App Engine. I'm not sure if this covers your need, but it might be worth checking out.
I only develop with Python for Google App Engine, so I'm afraid I can't comment on the state of Java external dependencies.
GAE will limit lot of things if not all in your case. You might want to try out Heroku, Amazon Web Services within their free quota.
I see no reason not to do this. You can run front-end instances which can use 800MHz of processor and 128MB of RAM - you can run one all the time for free but you need to be able to split your tasks into 10min sections (if you use tasks, or 30 second sections otherwise). A backend is going to be chargeable and you'd probably find it cheaper to run on another system.

Python equivalent to Java's JNLP Web Start?

Is there any way to achieve the same functionality in Python, i.e., launching a script from a browser and automatically updating it from a central server location?
Run your app on Jython and use Java Web Start?
From a comment below, http://blog.pyproject.ninja/posts/2016-03-31-web-start-on-jython.html, provides a complete example.
Note that Jython is not Python- some stuff does not work, and notably Jython is only Python-2.7 compatible.
Well this is still not a full match of the features of JNLP but maybe esky is closer to what you want. It's not browser based but once your app is installed on the client it can update itself. It might also lack something in the cross-platform department so depending on your environment YMMV.
Another alternative might be the Dabo framework at dabodev.com. It's been a few years since i looked at that but it still looks like it's alive :-)
You may be able to achieve some functionality with Skulpt although it uses classless python, so its functionality is rather limited.
Well check out this python wiki page as it lays out various options.

portable non-relational database

I want to experiment/play around with non-relational databases, it'd be best if the solution was:
portable, meaning it doesn't require an installation. ideally just copy-pasting the directory to someplace would make it work. I don't mind if it requires editing some configuration files or running a configuration tool for first time usage.
accessible from python
works on both windows and linux
What can you recommend for me?
Essentially, I would like to be able to install this system on a shared linux server where I have little user privileges.
I recommend you consider BerkelyDB with awareness of the licensing issues.
I am getting very tired of people recommending BerkleyDB without qualification - you can only distribute BDB systems under GPL or some unknown and not publicly visible licensing fee from Oracle.
For "local" playing around where it is not in use by external parties, it's probably a good idea. Just be aware that there is a license waiting to bite you.
This is also a reminder that it is a good idea when asking for technology recommendations to say whether or not GPL is acceptable.
From my own question about a portable C API database, whilst a range of other products were suggested, none of the embedded ones have Python bindings.
Metakit is an interesting non-relational embedded database that supports Python.
Installation requires just copying a single shared library and .py file. It works on Windows, Linux and Mac and is open-source (MIT licensed).
BerkleyDB
If you're used to thinking a relational database has to be huge and heavy like PostgreSQL or MySQL, then you'll be pleasantly surprised by SQLite.
It is relational, very small, uses a single file, has Python bindings, requires no extra priviledges, and works on Linux, Windows, and many other platforms.
Have you looked at CouchDB? It's non-relational, data can be migrated with relative ease and it has a Python API in the form of couchdb-python. It does have some fairly unusual dependencies in the form of Spidermonkey and Erlang though.
As for pure python solutions, I don't know how far along PyDBLite has come but it might be worth checking out nonetheless.
BerkeleyDB : (it seems that there is an API binding to python : http://www.jcea.es/programacion/pybsddb.htm)
Have you looked at Zope Object Database?
Also, SQLAlchemy or Django's ORM layer makes schema management over SQLite almost transparent.
Edit
Start with http://www.sqlalchemy.org/docs/05/ormtutorial.html#define-and-create-a-table
to see how to create SQL tables and how they map to Python objects.
While your question is vague, your comments seem to indicate that you might want to define the Python objects first, get those to work, then map them to relational schema objects via SQLAlchemy.
If you're only coming and going from Python you might think about using Pickle to serialize the objects. Not going to work if you're looking to use other tools to access the same data of course. It's built into python, so you shouldn't have any privileged problems, but it's not a true database so it may not suit the needs of your experiment.
Adding a reference to TinyDB here since this page is showing at the top of many searches. It is a portable non-relational database in python. It stores python dicts into a local json file and makes them available for database ops similar to mongodb. It also has an extension to port to mongodb's commands, the difference being that instead of working on another system server you'll be operating on a local json file.
And unlike the presently chosen answer, it is under a permissive MIT open license.
Links:
TinyDB site
Tinydb on Github
Basic Usage
Usage
Support forum
Implementations

Other than basic python syntax, what other key areas should I learn to get a website live?

Other than basic python syntax, what other key areas should I learn to get a website live?
Is there a web.config in the python world?
Which libraries handle things like authentication? or is that all done manually via session cookies and database tables?
Are there any web specific libraries?
Edit: sorry!
I am well versed in asp.net, I want to branch out and learn Python, hence this question (sorry, terrible start to this question I know).
Basic Python syntax isn't half of what you need to know.
All of the Python built-in data structures.
Object-oriented design.
What python module and packages are.
The Python libraries -- almost everything you could ever want has already been written.
To name a few things.
If you've done some web development, you probably have some background in HTTP protocol, HTML, .CSS and Javascript and SQL.
You should use a framework to handle the endless collection of mundane details, like authentication. Look at Django.
Answer replaced to correspond with the updated question.
If you're already familiar with ASP.NET, the easiest way to jump into creating a website with Python is probably to look into one of the major web frameworks. Django is very popular, working through the installation guide and the tutorial will probably get you rolling pretty well.
Really though, I'd personally suggest at least learning the language itself to a basic competency level before trying to dive right into using it inside a web framework. I think you'll be trying to force yourself to learn too much at once. In terms of just learning Python, the free book Dive Into Python is always spoken of highly.
Oh, golly.
Look, this is gonna be real hard to answer because, read as you wrote it, you're missing a lot of steps. Like, you need a web server, a design, some HTML, and so on.
Are you building from the ground up? Asking about Python makes me suspect you may be using something like Zope.
Don't forget to give IronPython a try - your .NET experience can help making sense of newly learned Python idioms.
IronPython is an implementation of the Python programming language running under .NET and Silverlight. It supports an interactive console with fully dynamic compilation. It's well integrated with the rest of the .NET Framework and makes all .NET libraries easily available to Python programmers, while maintaining compatibility with the Python language.
Of course the builtins. And become familiar with the standard library (until you start to remember what's in it, I'd suggest looking through it any time you're about to implement something... It might be there already!)
You'll want some kind of framework, I'd recommend Django or TurboGears
But you also need to learn the pythonic-way. For this, open up a Python interpreter and type:
import this

Categories