I'm starting a personal assistant project. I've noticed that if you type something like "How old is Obama" on Google, the first hit is a little thing that says "51 years (August 4, 1961)". This works for a lot of things, like, if you type "Who is Romney's wife" it returns "Ann Romney (m. 1969)". This is incredibly useful. How can I fetch this data and retrieve it?
Also, if nothing pops up, like saying "How much money is google worth", then scan each of the hits one by one and determines it. (I can do the determination part, I just need to know the scanning).
Can this be done using urllib2?
Have you considered wolframalpha ?
Its more helpful in doing dynamic computations based on a vast collection of built-in data, algorithms, and methods.
Also, Here is an example:
How old is Obama
Related
I'm working on a project where I need to extract "inputs" and "query intent" from text.
For example "What is the status of asset X26TH?"
In this case the main issue is to extract asset id which is X26TH, but how can I make my code understand that it's an id?
The other thing is to understand the query intent which is asset status. I found a good library for this called quepy, but it's meant for linux and I couldn't set it up on windows.
Please help me with the techniques and libraries.
So you have two problems, ID extraction and intent detection.
ID Extraction
If your IDs follow a regular pattern and definitely don't look like English, you can catch them with a regex - if that's possible, that's great since it's very easy to do. If you have a fixed list of product IDs, just check to see if any of them are in the input. If neither of those work then you'll have to get more sophisticated.
Can you get your users to remember a little syntax? If you can request that they write things with a prefix like id:X26TH or similar that would make your job easier. You may find the way the plumber in Plan9 works informative.
If you need to work with whatever the users throw at you, you should look into using a sequence labeller or Named Entity Recognition (NER) system to get IDs. CRFs are probably a good fit for this task; here's a good technial introduction, and the New York Times also used one with success. Besides being trickier to set up a downside of this is that it will require training data, but there's really no way to avoid that.
Intent Detection
This is usually modelled as a text classification problem. You can find an overview of how to do that here. Here's some training examples from the article:
training_data.append({"class":"greeting", "sentence":"how are you?"})
training_data.append({"class":"greeting", "sentence":"how is your day?"})
training_data.append({"class":"greeting", "sentence":"good day"})
training_data.append({"class":"greeting", "sentence":"how is it going today?"})
training_data.append({"class":"goodbye", "sentence":"have a nice day"})
training_data.append({"class":"goodbye", "sentence":"see you later"})
training_data.append({"class":"goodbye", "sentence":"have a nice day"})
training_data.append({"class":"goodbye", "sentence":"talk to you soon"})
training_data.append({"class":"sandwich", "sentence":"make me a sandwich"})
training_data.append({"class":"sandwich", "sentence":"can you make a sandwich?"})
training_data.append({"class":"sandwich", "sentence":"having a sandwich today?"})
training_data.append({"class":"sandwich", "sentence":"what's for lunch?"})
So I'm new to python and just finished my first application. (Giving random chords to be played on a midi piano and increasing the score if the right notes are hit in a graphical interface, nothing too fancy but also non-trivial.) And now I'm looking for a new challenge, this time I'm going to try and create a program that monitors a poker table and collects data on all the players. Though this is completely allowed on almost all poker rooms (example of the largest one) there is obviously no set and go API available. This probably makes the extraction of relevant data the most challenging part of the entire program. In my search for more information, I came across an undergraduate thesis that goes in to writing such a program using Java (Internet Poker: Data Collection and Analysis - Haruyoshi Sakai).
In this thesis, the author speaks of 3 data collection methods:
Sniffing packets
Hand history
Scraping the screen
Like the author, I've come to the conclusion that the third option is probably the best route, but unlike him I have no knowledge of how to start this.
What I do know is the following: Any table will look like the image below. Note how text, including numbers is written in the same font on the table. Additionally, all relevant information is also supplied in the chat box situated in the lower left corner of the window.
In some regards using the chat box sounds like the best way to go, seeing as all text is predictable and in the same font. The problem I see is computational speed: It will often occur that many actions get executed in rapid succession. Any program will have to be able to keep up with this.
On the other hand, using the table as reference means that you have to deal with unpredictable bet locations.
The plan: Taking this in to a count, I'd start by getting an index of all player's names and stacks from the table view and "initialising" the table that way, and continue to use their stacks to extrapolate the betting they do.
The Method: Of course, the method is the entire reason why I made this post. It seems to me like one would need some sort of OCR to achieve all this, but seeing as everything is in a known font, there may be some significant optimisations that can be made. I would love some input on resources to learn about solutions to similar problems. Or if you've got a better idea on how to tackle this problem, I'd love to hear that too!
Please do be sure to ask any questions you may have, I will be happy to answer them in as much detail as possible.
Some background
I am a literature student at New College of Florida, currently working on an overly ambitious creative project. The project is geared towards the algorithmic generation of poetry. It's written in Python. My Python knowledge and Natural Language Processing knowledge come only from teaching myself things through the internet. I've been working with this stuff for about a year, so I'm not helpless, but at various points I've had trouble moving forward in this project. Currently, I am entering the final phases of development, and have hit a little roadblock.
I need to implement some form of grammatical normalization, so that the output doesn't come out as un- conjugated/inflected caveman-speak. About a month ago some friendly folks on SO gave me some advice on how I might solve this issue by using an ngram language modeller, basically -- but I'm looking for yet other solutions, as it seems that NLTK's NgramModeler is not fit for my needs. (The possibilities of POS tagging were also mentioned, but my text may be too fragmentary and strange for an implementation of such to come easy, given my amateur-ness.)
Perhaps I need something like AtD, but hopefully less complex
I think need something that works like After the Deadline or Queequeg, but neither of these seem exactly right. Queequeg is probably not a good fit -- it was written in 2003 for Unix and I can't get it working on Windows for the life of me (have tried everything). But I like that all it checks for is proper verb conjugation and number agreement.
On the other hand, AtD is much more rigorous, offering more capabilities than I need. But I can't seem to get the python bindings for it working. (I get 502 errors from the AtD server, which I'm sure are easy to fix, but my application is going to be online, and I'd rather avoid depending on another server. I can't afford to run an AtD server myself, because the number of "services" my application is going to require of my web host is already threatening to cause problems in getting this application hosted cheaply.)
Things I'd like to avoid
Building Ngram language models myself doesn't seem right for the task. my application throws a lot of unknown vocabulary, skewing all the results. (Unless I use a corpus that's so large that it runs way too slow for my application -- the application needs to be pretty snappy.)
Strictly checking grammar is neither right for the task. the grammar doesn't need to be perfect, and the sentences don't have to be any more sensible than the kind of English-like jibberish that you can generate using ngrams. Even if it's jibberish, I just need to enforce verb conjugation, number agreement, and do things like remove extra articles.
In fact, I don't even need any kind of suggestions for corrections. I think all I need is for something to tally up how many errors seem to occur in each sentence in a group of possible sentences, so I can sort by their score and pick the one with the least grammatical issues.
A simple solution? Scoring fluency by detecting obvious errors
If a script exists that takes care of all this, I'd be overjoyed (I haven't found one yet). I can write code for what I can't find, of course; I'm looking for advice on how to optimize my approach.
Let's say we have a tiny bit of text already laid out:
existing_text = "The old river"
Now let's say my script needs to figure out which inflection of the verb "to bear" could come next. I'm open to suggestions about this routine. But I need help mostly with step #2, rating fluency by tallying grammatical errors:
Use the Verb Conjugation methods in NodeBox Linguistics to come up with all conjugations of this verb; ['bear', 'bears', 'bearing', 'bore', 'borne'].
Iterate over the possibilities, (shallowly) checking the grammar of the string resulting from existing_text + " " + possibility ("The old river bear", "The old river bears", etc). Tally the error count for each construction. In this case the only construction to raise an error, seemingly, would be "The old river bear".
Wrapping up should be easy... Of the possibilities with the lowest error count, select randomly.
Very cool project, first of all.
I found a java grammar checker. I've never used it but the docs claim it can run as a server. Both java and listening to a port should be supported basically anywhere.
I'm just getting into NLP with a CS background so I wouldn't mind going into more detail to help you integrate whatever you decide on using. Feel free to ask for more detail.
Another approach would be to use what is called an overgenerate and rank approach. In the first step you have your poetry generator generate multiple candidate generations. Then using a service like Amazon's Mechanical Turk to collect human judgments of fluency. I would actually suggest collecting simultaneous judgments for a number of sentences generated from the same seed conditions. Lastly, you extract features from the generated sentences (presumably using some form of syntactic parser) to train a model to rate or classify question quality. You could even thrown in the heuristics listed above.
Michael Heilman uses this approach for question generation. For more details, read these papers:
Good Question! Statistical Ranking for Question Generation and
Rating Computer-Generated Questions with Mechanical Turk.
The pylinkgrammar link provided above is a bit out of date. It points to version 0.1.9, and the code samples for that version no longer work. If you go down this path, be sure to use the latest version which can be found at:
https://pypi.python.org/pypi/pylinkgrammar
I am interested in generating a list of suggested semantic tags (via links to Freebase, Wikipedia or another system) to a user who is posting a short text snippet. I'm not looking to "understand" what the text is really saying, or even to automatically tag it, I just want to suggest to the user the most likely semantic tags for his/her post. My main goal is to force users to tag semantically and therefore consistently and not to write in ambiguous text strings. If there were a reasonably functional and reasonably priced tool on the market, I would use it. I have not found such a tool so I am looking in to writing my own.
My question is first of all, if there is such a tool that I have not encountered. I've looked at Zemanta, AlchemyAPI and OpenCalais and none of them seemed to offer the service I need.
Assuming that I'm writing my own, I'd be doing it in Python (unless there was a really compelling reason to use something else). My first guess would be to search for n-grams that match "entities" in Freebase and suggest them as tags, perhaps searching in descriptions of entities as well to get a little "smarter." If that proved insufficient, I'd read up and dip my toes into the ontological water. Since this is a very hard problem and I don't think that my application requires its solution, I would like to refrain from real semantic analysis as much as possible.
Does anyone have experience working with a semantic database system and could give me some pointers regarding where to begin and what sort of pitfalls to expect?
Take a look at NLTK python library. It contains a vast number of tools, dictionaries and algorithms.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Given the size of web2py and the lack of resources and corporate support, do you think it would be advisable to learn web2py as the only web development framework I know. I'm considersing learning Ruby on Rails or web2py for a website I need to create for as a school project.
web2py does have a smaller market share than competitor products but it is also much younger. I have knowledge of at least 13 consulting companies that provide web2py support. Anyway, I do believe web2py is much easier to use than other systems and therefore you will need less support that you may think. Most of the current users get their support via the web2py google group where you can find 29781 messages and almost all questions have been answered within 24 hours by one of the contributors.
Learning is bad. Sherlock Holmes explains:
"You see," he explained, "I consider
that a man's brain originally is like
a little empty attic, and you have to
stock it with such furniture as you
choose. A fool takes in all the lumber
of every sort that he comes across, so
that the knowledge which might be
useful to him gets crowded out, or at
best is jumbled up with a lot of other
things, so that he has a difficulty in
laying his hands upon it. Now the
skilful workman is very careful indeed
as to what he takes into his
brain-attic. He will have nothing but
the tools which may help him in doing
his work, but of these he has a large
assortment, and all in the most
perfect order. It is a mistake to
think that that little room has
elastic walls and can distend to any
extent. Depend upon it there comes a
time when for every addition of
knowledge you forget something that
you knew before. It is of the highest
importance, therefore, not to have
useless facts elbowing out the useful
ones."
I'm sure I'm not the only one who has wasted an inordinate amount of time wading through the many bad and poorly documented Python web frameworks trying to find one I can just use. If I was programming in Ruby or PHP I probably would have spent that time actually writing a web application. This is the curse of web development in Python.
This bit of flamebait may help:
stackoverflow.com tags about web frameworks http://spreadsheets.google.com/pub?key=tZCdBPAkC75t27UzsPdLfMg&oid=2&output=image
Omitted from the chart are the 13,000+ questions tagged [php], but let's not go there.
To be clear, even though choosing a framework for Python web development can be confusing, once you decide on one you get to program in Python. This is the blessing of web development in Python. It can be really nice.
My advice is don't accept anything less than a framework with excellent documentation. With the amount of choices out there there's no need to settle for poor, incomplete docs. Failing that, the simplest frameworks, those lacking room for any magic, are pleasant to work with and quickly learnable.
web2py may be young, but the mailing list has ~2000 messages / month, which is similar to Django and far more than Turbogears. I usually get answers to my questions within a few hours.
There is also an excellent online book, but I find the best source of information is the mailing list.
I have used both RoR, Django, Turbogears, and web2py, and find web2py the most productive.
Learning is good.
Learning something (that eventually goes away) is no loss at all. The basic skills of web development (HTML, CSS, URL-parsing, GET vs. POST) don't ever change.
Frameworks come and go. Learn as many as you can. Learn how to manage your learning so that you (a) get to the important stuff first and (b) leave the other framework stuff behind when tackling a new framework.
Every framework has it's bias (or focus). Once you figure this out, you can make use of them without all the "compare and contrast" that slows some people down. Once you've learned web2py, you have to be careful learning Django that you start fresh, with no translation from old concepts to new.
Web2py is a good one to learn. If this is going to be deployed to a server, double check it supports wsgi. Sometimes php is the way to go because you know it's supported almost anywhere.
Ask yourself what you are looking to gain from the experience. Ie, is it more important to just get the application built and running with a minimum of time and effort, or are you trying to learn about web stack architecture?
If you're just looking for results, obviously you'll have more code and documentation to borrow from if you stick with a more commonly used framework. If you grit your teeth and accept Django's view of the world, you can build very functional applications very quickly. If you can find some pre-made reusable Django apps that handle part of your problem, it'll be even faster.
But if you want to make sure you have a very solid understanding of everything in the request cycle from HTTP request handling to database access and abstraction to form generation and processing and HTML templating, you'll be bettered served with a minimal framework that forces you to think more about the architecture and has a small enough codebase that you can just read it all top to bottom and not really need documentation beyond that. In that case though, I'd advise going even deeper and building your own framework on top of a WSGI library (you don't actually want to waste time learning the intricacies of working around browser quirks if you can help it). Once you've built your own and seen where things get complicated and where the tradeoffs are, you'll be in an excellent position to judge other frameworks and decide if there's one that does things the way you want to work.
This may seem slightly off-topic, but Paul Graham has probably the best essay on this subject that I have seen: The Python Paradox.
Let me put it this way, if you want to work for me, I notice this kind of free thinking and experimentation on a resume, whether the work was commercial, academic, or otherwise. And I'm pretty sure I'm not alone.
Glad I found this thread! Cause some outdated pages and broken external links on Web2Py's website almost scared me off. But at least now I know there's a pretty good community around Web2Py.
I've just been looking through a load of Python web frameworks, and Web2Py's description sounded enticing and managed to make Django sound overly laborious. Pretty sure there are some tangible benefits to Django's design decisions avoiding "too much magic" when it comes to larger projects.
But to just throw something up on the web with err "sane defaults" sounds perfectly good to me. Instead of throwaway scripts, we can make throwaway websites to handle some temporary thing...
There should be room for an appliance style framework with no installation...
Interesting possibilities for some projects. I saw someone already got a python framework + server to work on android phones :))
For me, thanks to this thread, I will just learn both.
Another thought; if Web2Py is open source and you like what it does you might not even mind being the only user at some point in the future, since you can add features to it yourself?
Mind you, I have not used either yet, just read the docs. I think the Web2Py people should put up a blurb on their website to differentiate themselves from Django in more detail, I haven't been able to check off all my question marks for choosing the right one.
I've already used Java EE and Django. The web2py learning curve is so fast! It's incredible! Things that I was getting a time to develop in three days using java, I can do fastly using web2py. Of course, Web2py has not the same ready plugins that RoR, but, doubtless, we can do these things fastly using web2py. Therefore, is a good opportunity to start learning = )
I'm agree with S.Lott saying that:"Learning something (that eventually goes away) is no loss at all."
YEAH It's true but let me suggest that also a scholastic project should be able to reach the better support possible, otherwise could be very frustrating and a waste of time to learn and teach something not well supported, debugged, stable etc.
The time you spent, and maybe your auditors/students, should in some sense projected with an eye to the future...
just for example take a look to turbogears