I'm trying to learn NLP with python. Although I work with a variety of programming languages I'm looking for some kind of from the ground up solution that I can put together to come up with a product that has a high standard of spelling and grammer like grammerly?
I've tried some approaches with python. https://pypi.org/project/inflect/
Spacy for parts of speech.
Could someone point me in the direction of some kind of fully fledged API, that I can pull apart and try and work out how to get to a decent standard of english, like grammerly.
Many thanks,
Vince.
i would suggest to checkout Stanford core nlp, nltk as a start. but if you want to train your own ner and symantic modeling gensim.
Related
I've recently used the Language API to gather sentiment predictions for a work project. I had about 1,300 unlabeled documents and we used NLTK's tools initially, which was based on a dictionary of terms with polarity estimates of each word in the dictionary. I turned to the API, and after reviewing the predictions, the API produced much better results than NLTK.
I understand that the engineers probably won't want to release the details of the prediction engine, but I am curious how it works at a high level. If anybody could enlighten me or point me in the right direction, I'd appreciate it. For example, "it uses a Neural Network, trained on billions of observations," would be a reasonable answer.
Again, I'm using this for a work project and I'd like to be able to give a brief justification of why I switched from NLTK to the API (the improved results should speak for themselves, but I will definitely get "well, how does it work?").
The language API is a pipeline of state-of-the-art machine-learned systems that are trained on a combination of public data (like the Penn Treebank) and proprietary data annotated by Google's linguists.
Performance improvements compared to something like NLTK come from a combination of more and better data for training, as well as cutting edge machine learning algorithms, including but not limited to neural networks.
Related links that discuss some of the algorithms:
https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html (Parsing algorithms)
https://www.wired.com/2016/11/googles-search-engine-can-now-answer-questions-human-help/ (Mentions the linguist team)
https://research.googleblog.com/2016/08/acl-2016-research-at-google.html (Recent publications from the NLP research team)
I'm learning statistical learning these days using python's pandas and scikit-learn library and they're fantastic tools for me.
I could have learned the way of classification, regression and also clustering with them of course.
But, I cannot find the way how can I start with them when I would like to make a recommendation model. For example, if I have a customer's purchase dataset, which contains date, product name, product maker, price, order device etc...
What is the problem type of recommendation? classification, regression, or anything else?
In fact, I could find out there are very famous algorithms like collaborative filtering when someone has to solve this problem.
If so, can I use those algorithms using scikit-learn? or should I have to learn another M.L libraries?
Regards
Scikit-learn does not offer any recommendation system tools. You can give a look at mahout which is giving really easy to start proposition or spark.
However recommendation is a problem in itself in machine learning word. It can be regression if you are trying to predict the rate that a user would give to a movie for instance or classification if you want to know if a user will like the movie or not (binary choice).
The important thing is that recommendation is using tools and algorithms dedicated to this problem like item-based or content-based recommendation. These concepts are actually quite simple to understand and implementing yourself a little recommendation engine might be the best.
I advice you the book mahout in action which is a great introduction to recommendation concept
How about Crab https://github.com/python-recsys/crab, which is a a Python framework for building recommender engines integrated with the world of scientific Python packages (numpy, scipy, matplotlib).
I have not used this framework but just found it. And it seems there is only version 0.1 and Crab hasn't been updated for years. So I doubt whether it is well documented. Whatever, if you decide to try Crab, please give us a feedback after that:)
I have some research project which needs best NER results .
Can anyone please help me with the best NER tools which have a Python library.
Talking about Java, Stanford NER seems to be the best ceteris paribus.
There is also LingPipe, Illinois and others, take a look at ACL list.
Also consider this paper for experimental comparison of several NERCs.
I am attempting to build a model that will attempt to identify the interest category / topic of supplied text. For example:
"Enjoyed playing a game of football earlier."
would resolve to a top level category like:
"Sport".
I'm not sure what the correct terminology is for what I am trying to achieve here so Google hasn't turned up any libraries that may be able to help. With that in mind, my approach would be something like:
Extract features from text. Use tagging to classify each feature / identify names / places. Would probably use NTLK for this, or Topia.
Run a Naive Bayes classifier for each interest category ("Sport", "Video Games", "Politics" etc.) and get a relevancy % for each category.
Identify which category has the highest % accuracy and categorise the text.
My approach would likely involve having individual corpora for each interest category and I'm sure the accuracy would be fairly miserable - I understand it will never be that accurate.
Generally looking for some advice on the viability of what I am trying to accomplish, but the crux of my question: a) is my approach is correct? b) are there any libraries / resources that may be of assistance?
You seem to know a lot of the right terminology. Try searching for "document classification." That is the general problem you are trying to solve. A classifier trained on a representative corpus will be more accurate than you think.
(a) There is no one correct approach. The approach you outline will
work, however.
(b) Scikit
Learn
is a wonderful library for this sort of work.
There is plenty of other information, including tutorials, online about this topic:
This Naive Bayesian Classifier on github probably already does most of what you want to accomplish.
This NLTK tutorial explains the topic in depth.
If you really want to get into it, I am sure a Google Scholar search will turn up thousands of academic articles in computer science and linguistics about exactly this topic.
You should check out Latent Dirichlet Allocation it will give you categories without labels , as always ed chens bolg is a good start.
I have a project about chunking of Arabic text
I want to know if it is possible to use NLTK to extract the chunks NP, VP, PP of arabic Text and how I can use an Arabic corpus.
Please any one help me!
It's far from perfect (largely because the linguistic properties of Arabic are significantly different from those of English), but a computer science student developed an Arabic language analysis toolkit in 2011 that looks promising. He developed "an integrated solution consisting of a part-of-speech tagger and a morphological analyser. The toolkit was trained on classical Arabic and tested on a sample text of modern standard Arabic." I would think a limitation of this tool would be that the training set was classical while the test set was MSA.
The paper is a great start because it addresses existing tools and their relative successes (and shortcomings). I also highly recommend this 2010 paper which looks like an outstanding reference. It is also available as a book in print or electronic format.
Also, as a personal note, I would love to see a native speaker who is NLP-savvy use Google ta3reeb (available as a Java open source utility) to develop better tools and libraries. Just some of my thoughts, my actual experience with Arabic NLP is very limited. There are a variety of companies that have developed search solutions that apply Arabic NLP principles as well, although much of their work is likely proprietary (for instance, I am aware that Basis Technology has worked with this fairly extensively; I am not affiliated with Basis in any way nor have I ever been).