i am trying to create question answering model and have explored few datasets and NLP technique Seq2sql, Bert modal. I actually cannot find the way to understand the question and convert it to query.
My plan is to create some sort of token in database for example :
my_question | question_tag(can be anything extracted from question)
and when someone ask the questions i would love to match the answer's question_tag or create the question_tag from the asked question to create a query. i am open to suggestions or the better way to adapt this process.
Related
I am looking for a Hugging Face model that allows to perform Question Answering without providing the context for the answer itself.
For instance, let's assume that I have the following question:
"Who was the fifth King of Rome?"
I would like the model to exploit its own knowledge (i.e. the one that it created in the training phase) to give the answer, without relying on a given context.
So, given as input the raw question I would like the model to output several possibile answers to that question.
I understand that, as stated in the Hugging Face original website, the provided models can perform the "Extractive Question Answering" task which, by definition, needs a context from which to extract the answer to the question.
Is there a way to get rid of the context and use the pre-trained model to perform a "Non-Extractive Question Answering" with the provided models?
I would like to use Tensorflow to create a smart faq. I've seen how to manage a chatbot, but my need is to let the user searching for help and the result must be the most probable chapter or section of a manual.
For example the user can ask:
"What are the O.S. supported?"
The reply must be a list of all the possible sections of the manual in which could be the correct answer.
My text record set for the training procedure is only the manual itself. I've followed the text classification example, but i don't think is what i need because in that case it would only understand if a given text belongs to a category or another one.
What's the best practice to accomplish this task (i use Python)?
Thank you in advance
An idea could be building embeddings of your text using Bert or other pretrained models (take a look to transformers) and later compare (for instance using cosine distance) such embeddings with your query (the question) and get the most similar ones interpreting as the section or chapter containing them.
I'm a complete beginner when it comes to NLP. Just looking for someone to point me in the right direction.
I have documents that contains lots of multiple-choice questions, the choices, and their answers (like the picture below).
I would like to build a program that is able to get each question, its choices, and the answer. The problem is that not every document follows the exact same format/spacing, so I want to build an all-encompassing program that is able to account for the various formats. Is there anything within NLTK, scikit-learn, or TensorFlow that can help me do this?
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
What I'm trying to do is to ask the user to input a company name, for example Microsoft, and be able to predict that it is in the Computer Software industry. I have around 150 000 names and 60+ industries. Some of the names are not English company names.
I have tried training a Word2Vec model using Gensim based on company names only and averaged up the word vectors before feeding it into SKlearn's logistic regression but had terrible results. My questions are:
Has anyone tried these kind of tasks? Googling on short text classification shows me results on classifying short sentences instead of pure names. If anyone had tried this before, mind sharing a few keywords or research papers regarding this task?
Would it be better if I have a brief description for each company instead of only using their names? How much would it help for my Word2Vec model rather than using only the company names?
For your problem, This is nothing but Company-Industry Relationship so for that, you have to train your word2vec model using company description data because the word2vec works on calculating the similar words related to the given word.So if you train, based on the company names that would give you bad results.If you train on the description then that would give you the similar words related to the particular industry.By using that you can get the industry it belongs to.
If you want to train based on company names NER(Named Entity Tagger) will be useful.But this will not be accurate.
Not sure what you want.
If the point is to use just company names, maybe break names into syllables/phonemes, and train on that data.
If the point is to use Word2Vec, I'd recommend pulling the Wikipedia page for each company (easier to automate than an 'about me').
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am working on a project wherein I have to extract the following information from a set of articles (the articles could be on anything):
People Find the names of any people present, like "Barack Obama"
Topic or related tags of the article, like "Parliament", "World Energy"
Company/Organisation I should be able to obtain the names of the any companies or organisations mentioned, like "Apple" or "Google"
Is there an NLP framework/library of this sort available in Python which would help me accomplish this task?
#sel and #3kt really good answers. OP you are looking for Entity Extraction, commonly referred to as Named entity recognition. There exist many APIs to perform this. But the first question you need to ask yourself is
What is the structure of my DATA? or rather,
Are my sentences good English sentences?
In the sense of figuring out whether the data you are working with is consistently grammatically correct, well capitalized and is well structured. These factors are paramount when it comes to extracting entities. The data I worked with were tweets. ABSOLUTE NIGHTMARE!! I performed a detailed analysis on the performance of various APIs on entity extraction and I shall share with you what I found.
Here's APIs that perform fabulous entity extraction-
NLTK has a handy reference book which talks in-depth about its functions with multiple examples. NLTK does not perform well on noisy data(tweets) because it has been trained on structured data.NLTK is absolute garbage for badly capitalized words(Eg, DUCK, Verb, CHAIR). Moreover, it is slightly less precise when compared to other APIs. It is great for structured data or curated data from News articles and Scholarly reports. It is a great learning tool for beginners.
Alchemy is simpler to implement and performs very well in categorizing the named entities.It has great precision when compared to the APIs I have mentioned.However, it has a certain transaction cost. You can only perform 1000 queries in a day! It identifies twitter-handles and can handle awkward capitalization.
IMHO sPacy API is probably the best. It's open source. It outperforms the Alchemy API but is not as precise. Categorizes entities almost as well Alchemy.
Choosing which API should be a simple problem for you now that you know how each API is likely to behave according to the data you have.
EXTRA -
POLYGLOT is yet another API.
Here is a blog post that performs entity extraction in NLTK.
There is a beautiful paper by Alan Ritter that might go over your head. But it is the standard for entity extraction(particularly in noisy data) at a professional level. You could refer to it every now and then to understand complex concepts like LDA or SVM for capitalisation.
What you are actually looking for is called in literature 'Named entity Recognition' or NER.
You might like to take a look at this tutorial:
http://textminingonline.com/how-to-use-stanford-named-entity-recognizer-ner-in-python-nltk-and-other-programming-languages
One easy way of solving this problem partially this problem is using regular expressions to extract words having the patterns that you can find in this paper to extract peoples names. This of course might lead to extracting all the categories you are looking for i.e. the topics and the campanies names as well.
There is also an API that you can use, that actually gives the same results you are looking for, which is called Alchemy. Unfortunatelly no documentation is available to explain the method they use to extract the topics nor the people's names.
Hope this helps.
You should take a look at NLTK.
Finding names and companies can be achieved by tagging the recovered text, and extracting proper nouns (tagged NNP). Finding the topic is a bit more tricky, and may require some machine learning on a given set of article.
Also, since we're talking about articles, I recommend the newspaper module, that can recover those from their URLs, and do some basic nlp operations (summary, keywords).