I am making a chatbot in RASA which helps High school graduates find university according to their desired location. I have all my data stored in a CSV file. So is there any way we can extract some specific data from that CSV.
Example: If a user asks to show universities available in a certain location, how to extract the specific data from CSV which is the name of the university according to the location given by the user.
Seems like you will need to train the model with location entity. Create a story that will link an intent with location entity to a custom action.
Sample story could like something like this:
story 1
* ask_university{"location":New York}
- action_get_universities
In the custom action action_get_universities, you will then need to handle the CSV query based on the location entity that the model detected. Pandas should work just fine.
Have fun exploring !
Related
I'm working on my first project in Django and I'd really like to have some advice from people with more experience then me since right now I'm a little stuck on what road take to keep developing my project.
This is my plan to develop my project:
Create a modelsFileFields so that my user can upload one or more standarded excel file
Read it and create variables with pandas
Create an html page with the graph with chart.js and render it in pdf with ReportLab
Store the pdf in my user profile usinig primary key so that they can see it and download it again.
My main problem right now is to store or not the information that are in the excel file in my database. Since getting the information in the excel file is what is important to me, my first thought was to import them and the only thing that it keeping me from doing so is the quantity of column that I have. This how my excel file looks like:
total partial [...]
user1 10 4
user2 18 6
I have more then 60 variable in my excel file (so I'd need a model with more then 60 fields) and they should be doubled since I'd need the information from user1 and also user2.
So I'd like to ask if I should give up on import the csv to my database since how much big it would be and also if what I'm plannning to do has sense or if there is a better way to do so (every exemple project is welcomed).
Thanks for the helps!
I want to write a small script in which I use data from an excel document and mix it with a text I have that I send to leads.
Example:
Hello (Data-Name), I can see that your business (Data-business) need
some service on (Data-website name). Best regards Jim
So if I have 100 leads, then, it must create a document for the data to be inserted into the text and create a new document with the generated text.
I have been trying this but I was not able to make it work.
I am constructing a twitter database and would like to record some demographic data based on profile pictures. I decided to do this using Microsoft's Face API, but I am new to Python and coding in general, is there a way to:
Extract and save the age and gender data into the same or separate data frame.
Construct a list of only unique twitter handles, based on my data, so that I don't waste my daily call limit on Azure.
Append the data so that I have the age and gender of each person, stored next to their unique twitter handle.
The twitter data is stored in the format:
'timestamp', 'id', 'text', 'user', 'replies', 'retweets', 'likes'
I am also open to less other suggestions.
P.S.
Is the Python demo for Face broken, because it won't run with the default code?
Your question is wide, you should keep your questions focused on particular problem. Let me answer to part one:
You should use the SDK:
https://pypi.org/project/azure-cognitiveservices-vision-face/
And you have some samples here:
https://github.com/Azure-Samples/cognitive-services-python-sdk-samples/blob/master/samples/vision/face_samples.py
After that, storing and building a DB is a different question that I suggest humbly you ask separately once you have specific questions after your got the data you like.
My aim is to extract information from old scanned reports and store in the structured database. I have already extracted text from these reports using Solr.
All of these are scientific reports and have a different structure in terms of the content of the report, but all of these has similar information. I wanted to a create a structured database using these reports such as name of the company involved in the report, name of the software involved in the report, name of the location, date of the experiment etc. For each of these fields, I have some keywords which shall be used for extraction, For example for the Location information: Location, Place of experiment, Place, Facility etc. What will be the best way to proceed in the direction?
Also, in some of these files, there are no sentences to process. Information is given in Form like structure, for example:
Location: Canada
Date of the experiment: 1985-05-01.
Which techniques will be best to extract the information? also which software, libraries should I use?
I'm near a total outsider of programming, just interested in it.
I work in a Shipbrokering company and need to match between positions (which ship will be open at where, when) and orders (what kind of ships will be needed at where, when for what kind of employment).
And we send and receive such info (positions and orders) by emails to and from our principals and co-brokers.
There are thousands of such emails each day.
We do the matching by reading the emails manually.
I want to build an app to do the matching for us.
One important part of this app will do the information extraction from email text.
==> My question is how do I use Python to extract unstructured info into structured data.
Sample email of an order [annotation in the brackets, but is not included in the email]:
Email Subject: 20k dwt requirement, 20-30/mar, Santos-Conti
Content:
Acct ABC [Account Name]
Abt 20,000 MT Deadweight [Size of Ship Needed]
Delivery to make Santos [Delivery Point/Range, Owners will deliver the ship to Charterers here]
Laycan 20-30/Mar [Laycan (the time spread in which delivery can be accepted]
1 time charter with grains [What kind of Empolyment/Trade, Cargo]
Duration about 35 days [Duration]
Redelivery 1 safe port Continent [Redelivery Point/Range, Charterers will redeliver the ship back to Owners here.]
Broker name/email/phone...
End Email
Same email above can be written in many different ways - some writes in one line, some use l/c instead of laycan...
And there are emails for positions with ship's name, open port, date range, ship's deadweight and other specs.
How can I extract the info and put it into structured data, with Python?
Let's say I have put all email contents into text files.
Thanks.
Below is a possible approach:
Step 1: Classify the mails in categories using the subject and/or message in the mail.
As you stated one category is of mails requesting position and the other is of mails of order.
Machine Learning can be used to classify. You can use set of previous mails as training corpus. You might consider using NLTK(Natural Langauage Toolkit) for Python. Here is the link on text classification using NLTK.
Step 2: Once an email is identified as an order mail, process it to fetch the details(account name, size, time spread etc.) As you mentioned the challenge here is that there is no fixed format for these data. To solve this problem, you might consider preparing an exhaustive list of synonyms for each label(like for account the list could be like ['acct', 'a/c', 'account', 'acnt']). This should be done once, by going through a fixed volume of previous mails.
To make the solution more effective, you could consider implementing option for active learning
(i.e., prompt the user if in a mail a lable is found which is not found in any list. E.g. in a mail, if "accnt" is used, it wont be resolved, hence user should be prompted to ask in which category it falls.)
Once a lable is identifies, you can use basic string operations, to parse the email in a fetch relevant data in structured format.
You can refer to this discussion for a better understanding.