Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Currently I am researching about viable approaches to identify a certain object with the image processing techniques. However I ams struggling finding them. For example, I have a CNN capable of detecting certain objects, like a person, then I can track the person as well. However, my issue is that I want the identify the detected and tracked person like saving its credentials and giving an id. I do not want something like who is he/she. Just giving an id in that manner.
Any help/resource will be appreciated.
Create a database, Store the credentials you needed for later use e.g object type and some usable specifications, by giving them some unique ID. CNN already recognized the object so just need to store it in database and later on you can perform more processing on the generated data. Simple solution is that to the problem you are explaining.
Okay I got your problem that you want to identify what kind of object is being tracked because cnn is only tracking not identifying. For that purpose you have to train your CNN on some specific features and give them some identity like objectA has [x,y,z] features. Then CNN will help you in finding the identity of the object.
You can use openCv to do this as well, store some features of some specific objects, then use some distance matching technique to match the live feature with stored features.
Thanks.
I think you are looking for something called ReID. There are a lot of papers about it in CVPR2018.
You can imagine that you would need some sort of stored characteristic vector for each person. For each detected person, gives a new ID if it does not match any previous record, or returns the ID if it does match a record. The key is how to compute this characteristic vector. CNN features (intermediate layer) can be one. Gaussian mixtures of color of the detected human patch can be another.
It is still a very active research field and I think it would be quite hard to make a accurate one if you don't have much resources/time at hand.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I need to validate the correctness of heating/cooling cycle based on reading of temperature sensor in time.
Correct time series has a certain shape (number of ups and downs), lasts more or less the same amount of time, and has a certain max temperature which needs to be met during cycle.
Typically the process is faulty when it is compressed or extruded in time. Has too low temperatures at peaks or in general the heating/cooling envelope is messed up. On the above picture I posted a simplified example of proper and wrong cycles of the process.
What classifier you would recommend for supervised learning models? Is unsupervised model at all possible to be developed for such scenario?
I am currently using calculation of max value of Temperature and Cross Correlation of 1 master typical proper cycle vs the tested one, but I wonder if there is a better more generic way to tackle the problem.
imho machine learning is overengineering this problem, some banding and counting peaks seems to be the much easier approach to me.
Nontheless if you want machinelearning i would go with autoencoders for anomaly detection, examples can be found here or here.
Tl/dr:
The idea is an autoencoder reconstructs the input though a very small bottleneck (i.e one value, could be the phase) so any current point will construct a good looking curve. Then it gets compared to the actual curve. If it fits all is good, if it doesnt you know something is not right.
I am looking for a Hugging Face model that allows to perform Question Answering without providing the context for the answer itself.
For instance, let's assume that I have the following question:
"Who was the fifth King of Rome?"
I would like the model to exploit its own knowledge (i.e. the one that it created in the training phase) to give the answer, without relying on a given context.
So, given as input the raw question I would like the model to output several possibile answers to that question.
I understand that, as stated in the Hugging Face original website, the provided models can perform the "Extractive Question Answering" task which, by definition, needs a context from which to extract the answer to the question.
Is there a way to get rid of the context and use the pre-trained model to perform a "Non-Extractive Question Answering" with the provided models?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am working on a requirement where I have history of previous requests. Requests may be like "Send me a report of .." or "Get me this doc" and this will get assigned to some one and that person will respond.
I need to build an app which will analyse the previous request and if a new request arrives and if any of the previous requests matches then I should recommend the previous request's solution.
I am trying to implement the above using Python and after some research I found doc2vector is one of the approach to convert the previous requests to a vector and match with vector of new request. I want to know, is this the right approach or are better approaches available?
There are several different approaches for your problem. Actually, there's no right or wrong answer, but the one that fits your data, objectives and expected results more properly. To mention a few:
Vectorization (doc2vec)
This approach will make a vector representation of a document based on individual words vector from a pretrained source (these so called embeddings can be more general with worse results in too closed contexts or more specific, being better fit to a special type of text).
In order to match a new request to this vector representation of your document, the new request have to share words with a closely related vector representation, otherwise it won't work.
Keyword matching (or topicalization)
A simpler approach, where a document is classified by the more representative keywords in it (using techniques such as TF-IDF or even simpler word distribution).
To match a new request, this has to include the keywords of the document.
Graph Based Approach
I've worked with this approach for Question Answering in my Master's research. In it, each document is modeled as a graph node connected to its keywords (which are also nodes). Each word in the graph is related to other words and compose a network through which the document is accessed.
To match a new request, the keywords from the request are retrieved and "spread" using one of many network traversal techniques, attempting to get to the closest document into the graph. You can see how I documented my approach here. However, this approach requires either an already existing set of inter-word relations (wordnet for a simpler approach) or a good time spent annotating word relations.
Final Words
However, if you're interested in matching "this document" to "Annex A from e-mail 5". Thats a whooooole other problem. One that is actually not solved. You can attempt to use coreference resolution for references inside the same paragraph or phrase. But that won't work with different documents (e-mails). If you want to win some notoriety in NLP (actually NLU - Natural Language Understanding), that's a research to delve into.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I just learned what controlled variables mean for a project that I am doing, and I was trying to find if sci-kit learn has a controlled variable option. Specifically, does Python have controlled variable (not independent variables) for the logistic regression?
I googled stuff and found nothing for Python. However, I was thinking more basic and that controlled variables means stratifying the group you are interested (say race) and then going analysis on each group based on your x's and y. If this is correct, then I am suppose to interpret the results from those stratified groups, right?
Sorry, I asked two questions, but I am trying to gain much info on this controlled group idea and applications on Python
As you may know that control variables are those variables which the experimenter is not interested in studying, but believes that they do have a significant role in the value which your dependent variable takes. So people generally hold the value of this variable a constant when they run their experiments i.e. collecting data.
To give an example assume that you are trying to model the health condition of a person i.e. classify if he is healthy or not and you are considering age, gender and his/her exercise pattern as inputs to your model and want to study how each and every input affects your target variable. But you very well know that the country in which the subject is residing will also have a say on his health condition (which encodes the climate, heath facility etc.). So in order to make sure that this variable (country) is not affecting your model, you make sure that you collect all your data from just one country.
So answering your first question, no python does not account have controlled variables. It just assumes that all the input variables you are feeding in are of the interest to the experimenter.
Coming to your second question, one way of handling control variables it by first grouping the data with respect to it, so that each group now has a constant value for that control variable, now we run Logistic regression or any model for each group separately and then 'pool' the results from different models. But this approach falls apart if the number of levels in your control variable is really high, in which case we generally consider the control variable as an independent variable and feed it to our model.
For more details please refer to 1 or 2, they really have some nice explanations.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Update: How would one approach the task of classifying any text on public forums such as Games, or blogs such that derogatory comments/texts before bring posted are filtered.
Original: "
I want to filter out adult content from tweets (or any text for that matter).
For spam detection, we have datasets that check whether a particular text is spam or ham.
For adult content, I found a dataset I want to use (extract below):
arrBad = [
'acrotomophilia',
'anal',
'anilingus',
'anus',
.
. etc.
.
'zoophilia']
Question
How can I use that dataset to filter text instances?
"
I would approach this as a Text Classification problem, because using blacklists of words typically does not work very well to classify full texts. The main reason why blacklists don't work is that you will have a lot of false positives (one example: your list contains the word 'sexy', which alone isn't enough to flag a document as being for adults). To do so you need a training set with documents tagged as being "adult content" and others "safe for work". So here is what I would do:
check whether an existing labelled dataset can be used. You need
several thousands of documents of each class.
If you don't find any, create one. For instance you can create a scraper and download Reddit content. Read for instance Text Classification of NSFW Reddit Posts
Build a text classifier with NLTK. If you don't know how, read: Learning to Classify Text
This can be treated as a binary text classification problem. You should collect documents that contain 'adult-content' as well as documents that do not contain adult content ('universal'). It may so happen that a word/phrase that you have included in the list arrBad may be present in the 'universal' document, for example, 'girl on top' in the sentence 'She wanted to be the first girl on top of Mt. Everest.' You need to get a count vector of the number of times each word/phrase occurs in a 'adult-content' document and a 'universal' document.
I'd suggest you consider using algorithms like Naive Bayes (which should work fairly well in your case). However, if you want to capture the context in which each phrase is used, you could consider the Support Vector Machines algorithm as well (but that would involve tweaking a lot of complex parameters).
You may be interested in something like TextRazor. By using their API you could be able to classify the input text.
And for example you can choose to remove all input texts thats comes with some of the categories or keywords you don't want.
I think you more need to explore on filtering algorithms, studying their usage, how multi pattern searching works and how you can use some of those algorithms (their implementations are free online, so it is not hard to find an existing implementation and customize for your needs). Some pointers can be.
Check how grep family of algorithm works, especially the bitap algorithm and Wu-Manber implementation for fgrep..Depending upon how accurate you want to be, it may require adding some fuzzy logic handling (think why people use fukc instead of fuck..right?).
You may find Bloom Filter interesting, since it wont have any false negatives (your data set), downside is that it may contain false positives..