I'm doing my final degree project. I need to create an extended version of the word2vec algorithm, changing the default objective function of the original paper. This has already been done (check this paper). In that paper, they only say the new objective function, but they do not say how they have run the model.
Now, I need to extend that model too, with another function, but I'm not sure if I have to implement word2vec myself with the new function, or there is a way to replace it in the Gensim word2vec implementation.
I have checked the Word2Vec Gensim documentation but I have not seen any parameter to do this. Do you have any idea how to do it? It is even possible?
I was unsure if this StackExchange site was the correct one, maybe https://ai.stackexchange.com/ is more appropriate.
There's no official support in Gensim for simply dropping in your own objective function.
However, the full source code is available – https://github.com/RaRe-Technologies/gensim – so by editing it, or using it as a model for your own implementation, you could theoretically do anything.
Beware, though:
the code has gone through a lot of optimization & customization for new options that may not be relevant to your needs, so may not be the most clean & simple starting point
for performance, the core routines are written in Cython – see the .pyx files – which can be especially hard to debug, and rely on library bulk array functions that may obscure how to implement your alternate function instead
I'm looking for a python library for replace the rake function from "Survey", an R library (https://www.rdocumentation.org/packages/survey/versions/4.0/topics/rake)
I have found and try Quantipy, but the weights quality is poor compared to the weights generate with R on the same dataset.
I have found PandaSurvey, but seems to not working correctly (and the documentation is very poor)
I am surprised not to find much on google on this subject. However, it is an essential function if you are working with polls. Python being a datascience language, it's surprising. But maybe I missed it.
Thank you very much!
I am analyzing the call records and try to use doc2vec I cant find the appropriate way to apply
I tried to convert words to root later i will try to get rid of stop words(which are rooted).
I desire to understand that each what the conversation is about(that can be a few or more words).Can you suggest me a certain way or sample project ?
Note that many word2vec/doc2vev projects don't apply word-stemming (converting words to their roots), nor remove stop words. With an adequately-large training corpus, neither step is strictly necessary.
You seem to be at a very rudimentary starting point, so you should work through online examples of Doc2Vec (and more generally "topic modeling"). Several Jupyter Notebooks demonstrating both basic and more advanced uses of Doc2Vec are included with gensim, in the installations docs/notebooks directory. You can also view them online at:
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/
doc2vec-lee.ipynb: very simple example of usage on toy-sized data
doc2vec-IMDB.ipynb: more advanced example based on a movie-reviews experiment included in the original "Paragraph Vector" (Doc2Vec) research paper
doc2vec-wikipedia.ipynb: much larger & longer-running model using millions of Wikipedia articles
Though you can browse these online, you can and should run them locally step-by-step as a learning exercise, then tinker with them slightly as an exploration, before finally using them (and other sources) as guides for how you can approach your own problem.
I am trying to deal with data imbalance within a small dataset. Just found an article talking about SMOTE and MSMOTE here
It seems that MSMOTE can overcome the shortages of SMOTE, so I really want to try it. MSMOTE paper is published in 2009, however I could not find any library related to MSMOTE in R or python.
Do you know whether there is any built-in MSMOTE I could try? I'm fine with whatever programming language...
You can use "imbalanced-learn" package in Python.
This is the link
This is an old question, but for future reference.
Here is a library with multiple variants to SMOTE in Python.
In particular, includes MSMOTE: https://smote-variants.readthedocs.io/en/latest/oversamplers.html?highlight=msmote#msmote
oversampler= smote_variants.MSMOTE()
X_samp, y_samp= oversampler.sample(X, y)
I would like to perform a few basic machine vision tasks using Python and I'd like to know where I could find tutorials to help me get started.
As far as I know, the only free library for Python that does machine vision is PyCV (which is a wrapper for OpenCV apparently), but I can't find any appropriate tutorials.
My main tasks are to acquire an image from FireWire. Segment the image in different regions. And then perform statistics on each regions to determine pixel area and center of mass.
Previously, I've used Matlab's Image Processing Tootlbox without any problems. The functions I would like to find an equivalent in Python are graythresh, regionprops and gray2ind.
Thanks!
OpenCV is probably your best bet for a library; you have your choice of wrappers for them. I looked at the SWIG wrapper that comes with the standard OpenCV install, but ended up using ctypes-opencv because the memory management seemed cleaner.
They are both very thin wrappers around the C code, so any C references you can find will be applicable to the Python.
OpenCV is huge and not especially well documented, but there are some decent samples included in the samples directory that you can use to get started. A searchable OpenCV API reference is here.
You didn't mention if you were looking for online or print sources, but I have the O'Reilly book and it's quite good (examples in C, but easily translatable).
The FindContours function is a bit similar to regionprops; it will get you a list of the connected components, which you can then inspect to get their info.
For thresholding you can try Threshold. I was sure you could pass a flag to it to use Otsu's method, but it doesn't seem to be listed in the docs there.
I haven't come across specific functions corresponding to gray2ind, but they may be in there.
documentation: A few years ago I used OpenCV wrapped for Python quite a lot. OpenCV is extensively documented, ships with many examples, and there's even a book. The Python wrappers I was using were thin enough so that very little wrapper specific documentation was required (and this is typical for many other wrapped libraries). I imagine that a few minutes looking at an example, like the PyCV unit tests would be all you need, and then you could focus on the OpenCV documentation that suited your needs.
analysis: As for whether there's a better library than OpenCV, my somewhat outdated opinion is that OpenCV is great if you want to do fairly advanced stuff (e.g. object tracking), but it is possibly overkill for your needs. It sounds like scipy ndimage combined with some basic numpy array manipulation might be enough.
acquisition: The options I know of for acquisition are OpenCV, Motmot, or using ctypes to directly interface to the drivers. Of these, I've never used Motmot because I had trouble installing it. The other methods I found fairly straightforward, though I don't remember the details (which is a good thing, since it means it was easy).
I've started a website on this subject: pythonvision.org. It has some tutorials, &c and some links to software. There are more links and tutorials there.
You probably would be well served by SciPy. Here is the introductory tutorial for SciPy. It has a lot of similarities to Matlab. Especially the included matplotlib package, which is explicitly made to emulate the Matlab plotting functions. I don't believe SciPy has equivalents for the functions you mentioned. There are some things which are similar. For example, threshold is a very simple version of graythresh. It doesn't implement "Otsu's" method, it just does a simple threshold, but that might be close enough.
I'm sorry that I don't know of any tutorials which are closer to the task you described. But if you are accustomed to Matlab, and you want to do this in Python, SciPy is a good starting point.
I don't know much about this package Motmot or how it compares to OpenCV, but I have imported and used a class or two from it. Much of the image processing is done via numpy arrays and might be similar enough to how you've used Matlab to meet your needs.
I've acquired image from FW camera using .NET and IronPython. On CPython I would checkout ctypes library, unless you find any library support for grabbing.
Foreword: This book is more for people who want a good hands on introduction into computer or machine vision, even though it covers what the original question asked.
[BOOK]: Programming Computer Vision with Python
At the moment you can download the final draft from the book's website for free as pdf:
http://programmingcomputervision.com/
From the introduction:
The idea behind this book is to give an easily accessible entry point to hands-on
computer vision with enough understanding of the underlying theory and algorithms
to be a foundation for students, researchers and enthusiasts.
What you need to know
Basic programming experience. You need to know how to use an editor and run
scripts, how to structure code as well as basic data types. Familiarity with Python or other scripting style languages like Ruby or Matlab will help.
Basic mathematics. To make full use of the examples it helps if you know about
matrices, vectors, matrix multiplication, the standard mathematical functions
and concepts like derivatives and gradients. Some of the more advanced mathe-
matical examples can be easily skipped.
What you will learn
Hands-on programming with images using Python.
Computer vision techniques behind a wide variety of real-world applications.
Many of the fundamental algorithms and how to implement and apply them your-
self.