instantiated the Doc2Vec model like this
mv_tags_doc = [TaggedDocument(words=word_tokenize_clean(D), tags=[str(i)]) for i, D in enumerate(mv_tags_corpus)]
max_epochs = 50
vector_size = 20
alpha = 0.025
model = Doc2Vec(size=vector_size,
alpha=alpha,
min_alpha=0.00025,
min_count=1,
dm=0)
model.build_vocab(mv_tags_doc)
but getting error
TypeError: __init__() got an unexpected keyword argument 'size'
In the latest version of the Gensim library that you appear to be using, the parameter size is now more consistently vector_size everywhere. See the 'migrating to Gensim 4.0' help page:
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4#1-size-ctr-parameter-is-now-consistently-vector_size-everywhere
Separately, if you're consulting any online example with that outdated parameter name, and which also suggested that unnecessary specification of min_alpha and alpha, there's a good chance the example you're following is a bad reference in other ways.
So, also take a look at this answer: My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong?
Related
I am trying to follow this example https://github.com/dsfsi/textaugment to upload a pre-trained Gensim model for data augmentation
import textaugment
import gensim
from textaugment import Word2vec
model = gensim.models.KeyedVectors.load_word2vec_format(r'\GoogleNews-vectors-negative300.bin', binary=True)
from textaugment import Word2vec
t = Word2vec(model)
t.augment('The stories are good')
but I get the following error:
TypeError: __init__() takes 1 positional argument but 2 were given
at line
t = Word2vec(model)
What am I doing wrong?
If you edit your question to include the full error message shown, including the traceback identifying exact files/lines-of-code leading to the error, it will often provide extra important info to know what's going wrong. (Whenever possible, show answerers all the text/output that you see, not just excerpts.)
But also, the examples at the page you link, https://github.com/dsfsi/textaugment, all show the model passed in as a named parameter (model=SOMETHING), not merely a positional parameter. You should try to do it the same way (and here I've changed the name of your local variabe to make it more distinct from the parameter-name, and removed the out-of-place r prefix):
my_model = gensim.models.KeyedVectors.load_word2vec_format('\GoogleNews-vectors-negative300.bin', binary=True)
t = Word2vec(model=my_model)
The error you got may be less confusing once you know, from experience or careful viewing of the traceback, that the call to the constructor Word2vec() actually calls another method, __init__(), behind the scenes. And that __init__() method receives both the newly-created instance, and whatever else you supplied, as 'positional' arguments. That is: 2 positional arguments, when it normally only expects 1 (the new instances), with any extra arguments as named arguments (model=SOMETHING style).
I am trying to learn catboost, and I see two confusing terms with CatBoostClassifier:
custom_loss and custom_metric.
I have browsed here which says: https://catboost.ai/docs/concepts/python-reference_parameters-list.html#python-reference_parameters-list
custom_metric:
Metric values to output during training. These functions are not optimized and are displayed for informational purposes only. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric).
but then what is custom_loss?
I see custom_loss defined in the R documentation: https://catboost.ai/docs/features/loss-functions-desc.html - but not in the python one.
yet. on the python tutorial, they have defined a custom_loss like so:
model = CatBoostClassifier(
custom_loss=['Accuracy'],
random_seed=42,
logging_level='Silent'
)
Am I missing something here? Infact, custom_loss does not seem to be defined as a property anywhere in the python docs: https://catboost.ai/docs/concepts/python-reference_parameters-list.html#python-reference_parameters-list
I infer the following from this link in the documentation.
I am almost certain that they refer to the same parameter, but custom_loss is the R name while custom_metric is for Python. Apparently they can be used interchangeably as long as they don't cause name collisions.
I am trying to add pretrained vectors to a training model using fasttext and getting the below error. Code is written in python with fasttext 0.8.3.
I thought with fasttext you could add pre trained vectors to a supervised training model?
TypeError: supervised() got an unexpected keyword argument 'pretrainedVectors'
pretrainedVectors = 'vectorFile.vec'
classifier = ft.supervised(model_data, model_name, pretrainedVectors=pretrainedVectors, label_prefix=label_prefix, lr=lr, epoch=epoch, minn=minn, maxn=maxn, dim=dim, bucket=bucket)
According to the documentation, the named parameter to the function is called pretrained_vectors not pretrainedVectors.
This naming convention is in line with PEP-8 style and so is normal for a Python API.
I am trying to reproduce the example for implementing the T- distributed Stochastic Neighbor Embedding or t-SNE algorithm from sklearn as described here.
On running the TSNE function I get this error:
TypeError: _gradient_descent() got an unexpected keyword argument 'n_iter_check'
Currently the t-SNE function does not have any n_iter_check argument so not sure what is the unexpected keyword argument.
The only online help I found was at this link
Anyone who has managed to work around this?
Look at your sklearn module and find related function in tsne for gradient_descent.
You will find that it has two extra parameters that you have to initialize in your new function. There are two missing parameters: n_iter_check and kwargs
def _gradient_descent(objective, p0, it, n_iter, objective_error=None, n_iter_check=1, n_iter_without_progress=50, momentum=0.5, learning_rate=1000.0, min_gain=0.01, min_grad_norm=1e-7, min_error_diff=1e-7, verbose=0, args=None, kwargs=None):
I have the following code (based on the samples here), but it is not working:
[...]
def my_analyzer(s):
return s.split()
my_vectorizer = CountVectorizer(analyzer=my_analyzer)
X_train = my_vectorizer.fit_transform(traindata)
ch2 = SelectKBest(chi2,k=1)
X_train = ch2.fit_transform(X_train,Y_train)
[...]
The following error is given when calling fit_transform:
AttributeError: 'function' object has no attribute 'analyze'
According to the documentation, CountVectorizer should be created like this: vectorizer = CountVectorizer(tokenizer=my_tokenizer). However, if I do that, I get the following error: "got an unexpected keyword argument 'tokenizer'".
My actual scikit-learn version is 0.10.
You're looking at the documentation for 0.11 (to be released soon), where the vectorizer has been overhauled. Check the documentation for 0.10, where there is no tokenizer argument and the analyzer should be an object implementing an analyze method:
class MyAnalyzer(object):
#staticmethod
def analyze(s):
return s.split()
v = CountVectorizer(analyzer=MyAnalyzer())
http://scikit-learn.org/dev is the documentation for the upcoming release (which may change at any time), while http://scikit-learn/stable has the documentation for the current stable version.