scikitlearn adapt bigram to svm

scikitlearn adapt bigram to svm - python

I have problem. here's my code.
http://colorscripter.com/s/9vc2ryj
And I mistaked. evaluate_classifier(bigram_word_feats) is what I want.
I'm trying to text mining by SVM.
The feature vectors are bigram model.
But I got a problem:
Traceback (most recent call last):
File "C:/Users/LG/Desktop/untitled1/TEST.py", line 184, in <module>
evaluate_classifier(bigram_word_feats)
File "C:/Users/LG/Desktop/untitled1/TEST.py", line 90, in evaluate_classifier
classifier.train(trainfeats)
File "C:\Users\LG\Anaconda3\lib\site-packages\nltk\classify\scikitlearn.py", line 115, in train
X = self._vectorizer.fit_transform(X)
File "C:\Users\LG\Anaconda3\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 226, in fit_transform
return self._transform(X, fitting=True)
File "C:\Users\LG\Anaconda3\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 190, in _transform
feature_names.sort()
TypeError: unorderable types: tuple() < str()
Why this happen and how can I solve?
and what's the process of nltk classifier?
give it to my feature word and period? Then it just generate svm model?
Oh and I'm using python 3. Do I need to use python 2?

New answer:
I think the problem is that nltk expects a dict indexed by strings instead of tuples. Can you try to replace the return statement from:
return dict([(ngram, True) for ngram in itertools.chain(words, bigrams)])
to the following:
return dict([('|'.join (ngram), True) for ngram in itertools.chain(words, bigrams)])
Old answer:
`train` methods of Scikit-learn predictors expect two inputs: features and targets. Something like the following (not tested):
negfeats = [featx(f) for f in word_split(negdata)]posfeats = [featx(f) for f in word_split(posdata)]...trainlabels = [-1,] * negcutoff + [+1,] * poscutoffclassifier.train(trainfeats, trainlabels)
In defining trainlabels, I followed your style of using arithmetic operators on lists but I wouldn't do it in my code as it makes it less readable.

Related

Uber Ludwig: Issue Making Predictions

I decided to mess with Uber Ludwig again. I wanted to make a simple demo using the python API that learns to add 1 to the input number. I have successfully produced a model, but the issue arises when predicting. I am running on the newest release from github on PopOS 19.10 on CPU TensorFlow.
Thank you for any help.
Edit: I have reproduced the issue on windows as well.
The error is as follows
Traceback (most recent call last):
File "predict.py", line 3, in <module>
x = model.predict({"numberIn":[1]}, return_type='dict')
File "/home/user/.local/lib/python3.7/site-packages/ludwig/api.py", line 914, in predict
gpu_fraction=gpu_fraction,
File "/home/user/.local/lib/python3.7/site-packages/ludwig/api.py", line 772, in _predict
self.model_definition['preprocessing']
File "/home/user/.local/lib/python3.7/site-packages/ludwig/data/preprocessing.py", line 159, in build_data
preprocessing_parameters
File "/home/user/.local/lib/python3.7/site-packages/ludwig/data/preprocessing.py", line 180, in handle_missing_values
dataset_df[feature['name']] = dataset_df[feature['name']].fillna(
AttributeError: 'list' object has no attribute 'fillna'
Here is my prediction script
from ludwig.api import LudwigModel
model = LudwigModel.load("/home/user/Documents/ludwig-test/plus1/results/api_experiment_run_0/model")
x = model.predict({"numberIn":[1]}, return_type='dict')
#x = model.predict({"numberIn":[1]}, return_type=<class 'dict'>) I tried this with no success
print(x)
Here is the contents of my training script.
mydata = {"numberIn":[], "value":[]}
for x in range(10000):
mydata["numberIn"].append(x)
mydata["value"].append(x + 1)
from ludwig.api import LudwigModel
print("Imported Ludwig")
modelobject = LudwigModel(model_definition_file="modeldef.yaml")
stats = modelobject.train(data_dict=mydata)
modelobject.close()
modeldef.yaml
input_features:
-
name: numberIn
type: numerical
output_features:
-
name: value
type: numerical

Solution: Input argument of predict function is not positional and data_dict needs to be specified in this case.
x = modelobject.predict(data_dict=mydictionary)

Code error with variable storage from selected feature

I'm using QGIS 3.6 with the built in Python text editor. I have found a snippet of code that I'm trying to make work, and I've modified it to the best of my abilities to fit my specific needs. I have a point layer called "Regulators" and it contains a field called "Town". The idea of the code is that when I select a single feature on the "Regulators" layer, the code will look at the "Town" field, and select all other features that match that field's value. I select a feature, run this code:
layer = iface.activeLayer()
field_name = 'Town'
values = []
for feat in layer.selectedFeatures():
tmp_value = feat[field_name]
if tmp_value not in values:
values.append(str(tmp_value))
strings = []
for val in values:
if val != values[-1]:
string = field_name + ' = ' + val + ' or '
strings.append(string)
else:
last_string = field_name + ' = ' + val
strings.append(last_string)
query = ''.join(strings)
request = QgsFeatureRequest().setFlags(QgsFeatureRequest.NoGeometry)
request.setSubsetOfAttributes([]).setFilterExpression(query)
selection = layer.getFeatures(request)
layer.setSelectedFeatures([k.id() for k in selection])
and I get this error:
Traceback (most recent call last):
File "C:\PROGRA~1\QGIS3~1.6\apps\Python37\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "<string>", line 24, in <module>
AttributeError: 'QgsVectorLayer' object has no attribute 'setSelectedFeatures'
I'm very much new to python, and see nothing wrong with line 1 or 5. I have found some other codes that do what I'm attempting here, but they also return errors, so I'm wondering if there is some method or function that has changed since this code was posted. The integrated compiler with QGIS is also much different than I am used to.
EDIT: I've updated the code and the error message based on feedback I've received on the post so far. I assume that QgsVectorLayer is the generic term for a vector layer being referenced, in this case the "Regulators" layer. but I don't understand why it's trying to use the setSelectedFeatures method as an attribute.

My created function won't accept an array as one of the arguments

I've written a function that takes two arguments, one for no. dimensions and another for no. simulations. The function does exactly what is needed (calculating the volume of a unit hypersphere), however when I wish to plot the function over a range of dimensions it returns an error: ''list' object cannot be interpreted as an integer'.
My function is the following,
def hvolume(ndim, nsim):
ob = [np.random.uniform(0.0,1.0,(nsim, ndim))]
ob = np.concatenate(ob)
i = 0
res = []
while i <= nsim-1:
arr = np.sqrt(np.sum(np.square(ob[i])))
i += 1
res.append(arr)
N = nsim
n = ndim
M = len([i for i in res if i <= 1])
return ((2**n)*M/N)
The error traceback is:
Traceback (most recent call last):
File "<ipython-input-192-4c4a2c778637>", line 1, in <module>
runfile('H:/Documents/Python Scripts/Q4ATTEMPT.py', wdir='H:/Documents/Python Scripts')
File "C:\Users\u1708511\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "C:\Users\u1708511\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "H:/Documents/Python Scripts/Q4ATTEMPT.py", line 20, in <module>
print(hvolume(d, 2))
File "H:/Documents/Python Scripts/Q4ATTEMPT.py", line 4, in hvolume
ob = [np.random.uniform(0.0,1.0,(nsim, ndim))]
File "mtrand.pyx", line 1307, in mtrand.RandomState.uniform
File "mtrand.pyx", line 242, in mtrand.cont2_array_sc
TypeError: 'list' object cannot be interpreted as an integer
I really have no idea where to go from here, and have searched thoroughly online for how to resolve this. Unfortunately I'm a beginner with this!
Any help is appreciated.

If you simply try your first line in the function;
ob = [np.random.uniform(0.0,1.0,(nsim, ndim))]
with a list as one of the variables like so;
[np.random.uniform(0.0,1.0,([1,2], 2))]
you will get the error:
TypeError: 'list' object cannot be interpreted as an integer
This is because the uniform command it looking for an integer, not a list. You will need to make a for loop if you would like to handle lists.

One pattern I use for situations like this would be to begin the function with a block to handle the case of if they're iterators. Something like this for example.
from collections import Iterator
def hvolume(ndim, nsim):
outputs = []
if isinstance(ndim, Iterator):
for ndim_arg in ndim:
outputs.append(hvolume(ndim_arg, nsim))
if isinstance(nsim, Iterator):
for nsim_arg in nsim:
outputs.append(hvolume(ndim, nsim_arg))
if len(outputs) == 0: # neither above is an Iterator
# ... the rest of the function but it appends to outputs
return outputs

Check the input parameters of your method "hvolume", it seems that you give a list either nsim or ndim, which should be both integer values. That makes the uniform throw a TypeError Exception.

Theano scan function and argument number lstm

I am new to Neural Networks and I am trying to modify this code RNN-Classifier and instead of using the GRU_step, I would rather use an LSTM.
I added one extra parameter c_prev
def lstm_step(x, h_prev, c_prev, W_xz, W_hz, W_xm, W_hm):
and after applying all the LSTM equations I am returning them both (h and c)
My hidden vector looks like that:
hidden_vector, _ = theano.scan(
lstm_step,
sequences=input_vectors,
outputs_info=initial_hidden_vector,
non_sequences=[W_xz, W_hz, W_xm, W_hm]
)
hidden_vector = hidden_vector[-1]
I get an exception like this and don't understand why it does not see the c_prev as an existant parameter (or how/where can I feed it with some values, so that it's not empty?)
python rnnclassifier.py data/sentiment.train.txt data/sentiment.test.txt
Traceback (most recent call last):
File "rnnclassifier.py", line 167, in <module>
rnn_classifier = RnnClassifier(word2id_len, n_classes)
File "rnnclassifier.py", line 110, in __init__
non_sequences=[W_xz, W_hz, W_xm, W_hm]
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py",
line 773, in scan condition, outputs, updates =
scan_utils.get_updates_and_outputs(fn(*args))
TypeError: lstm_step() takes exactly 7 arguments (6 given)
I am new to this topic and would appreciate any help or advice! Thank you.

float division by zero error related to ngram and nltk

My task is to use 10-fold cross validation method with uni, bi and trigrams in a corpus and compare their accuracy. However, I am stuck with a float division error. All of these codes are given by the question setter except for the loop, so the error is probably there. Here, we are only using the first 1000 sentences to test the program, and that line will be removed once I know the program runs.
import codecs
mypath = "/Users/myname/Desktop/"
corpusFile = codecs.open(mypath + "estonianSample.txt",mode="r",encoding="latin-1")
sentences = [[tuple(w.split("/")) for w in line[:-1].split()] for line in corpusFile.readlines()]
corpusFile.close()
from math import ceil
N=len(sentences)
chunkSize = int(ceil(N/10.0))
sentences = sentences[:1000]
chunks=[sentences[i:i+chunkSize] for i in range(0, N, chunkSize)]
for i in range(10):
training = reduce(lambda x,y:x+y,[chunks[j] for j in range(10) if j!=i])
testing = chunks[i]
from nltk import UnigramTagger,BigramTagger,TrigramTagger
t1 = UnigramTagger(training)
t2 = BigramTagger(training,backoff=t1)
t3 = TrigramTagger(training,backoff=t2)
t3.evaluate(testing)
This is what the error says:
runfile('/Users/myname/pythonhw3.py', wdir='/Users/myname')
Traceback (most recent call last):
File "<ipython-input-1-921164840ebd>", line 1, in <module>
runfile('/Users/myname/pythonhw3.py', wdir='/Users/myname')
File "/Users/myname/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "/Users/myname/pythonhw3.py", line 34, in <module>
t3.evaluate(testing)
File "/Users/myname/anaconda/lib/python2.7/site-packages/nltk/tag/api.py", line 67, in evaluate
return accuracy(gold_tokens, test_tokens)
File "/Users/myname/anaconda/lib/python2.7/site-packages/nltk/metrics/scores.py", line 40, in accuracy
return float(sum(x == y for x, y in izip(reference, test))) / len(test)
ZeroDivisionError: float division by zero

Your error is occurring due to the return value being close to negative infinity.
The line specifically causing the issue is,
t3.evaluate(testing)
What you can do instead is,
try:
t3.evaluate(testing)
except ZeroDivisonError:
# Do whatever you want it to do
print(0)
It works on my end. Try it out!
The answer is four years later, but hopefully, a fellow net citizen can find this helpful.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

scikitlearn adapt bigram to svm - python

Related

Uber Ludwig: Issue Making Predictions

Code error with variable storage from selected feature

My created function won't accept an array as one of the arguments

Theano scan function and argument number lstm

float division by zero error related to ngram and nltk

Categories

Resources