Using a custom metric with sklearn.neighbors.BallTree gives wrong input?

Using a custom metric with sklearn.neighbors.BallTree gives wrong input? - python

I'm trying to use a custom metric with sklearn.neighbors.BallTree, but when it calls my metric the inputs do not look correct. If I use scipy.spatial.distance.pdist with the same custom metric, it works as expected. However, if I try to instantiate a BallTree, an exception is raised when I try to reshape the input. If I look at the actual inputs, the shape and values do not look correct.
import numpy as np
import scipy.spatial.distance as spdist
import sklearn.neighbors.ball_tree as ball_tree
# custom metric
def minimum_average_direct_flip(x, y):
x = np.reshape(x, (-1, 3))
y = np.reshape(y, (-1, 3))
direct = np.mean(np.sqrt(np.sum(np.square(x - y), axis=1)))
flipped = np.mean(np.sqrt(np.sum(np.square(np.flipud(x) - y), axis=1)))
return min(direct, flipped)
# create an X to test
X = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9], [11, 12, 13, 14, 15, 16, 17, 18, 19], [21, 22, 23, 24, 25, 26, 27, 28, 29]])
# works as expected
distances = spdist.pdist(X, metric=minimum_average_direct_flip)
# outputs: [ 17.32050808 34.64101615 17.32050808]
print distances
# raises exception, inputs to minimum_average_direct_flip look wrong
# Traceback (most recent call last):
# File ".../test_script.py", line 23, in <module>
# ball_tree.BallTree(X, metric=minimum_average_direct_flip)
# File "sklearn/neighbors/binary_tree.pxi", line 1059, in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn\neighbors\ball_tree.c:8381)
# File "sklearn/neighbors/dist_metrics.pyx", line 262, in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric (sklearn\neighbors\dist_metrics.c:4032)
# File "sklearn/neighbors/dist_metrics.pyx", line 1091, in sklearn.neighbors.dist_metrics.PyFuncDistance.__init__ (sklearn\neighbors\dist_metrics.c:10586)
# File "C:/Users/danrs/Documents/neuro_atlas/test_script.py", line 8, in minimum_average_direct_flip
# x = np.reshape(x, (-1, 3))
# File "C:\Anaconda2\lib\site-packages\numpy\core\fromnumeric.py", line 225, in reshape
# return reshape(newshape, order=order)
# ValueError: total size of new array must be unchanged
ball_tree.BallTree(X, metric=minimum_average_direct_flip)
In the first call to minimum_average_direct_flip from the BallTree code, the inputs are:
x = [ 0.4238394 0.55205233 0.04699435 0.19542642 0.20331665 0.44594837 0.35634537 0.8200018 0.28598294 0.34236847]
y = [ 0.4238394 0.55205233 0.04699435 0.19542642 0.20331665 0.44594837 0.35634537 0.8200018 0.28598294 0.34236847]
These look completely incorrect. Am I doing something wrong in the way I am calling this or is this a bug in sklearn?

It seems that this is a known issue:
https://github.com/scikit-learn/scikit-learn/issues/6287
They do some kind of validation step that is problematic. As a workaround I guess I can add a check on the input size, but as the issue notes this is undesirable because I can't do actual validation checks myself.

Related

Can't get the correct parameter fit for a system of ODEs using Symfit and some experimental results

I want to fit the following model:
To some fluorescence measurements ([YFP] over time). Basically I can measure the change of YFP over time, but not the change of x. After navigating through different solutions in overflow (and trying various of the solutions proposed), I finished getting pretty close with Symfit.
However, when I try to fit the model to the experimental results, I get the following fit results:
Parameter Value Standard Deviation
TauOFF 4.425923e-02 2.173698e+00
TauON 9.687891e+00 1.945774e+02
TauONx 4.539607e-02 2.239210e+00
x_SS 7.968579e+00 2.726591e+02
Status message Maximum number of function evaluations has been exceeded.
Number of iterations 443
Objective <symfit.core.objectives.LeastSquares object at 0x000002640701C898>
Minimizer <symfit.core.minimizers.NelderMead object at 0x000002640701CEF0>
Goodness of fit qualifiers:
chi_squared 480161.4690600715
objective_value 240080.73453003576
r_squared 0.9677940481847731
I don't understand why X's prediction is so low, and almost a constant (almost because when I zoom in, it actually changes a little bit). Also, it says that "Maximum number of function evaluations has been exceeded". What am I doing wrong?? Am I am using the wrong minimizer? The wrong initial parameter estimated values?
Below is my code:
# %% Importing modules
import symfit
from symfit import parameters, variables, ODEModel, Fit, Parameter, D
from symfit.core.objectives import LogLikelihood
from symfit.core.minimizers import NelderMead
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sympy.solvers import ode
# %% Experimental data. Inputs is time, outputs is fluorescence measurements ([YFP])
inputs = np.array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66])
outputs = [73.64756293519015, 44.83500717360115, 66.59467242620596, 49.65998568360771, 46.484859217283514, 72.37530519707008, 74.47354904982025, 61.437468439656605, 80.15204098496119, 93.11740890688259, 74.73900664346728, 87.38835848475512, 94.96499329658872, 116.07910576096306, 126.95045168354777, 123.76237623762376, 147.73432650527624, 168.04489072652098, 183.3221551531411, 321.22186495176834, 356.38957816377166, 389.03440885819737, 321.22186495176834, 356.38957816377166, 389.03440885819737, 582.1501961516907, 607.139657798083, 651.6151143860851, 682.4329863103533, 716.422610612502, 749.3927432822223, 777.726234656009, 809.6079246328624, 847.2845376012857, 870.6370831711431, 895.512942218847, 914.3568311720239, 1002.7537605116663, 1019.3525890625908, 1028.7006485379452, 1073.162564875272, 1080.7277331278212, 1106.8392267287595, 1119.0425361584034, 1139.207233729366, 1145.790182270091, 1177.2867420349437, 1185.0114126299773, 1196.1818638533032, 1213.7383689107828, 1208.2922013820337, 1209.8943558642277, 1225.7463589296947, 1232.9657629893582, 1221.7722725107194, 1237.6858956142842, 1240.1111320399323, 1240.6384572177496, 1249.767333643555, 1247.0462864291337, 1259.6783113651027, 1258.188648128636, 1267.006026296567, 1272.2310666363428, 1260.6866757617101, 1266.8857660924748]
# %% Model Definitions
x, y, t = variables('x, y, t')
TauONx = Parameter('TauONx', 0.1)
TauON = Parameter('TauON', 0.180854297)
### For a moment, I thought of fixing TauOFF, obtaining this value from other experiments
TauOFF = Parameter('TauOFF', 10.53547354)
#TauOFF = 10.53547354
x_SS = Parameter('x_SS', 0.1)
#### All of this is using symfit package!
model_dict = {
D(x, t): TauONx*(x_SS - x),
D(y, t): TauON*x - TauOFF*y,
}
# %% Execute data
ode_model = ODEModel(model_dict, initial={t: 0.0, x: 54 * 10e-4, y: 54 * 10e-4})
fit = Fit(ode_model, t=inputs, x=None, y=outputs, minimizer=NelderMead)
#fit = Fit(ode_model, outputs, objective=LogLikelihood)
fit_result = fit.execute()
print(fit_result)
# %% Plot the data generated vs the output
tvec = np.linspace(0, 60, 1000)
X, Y = ode_model(t=tvec, **fit_result.params)
plt.plot(tvec, X, label='[x]')
plt.plot(tvec, Y, label='[y]')
plt.scatter(inputs, outputs)
plt.legend()
plt.show()

Trouble loading numpy array where they show pickle data error in python

I can successfully save & load small arrays using Numpy. Now I am saving the below array using np.save('array.npy')
[0, 100, 0, 5, 10, 15, 20, 25, 30, 25, 20, 15, 10, 5, 0]
when I try to load using np.load('array.npy'), it shows the below error:
raise ValueError("Cannot load file containing pickled data "
ValueError: Cannot load file containing pickled data when allow_pickle=False
If I try to solve it by adding allow_pickle=True then it shows the below error:
raise IOError(
OSError: Failed to interpret file 'array.npy' as a pickle
Its really a difficult situation. Please advise! :(
The code I am referring to is below:
def recv():
import socket
import time
import numpy as np
TCP_IP = "0.0.0.0"
BUFFER_SIZE = 20 # Normally 1024, but we want fast response
# receiving CAN frame payload
TCP_PORT = 5003
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.bind((TCP_IP,TCP_PORT))
s.listen(1)
conn,addr = s.accept()
while 1:
data1 = conn.recv(BUFFER_SIZE)
if not data1: break
datalist = list(data1)
print("CAN payload: %s" % datalist)
conn.send(data1) # echo
conn.close()
time.sleep(2)
# ------------------------------------------------------------
# assembling CAN frame
from can import Message
can_msg = Message(is_extended_id=bool(datalist[0]),arbitration_id=datalist[1],data=datalist[2:])
# printing all received payloads
print("CAN frame: ",can_msg)
print("Vehicle speed: ",datalist[2:])
# Saving all received payloads
np.save('array.npy',datalist[2:]) # save
def EPS_process():
# EPS process for Right turn, high speed
import numpy as np
print("Starting EPS process")
speed_array = np.load('array.npy') # load

Okay, I want to share what resolves the situation. So basically #hpaulj was suggesting that for multiple processes the read might start before the save is finished. So I added a 3-sec delay before speed_array = np.load('array.npy')
This gives enough time for the NumPy array to save before I can read again. Otherwise, it will read the previously saved array and hence the pickle error.

That array shouldn't give you any problems:
In [1]: arr = np.array([0, 100, 0, 5, 10, 15, 20, 25, 30, 25, 20, 15, 10, 5, 0])
In [2]: arr
Out[2]:
array([ 0, 100, 0, 5, 10, 15, 20, 25, 30, 25, 20, 15, 10,
5, 0])
In [3]: np.save('test.npy',arr)
In [4]: np.load('test.npy')
Out[4]:
array([ 0, 100, 0, 5, 10, 15, 20, 25, 30, 25, 20, 15, 10,
5, 0])

Relative difference in numpy.testing.assert_allclose

I could not understand how numpy.testing.assert_allclose method is calculating relative difference between two arrays. Is it calculating in percentage or without taking percentage? For example, If I have two arrays
import numpy as np
gfg1 = [1, 2, 3]
gfg2 = np.array([4, 8, 9])
np.testing.assert_allclose(gfg1, gfg2)
the following error occurs:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/anaconda3/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 1515, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/home/anaconda3/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 841, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Mismatch: 100%
Max absolute difference: 6
Max relative difference: 0.75
Max absolute difference is understood but what about relative difference?

If you go to the source code of assert_allclose you will see that it calls assert_array_compare. And inside the assert_array_compare you can see that the maximum relative difference is calculated as max(error[nonzero] / abs(y[nonzero])) where the error is abs(x - y).
So, in your case, for x = np.array([1, 2, 3]) and y = np.array([4, 8, 9]), you get
max_rel_error == max(|1-4|/|4|, |2-8|/|8|, |3-9|/|9|) == 0.75

TypeError: unhashable type: 'numpy.ndarray' when attempting to make plot using numpy

Using this piece of code i get the temperatures and date times then insert them into a matplotlib (plt) using numpy (np)
# get date times and the temperatures of a certain city, info pulled from request
raw_date_times = [item['dt_txt'] for item in s['list']]
temperature_kelvins = [item['main']['temp'] for item in s['list']]
# Apply calculation on each item to make celsius from kelvins
temperatures = [round(item - 273.15) for item in temperature_kelvins]
# Filter out today's date from list of dates into date_times
today = datetime.today().date()
date_times = []
for i in raw_date_times:
date = datetime.strptime(i, '%Y-%m-%d %H:%M:%S').date()
if date == today:
date_times.append(i)
# Convert the array with integers of temperatures to strings to make both of same dimension
for i in range(0, len(temperatures)):
temperatures[i] = str(temperatures[i])
# get len of date_times and convert it into an array (i.e 6 becomes [0,1,2,3,4,5])
date_times_len = len(date_times)
n = []
for i in range(0,date_times_len):
n.append(i)
print (n)
# Plot out map using values
x = np.array(n)
y = np.array([temperatures])
my_xticks = [date_times]
plt.xticks(x, my_xticks)
plt.plot(x, y)
plt.show()
# date_times example = ['2020-03-17 12:00:00', '2020-03-17 15:00:00', '2020-03-17 18:00:00', '2020-03-17 21:00:00']
# temperatures example (before string)= [29, 31, 30, 25, 23, 22, 20, 23, 30, 33, 31, 27, 24, 23, 21, 23, 31]
However i keep getting this error:
for val in OrderedDict.fromkeys(data):
TypeError: unhashable type: 'numpy.ndarray'
I researched a bit and found that it means that something went wrong with the shape i think.
Is it because they are strings? If so then could you suggest a way to convert my datetimes into integers?
Thank you!
Full traceback:
Traceback (most recent call last):
File "class-test.py", line 77, in <module>
weatherData('gurgaon')
File "class-test.py", line 55, in weatherData
plt.plot(x, y)
File "/Users/Ronnie/.local/share/virtualenvs/weather-d3bb5uZO/lib/python3.8/site-packages/matplotlib/pyplot.py", line 2761, in plot
return gca().plot(
File "/Users/Ronnie/.local/share/virtualenvs/weather-d3bb5uZO/lib/python3.8/site-packages/matplotlib/axes/_axes.py", line 1646, in plot
lines = [*self._get_lines(*args, data=data, **kwargs)]
File "/Users/Ronnie/.local/share/virtualenvs/weather-d3bb5uZO/lib/python3.8/site-packages/matplotlib/axes/_base.py", line 216, in __call__
yield from self._plot_args(this, kwargs)
File "/Users/Ronnie/.local/share/virtualenvs/weather-d3bb5uZO/lib/python3.8/site-packages/matplotlib/axes/_base.py", line 339, in _plot_args
self.axes.yaxis.update_units(y)
File "/Users/Ronnie/.local/share/virtualenvs/weather-d3bb5uZO/lib/python3.8/site-packages/matplotlib/axis.py", line 1516, in update_units
default = self.converter.default_units(data, self)
File "/Users/Ronnie/.local/share/virtualenvs/weather-d3bb5uZO/lib/python3.8/site-packages/matplotlib/category.py", line 107, in default_units
axis.set_units(UnitData(data))
File "/Users/Ronnie/.local/share/virtualenvs/weather-d3bb5uZO/lib/python3.8/site-packages/matplotlib/category.py", line 175, in __init__
self.update(data)
File "/Users/Ronnie/.local/share/virtualenvs/weather-d3bb5uZO/lib/python3.8/site-packages/matplotlib/category.py", line 210, in update
for val in OrderedDict.fromkeys(data):
TypeError: unhashable type: 'numpy.ndarray'
(weather) bash-3.2$

I think this replicates a portion of your plot:
In [347]: date_times = ['2020-03-17 12:00:00', '2020-03-17 15:00:00', '2020-03-17 18:00:00', '2020-03-17 21:00:00']
...: temperatures = [29, 31, 30, 25, 23, 22, 20, 23, 30, 33, 31, 27, 24, 23, 21, 23, 31]
In [348]: len(date_times)
Out[348]: 4
In [349]: len(temperatures)
Out[349]: 17
In [350]: x = np.arange(len(date_times))
In [351]: y = np.array(temperatures[:4])
In [359]: plt.xticks(x, date_times);
In [360]: plt.plot(x,y);
My arange is a shorter and faster way on constructing x than your:
In [361]: n = []
...: for i in range(0,4):
...: n.append(i)
...: np.array(n)
Note that I use date_times, not [date_times]; the later adds an extra layer of list. I can't reproduce your error, but the unnecessary [] might be causing problems. The ticks and labels parameters to xticks should have the same length.
The error looks like it occurs while creating the axes (xticks). It's using an array (x?) as a dictionary key. The error occurs deep in the plt code, so it is hard to trace it back to your inputs. So it's easier to just examine the inputs (x,y,date_times), and make sure they look reasonable (expected data and matching lengths).
Same error here:
TypeError: unhashable type: 'numpy.ndarray' when trying to plot a DataFrame
though off hand I don't see what similar or different.
===
This plots ok:
In [364]: plt.plot(date_times,y);
but this produces the error:
In [365]: plt.plot([date_times],y);
(as with your xticks, this has the unnecessary brackets).

Maybe this can help
#first import
from datetime import datetime
a = datetime.now()
#converting into int
a = int(a.strftime('%Y-%m-%d %H:%M:%S')) #using strtime to convert datetime into int

yes you can use string to make graph but your matplot version should be >2.1 or 2.2
import matplotlib.pyplot as plt
x = ["ABCD", "EEEEE", "LLLL"]
y = [5,2,3]
plt.plot(x, y)
plt.show()

python-Cannot call a function in script but can in the interactive mode

It's a simple task about kNN, and I'm a newbee of pyhton.
# coding=utf-8
from numpy import *
import operator
def createDataSet():
group = array([[112, 110], [128, 162], [83, 206], [142, 267], [188, 184], [218, 268], [234, 108], [256, 146], [
333, 177], [350, 86], [364, 237], [378, 117], [409, 147], [485, 130], [326, 344], [387, 326], [378, 435], [434, 375]])
labels = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3]
return group, labels
def classify0(inX, dataSet, labels, k):
dataSetSize = dataSet.shape[0]
tempSet = array(tile(inX, (dataSetSize, 1)))
diffMat = tempSet - dataSet
sqDiffMat = diffMat**2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances**0.5
sortedDistIndices = distances.argsort()
classCount = {}
for i in range(k):
voteLabel = labels[sortedDistIndices[i]]
classCount[voteLabel] = classCount.get(voteLabel, 0) + 1
sortedClassCount = sorted(classCount.iteritems(),
key=operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0]
# TRY1
# def with_intput():
# sample = array(raw_input('Enter you data:'))
# group, labels = createDataSet()
# sampleClass = classify0(sample, group, labels, 3)
# print sampleClass
# with_intput()
# TRY1
# TRY2
# sample = array(raw_input('Enter your sample data:'))
# group, labels = createDataSet()
# sampleClass = classify0(sample, group, labels, 3)
# print sampleClass
# TRY2
There is something really strange. I created a function name classify0(), but if i call it while writing the codes(uncomment the #TRY1),or use it to make assingment(if uncomment the #TRT2), it will return error when I run this file.
Appears likes:
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('S11') dtype('S11') dtype('S11')
Here is the traceback of TRY1:
Traceback (most recent call last):
File "C:\Users\zhongzheng\Desktop\ML_Code\temp2.py", line 39, in <module>
with_intput()
File "C:\Users\zhongzheng\Desktop\ML_Code\temp2.py", line 36, in with_intput
sampleClass = classify0(sample, group, labels, 3)
File "C:\Users\zhongzheng\Desktop\ML_Code\temp2.py", line 17, in classify0
diffMat = tempSet - dataSet
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('S11') dtype('S11') dtype('S11')
And the traceback of TRY2:
Traceback (most recent call last):
File "C:\Users\zhongzheng\Desktop\ML_Code\temp2.py", line 46, in <module>
sampleClass = classify0(sample, group, labels, 3)
File "C:\Users\zhongzheng\Desktop\ML_Code\temp2.py", line 17, in classify0
diffMat = tempSet - dataSet
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('S11') dtype('S11') dtype('S11')
But if I save the file without uncommenting either TRT1 or TRY2, save and run the file with only two functions in it, then enter these commands line by line in interactive mode in cmd or ipython:
>>>group,labels = createDataSet()
>>>sampleClass = classify0(array([111,111]), group, labels, 3)
>>>print sampleClass
It will work just fine.
Cannot figure out why.
One more question, why my sublime3(subliemlinter, pep8linter installed) keeps warnning from numpy import * or import numpy or import numpy as np is wrong.
Thanks for your patience.

Your raw_input is not taken what you expect as input for the classify0 function.
sample = array(raw_input('Enter you data:'))
This would give something like ["111 111"]
sample = [int(x) for x in raw_input().split()]
This would give [111,111]
You could also change the delimiter to split on, i.e. use a , if input is comma separated

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using a custom metric with sklearn.neighbors.BallTree gives wrong input? - python

Related

Can't get the correct parameter fit for a system of ODEs using Symfit and some experimental results

Trouble loading numpy array where they show pickle data error in python

Relative difference in numpy.testing.assert_allclose

TypeError: unhashable type: 'numpy.ndarray' when attempting to make plot using numpy

python-Cannot call a function in script but can in the interactive mode

Categories

Resources