Related
I have this dataframe with euclidean distances:
import pandas as pd
df = pd.DataFrame({
'O1': [0.0, 1.7, 1.4, 0.4, 2.2, 3.7, 5.2, 0.2, 4.3, 6.8, 6.0],
'O2': [1.7, 0.0, 1.0, 2.0, 1.3, 2.6, 4.5, 1.8, 3.2, 5.9, 5.2],
'O3': [1.4, 1.0, 0.0, 1.7, 0.9, 2.4, 4.1, 1.5, 3.0, 5.5, 4.8],
'O4': [0.4, 2.0, 1.7, 0.0, 2.6, 4.0, 5.5, 0.3, 4.6, 7.1, 6.3],
'O5': [2.2, 1.3, 0.9, 2.6, 0.0, 1.7, 3.4, 2.4, 2.1, 4.8, 4.1],
'O6': [3.7, 2.6, 2.4, 4.0, 1.7, 0.0, 2.0, 3.8, 1.6, 3.3, 2.7],
'O7': [5.2, 4.5, 4.1, 5.5, 3.4, 2.0, 0.0, 5.4, 2.5, 1.6, 0.9],
'O8': [0.2, 1.8, 1.5, 0.3, 2.4, 3.8, 5.4, 0.0, 4.4, 6.9, 6.1],
'O9': [4.3, 3.2, 3.0, 4.6, 2.1, 1.6, 2.5, 4.4, 0.0, 3.4, 2.9],
'O10':[6.8, 5.9, 5.5, 7.1, 4.8, 3.3, 1.6, 6.9, 3.4, 0.0, 1.0],
'O11': [6.0, 5.2, 4.8, 6.3, 4.1, 2.7, 0.9, 6.1, 2.9, 1.0, 0.0]
})
Whereas O1, O2, O3, O4, O5, O6, O7, O8 is class 0 and O9, O10 and O11 is class 1.
I want to change the dataframe above to a dataframe with columns: x, y and class. So I am able to split into train and test sets to then fit a simple classifier.
I am confused how I can achieve dataframe described above. How is this performed in python? Is it possible?
Steps afterwards when dataframe is achieved:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
import seaborn as sns
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)
sns.scatterplot(x = X_test['x'], y = X_test['y'], hue = y_pred)
You mainly want to include the point name as an additional column in the dataframe. Here I am using point indices as x and y:
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
1: [0.0, 1.7, 1.4, 0.4, 2.2, 3.7, 5.2, 0.2, 4.3, 6.8, 6.0],
2: [1.7, 0.0, 1.0, 2.0, 1.3, 2.6, 4.5, 1.8, 3.2, 5.9, 5.2],
3: [1.4, 1.0, 0.0, 1.7, 0.9, 2.4, 4.1, 1.5, 3.0, 5.5, 4.8],
4: [0.4, 2.0, 1.7, 0.0, 2.6, 4.0, 5.5, 0.3, 4.6, 7.1, 6.3],
5: [2.2, 1.3, 0.9, 2.6, 0.0, 1.7, 3.4, 2.4, 2.1, 4.8, 4.1],
6: [3.7, 2.6, 2.4, 4.0, 1.7, 0.0, 2.0, 3.8, 1.6, 3.3, 2.7],
7: [5.2, 4.5, 4.1, 5.5, 3.4, 2.0, 0.0, 5.4, 2.5, 1.6, 0.9],
8: [0.2, 1.8, 1.5, 0.3, 2.4, 3.8, 5.4, 0.0, 4.4, 6.9, 6.1],
9: [4.3, 3.2, 3.0, 4.6, 2.1, 1.6, 2.5, 4.4, 0.0, 3.4, 2.9],
10: [6.8, 5.9, 5.5, 7.1, 4.8, 3.3, 1.6, 6.9, 3.4, 0.0, 1.0],
11: [6.0, 5.2, 4.8, 6.3, 4.1, 2.7, 0.9, 6.1, 2.9, 1.0, 0.0]
})
That allows you to reshape the dataframe to your desired form:
model_df = df.melt(id_vars='x', var_name='y', value_name='distance')
Finally, define a class e.g. using:
def assign_class(x):
return 0 if x <= 8 else 1
model_df["class_x"] = model_df["x"].apply(assign_class),
model_df["class_y"] = model_df["y"].apply(assign_class)
This will give you a dataframe that you can pass to the model. Note that the input matrix is symmetric, so you may want to only keep unique records (drop [y, x] if you already have [x, y]).
for some reason when i call the function "novalinhasubtraida",
it keeps its information and affects it`s next calls
matriz = [[1.0, 7.0, 9.0, 5.0],
[1.125, 1.0, 0.25, 0.875],
[0.4, 0.6, 1.0, 0.2]]
result = list()
def subrairlinhas(matriz,linhasubtraida,linhasubtraiadora):
result.clear()
for item1, item2 in zip(matriz[linhasubtraida], matriz[linhasubtraiadora]):
item = item1 - item2*(matriz[linhasubtraida][linhasubtraiadora])
#print(f'item:{item}')
result.append(item)
return result
#novalinhasubtraida contains subtrairlinhas
def novalinhasubtraida(matriz,linhatransformada,linhado1):
result = subrairlinhas(matriz,linhatransformada,linhado1)
#print(result)
matriz.remove(matriz[linhatransformada])
matriz.insert(linhatransformada,result)
return matriz
for example:
INPUT:
novalinhasubtraida(matriz,1,0)
print(matriz)
novalinhasubtraida(matriz,2,0)
print(matriz)
Output:
[[1.0, 7.0, 9.0, 5.0], [0.0, -6.875, -9.875, -4.75], [0.4, 0.6, 1.0, 0.2]]
[[1.0, 7.0, 9.0, 5.0], [0.0, -2.2, -2.6, -1.8], [0.0, -2.2, -2.6, -1.8]]
when instead a insert this:
INPUT:
novalinhasubtraida(matriz,2,0)
print(matriz)
novalinhasubtraida(matriz,1,0)
print(matriz)
OUTPUT:
[[1.0, 7.0, 9.0, 5.0], [1.125, 1.0, 0.25, 0.875], [0.0, -2.2, -2.6, -1.8]]
[[1.0, 7.0, 9.0, 5.0], [0.0, -6.875, -9.875, -4.75], [0.0, -6.875, -9.875, -4.75]]
So, i have the following data below and i want to loop through the dataframe and perform some functions and at the end save the results from the function in a list. I am have trouble creating a list. i only get a single value in the list and not the two means which i intend to get. Anybody with a more effective way to solve this problem please share.
dict = {'PassengerId' : [0.0, 0.001, 0.002, 0.003, 0.004, 0.006, 0.007, 0.008, 0.009, 0.01],
'Survived' : [0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0],
'Pclass' : [1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.5],
'Age' : [0.271, 0.472, 0.321, 0.435, 0.435, np.nan, 0.673, 0.02, 0.334, 0.171],
'SibSp' : [0.125, 0.125, 0.0, 0.125, 0.0, 0.0, 0.0, 0.375, 0.0, 0.125],
'Parch' : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.167, 0.333, 0.0],
'Fare' : [0.014, 0.139, 0.015, 0.104, 0.016, 0.017, 0.101, 0.041, 0.022, 0.059]}
import pandas as pd
dicts = pd.DataFrame(dicts, columns = dicts.keys())
def Mean(self):
list_mean = []
list_all = []
for i, row in dicts.iterrows():
if (row['Age'] > 0.2) & (row['Fare'] < 0.1):
list_all.append(row['PassengerId'])
elif (row['Age'] > 0.2) & (row['Fare'] > 0.1):
list_all.clear()
list_all.append(row['PassengerId'])
return list_mean.append(np.mean(list_all))
Mean()
Help Please!!
Some of changes you have to made in you solution to resolve this issue. And for vectorized answer checkout my Code section.
1.
Return statement return list_mean should placed in function block not in if-block
Change:
. . .
if (row['Age'] > self.age) & (row['Fare'] < self.fare):
list_mean.append(row['PassengerId'])
return list_mean
. . .
To:
. . .
list_mean = []
for i, row in dicts.iterrows():
if (row['Age'] > self.age) & (row['Fare'] < self.fare):
list_mean.append(row['PassengerId'])
return list_mean
. . .
CODE :(Vectorized-Version-Solution) No need of defining explicit class to perform this action
import numpy as np
dict_ = {
'PassengerId':
[0.0, 0.001, 0.002, 0.003, 0.004, 0.006, 0.007, 0.008, 0.009, 0.01],
'Survived': [0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0],
'Pclass': [1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.5],
'Age':
[0.271, 0.472, 0.321, 0.435, 0.435, np.nan, 0.673, 0.02, 0.334, 0.171],
'SibSp': [0.125, 0.125, 0.0, 0.125, 0.0, 0.0, 0.0, 0.375, 0.0, 0.125],
'Parch': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.167, 0.333, 0.0],
'Fare':
[0.014, 0.139, 0.015, 0.104, 0.016, 0.017, 0.101, 0.041, 0.022, 0.059]
}
import pandas as pd
dicts = pd.DataFrame(dict_, columns=dict_.keys())
l1 = dicts['PassengerId'][np.logical_and(dicts['Age'] > 0.2, dicts['Fare'] < 0.1)]
l2 = dicts['PassengerId'][np.logical_and(dicts['Age'] > 0.2, dicts['Fare'] > 0.1)]
print( (sum(list(l1))/len(l1), sum(list(l2))/len(l2)) )
OUTPUT :
(0.00375, 0.0036666666666666666)
import pandas as pd
import numpy as np
dict = {'PassengerId' : [0.0, 0.001, 0.002, 0.003, 0.004, 0.006, 0.007, 0.008, 0.009, 0.01],
'Survived' : [0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0],
'Pclass' : [1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.5],
'Age' : [0.271, 0.472, 0.321, 0.435, 0.435, np.nan, 0.673, 0.02, 0.334, 0.171],
'SibSp' : [0.125, 0.125, 0.0, 0.125, 0.0, 0.0, 0.0, 0.375, 0.0, 0.125],
'Parch' : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.167, 0.333, 0.0],
'Fare' : [0.014, 0.139, 0.015, 0.104, 0.016, 0.017, 0.101, 0.041, 0.022, 0.059]}
df = pd.DataFrame(dict, columns = dict.keys())
def calculate_mean():
l1, l2 = [], []
for i, row in df.iterrows():
if row['Age'] > 0.2 and row['Fare'] < 0.1:
l1.append(row['PassengerId'])
elif row['Age'] > 0.2 and row['Fare'] > 0.1:
l2.append(row['PassengerId'])
return np.mean(l1), np.mean(l2)
print(calculate_mean()) # (0.00375, 0.0036666666666666666)
Suppose that i'm plotting the following numpy array of data on a simple matplotlib heatmap using imshow; there are some cases where the value will be 0.0. Is there any way to set a specif color for the cell where that value will be shown? For example, when the value is 0, the color for that cell must be black
a = np.array([[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0],
[2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0],
[1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0],
[0.6, 0.0, 0.3, 0.0, 0, 0.0, 0.0],
[0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0],
[0, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1],
[0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]])
Map = ax.imshow(a, interpolation='none', cmap='coolwarm')
Maybe not the perfect solution, but definitely a simple one. If it is only for the purpose of creating an image you can modify the original data (or a copy of it) and replace 0.0 with NaN. Then you can use set_bad to get the desired output.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
a = np.array([[0.8, 2.4, -2.5, 3.9, 0.0, 4.0, 0.0],
[2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0],
[-1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0],
[0.6, 0.0, 0.3, 0.0, 0, 0.0, 0.0],
[0.7, 1.7, -0.6, 2.6, 2.2, 6.2, 0.0],
[0, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1],
[0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]])
c_map = cm.get_cmap('rainbow')
c_map.set_bad('k')
b = a.copy()
b[b==0] = np.nan
fig = plt.imshow(b, interpolation='none', cmap=c_map)
plt.colorbar(fig)
I have a list of 14 numbers in a list
list=[0.0, 2.0, 2.0, 2.0, 2.0, 1.5, 1.0, 1.0, 1.0, 1.0, 0.5, 1.5, 1.0, 2.0]
I have to sum the best of 12 among the list
New to code have no idea.
l=[0.0, 2.0, 2.0, 2.0, 2.0, 1.5, 1.0, 1.0, 1.0, 1.0, 0.5, 1.5, 1.0, 2.0]
l.sort()
l=l[0:12]
print(l)
total=0
for element in range(0,len(l)):
total=total+l[element]
print(total)
Hope this help! but try to learn this language by yourself, you will surely be enjoying python:)