Handling table with two classes to fit a simple classifier - python

I have this dataframe with euclidean distances:
import pandas as pd
df = pd.DataFrame({
'O1': [0.0, 1.7, 1.4, 0.4, 2.2, 3.7, 5.2, 0.2, 4.3, 6.8, 6.0],
'O2': [1.7, 0.0, 1.0, 2.0, 1.3, 2.6, 4.5, 1.8, 3.2, 5.9, 5.2],
'O3': [1.4, 1.0, 0.0, 1.7, 0.9, 2.4, 4.1, 1.5, 3.0, 5.5, 4.8],
'O4': [0.4, 2.0, 1.7, 0.0, 2.6, 4.0, 5.5, 0.3, 4.6, 7.1, 6.3],
'O5': [2.2, 1.3, 0.9, 2.6, 0.0, 1.7, 3.4, 2.4, 2.1, 4.8, 4.1],
'O6': [3.7, 2.6, 2.4, 4.0, 1.7, 0.0, 2.0, 3.8, 1.6, 3.3, 2.7],
'O7': [5.2, 4.5, 4.1, 5.5, 3.4, 2.0, 0.0, 5.4, 2.5, 1.6, 0.9],
'O8': [0.2, 1.8, 1.5, 0.3, 2.4, 3.8, 5.4, 0.0, 4.4, 6.9, 6.1],
'O9': [4.3, 3.2, 3.0, 4.6, 2.1, 1.6, 2.5, 4.4, 0.0, 3.4, 2.9],
'O10':[6.8, 5.9, 5.5, 7.1, 4.8, 3.3, 1.6, 6.9, 3.4, 0.0, 1.0],
'O11': [6.0, 5.2, 4.8, 6.3, 4.1, 2.7, 0.9, 6.1, 2.9, 1.0, 0.0]
})
Whereas O1, O2, O3, O4, O5, O6, O7, O8 is class 0 and O9, O10 and O11 is class 1.
I want to change the dataframe above to a dataframe with columns: x, y and class. So I am able to split into train and test sets to then fit a simple classifier.
I am confused how I can achieve dataframe described above. How is this performed in python? Is it possible?
Steps afterwards when dataframe is achieved:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
import seaborn as sns
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)
sns.scatterplot(x = X_test['x'], y = X_test['y'], hue = y_pred)

You mainly want to include the point name as an additional column in the dataframe. Here I am using point indices as x and y:
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
1: [0.0, 1.7, 1.4, 0.4, 2.2, 3.7, 5.2, 0.2, 4.3, 6.8, 6.0],
2: [1.7, 0.0, 1.0, 2.0, 1.3, 2.6, 4.5, 1.8, 3.2, 5.9, 5.2],
3: [1.4, 1.0, 0.0, 1.7, 0.9, 2.4, 4.1, 1.5, 3.0, 5.5, 4.8],
4: [0.4, 2.0, 1.7, 0.0, 2.6, 4.0, 5.5, 0.3, 4.6, 7.1, 6.3],
5: [2.2, 1.3, 0.9, 2.6, 0.0, 1.7, 3.4, 2.4, 2.1, 4.8, 4.1],
6: [3.7, 2.6, 2.4, 4.0, 1.7, 0.0, 2.0, 3.8, 1.6, 3.3, 2.7],
7: [5.2, 4.5, 4.1, 5.5, 3.4, 2.0, 0.0, 5.4, 2.5, 1.6, 0.9],
8: [0.2, 1.8, 1.5, 0.3, 2.4, 3.8, 5.4, 0.0, 4.4, 6.9, 6.1],
9: [4.3, 3.2, 3.0, 4.6, 2.1, 1.6, 2.5, 4.4, 0.0, 3.4, 2.9],
10: [6.8, 5.9, 5.5, 7.1, 4.8, 3.3, 1.6, 6.9, 3.4, 0.0, 1.0],
11: [6.0, 5.2, 4.8, 6.3, 4.1, 2.7, 0.9, 6.1, 2.9, 1.0, 0.0]
})
That allows you to reshape the dataframe to your desired form:
model_df = df.melt(id_vars='x', var_name='y', value_name='distance')
Finally, define a class e.g. using:
def assign_class(x):
return 0 if x <= 8 else 1
model_df["class_x"] = model_df["x"].apply(assign_class),
model_df["class_y"] = model_df["y"].apply(assign_class)
This will give you a dataframe that you can pass to the model. Note that the input matrix is symmetric, so you may want to only keep unique records (drop [y, x] if you already have [x, y]).

Related

How can I add a variable number of custom hover fields, with respect to node, on a HoloViews plot? (python and bokeh backend)

I'm working on getting interactive networks so I can send datasets around to collaborators. I've found that HoloViews is the most intuitive option for interactive networks. I'm using Bokeh for the backend not for any reason other than that's what the tutorial above used and I'm pretty familiar with it.
I've gotten the hover tool to work for my network and it looks great. Below is an adaptation of the methodology using the iris dataset for the sake of this post.
What I'm having trouble with is getting custom hover fields in addition to the ones already shown. For example, I want all the nodes to have the [Node, Species] fields from the df_nodes DataFrame. However, in the second part of the code underneath the figure I generate custom fields per node that range from 0-5 categories. I would like to append this onto the existing Hover options.
For example, iris_1 would have the following where * indicates what is already there and # indicates what needs to be added:
* Node iris_1
* Species Setosa
# Category_2 0.734694
# Category_9 0.489796
# Category_8 0.469388
# Category_4 0.122449
iris_2 would only have [Node, Species] since it has 0 categories (if you index the node_to_custom dictionary you will see that). iris_3 will have the [Node, Species, Category_4, Category_5] fields.
How can I add a variable number of custom hover fields, with respect to node, on a HoloViews plot? Preferably with Bokeh but if Plot.ly is the better option for this, then let's do it.
I tried doing line breaks but they didn't render. Though, that was supposed to be a hack and not what I actually wanted.
# Iris
import pandas as pd
import networkx as nx
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
defaults = dict(width=500, height=500)
hv.opts.defaults(
opts.EdgePaths(**defaults),
opts.Graph(**defaults),
opts.Nodes(**defaults),
)
X_iris = pd.DataFrame({'sepal_length': {'iris_0': 5.1, 'iris_1': 4.9, 'iris_2': 4.7, 'iris_3': 4.6, 'iris_4': 5.0, 'iris_5': 5.4, 'iris_6': 4.6, 'iris_7': 5.0, 'iris_8': 4.4, 'iris_9': 4.9, 'iris_10': 5.4, 'iris_11': 4.8, 'iris_12': 4.8, 'iris_13': 4.3, 'iris_14': 5.8, 'iris_15': 5.7, 'iris_16': 5.4, 'iris_17': 5.1, 'iris_18': 5.7, 'iris_19': 5.1, 'iris_20': 5.4, 'iris_21': 5.1, 'iris_22': 4.6, 'iris_23': 5.1, 'iris_24': 4.8, 'iris_25': 5.0, 'iris_26': 5.0, 'iris_27': 5.2, 'iris_28': 5.2, 'iris_29': 4.7, 'iris_30': 4.8, 'iris_31': 5.4, 'iris_32': 5.2, 'iris_33': 5.5, 'iris_34': 4.9, 'iris_35': 5.0, 'iris_36': 5.5, 'iris_37': 4.9, 'iris_38': 4.4, 'iris_39': 5.1, 'iris_40': 5.0, 'iris_41': 4.5, 'iris_42': 4.4, 'iris_43': 5.0, 'iris_44': 5.1, 'iris_45': 4.8, 'iris_46': 5.1, 'iris_47': 4.6, 'iris_48': 5.3, 'iris_49': 5.0, 'iris_50': 7.0, 'iris_51': 6.4, 'iris_52': 6.9, 'iris_53': 5.5, 'iris_54': 6.5, 'iris_55': 5.7, 'iris_56': 6.3, 'iris_57': 4.9, 'iris_58': 6.6, 'iris_59': 5.2, 'iris_60': 5.0, 'iris_61': 5.9, 'iris_62': 6.0, 'iris_63': 6.1, 'iris_64': 5.6, 'iris_65': 6.7, 'iris_66': 5.6, 'iris_67': 5.8, 'iris_68': 6.2, 'iris_69': 5.6, 'iris_70': 5.9, 'iris_71': 6.1, 'iris_72': 6.3, 'iris_73': 6.1, 'iris_74': 6.4, 'iris_75': 6.6, 'iris_76': 6.8, 'iris_77': 6.7, 'iris_78': 6.0, 'iris_79': 5.7, 'iris_80': 5.5, 'iris_81': 5.5, 'iris_82': 5.8, 'iris_83': 6.0, 'iris_84': 5.4, 'iris_85': 6.0, 'iris_86': 6.7, 'iris_87': 6.3, 'iris_88': 5.6, 'iris_89': 5.5, 'iris_90': 5.5, 'iris_91': 6.1, 'iris_92': 5.8, 'iris_93': 5.0, 'iris_94': 5.6, 'iris_95': 5.7, 'iris_96': 5.7, 'iris_97': 6.2, 'iris_98': 5.1, 'iris_99': 5.7, 'iris_100': 6.3, 'iris_101': 5.8, 'iris_102': 7.1, 'iris_103': 6.3, 'iris_104': 6.5, 'iris_105': 7.6, 'iris_106': 4.9, 'iris_107': 7.3, 'iris_108': 6.7, 'iris_109': 7.2, 'iris_110': 6.5, 'iris_111': 6.4, 'iris_112': 6.8, 'iris_113': 5.7, 'iris_114': 5.8, 'iris_115': 6.4, 'iris_116': 6.5, 'iris_117': 7.7, 'iris_118': 7.7, 'iris_119': 6.0, 'iris_120': 6.9, 'iris_121': 5.6, 'iris_122': 7.7, 'iris_123': 6.3, 'iris_124': 6.7, 'iris_125': 7.2, 'iris_126': 6.2, 'iris_127': 6.1, 'iris_128': 6.4, 'iris_129': 7.2, 'iris_130': 7.4, 'iris_131': 7.9, 'iris_132': 6.4, 'iris_133': 6.3, 'iris_134': 6.1, 'iris_135': 7.7, 'iris_136': 6.3, 'iris_137': 6.4, 'iris_138': 6.0, 'iris_139': 6.9, 'iris_140': 6.7, 'iris_141': 6.9, 'iris_142': 5.8, 'iris_143': 6.8, 'iris_144': 6.7, 'iris_145': 6.7, 'iris_146': 6.3, 'iris_147': 6.5, 'iris_148': 6.2, 'iris_149': 5.9}, 'sepal_width': {'iris_0': 3.5, 'iris_1': 3.0, 'iris_2': 3.2, 'iris_3': 3.1, 'iris_4': 3.6, 'iris_5': 3.9, 'iris_6': 3.4, 'iris_7': 3.4, 'iris_8': 2.9, 'iris_9': 3.1, 'iris_10': 3.7, 'iris_11': 3.4, 'iris_12': 3.0, 'iris_13': 3.0, 'iris_14': 4.0, 'iris_15': 4.4, 'iris_16': 3.9, 'iris_17': 3.5, 'iris_18': 3.8, 'iris_19': 3.8, 'iris_20': 3.4, 'iris_21': 3.7, 'iris_22': 3.6, 'iris_23': 3.3, 'iris_24': 3.4, 'iris_25': 3.0, 'iris_26': 3.4, 'iris_27': 3.5, 'iris_28': 3.4, 'iris_29': 3.2, 'iris_30': 3.1, 'iris_31': 3.4, 'iris_32': 4.1, 'iris_33': 4.2, 'iris_34': 3.1, 'iris_35': 3.2, 'iris_36': 3.5, 'iris_37': 3.6, 'iris_38': 3.0, 'iris_39': 3.4, 'iris_40': 3.5, 'iris_41': 2.3, 'iris_42': 3.2, 'iris_43': 3.5, 'iris_44': 3.8, 'iris_45': 3.0, 'iris_46': 3.8, 'iris_47': 3.2, 'iris_48': 3.7, 'iris_49': 3.3, 'iris_50': 3.2, 'iris_51': 3.2, 'iris_52': 3.1, 'iris_53': 2.3, 'iris_54': 2.8, 'iris_55': 2.8, 'iris_56': 3.3, 'iris_57': 2.4, 'iris_58': 2.9, 'iris_59': 2.7, 'iris_60': 2.0, 'iris_61': 3.0, 'iris_62': 2.2, 'iris_63': 2.9, 'iris_64': 2.9, 'iris_65': 3.1, 'iris_66': 3.0, 'iris_67': 2.7, 'iris_68': 2.2, 'iris_69': 2.5, 'iris_70': 3.2, 'iris_71': 2.8, 'iris_72': 2.5, 'iris_73': 2.8, 'iris_74': 2.9, 'iris_75': 3.0, 'iris_76': 2.8, 'iris_77': 3.0, 'iris_78': 2.9, 'iris_79': 2.6, 'iris_80': 2.4, 'iris_81': 2.4, 'iris_82': 2.7, 'iris_83': 2.7, 'iris_84': 3.0, 'iris_85': 3.4, 'iris_86': 3.1, 'iris_87': 2.3, 'iris_88': 3.0, 'iris_89': 2.5, 'iris_90': 2.6, 'iris_91': 3.0, 'iris_92': 2.6, 'iris_93': 2.3, 'iris_94': 2.7, 'iris_95': 3.0, 'iris_96': 2.9, 'iris_97': 2.9, 'iris_98': 2.5, 'iris_99': 2.8, 'iris_100': 3.3, 'iris_101': 2.7, 'iris_102': 3.0, 'iris_103': 2.9, 'iris_104': 3.0, 'iris_105': 3.0, 'iris_106': 2.5, 'iris_107': 2.9, 'iris_108': 2.5, 'iris_109': 3.6, 'iris_110': 3.2, 'iris_111': 2.7, 'iris_112': 3.0, 'iris_113': 2.5, 'iris_114': 2.8, 'iris_115': 3.2, 'iris_116': 3.0, 'iris_117': 3.8, 'iris_118': 2.6, 'iris_119': 2.2, 'iris_120': 3.2, 'iris_121': 2.8, 'iris_122': 2.8, 'iris_123': 2.7, 'iris_124': 3.3, 'iris_125': 3.2, 'iris_126': 2.8, 'iris_127': 3.0, 'iris_128': 2.8, 'iris_129': 3.0, 'iris_130': 2.8, 'iris_131': 3.8, 'iris_132': 2.8, 'iris_133': 2.8, 'iris_134': 2.6, 'iris_135': 3.0, 'iris_136': 3.4, 'iris_137': 3.1, 'iris_138': 3.0, 'iris_139': 3.1, 'iris_140': 3.1, 'iris_141': 3.1, 'iris_142': 2.7, 'iris_143': 3.2, 'iris_144': 3.3, 'iris_145': 3.0, 'iris_146': 2.5, 'iris_147': 3.0, 'iris_148': 3.4, 'iris_149': 3.0}, 'petal_length': {'iris_0': 1.4, 'iris_1': 1.4, 'iris_2': 1.3, 'iris_3': 1.5, 'iris_4': 1.4, 'iris_5': 1.7, 'iris_6': 1.4, 'iris_7': 1.5, 'iris_8': 1.4, 'iris_9': 1.5, 'iris_10': 1.5, 'iris_11': 1.6, 'iris_12': 1.4, 'iris_13': 1.1, 'iris_14': 1.2, 'iris_15': 1.5, 'iris_16': 1.3, 'iris_17': 1.4, 'iris_18': 1.7, 'iris_19': 1.5, 'iris_20': 1.7, 'iris_21': 1.5, 'iris_22': 1.0, 'iris_23': 1.7, 'iris_24': 1.9, 'iris_25': 1.6, 'iris_26': 1.6, 'iris_27': 1.5, 'iris_28': 1.4, 'iris_29': 1.6, 'iris_30': 1.6, 'iris_31': 1.5, 'iris_32': 1.5, 'iris_33': 1.4, 'iris_34': 1.5, 'iris_35': 1.2, 'iris_36': 1.3, 'iris_37': 1.4, 'iris_38': 1.3, 'iris_39': 1.5, 'iris_40': 1.3, 'iris_41': 1.3, 'iris_42': 1.3, 'iris_43': 1.6, 'iris_44': 1.9, 'iris_45': 1.4, 'iris_46': 1.6, 'iris_47': 1.4, 'iris_48': 1.5, 'iris_49': 1.4, 'iris_50': 4.7, 'iris_51': 4.5, 'iris_52': 4.9, 'iris_53': 4.0, 'iris_54': 4.6, 'iris_55': 4.5, 'iris_56': 4.7, 'iris_57': 3.3, 'iris_58': 4.6, 'iris_59': 3.9, 'iris_60': 3.5, 'iris_61': 4.2, 'iris_62': 4.0, 'iris_63': 4.7, 'iris_64': 3.6, 'iris_65': 4.4, 'iris_66': 4.5, 'iris_67': 4.1, 'iris_68': 4.5, 'iris_69': 3.9, 'iris_70': 4.8, 'iris_71': 4.0, 'iris_72': 4.9, 'iris_73': 4.7, 'iris_74': 4.3, 'iris_75': 4.4, 'iris_76': 4.8, 'iris_77': 5.0, 'iris_78': 4.5, 'iris_79': 3.5, 'iris_80': 3.8, 'iris_81': 3.7, 'iris_82': 3.9, 'iris_83': 5.1, 'iris_84': 4.5, 'iris_85': 4.5, 'iris_86': 4.7, 'iris_87': 4.4, 'iris_88': 4.1, 'iris_89': 4.0, 'iris_90': 4.4, 'iris_91': 4.6, 'iris_92': 4.0, 'iris_93': 3.3, 'iris_94': 4.2, 'iris_95': 4.2, 'iris_96': 4.2, 'iris_97': 4.3, 'iris_98': 3.0, 'iris_99': 4.1, 'iris_100': 6.0, 'iris_101': 5.1, 'iris_102': 5.9, 'iris_103': 5.6, 'iris_104': 5.8, 'iris_105': 6.6, 'iris_106': 4.5, 'iris_107': 6.3, 'iris_108': 5.8, 'iris_109': 6.1, 'iris_110': 5.1, 'iris_111': 5.3, 'iris_112': 5.5, 'iris_113': 5.0, 'iris_114': 5.1, 'iris_115': 5.3, 'iris_116': 5.5, 'iris_117': 6.7, 'iris_118': 6.9, 'iris_119': 5.0, 'iris_120': 5.7, 'iris_121': 4.9, 'iris_122': 6.7, 'iris_123': 4.9, 'iris_124': 5.7, 'iris_125': 6.0, 'iris_126': 4.8, 'iris_127': 4.9, 'iris_128': 5.6, 'iris_129': 5.8, 'iris_130': 6.1, 'iris_131': 6.4, 'iris_132': 5.6, 'iris_133': 5.1, 'iris_134': 5.6, 'iris_135': 6.1, 'iris_136': 5.6, 'iris_137': 5.5, 'iris_138': 4.8, 'iris_139': 5.4, 'iris_140': 5.6, 'iris_141': 5.1, 'iris_142': 5.1, 'iris_143': 5.9, 'iris_144': 5.7, 'iris_145': 5.2, 'iris_146': 5.0, 'iris_147': 5.2, 'iris_148': 5.4, 'iris_149': 5.1}, 'petal_width': {'iris_0': 0.2, 'iris_1': 0.2, 'iris_2': 0.2, 'iris_3': 0.2, 'iris_4': 0.2, 'iris_5': 0.4, 'iris_6': 0.3, 'iris_7': 0.2, 'iris_8': 0.2, 'iris_9': 0.1, 'iris_10': 0.2, 'iris_11': 0.2, 'iris_12': 0.1, 'iris_13': 0.1, 'iris_14': 0.2, 'iris_15': 0.4, 'iris_16': 0.4, 'iris_17': 0.3, 'iris_18': 0.3, 'iris_19': 0.3, 'iris_20': 0.2, 'iris_21': 0.4, 'iris_22': 0.2, 'iris_23': 0.5, 'iris_24': 0.2, 'iris_25': 0.2, 'iris_26': 0.4, 'iris_27': 0.2, 'iris_28': 0.2, 'iris_29': 0.2, 'iris_30': 0.2, 'iris_31': 0.4, 'iris_32': 0.1, 'iris_33': 0.2, 'iris_34': 0.2, 'iris_35': 0.2, 'iris_36': 0.2, 'iris_37': 0.1, 'iris_38': 0.2, 'iris_39': 0.2, 'iris_40': 0.3, 'iris_41': 0.3, 'iris_42': 0.2, 'iris_43': 0.6, 'iris_44': 0.4, 'iris_45': 0.3, 'iris_46': 0.2, 'iris_47': 0.2, 'iris_48': 0.2, 'iris_49': 0.2, 'iris_50': 1.4, 'iris_51': 1.5, 'iris_52': 1.5, 'iris_53': 1.3, 'iris_54': 1.5, 'iris_55': 1.3, 'iris_56': 1.6, 'iris_57': 1.0, 'iris_58': 1.3, 'iris_59': 1.4, 'iris_60': 1.0, 'iris_61': 1.5, 'iris_62': 1.0, 'iris_63': 1.4, 'iris_64': 1.3, 'iris_65': 1.4, 'iris_66': 1.5, 'iris_67': 1.0, 'iris_68': 1.5, 'iris_69': 1.1, 'iris_70': 1.8, 'iris_71': 1.3, 'iris_72': 1.5, 'iris_73': 1.2, 'iris_74': 1.3, 'iris_75': 1.4, 'iris_76': 1.4, 'iris_77': 1.7, 'iris_78': 1.5, 'iris_79': 1.0, 'iris_80': 1.1, 'iris_81': 1.0, 'iris_82': 1.2, 'iris_83': 1.6, 'iris_84': 1.5, 'iris_85': 1.6, 'iris_86': 1.5, 'iris_87': 1.3, 'iris_88': 1.3, 'iris_89': 1.3, 'iris_90': 1.2, 'iris_91': 1.4, 'iris_92': 1.2, 'iris_93': 1.0, 'iris_94': 1.3, 'iris_95': 1.2, 'iris_96': 1.3, 'iris_97': 1.3, 'iris_98': 1.1, 'iris_99': 1.3, 'iris_100': 2.5, 'iris_101': 1.9, 'iris_102': 2.1, 'iris_103': 1.8, 'iris_104': 2.2, 'iris_105': 2.1, 'iris_106': 1.7, 'iris_107': 1.8, 'iris_108': 1.8, 'iris_109': 2.5, 'iris_110': 2.0, 'iris_111': 1.9, 'iris_112': 2.1, 'iris_113': 2.0, 'iris_114': 2.4, 'iris_115': 2.3, 'iris_116': 1.8, 'iris_117': 2.2, 'iris_118': 2.3, 'iris_119': 1.5, 'iris_120': 2.3, 'iris_121': 2.0, 'iris_122': 2.0, 'iris_123': 1.8, 'iris_124': 2.1, 'iris_125': 1.8, 'iris_126': 1.8, 'iris_127': 1.8, 'iris_128': 2.1, 'iris_129': 1.6, 'iris_130': 1.9, 'iris_131': 2.0, 'iris_132': 2.2, 'iris_133': 1.5, 'iris_134': 1.4, 'iris_135': 2.3, 'iris_136': 2.4, 'iris_137': 1.8, 'iris_138': 1.8, 'iris_139': 2.1, 'iris_140': 2.4, 'iris_141': 2.3, 'iris_142': 1.9, 'iris_143': 2.3, 'iris_144': 2.5, 'iris_145': 2.3, 'iris_146': 1.9, 'iris_147': 2.0, 'iris_148': 2.3, 'iris_149': 1.8}})
y_iris = pd.Series({'iris_0': 'setosa', 'iris_1': 'setosa', 'iris_2': 'setosa', 'iris_3': 'setosa', 'iris_4': 'setosa', 'iris_5': 'setosa', 'iris_6': 'setosa', 'iris_7': 'setosa', 'iris_8': 'setosa', 'iris_9': 'setosa', 'iris_10': 'setosa', 'iris_11': 'setosa', 'iris_12': 'setosa', 'iris_13': 'setosa', 'iris_14': 'setosa', 'iris_15': 'setosa', 'iris_16': 'setosa', 'iris_17': 'setosa', 'iris_18': 'setosa', 'iris_19': 'setosa', 'iris_20': 'setosa', 'iris_21': 'setosa', 'iris_22': 'setosa', 'iris_23': 'setosa', 'iris_24': 'setosa', 'iris_25': 'setosa', 'iris_26': 'setosa', 'iris_27': 'setosa', 'iris_28': 'setosa', 'iris_29': 'setosa', 'iris_30': 'setosa', 'iris_31': 'setosa', 'iris_32': 'setosa', 'iris_33': 'setosa', 'iris_34': 'setosa', 'iris_35': 'setosa', 'iris_36': 'setosa', 'iris_37': 'setosa', 'iris_38': 'setosa', 'iris_39': 'setosa', 'iris_40': 'setosa', 'iris_41': 'setosa', 'iris_42': 'setosa', 'iris_43': 'setosa', 'iris_44': 'setosa', 'iris_45': 'setosa', 'iris_46': 'setosa', 'iris_47': 'setosa', 'iris_48': 'setosa', 'iris_49': 'setosa', 'iris_50': 'versicolor', 'iris_51': 'versicolor', 'iris_52': 'versicolor', 'iris_53': 'versicolor', 'iris_54': 'versicolor', 'iris_55': 'versicolor', 'iris_56': 'versicolor', 'iris_57': 'versicolor', 'iris_58': 'versicolor', 'iris_59': 'versicolor', 'iris_60': 'versicolor', 'iris_61': 'versicolor', 'iris_62': 'versicolor', 'iris_63': 'versicolor', 'iris_64': 'versicolor', 'iris_65': 'versicolor', 'iris_66': 'versicolor', 'iris_67': 'versicolor', 'iris_68': 'versicolor', 'iris_69': 'versicolor', 'iris_70': 'versicolor', 'iris_71': 'versicolor', 'iris_72': 'versicolor', 'iris_73': 'versicolor', 'iris_74': 'versicolor', 'iris_75': 'versicolor', 'iris_76': 'versicolor', 'iris_77': 'versicolor', 'iris_78': 'versicolor', 'iris_79': 'versicolor', 'iris_80': 'versicolor', 'iris_81': 'versicolor', 'iris_82': 'versicolor', 'iris_83': 'versicolor', 'iris_84': 'versicolor', 'iris_85': 'versicolor', 'iris_86': 'versicolor', 'iris_87': 'versicolor', 'iris_88': 'versicolor', 'iris_89': 'versicolor', 'iris_90': 'versicolor', 'iris_91': 'versicolor', 'iris_92': 'versicolor', 'iris_93': 'versicolor', 'iris_94': 'versicolor', 'iris_95': 'versicolor', 'iris_96': 'versicolor', 'iris_97': 'versicolor', 'iris_98': 'versicolor', 'iris_99': 'versicolor', 'iris_100': 'virginica', 'iris_101': 'virginica', 'iris_102': 'virginica', 'iris_103': 'virginica', 'iris_104': 'virginica', 'iris_105': 'virginica', 'iris_106': 'virginica', 'iris_107': 'virginica', 'iris_108': 'virginica', 'iris_109': 'virginica', 'iris_110': 'virginica', 'iris_111': 'virginica', 'iris_112': 'virginica', 'iris_113': 'virginica', 'iris_114': 'virginica', 'iris_115': 'virginica', 'iris_116': 'virginica', 'iris_117': 'virginica', 'iris_118': 'virginica', 'iris_119': 'virginica', 'iris_120': 'virginica', 'iris_121': 'virginica', 'iris_122': 'virginica', 'iris_123': 'virginica', 'iris_124': 'virginica', 'iris_125': 'virginica', 'iris_126': 'virginica', 'iris_127': 'virginica', 'iris_128': 'virginica', 'iris_129': 'virginica', 'iris_130': 'virginica', 'iris_131': 'virginica', 'iris_132': 'virginica', 'iris_133': 'virginica', 'iris_134': 'virginica', 'iris_135': 'virginica', 'iris_136': 'virginica', 'iris_137': 'virginica', 'iris_138': 'virginica', 'iris_139': 'virginica', 'iris_140': 'virginica', 'iris_141': 'virginica', 'iris_142': 'virginica', 'iris_143': 'virginica', 'iris_144': 'virginica', 'iris_145': 'virginica', 'iris_146': 'virginica', 'iris_147': 'virginica', 'iris_148': 'virginica', 'iris_149': 'virginica'})
c_iris = pd.Series({'setosa': '#66c2a5', 'versicolor': '#fc8d62', 'virginica': '#8da0cb'})
# Get edge to weight mapping
weights = X_iris.T.corr().stack()
weights.index = weights.index.map(frozenset)
print(weights.size)
# 22500 = 150**2
# Get rid of diagonal b/c the weights are non-informative
weights = weights[weights.index.map(lambda nodes: len(nodes) == 2)]
print(weights.size)
# 22350 = 150**2 - 150
# Get non-redundant edges ([upper/lower]triangle)
weights = pd.Series(weights.to_dict() )
print(weights.size)
# 11175 = (150**2 - 150)/2
# Create graph
tol = 0.99
graph = nx.Graph()
for edge, w in weights.abs().items(): # For sake of demonstration, just take absolute value though I wouldn't normally do this
if w > tol:
graph.add_edge(*edge, weight=w)
# Get positions
pos = nx.circular_layout(graph)#, seed=0)
# Prepare nodes for HoloViews
df_nodes = pd.DataFrame(pos, index=list("xy")).T
df_nodes.index.name = "Node"
df_nodes["Species"] = y_iris
df_nodes = df_nodes.reset_index()[["x","y", "Node", "Species"]]
df_nodes.head()
# x y Node Species
# 0 0.002421 -0.765592 iris_1 setosa
# 1 0.116149 -0.721862 iris_0 setosa
# 2 0.012620 -0.730962 iris_2 setosa
# 3 0.053972 -0.611302 iris_3 setosa
# 4 0.049840 -0.687669 iris_4 setosa
# Prepare edges for HoloViews
df_edges = list()
for node_a, node_b, edge_data in graph.edges(data=True):
df_edges.append([node_a, node_b, edge_data["weight"]])
df_edges = pd.DataFrame(df_edges, columns=["start", "end", "weight"])
df_edges.head()
# start end weight
# 0 iris_1 iris_0 0.995999
# 1 iris_1 iris_2 0.996607
# 2 iris_1 iris_3 0.997397
# 3 iris_1 iris_4 0.992233
# 4 iris_1 iris_5 0.993592
hv_nodes = hv.Nodes(df_nodes)
hv_graph = hv.Graph((df_edges, hv_nodes), label='Iris Dataset')
hv_graph.opts(cmap=c_iris.to_dict(), node_size=10, edge_line_width="weight",
node_line_color='white', node_color='Species', xaxis=None, yaxis=None)
# Custom mapping
categories = list(map(lambda i: "Category_{}".format(i), range(10)))
range_of_values = np.linspace(0,1)
node_to_custom = dict()
for i, node in enumerate(graph.nodes()):
rng = np.random.RandomState(i)
# Get a random number of categories (real data will not be this obviously)
number_of_categories = rng.choice([0,1,2,3,4,5], size=1)[0]
# Grab N categories w/o replacement
categories_wrt_node = rng.choice(categories, size=number_of_categories, replace=False)
# Get values ranging from [0,1] for those categories
values_wrt_categories = rng.choice(range_of_values, size=number_of_categories )
# Get a mapping between categories and values
categories_to_values = pd.Series(dict(zip(categories_wrt_node, values_wrt_categories)), dtype=float)
# Get non-zero values, sort, and store
node_to_custom[node] = categories_to_values[lambda v: v > 0].sort_values(ascending=False)
# Example of {key:value} showing {node:series}
list(node_to_custom.items())[0]
# ('iris_1',
# Category_2 0.734694
# Category_9 0.489796
# Category_8 0.469388
# Category_4 0.122449
# dtype: float64)
I don't have a definitive answer to your question, but maybe I can still help.
To the best of my knowledge, Holoviews doesn't support variable number of tooltips. What it does support are custom tooltips.
Custom tooltips look like this:
# each tuple will be a row in the tooltip
tooltips = [
('Name', '#name'),
('Symbol', '#symbol'),
('CPK', '$color[hex, swatch]:CPK')
]
custom_hover_tool = HoverTool(tooltips=tooltips)
points.opts(tools=[custom_hover_tool])
Example from here:
http://holoviews.org/user_guide/Plotting_with_Bokeh.html
More details on the usable $variables and #variables:
https://docs.bokeh.org/en/latest/docs/user_guide/tools.html#hovertool
So if this could be good enough for you, you could aggregate your categorical data into a string for each record like "Category_2: 0.734694, Category_9: 0.489796 ..." and display that as a row in the tooltip with a label like "Categories:".
But the tooltips variable actually can be an HTML template too, something like this:
tooltips = """
<div class="row">
<div class="col label">Node</div>
<div class="col value">#name</div>
</div>
<div class="row">
<div class="col label">Species</div>
<div class="col value">#species</div>
</div>
#categories{safe}
"""
The {safe} part forces the tooltip to display the content of that variable as HTML content. So this time you have to previously aggregate your categorical data into a data column that already contains the final HTML code for every record, so for your example record it should look like this:
'\
<div class="row">\
<div class="col label">Category_2:</div>\
<div class="col value">0.734694</div>\
</div>\
<div class="row">\
<div class="col label">Category_9:</div>\
<div class="col value">0.489796</div>\
</div>\
...\
'
(Most likely you would have two for loops inside each other, one for every node and another for every category in it, and only adding one "row" at the time, but something like this would be the end result for each node.)
If you use the exact same HTML/CSS structure in both code blocks, they should be merged seamlessly.
Treat these code blocks as mockups just to demonstrate the idea as I just improvised them here without testing, but I hope it helps.
Let me know if you tried it and show me a working example if you got stuck with it and I try to go into the details.

How to use `hclust` function from R in Python via Rpy2 (v3)?

There are a lot of changes between rpy2 v2 and v3. I'm porting my code and patching up some compatibility issues. One thing I can't figure out is how to get hclust to work. Specifically from the fastcluster package but I can't even get base hclust to work.
A few things I do not understand:
(1) Should I use R["as.dist"](rkernel) or R("as.dist")(rkernel) ?
(2) Why does this return a numpy array when I'm calling it within R?
(3) How can I get this disimilarity object to work with hclust and fastcluster::hclust?
I'm using rpy2 v3.3.2 btw.
I was using something similar to this but it's not working anymore:
feeding distance matrix to R clustering from Rpy2
import pandas as pd
import numpy as np
from rpy2 import robjects as ro
from rpy2 import rinterface as ri
from rpy2.robjects.packages import importr
R = ro.r
# r_stats = importr("stats")
fastcluster = importr("fastcluster")
def pandas_to_rpy2(df):
return ro.conversion.py2rpy(df)
def rpy2_to_pandas(r_df):
return ro.conversion.rpy2py(r_df)
# Data
X_iris = pd.read_csv("https://pastebin.com/raw/dR59vTD4", sep="\t", index_col=0)
#X_iris = pd.DataFrame({'sepal_length': {'iris_0': 5.1, 'iris_1': 4.9, 'iris_2': 4.7, 'iris_3': 4.6, 'iris_4': 5.0, 'iris_5': 5.4, 'iris_6': 4.6, 'iris_7': 5.0, 'iris_8': 4.4, 'iris_9': 4.9, 'iris_10': 5.4, 'iris_11': 4.8, 'iris_12': 4.8, 'iris_13': 4.3, 'iris_14': 5.8, 'iris_15': 5.7, 'iris_16': 5.4, 'iris_17': 5.1, 'iris_18': 5.7, 'iris_19': 5.1, 'iris_20': 5.4, 'iris_21': 5.1, 'iris_22': 4.6, 'iris_23': 5.1, 'iris_24': 4.8, 'iris_25': 5.0, 'iris_26': 5.0, 'iris_27': 5.2, 'iris_28': 5.2, 'iris_29': 4.7, 'iris_30': 4.8, 'iris_31': 5.4, 'iris_32': 5.2, 'iris_33': 5.5, 'iris_34': 4.9, 'iris_35': 5.0, 'iris_36': 5.5, 'iris_37': 4.9, 'iris_38': 4.4, 'iris_39': 5.1, 'iris_40': 5.0, 'iris_41': 4.5, 'iris_42': 4.4, 'iris_43': 5.0, 'iris_44': 5.1, 'iris_45': 4.8, 'iris_46': 5.1, 'iris_47': 4.6, 'iris_48': 5.3, 'iris_49': 5.0, 'iris_50': 7.0, 'iris_51': 6.4, 'iris_52': 6.9, 'iris_53': 5.5, 'iris_54': 6.5, 'iris_55': 5.7, 'iris_56': 6.3, 'iris_57': 4.9, 'iris_58': 6.6, 'iris_59': 5.2, 'iris_60': 5.0, 'iris_61': 5.9, 'iris_62': 6.0, 'iris_63': 6.1, 'iris_64': 5.6, 'iris_65': 6.7, 'iris_66': 5.6, 'iris_67': 5.8, 'iris_68': 6.2, 'iris_69': 5.6, 'iris_70': 5.9, 'iris_71': 6.1, 'iris_72': 6.3, 'iris_73': 6.1, 'iris_74': 6.4, 'iris_75': 6.6, 'iris_76': 6.8, 'iris_77': 6.7, 'iris_78': 6.0, 'iris_79': 5.7, 'iris_80': 5.5, 'iris_81': 5.5, 'iris_82': 5.8, 'iris_83': 6.0, 'iris_84': 5.4, 'iris_85': 6.0, 'iris_86': 6.7, 'iris_87': 6.3, 'iris_88': 5.6, 'iris_89': 5.5, 'iris_90': 5.5, 'iris_91': 6.1, 'iris_92': 5.8, 'iris_93': 5.0, 'iris_94': 5.6, 'iris_95': 5.7, 'iris_96': 5.7, 'iris_97': 6.2, 'iris_98': 5.1, 'iris_99': 5.7, 'iris_100': 6.3, 'iris_101': 5.8, 'iris_102': 7.1, 'iris_103': 6.3, 'iris_104': 6.5, 'iris_105': 7.6, 'iris_106': 4.9, 'iris_107': 7.3, 'iris_108': 6.7, 'iris_109': 7.2, 'iris_110': 6.5, 'iris_111': 6.4, 'iris_112': 6.8, 'iris_113': 5.7, 'iris_114': 5.8, 'iris_115': 6.4, 'iris_116': 6.5, 'iris_117': 7.7, 'iris_118': 7.7, 'iris_119': 6.0, 'iris_120': 6.9, 'iris_121': 5.6, 'iris_122': 7.7, 'iris_123': 6.3, 'iris_124': 6.7, 'iris_125': 7.2, 'iris_126': 6.2, 'iris_127': 6.1, 'iris_128': 6.4, 'iris_129': 7.2, 'iris_130': 7.4, 'iris_131': 7.9, 'iris_132': 6.4, 'iris_133': 6.3, 'iris_134': 6.1, 'iris_135': 7.7, 'iris_136': 6.3, 'iris_137': 6.4, 'iris_138': 6.0, 'iris_139': 6.9, 'iris_140': 6.7, 'iris_141': 6.9, 'iris_142': 5.8, 'iris_143': 6.8, 'iris_144': 6.7, 'iris_145': 6.7, 'iris_146': 6.3, 'iris_147': 6.5, 'iris_148': 6.2, 'iris_149': 5.9}, 'sepal_width': {'iris_0': 3.5, 'iris_1': 3.0, 'iris_2': 3.2, 'iris_3': 3.1, 'iris_4': 3.6, 'iris_5': 3.9, 'iris_6': 3.4, 'iris_7': 3.4, 'iris_8': 2.9, 'iris_9': 3.1, 'iris_10': 3.7, 'iris_11': 3.4, 'iris_12': 3.0, 'iris_13': 3.0, 'iris_14': 4.0, 'iris_15': 4.4, 'iris_16': 3.9, 'iris_17': 3.5, 'iris_18': 3.8, 'iris_19': 3.8, 'iris_20': 3.4, 'iris_21': 3.7, 'iris_22': 3.6, 'iris_23': 3.3, 'iris_24': 3.4, 'iris_25': 3.0, 'iris_26': 3.4, 'iris_27': 3.5, 'iris_28': 3.4, 'iris_29': 3.2, 'iris_30': 3.1, 'iris_31': 3.4, 'iris_32': 4.1, 'iris_33': 4.2, 'iris_34': 3.1, 'iris_35': 3.2, 'iris_36': 3.5, 'iris_37': 3.6, 'iris_38': 3.0, 'iris_39': 3.4, 'iris_40': 3.5, 'iris_41': 2.3, 'iris_42': 3.2, 'iris_43': 3.5, 'iris_44': 3.8, 'iris_45': 3.0, 'iris_46': 3.8, 'iris_47': 3.2, 'iris_48': 3.7, 'iris_49': 3.3, 'iris_50': 3.2, 'iris_51': 3.2, 'iris_52': 3.1, 'iris_53': 2.3, 'iris_54': 2.8, 'iris_55': 2.8, 'iris_56': 3.3, 'iris_57': 2.4, 'iris_58': 2.9, 'iris_59': 2.7, 'iris_60': 2.0, 'iris_61': 3.0, 'iris_62': 2.2, 'iris_63': 2.9, 'iris_64': 2.9, 'iris_65': 3.1, 'iris_66': 3.0, 'iris_67': 2.7, 'iris_68': 2.2, 'iris_69': 2.5, 'iris_70': 3.2, 'iris_71': 2.8, 'iris_72': 2.5, 'iris_73': 2.8, 'iris_74': 2.9, 'iris_75': 3.0, 'iris_76': 2.8, 'iris_77': 3.0, 'iris_78': 2.9, 'iris_79': 2.6, 'iris_80': 2.4, 'iris_81': 2.4, 'iris_82': 2.7, 'iris_83': 2.7, 'iris_84': 3.0, 'iris_85': 3.4, 'iris_86': 3.1, 'iris_87': 2.3, 'iris_88': 3.0, 'iris_89': 2.5, 'iris_90': 2.6, 'iris_91': 3.0, 'iris_92': 2.6, 'iris_93': 2.3, 'iris_94': 2.7, 'iris_95': 3.0, 'iris_96': 2.9, 'iris_97': 2.9, 'iris_98': 2.5, 'iris_99': 2.8, 'iris_100': 3.3, 'iris_101': 2.7, 'iris_102': 3.0, 'iris_103': 2.9, 'iris_104': 3.0, 'iris_105': 3.0, 'iris_106': 2.5, 'iris_107': 2.9, 'iris_108': 2.5, 'iris_109': 3.6, 'iris_110': 3.2, 'iris_111': 2.7, 'iris_112': 3.0, 'iris_113': 2.5, 'iris_114': 2.8, 'iris_115': 3.2, 'iris_116': 3.0, 'iris_117': 3.8, 'iris_118': 2.6, 'iris_119': 2.2, 'iris_120': 3.2, 'iris_121': 2.8, 'iris_122': 2.8, 'iris_123': 2.7, 'iris_124': 3.3, 'iris_125': 3.2, 'iris_126': 2.8, 'iris_127': 3.0, 'iris_128': 2.8, 'iris_129': 3.0, 'iris_130': 2.8, 'iris_131': 3.8, 'iris_132': 2.8, 'iris_133': 2.8, 'iris_134': 2.6, 'iris_135': 3.0, 'iris_136': 3.4, 'iris_137': 3.1, 'iris_138': 3.0, 'iris_139': 3.1, 'iris_140': 3.1, 'iris_141': 3.1, 'iris_142': 2.7, 'iris_143': 3.2, 'iris_144': 3.3, 'iris_145': 3.0, 'iris_146': 2.5, 'iris_147': 3.0, 'iris_148': 3.4, 'iris_149': 3.0}, 'petal_length': {'iris_0': 1.4, 'iris_1': 1.4, 'iris_2': 1.3, 'iris_3': 1.5, 'iris_4': 1.4, 'iris_5': 1.7, 'iris_6': 1.4, 'iris_7': 1.5, 'iris_8': 1.4, 'iris_9': 1.5, 'iris_10': 1.5, 'iris_11': 1.6, 'iris_12': 1.4, 'iris_13': 1.1, 'iris_14': 1.2, 'iris_15': 1.5, 'iris_16': 1.3, 'iris_17': 1.4, 'iris_18': 1.7, 'iris_19': 1.5, 'iris_20': 1.7, 'iris_21': 1.5, 'iris_22': 1.0, 'iris_23': 1.7, 'iris_24': 1.9, 'iris_25': 1.6, 'iris_26': 1.6, 'iris_27': 1.5, 'iris_28': 1.4, 'iris_29': 1.6, 'iris_30': 1.6, 'iris_31': 1.5, 'iris_32': 1.5, 'iris_33': 1.4, 'iris_34': 1.5, 'iris_35': 1.2, 'iris_36': 1.3, 'iris_37': 1.4, 'iris_38': 1.3, 'iris_39': 1.5, 'iris_40': 1.3, 'iris_41': 1.3, 'iris_42': 1.3, 'iris_43': 1.6, 'iris_44': 1.9, 'iris_45': 1.4, 'iris_46': 1.6, 'iris_47': 1.4, 'iris_48': 1.5, 'iris_49': 1.4, 'iris_50': 4.7, 'iris_51': 4.5, 'iris_52': 4.9, 'iris_53': 4.0, 'iris_54': 4.6, 'iris_55': 4.5, 'iris_56': 4.7, 'iris_57': 3.3, 'iris_58': 4.6, 'iris_59': 3.9, 'iris_60': 3.5, 'iris_61': 4.2, 'iris_62': 4.0, 'iris_63': 4.7, 'iris_64': 3.6, 'iris_65': 4.4, 'iris_66': 4.5, 'iris_67': 4.1, 'iris_68': 4.5, 'iris_69': 3.9, 'iris_70': 4.8, 'iris_71': 4.0, 'iris_72': 4.9, 'iris_73': 4.7, 'iris_74': 4.3, 'iris_75': 4.4, 'iris_76': 4.8, 'iris_77': 5.0, 'iris_78': 4.5, 'iris_79': 3.5, 'iris_80': 3.8, 'iris_81': 3.7, 'iris_82': 3.9, 'iris_83': 5.1, 'iris_84': 4.5, 'iris_85': 4.5, 'iris_86': 4.7, 'iris_87': 4.4, 'iris_88': 4.1, 'iris_89': 4.0, 'iris_90': 4.4, 'iris_91': 4.6, 'iris_92': 4.0, 'iris_93': 3.3, 'iris_94': 4.2, 'iris_95': 4.2, 'iris_96': 4.2, 'iris_97': 4.3, 'iris_98': 3.0, 'iris_99': 4.1, 'iris_100': 6.0, 'iris_101': 5.1, 'iris_102': 5.9, 'iris_103': 5.6, 'iris_104': 5.8, 'iris_105': 6.6, 'iris_106': 4.5, 'iris_107': 6.3, 'iris_108': 5.8, 'iris_109': 6.1, 'iris_110': 5.1, 'iris_111': 5.3, 'iris_112': 5.5, 'iris_113': 5.0, 'iris_114': 5.1, 'iris_115': 5.3, 'iris_116': 5.5, 'iris_117': 6.7, 'iris_118': 6.9, 'iris_119': 5.0, 'iris_120': 5.7, 'iris_121': 4.9, 'iris_122': 6.7, 'iris_123': 4.9, 'iris_124': 5.7, 'iris_125': 6.0, 'iris_126': 4.8, 'iris_127': 4.9, 'iris_128': 5.6, 'iris_129': 5.8, 'iris_130': 6.1, 'iris_131': 6.4, 'iris_132': 5.6, 'iris_133': 5.1, 'iris_134': 5.6, 'iris_135': 6.1, 'iris_136': 5.6, 'iris_137': 5.5, 'iris_138': 4.8, 'iris_139': 5.4, 'iris_140': 5.6, 'iris_141': 5.1, 'iris_142': 5.1, 'iris_143': 5.9, 'iris_144': 5.7, 'iris_145': 5.2, 'iris_146': 5.0, 'iris_147': 5.2, 'iris_148': 5.4, 'iris_149': 5.1}, 'petal_width': {'iris_0': 0.2, 'iris_1': 0.2, 'iris_2': 0.2, 'iris_3': 0.2, 'iris_4': 0.2, 'iris_5': 0.4, 'iris_6': 0.3, 'iris_7': 0.2, 'iris_8': 0.2, 'iris_9': 0.1, 'iris_10': 0.2, 'iris_11': 0.2, 'iris_12': 0.1, 'iris_13': 0.1, 'iris_14': 0.2, 'iris_15': 0.4, 'iris_16': 0.4, 'iris_17': 0.3, 'iris_18': 0.3, 'iris_19': 0.3, 'iris_20': 0.2, 'iris_21': 0.4, 'iris_22': 0.2, 'iris_23': 0.5, 'iris_24': 0.2, 'iris_25': 0.2, 'iris_26': 0.4, 'iris_27': 0.2, 'iris_28': 0.2, 'iris_29': 0.2, 'iris_30': 0.2, 'iris_31': 0.4, 'iris_32': 0.1, 'iris_33': 0.2, 'iris_34': 0.2, 'iris_35': 0.2, 'iris_36': 0.2, 'iris_37': 0.1, 'iris_38': 0.2, 'iris_39': 0.2, 'iris_40': 0.3, 'iris_41': 0.3, 'iris_42': 0.2, 'iris_43': 0.6, 'iris_44': 0.4, 'iris_45': 0.3, 'iris_46': 0.2, 'iris_47': 0.2, 'iris_48': 0.2, 'iris_49': 0.2, 'iris_50': 1.4, 'iris_51': 1.5, 'iris_52': 1.5, 'iris_53': 1.3, 'iris_54': 1.5, 'iris_55': 1.3, 'iris_56': 1.6, 'iris_57': 1.0, 'iris_58': 1.3, 'iris_59': 1.4, 'iris_60': 1.0, 'iris_61': 1.5, 'iris_62': 1.0, 'iris_63': 1.4, 'iris_64': 1.3, 'iris_65': 1.4, 'iris_66': 1.5, 'iris_67': 1.0, 'iris_68': 1.5, 'iris_69': 1.1, 'iris_70': 1.8, 'iris_71': 1.3, 'iris_72': 1.5, 'iris_73': 1.2, 'iris_74': 1.3, 'iris_75': 1.4, 'iris_76': 1.4, 'iris_77': 1.7, 'iris_78': 1.5, 'iris_79': 1.0, 'iris_80': 1.1, 'iris_81': 1.0, 'iris_82': 1.2, 'iris_83': 1.6, 'iris_84': 1.5, 'iris_85': 1.6, 'iris_86': 1.5, 'iris_87': 1.3, 'iris_88': 1.3, 'iris_89': 1.3, 'iris_90': 1.2, 'iris_91': 1.4, 'iris_92': 1.2, 'iris_93': 1.0, 'iris_94': 1.3, 'iris_95': 1.2, 'iris_96': 1.3, 'iris_97': 1.3, 'iris_98': 1.1, 'iris_99': 1.3, 'iris_100': 2.5, 'iris_101': 1.9, 'iris_102': 2.1, 'iris_103': 1.8, 'iris_104': 2.2, 'iris_105': 2.1, 'iris_106': 1.7, 'iris_107': 1.8, 'iris_108': 1.8, 'iris_109': 2.5, 'iris_110': 2.0, 'iris_111': 1.9, 'iris_112': 2.1, 'iris_113': 2.0, 'iris_114': 2.4, 'iris_115': 2.3, 'iris_116': 1.8, 'iris_117': 2.2, 'iris_118': 2.3, 'iris_119': 1.5, 'iris_120': 2.3, 'iris_121': 2.0, 'iris_122': 2.0, 'iris_123': 1.8, 'iris_124': 2.1, 'iris_125': 1.8, 'iris_126': 1.8, 'iris_127': 1.8, 'iris_128': 2.1, 'iris_129': 1.6, 'iris_130': 1.9, 'iris_131': 2.0, 'iris_132': 2.2, 'iris_133': 1.5, 'iris_134': 1.4, 'iris_135': 2.3, 'iris_136': 2.4, 'iris_137': 1.8, 'iris_138': 1.8, 'iris_139': 2.1, 'iris_140': 2.4, 'iris_141': 2.3, 'iris_142': 1.9, 'iris_143': 2.3, 'iris_144': 2.5, 'iris_145': 2.3, 'iris_146': 1.9, 'iris_147': 2.0, 'iris_148': 2.3, 'iris_149': 1.8}})
# Dissimilarity
df_dism = 1 - X_iris.T.corr().abs()
# Convert to R dataframe
rkernel = pandas_to_rpy2(df_dism.values)
# Should I use R() or R[] to convert to dissimilarity object
rdism = R("as.dist")(rkernel)
# rdism = R["as.dist"](rkernel)
# rdism = r_stats.as_dist(rkernel)
# Why does it return a numpy array
print(type(rdism))
# <class 'numpy.ndarray'>
# Load in fastcluster package
fastcluster.hclust(rdism)
# Regular HCLUST doesn't even work
# R.hclust(rdism)
# R[write to console]: Error in (function (d, method = "complete", members = NULL) :
# 'N' must be a single integer.
# ---------------------------------------------------------------------------
# RRuntimeError Traceback (most recent call last)
# <ipython-input-93-487d69feeea3> in <module>
# 21 # rdism = r_stats.as_dist(rkernel)
# 22
# ---> 23 R.hclust(rdism)
# 24
# 25
# ~/anaconda3/envs/soothsayer2_env/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
# 196 kwargs[r_k] = v
# 197 return (super(SignatureTranslatedFunction, self)
# --> 198 .__call__(*args, **kwargs))
# 199
# 200
# ~/anaconda3/envs/soothsayer2_env/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
# 123 else:
# 124 new_kwargs[k] = conversion.py2rpy(v)
# --> 125 res = super(Function, self).__call__(*new_args, **new_kwargs)
# 126 res = conversion.rpy2py(res)
# 127 return res
# ~/anaconda3/envs/soothsayer2_env/lib/python3.6/site-packages/rpy2/rinterface_lib/conversion.py in _(*args, **kwargs)
# 42 def _cdata_res_to_rinterface(function):
# 43 def _(*args, **kwargs):
# ---> 44 cdata = function(*args, **kwargs)
# 45 # TODO: test cdata is of the expected CType
# 46 return _cdata_to_rinterface(cdata)
# ~/anaconda3/envs/soothsayer2_env/lib/python3.6/site-packages/rpy2/rinterface.py in __call__(self, *args, **kwargs)
# 619 error_occured))
# 620 if error_occured[0]:
# --> 621 raise embedded.RRuntimeError(_rinterface._geterrmessage())
# 622 return res
# 623
# RRuntimeError: Error in (function (d, method = "complete", members = NULL) :
# 'N' must be a single integer.
post hoc
Please refer to this https://github.com/rpy2/rpy2/issues/690
When you use pandas2ri.activate() on v3.x it forces np.ndarray from the R["as.dist"] function but in v2.x it doesn't.
You probably want to use a data frame rather than a matrix (as your comments indicate), so you should use df_dism rather than df_dism.values (which gives a numpy array).
Further, to use pandas conversion you need to adjust the conversion functions to work with rpy2 3.x:
from rpy2.robjects.conversion import localconverter
from rpy2.robjects import pandas2ri
def pandas_to_rpy2(df):
with localconverter(ro.default_converter + pandas2ri.converter):
return ro.conversion.py2rpy(df)
def rpy2_to_pandas(r_df):
with localconverter(ro.default_converter + pandas2ri.converter):
return ro.conversion.rpy2py(r_df)
rkernel = pandas_to_rpy2(df_dism)
then type(rdism) returns <class 'rpy2.robjects.vectors.FloatVector'> and both: fastcluster.hclust(rdism) and R.hclust(rdism) work properly.
Tested with:
R 3.6.3
rpy2 3.3.1 and 3.3.2
Python 3.8.1
fastcluster 1.1.25
pandas 1.0.3
Regarding your first question, the R["as.dist"] way would be preferred for getting objects (and as functions are object it works ok in this case), while R("as.dist") would be preferred for evaluating R code (as.dist evaluates to a function objects which is returned, so the end result is the same). Have a look at the introduction section of the documentation, where both are used with pi object.
As no evaluation is needed, R["as.dist"] would be faster:
%timeit R("as.dist")
1.75 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit R["as.dist"]
473 µs ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
To highlight the differences in implementation R["as.dist"] would execute:
def __getitem__(self, item):
res = _globalenv.find(item)
res = conversion.rpy2py(res)
if hasattr(res, '__rname__'):
res.__rname__ = item
return res
while R("as.dist"):
def __call__(self, string):
p = rinterface.parse(string)
res = self.eval(p)
return conversion.rpy2py(res)
In summary, I would go with R["as.dist"] in this case as it is slightly faster and better reflects the intent.

How can I use broadcasting with NumPy to speed up this correlation calculation?

I'm trying to take advantage of NumPy broadcasting and backend array computations to significantly speed up this function. Unfortunately, it doesn't scale so well so I'm hoping to greatly improve the performance of this. Right now the code isn't properly utilizing broadcasting for the computations.
I'm using WGCNA's bicor function as a gold standard as this is the fastest implementation I know of at the moment. The Python version outputs the same results as the R function.
# ==============================================================================
# Imports
# ==============================================================================
# Built-ins
import os, sys, time, multiprocessing
# 3rd party
import numpy as np
import pandas as pd
# ==============================================================================
# R Imports
# ==============================================================================
from rpy2 import robjects, rinterface
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
R = robjects.r
NULL = robjects.rinterface.NULL
rinterface.set_writeconsole_regular(None)
WGCNA = importr("WGCNA")
# Python
def _biweight_midcorrelation(a, b):
a_median = np.median(a)
b_median = np.median(b)
# Median absolute deviation
a_mad = np.median(np.abs(a - a_median))
b_mad = np.median(np.abs(b - b_median))
u = (a - a_median) / (9 * a_mad)
v = (b - b_median) / (9 * b_mad)
w_a = np.square(1 - np.square(u)) * ((1 - np.abs(u)) > 0)
w_b = np.square(1 - np.square(v)) * ((1 - np.abs(v)) > 0)
a_item = (a - a_median) * w_a
b_item = (b - b_median) * w_b
return (a_item * b_item).sum() / (
np.sqrt(np.square(a_item).sum()) *
np.sqrt(np.square(b_item).sum()))
def biweight_midcorrelation(X):
return X.corr(method=_biweight_midcorrelation)
# # OLD IMPLEMENTATION
# def biweight_midcorrelation(X):
# median = X.median()
# mad = (X - median).abs().median()
# U = (X - median) / (9 * mad)
# adjacency = np.square(1 - np.square(U)) * ((1 - U.abs()) > 0)
# estimator = (X - median) * adjacency
# bicor_matrix = np.empty((X.shape[1], X.shape[1]), dtype=float)
# for i, ac in enumerate(estimator):
# for j, bc in enumerate(estimator):
# a = estimator[ac]
# b = estimator[bc]
# c = (a * b).sum() / (
# np.sqrt(np.square(a).sum()) * np.sqrt(np.square(b).sum()))
# bicor_matrix[i, j] = c
# bicor_matrix[j, i] = c
# return pd.DataFrame(bicor_matrix, index=X.columns, columns=X.columns)
# R
def biweight_midcorrelation_r_wrapper(X, n_jobs=-1, r_package=None):
"""
WGCNA: bicor
function (x, y = NULL, robustX = TRUE, robustY = TRUE, use = "all.obs",
maxPOutliers = 1, qu <...> dian absolute deviation, or zero variance."))
"""
if r_package is None:
r_package = importr("WGCNA")
if n_jobs == -1:
n_jobs = multiprocessing.cpu_count()
labels = X.columns
r_df_sim = r_package.bicor(pandas2ri.py2ri(X), nThreads=n_jobs)
df_bicor = pd.DataFrame(pandas2ri.ri2py(r_df_sim), index=labels, columns=labels)
return df_bicor
# X.shape = (150,4)
X = pd.DataFrame({'sepal_length': {'iris_0': 5.1, 'iris_1': 4.9, 'iris_2': 4.7, 'iris_3': 4.6, 'iris_4': 5.0, 'iris_5': 5.4, 'iris_6': 4.6, 'iris_7': 5.0, 'iris_8': 4.4, 'iris_9': 4.9, 'iris_10': 5.4, 'iris_11': 4.8, 'iris_12': 4.8, 'iris_13': 4.3, 'iris_14': 5.8, 'iris_15': 5.7, 'iris_16': 5.4, 'iris_17': 5.1, 'iris_18': 5.7, 'iris_19': 5.1, 'iris_20': 5.4, 'iris_21': 5.1, 'iris_22': 4.6, 'iris_23': 5.1, 'iris_24': 4.8, 'iris_25': 5.0, 'iris_26': 5.0, 'iris_27': 5.2, 'iris_28': 5.2, 'iris_29': 4.7, 'iris_30': 4.8, 'iris_31': 5.4, 'iris_32': 5.2, 'iris_33': 5.5, 'iris_34': 4.9, 'iris_35': 5.0, 'iris_36': 5.5, 'iris_37': 4.9, 'iris_38': 4.4, 'iris_39': 5.1, 'iris_40': 5.0, 'iris_41': 4.5, 'iris_42': 4.4, 'iris_43': 5.0, 'iris_44': 5.1, 'iris_45': 4.8, 'iris_46': 5.1, 'iris_47': 4.6, 'iris_48': 5.3, 'iris_49': 5.0, 'iris_50': 7.0, 'iris_51': 6.4, 'iris_52': 6.9, 'iris_53': 5.5, 'iris_54': 6.5, 'iris_55': 5.7, 'iris_56': 6.3, 'iris_57': 4.9, 'iris_58': 6.6, 'iris_59': 5.2, 'iris_60': 5.0, 'iris_61': 5.9, 'iris_62': 6.0, 'iris_63': 6.1, 'iris_64': 5.6, 'iris_65': 6.7, 'iris_66': 5.6, 'iris_67': 5.8, 'iris_68': 6.2, 'iris_69': 5.6, 'iris_70': 5.9, 'iris_71': 6.1, 'iris_72': 6.3, 'iris_73': 6.1, 'iris_74': 6.4, 'iris_75': 6.6, 'iris_76': 6.8, 'iris_77': 6.7, 'iris_78': 6.0, 'iris_79': 5.7, 'iris_80': 5.5, 'iris_81': 5.5, 'iris_82': 5.8, 'iris_83': 6.0, 'iris_84': 5.4, 'iris_85': 6.0, 'iris_86': 6.7, 'iris_87': 6.3, 'iris_88': 5.6, 'iris_89': 5.5, 'iris_90': 5.5, 'iris_91': 6.1, 'iris_92': 5.8, 'iris_93': 5.0, 'iris_94': 5.6, 'iris_95': 5.7, 'iris_96': 5.7, 'iris_97': 6.2, 'iris_98': 5.1, 'iris_99': 5.7, 'iris_100': 6.3, 'iris_101': 5.8, 'iris_102': 7.1, 'iris_103': 6.3, 'iris_104': 6.5, 'iris_105': 7.6, 'iris_106': 4.9, 'iris_107': 7.3, 'iris_108': 6.7, 'iris_109': 7.2, 'iris_110': 6.5, 'iris_111': 6.4, 'iris_112': 6.8, 'iris_113': 5.7, 'iris_114': 5.8, 'iris_115': 6.4, 'iris_116': 6.5, 'iris_117': 7.7, 'iris_118': 7.7, 'iris_119': 6.0, 'iris_120': 6.9, 'iris_121': 5.6, 'iris_122': 7.7, 'iris_123': 6.3, 'iris_124': 6.7, 'iris_125': 7.2, 'iris_126': 6.2, 'iris_127': 6.1, 'iris_128': 6.4, 'iris_129': 7.2, 'iris_130': 7.4, 'iris_131': 7.9, 'iris_132': 6.4, 'iris_133': 6.3, 'iris_134': 6.1, 'iris_135': 7.7, 'iris_136': 6.3, 'iris_137': 6.4, 'iris_138': 6.0, 'iris_139': 6.9, 'iris_140': 6.7, 'iris_141': 6.9, 'iris_142': 5.8, 'iris_143': 6.8, 'iris_144': 6.7, 'iris_145': 6.7, 'iris_146': 6.3, 'iris_147': 6.5, 'iris_148': 6.2, 'iris_149': 5.9}, 'sepal_width': {'iris_0': 3.5, 'iris_1': 3.0, 'iris_2': 3.2, 'iris_3': 3.1, 'iris_4': 3.6, 'iris_5': 3.9, 'iris_6': 3.4, 'iris_7': 3.4, 'iris_8': 2.9, 'iris_9': 3.1, 'iris_10': 3.7, 'iris_11': 3.4, 'iris_12': 3.0, 'iris_13': 3.0, 'iris_14': 4.0, 'iris_15': 4.4, 'iris_16': 3.9, 'iris_17': 3.5, 'iris_18': 3.8, 'iris_19': 3.8, 'iris_20': 3.4, 'iris_21': 3.7, 'iris_22': 3.6, 'iris_23': 3.3, 'iris_24': 3.4, 'iris_25': 3.0, 'iris_26': 3.4, 'iris_27': 3.5, 'iris_28': 3.4, 'iris_29': 3.2, 'iris_30': 3.1, 'iris_31': 3.4, 'iris_32': 4.1, 'iris_33': 4.2, 'iris_34': 3.1, 'iris_35': 3.2, 'iris_36': 3.5, 'iris_37': 3.6, 'iris_38': 3.0, 'iris_39': 3.4, 'iris_40': 3.5, 'iris_41': 2.3, 'iris_42': 3.2, 'iris_43': 3.5, 'iris_44': 3.8, 'iris_45': 3.0, 'iris_46': 3.8, 'iris_47': 3.2, 'iris_48': 3.7, 'iris_49': 3.3, 'iris_50': 3.2, 'iris_51': 3.2, 'iris_52': 3.1, 'iris_53': 2.3, 'iris_54': 2.8, 'iris_55': 2.8, 'iris_56': 3.3, 'iris_57': 2.4, 'iris_58': 2.9, 'iris_59': 2.7, 'iris_60': 2.0, 'iris_61': 3.0, 'iris_62': 2.2, 'iris_63': 2.9, 'iris_64': 2.9, 'iris_65': 3.1, 'iris_66': 3.0, 'iris_67': 2.7, 'iris_68': 2.2, 'iris_69': 2.5, 'iris_70': 3.2, 'iris_71': 2.8, 'iris_72': 2.5, 'iris_73': 2.8, 'iris_74': 2.9, 'iris_75': 3.0, 'iris_76': 2.8, 'iris_77': 3.0, 'iris_78': 2.9, 'iris_79': 2.6, 'iris_80': 2.4, 'iris_81': 2.4, 'iris_82': 2.7, 'iris_83': 2.7, 'iris_84': 3.0, 'iris_85': 3.4, 'iris_86': 3.1, 'iris_87': 2.3, 'iris_88': 3.0, 'iris_89': 2.5, 'iris_90': 2.6, 'iris_91': 3.0, 'iris_92': 2.6, 'iris_93': 2.3, 'iris_94': 2.7, 'iris_95': 3.0, 'iris_96': 2.9, 'iris_97': 2.9, 'iris_98': 2.5, 'iris_99': 2.8, 'iris_100': 3.3, 'iris_101': 2.7, 'iris_102': 3.0, 'iris_103': 2.9, 'iris_104': 3.0, 'iris_105': 3.0, 'iris_106': 2.5, 'iris_107': 2.9, 'iris_108': 2.5, 'iris_109': 3.6, 'iris_110': 3.2, 'iris_111': 2.7, 'iris_112': 3.0, 'iris_113': 2.5, 'iris_114': 2.8, 'iris_115': 3.2, 'iris_116': 3.0, 'iris_117': 3.8, 'iris_118': 2.6, 'iris_119': 2.2, 'iris_120': 3.2, 'iris_121': 2.8, 'iris_122': 2.8, 'iris_123': 2.7, 'iris_124': 3.3, 'iris_125': 3.2, 'iris_126': 2.8, 'iris_127': 3.0, 'iris_128': 2.8, 'iris_129': 3.0, 'iris_130': 2.8, 'iris_131': 3.8, 'iris_132': 2.8, 'iris_133': 2.8, 'iris_134': 2.6, 'iris_135': 3.0, 'iris_136': 3.4, 'iris_137': 3.1, 'iris_138': 3.0, 'iris_139': 3.1, 'iris_140': 3.1, 'iris_141': 3.1, 'iris_142': 2.7, 'iris_143': 3.2, 'iris_144': 3.3, 'iris_145': 3.0, 'iris_146': 2.5, 'iris_147': 3.0, 'iris_148': 3.4, 'iris_149': 3.0}, 'petal_length': {'iris_0': 1.4, 'iris_1': 1.4, 'iris_2': 1.3, 'iris_3': 1.5, 'iris_4': 1.4, 'iris_5': 1.7, 'iris_6': 1.4, 'iris_7': 1.5, 'iris_8': 1.4, 'iris_9': 1.5, 'iris_10': 1.5, 'iris_11': 1.6, 'iris_12': 1.4, 'iris_13': 1.1, 'iris_14': 1.2, 'iris_15': 1.5, 'iris_16': 1.3, 'iris_17': 1.4, 'iris_18': 1.7, 'iris_19': 1.5, 'iris_20': 1.7, 'iris_21': 1.5, 'iris_22': 1.0, 'iris_23': 1.7, 'iris_24': 1.9, 'iris_25': 1.6, 'iris_26': 1.6, 'iris_27': 1.5, 'iris_28': 1.4, 'iris_29': 1.6, 'iris_30': 1.6, 'iris_31': 1.5, 'iris_32': 1.5, 'iris_33': 1.4, 'iris_34': 1.5, 'iris_35': 1.2, 'iris_36': 1.3, 'iris_37': 1.4, 'iris_38': 1.3, 'iris_39': 1.5, 'iris_40': 1.3, 'iris_41': 1.3, 'iris_42': 1.3, 'iris_43': 1.6, 'iris_44': 1.9, 'iris_45': 1.4, 'iris_46': 1.6, 'iris_47': 1.4, 'iris_48': 1.5, 'iris_49': 1.4, 'iris_50': 4.7, 'iris_51': 4.5, 'iris_52': 4.9, 'iris_53': 4.0, 'iris_54': 4.6, 'iris_55': 4.5, 'iris_56': 4.7, 'iris_57': 3.3, 'iris_58': 4.6, 'iris_59': 3.9, 'iris_60': 3.5, 'iris_61': 4.2, 'iris_62': 4.0, 'iris_63': 4.7, 'iris_64': 3.6, 'iris_65': 4.4, 'iris_66': 4.5, 'iris_67': 4.1, 'iris_68': 4.5, 'iris_69': 3.9, 'iris_70': 4.8, 'iris_71': 4.0, 'iris_72': 4.9, 'iris_73': 4.7, 'iris_74': 4.3, 'iris_75': 4.4, 'iris_76': 4.8, 'iris_77': 5.0, 'iris_78': 4.5, 'iris_79': 3.5, 'iris_80': 3.8, 'iris_81': 3.7, 'iris_82': 3.9, 'iris_83': 5.1, 'iris_84': 4.5, 'iris_85': 4.5, 'iris_86': 4.7, 'iris_87': 4.4, 'iris_88': 4.1, 'iris_89': 4.0, 'iris_90': 4.4, 'iris_91': 4.6, 'iris_92': 4.0, 'iris_93': 3.3, 'iris_94': 4.2, 'iris_95': 4.2, 'iris_96': 4.2, 'iris_97': 4.3, 'iris_98': 3.0, 'iris_99': 4.1, 'iris_100': 6.0, 'iris_101': 5.1, 'iris_102': 5.9, 'iris_103': 5.6, 'iris_104': 5.8, 'iris_105': 6.6, 'iris_106': 4.5, 'iris_107': 6.3, 'iris_108': 5.8, 'iris_109': 6.1, 'iris_110': 5.1, 'iris_111': 5.3, 'iris_112': 5.5, 'iris_113': 5.0, 'iris_114': 5.1, 'iris_115': 5.3, 'iris_116': 5.5, 'iris_117': 6.7, 'iris_118': 6.9, 'iris_119': 5.0, 'iris_120': 5.7, 'iris_121': 4.9, 'iris_122': 6.7, 'iris_123': 4.9, 'iris_124': 5.7, 'iris_125': 6.0, 'iris_126': 4.8, 'iris_127': 4.9, 'iris_128': 5.6, 'iris_129': 5.8, 'iris_130': 6.1, 'iris_131': 6.4, 'iris_132': 5.6, 'iris_133': 5.1, 'iris_134': 5.6, 'iris_135': 6.1, 'iris_136': 5.6, 'iris_137': 5.5, 'iris_138': 4.8, 'iris_139': 5.4, 'iris_140': 5.6, 'iris_141': 5.1, 'iris_142': 5.1, 'iris_143': 5.9, 'iris_144': 5.7, 'iris_145': 5.2, 'iris_146': 5.0, 'iris_147': 5.2, 'iris_148': 5.4, 'iris_149': 5.1}, 'petal_width': {'iris_0': 0.2, 'iris_1': 0.2, 'iris_2': 0.2, 'iris_3': 0.2, 'iris_4': 0.2, 'iris_5': 0.4, 'iris_6': 0.3, 'iris_7': 0.2, 'iris_8': 0.2, 'iris_9': 0.1, 'iris_10': 0.2, 'iris_11': 0.2, 'iris_12': 0.1, 'iris_13': 0.1, 'iris_14': 0.2, 'iris_15': 0.4, 'iris_16': 0.4, 'iris_17': 0.3, 'iris_18': 0.3, 'iris_19': 0.3, 'iris_20': 0.2, 'iris_21': 0.4, 'iris_22': 0.2, 'iris_23': 0.5, 'iris_24': 0.2, 'iris_25': 0.2, 'iris_26': 0.4, 'iris_27': 0.2, 'iris_28': 0.2, 'iris_29': 0.2, 'iris_30': 0.2, 'iris_31': 0.4, 'iris_32': 0.1, 'iris_33': 0.2, 'iris_34': 0.2, 'iris_35': 0.2, 'iris_36': 0.2, 'iris_37': 0.1, 'iris_38': 0.2, 'iris_39': 0.2, 'iris_40': 0.3, 'iris_41': 0.3, 'iris_42': 0.2, 'iris_43': 0.6, 'iris_44': 0.4, 'iris_45': 0.3, 'iris_46': 0.2, 'iris_47': 0.2, 'iris_48': 0.2, 'iris_49': 0.2, 'iris_50': 1.4, 'iris_51': 1.5, 'iris_52': 1.5, 'iris_53': 1.3, 'iris_54': 1.5, 'iris_55': 1.3, 'iris_56': 1.6, 'iris_57': 1.0, 'iris_58': 1.3, 'iris_59': 1.4, 'iris_60': 1.0, 'iris_61': 1.5, 'iris_62': 1.0, 'iris_63': 1.4, 'iris_64': 1.3, 'iris_65': 1.4, 'iris_66': 1.5, 'iris_67': 1.0, 'iris_68': 1.5, 'iris_69': 1.1, 'iris_70': 1.8, 'iris_71': 1.3, 'iris_72': 1.5, 'iris_73': 1.2, 'iris_74': 1.3, 'iris_75': 1.4, 'iris_76': 1.4, 'iris_77': 1.7, 'iris_78': 1.5, 'iris_79': 1.0, 'iris_80': 1.1, 'iris_81': 1.0, 'iris_82': 1.2, 'iris_83': 1.6, 'iris_84': 1.5, 'iris_85': 1.6, 'iris_86': 1.5, 'iris_87': 1.3, 'iris_88': 1.3, 'iris_89': 1.3, 'iris_90': 1.2, 'iris_91': 1.4, 'iris_92': 1.2, 'iris_93': 1.0, 'iris_94': 1.3, 'iris_95': 1.2, 'iris_96': 1.3, 'iris_97': 1.3, 'iris_98': 1.1, 'iris_99': 1.3, 'iris_100': 2.5, 'iris_101': 1.9, 'iris_102': 2.1, 'iris_103': 1.8, 'iris_104': 2.2, 'iris_105': 2.1, 'iris_106': 1.7, 'iris_107': 1.8, 'iris_108': 1.8, 'iris_109': 2.5, 'iris_110': 2.0, 'iris_111': 1.9, 'iris_112': 2.1, 'iris_113': 2.0, 'iris_114': 2.4, 'iris_115': 2.3, 'iris_116': 1.8, 'iris_117': 2.2, 'iris_118': 2.3, 'iris_119': 1.5, 'iris_120': 2.3, 'iris_121': 2.0, 'iris_122': 2.0, 'iris_123': 1.8, 'iris_124': 2.1, 'iris_125': 1.8, 'iris_126': 1.8, 'iris_127': 1.8, 'iris_128': 2.1, 'iris_129': 1.6, 'iris_130': 1.9, 'iris_131': 2.0, 'iris_132': 2.2, 'iris_133': 1.5, 'iris_134': 1.4, 'iris_135': 2.3, 'iris_136': 2.4, 'iris_137': 1.8, 'iris_138': 1.8, 'iris_139': 2.1, 'iris_140': 2.4, 'iris_141': 2.3, 'iris_142': 1.9, 'iris_143': 2.3, 'iris_144': 2.5, 'iris_145': 2.3, 'iris_146': 1.9, 'iris_147': 2.0, 'iris_148': 2.3, 'iris_149': 1.8}})
# Python computation
start_time = time.time()
df_bicor__python = biweight_midcorrelation(X)
# R computation
df_bicor__r = biweight_midcorrelation_r_wrapper(X)
np.allclose(df_bicor__python, df_bicor__r)
Summary
One could write this computation approx. one order of magnitude faster (for the input you specified) with:
import numpy as np
def biweight_midcorrelation(arr):
n, m = arr.shape
arr = arr - np.median(arr, axis=0, keepdims=True)
v = 1 - (arr / (9 * np.median(np.abs(arr), axis=0, keepdims=True))) ** 2
arr = arr * v ** 2 * (v > 0)
norms = np.sqrt(np.sum(arr ** 2, axis=0))
return np.einsum('mi,mj->ij', arr, arr) / norms[:, None] / norms[None, :]
to be bridged to a Pandas dataframe by:
import pandas as pd
def corr_np2pd(df, func):
return pd.DataFrame(func(np.array(df)), index=df.columns, columns=df.columns)
whose usage is:
corr_df = corr_np2pd(df, biweight_midcorrelation)
This could be made even faster by implementing the last computation with Numba.
Introduction
I am not quite sure why you expect broadcasting to be of help in the current code.
Did you perhaps mean vectorizing?
Anyway, I believe that it is possible to write faster code, and a vectorized version of your "old" approach would outperform your current approach.
This could be made even faster using Numba.
There are two practical approaches to your problem:
to manually compute the correlation matrix
to generate a correlation function to be passed to pd.DataFrame.corr()
When doing (1), an explicit looping may not be avoidable without computing unnecessary parts of the correlation matrix.
When doing (2), it will be necessary to compute the auxiliary value of the computation for each (symmetric) pair of the 1D inputs (2 * comb(n, 2) times), as opposed to computing the auxiliary values only once for each of the 1D inputs (n times). For example, for the input specified in the question, one would need to perform n == 4 pre-computations, but, if done in pairwise fashion, this number becomes 2 * comb(4, 2) == 12.
Let us see how can we push the performances in both cases.
Manually Computing the Correlation Matrix
Let us first define a function to serve as a Pandas-to-NumPy bridge:
import numpy as np
import pandas as pd
def corr_np2pd(df, func):
return pd.DataFrame(func(np.array(df)), index=df.columns, columns=df.columns)
The function with explicit looping that is now in the comments belongs to this category and it is reported below as biweight_midcorrelation_pd_OP():
def biweight_midcorrelation_pd_OP(X):
median = X.median()
mad = (X - median).abs().median()
U = (X - median) / (9 * mad)
adjacency = np.square(1 - np.square(U)) * ((1 - U.abs()) > 0)
estimator = (X - median) * adjacency
bicor_matrix = np.empty((X.shape[1], X.shape[1]), dtype=float)
for i, ac in enumerate(estimator):
for j, bc in enumerate(estimator):
a = estimator[ac]
b = estimator[bc]
c = (a * b).sum() / (
np.sqrt(np.square(a).sum()) * np.sqrt(np.square(b).sum()))
bicor_matrix[i, j] = c
bicor_matrix[j, i] = c
return pd.DataFrame(bicor_matrix, index=X.columns, columns=X.columns)
A slightly modified version of that, where the computation is done entirely in NumPy and which should be used with corr_np2pd(), reads:
def biweight_midcorrelation_OP(arr):
n, m = arr.shape
med = np.median(arr, axis=0, keepdims=True)
mad = np.median(np.abs(arr - med), axis=0, keepdims=True)
u = (arr - med) / (9 * mad)
adj = ((1 - u ** 2) ** 2) * ((1 - np.abs(u)) > 0)
est = (arr - med) * adj
result = np.empty((m, m))
for i in range(m):
for j in range(m):
a = est[:, i]
b = est[:, j]
c = (a * b).sum() / (
np.sqrt(np.sum(a ** 2)) * np.sqrt(np.sum(b ** 2)))
result[i, j] = result[j, i] = c
return result
Now, this has some points of improvement:
the intermediate computations can be reduced
the final nested loop could be made more efficient
This last point could be improved with two ways:
by only computing the symmetric indices once, resulting in biweight_midcorrelation_np()
by writing it in vectorized form, resulting in biweight_midcorrelation_npv()
def biweight_midcorrelation_np(arr):
n, m = arr.shape
arr = arr - np.median(arr, axis=0, keepdims=True)
v = 1 - (arr / (9 * np.median(np.abs(arr), axis=0, keepdims=True))) ** 2
arr = arr * v ** 2 * (v > 0)
norms = np.sqrt(np.sum(arr ** 2, axis=0))
result = np.empty((m, m))
np.fill_diagonal(result, 1.0)
for i, j in zip(*np.triu_indices(m, 1)):
result[i, j] = result[j, i] = \
np.sum(arr[:, i] * arr[:, j]) / norms[i] / norms[j]
return result
def biweight_midcorrelation_npv(arr):
n, m = arr.shape
arr = arr - np.median(arr, axis=0, keepdims=True)
v = 1 - (arr / (9 * np.median(np.abs(arr), axis=0, keepdims=True))) ** 2
arr = arr * v ** 2 * (v > 0)
norms = np.sqrt(np.sum(arr ** 2, axis=0))
return np.einsum('mi,mj->ij', arr, arr) / norms[:, None] / norms[None, :]
The first one will be fast as long as m is small, because of the explicit looping.
The second one will generally be fast, but it seems inefficient to compute some of the entries of the matrix twice.
To overcome both issues, one could rewrite the final loop with Numba:
import numba as nb
#nb.jit
def _biweight_midcorrelation_triu_nb(n, m, est, norms, result):
for i in range(m):
for j in range(i + 1, m):
x = 0
for k in range(n):
x += est[k, i] * est[k, j]
result[i, j] = result[j, i] = x / norms[i] / norms[j]
def biweight_midcorrelation_nb(arr):
n, m = arr.shape
arr = arr - np.median(arr, axis=0, keepdims=True)
v = 1 - (arr / (9 * np.median(np.abs(arr), axis=0, keepdims=True))) ** 2
arr = arr * v ** 2 * (v > 0)
norms = np.sqrt(np.sum(arr ** 2, axis=0))
result = np.empty((m, m))
np.fill_diagonal(result, 1.0)
_biweight_midcorrelation_triu_nb(n, m, arr, norms, result)
return result
Pairwise Correlation Function
A slightly modified version of your now proposed approach belongs to this category:
def pairwise_biweight_midcorrelation_OP(a, b):
a_median = np.median(a)
b_median = np.median(b)
a_mad = np.median(np.abs(a - a_median))
b_mad = np.median(np.abs(b - b_median))
u_a = (a - a_median) / (9 * a_mad)
u_b = (b - b_median) / (9 * b_mad)
adj_a = (1 - u_a ** 2) ** 2 * ((1 - np.abs(u_a)) > 0)
adj_b = (1 - u_b ** 2) ** 2 * ((1 - np.abs(u_b)) > 0)
a = (a - a_median) * adj_a
b = (b - b_median) * adj_b
return np.sum(a * b) / (np.sqrt(np.sum(a ** 2)) * np.sqrt(np.sum(b ** 2)))
This may be written a bit more concisely, using similar simplifications as above, resuling in:
def pairwise_biweight_midcorrelation_opt(a, b):
a = a - np.median(a)
b = b - np.median(b)
v_a = 1 - (a / (9 * np.median(np.abs(a)))) ** 2
v_b = 1 - (b / (9 * np.median(np.abs(b)))) ** 2
a = a * v_a ** 2 * (v_a > 0)
b = b * v_b ** 2 * (v_b > 0)
return np.sum(a * b) / (np.sqrt(np.sum(a ** 2)) * np.sqrt(np.sum(b ** 2)))
The last operation is performing summation over a and b three times, but it could actually be done in a single loop, which could be again made fast with Numba:
#nb.jit
def pairwise_biweight_midcorrelation_nb(a, b):
n = a.size
a = a - np.median(a)
b = b - np.median(b)
v_a = 1 - (a / (9 * np.median(np.abs(a)))) ** 2
v_b = 1 - (b / (9 * np.median(np.abs(b)))) ** 2
a = (v_a > 0) * a * v_a ** 2
b = (v_b > 0) * b * v_b ** 2
s_ab = s_aa = s_bb = 0
for i in range(n):
s_ab += a[i] * b[i]
s_aa += a[i] * a[i]
s_bb += b[i] * b[i]
return s_ab / np.sqrt(s_aa) / np.sqrt(s_bb)
But there is no simple way of avoiding to perform the pre-computations 2 * comb(n, 2) times instead of n times.
The other side of the story is that this class of approaches requires less memory as only two 1D array are considered at each iteration.
Testing
For the suggested input:
import pandas as pd
df = pd.DataFrame({'sepal_length': {'iris_0': 5.1, 'iris_1': 4.9, 'iris_2': 4.7, 'iris_3': 4.6, 'iris_4': 5.0, 'iris_5': 5.4, 'iris_6': 4.6, 'iris_7': 5.0, 'iris_8': 4.4, 'iris_9': 4.9, 'iris_10': 5.4, 'iris_11': 4.8, 'iris_12': 4.8, 'iris_13': 4.3, 'iris_14': 5.8, 'iris_15': 5.7, 'iris_16': 5.4, 'iris_17': 5.1, 'iris_18': 5.7, 'iris_19': 5.1, 'iris_20': 5.4, 'iris_21': 5.1, 'iris_22': 4.6, 'iris_23': 5.1, 'iris_24': 4.8, 'iris_25': 5.0, 'iris_26': 5.0, 'iris_27': 5.2, 'iris_28': 5.2, 'iris_29': 4.7, 'iris_30': 4.8, 'iris_31': 5.4, 'iris_32': 5.2, 'iris_33': 5.5, 'iris_34': 4.9, 'iris_35': 5.0, 'iris_36': 5.5, 'iris_37': 4.9, 'iris_38': 4.4, 'iris_39': 5.1, 'iris_40': 5.0, 'iris_41': 4.5, 'iris_42': 4.4, 'iris_43': 5.0, 'iris_44': 5.1, 'iris_45': 4.8, 'iris_46': 5.1, 'iris_47': 4.6, 'iris_48': 5.3, 'iris_49': 5.0, 'iris_50': 7.0, 'iris_51': 6.4, 'iris_52': 6.9, 'iris_53': 5.5, 'iris_54': 6.5, 'iris_55': 5.7, 'iris_56': 6.3, 'iris_57': 4.9, 'iris_58': 6.6, 'iris_59': 5.2, 'iris_60': 5.0, 'iris_61': 5.9, 'iris_62': 6.0, 'iris_63': 6.1, 'iris_64': 5.6, 'iris_65': 6.7, 'iris_66': 5.6, 'iris_67': 5.8, 'iris_68': 6.2, 'iris_69': 5.6, 'iris_70': 5.9, 'iris_71': 6.1, 'iris_72': 6.3, 'iris_73': 6.1, 'iris_74': 6.4, 'iris_75': 6.6, 'iris_76': 6.8, 'iris_77': 6.7, 'iris_78': 6.0, 'iris_79': 5.7, 'iris_80': 5.5, 'iris_81': 5.5, 'iris_82': 5.8, 'iris_83': 6.0, 'iris_84': 5.4, 'iris_85': 6.0, 'iris_86': 6.7, 'iris_87': 6.3, 'iris_88': 5.6, 'iris_89': 5.5, 'iris_90': 5.5, 'iris_91': 6.1, 'iris_92': 5.8, 'iris_93': 5.0, 'iris_94': 5.6, 'iris_95': 5.7, 'iris_96': 5.7, 'iris_97': 6.2, 'iris_98': 5.1, 'iris_99': 5.7, 'iris_100': 6.3, 'iris_101': 5.8, 'iris_102': 7.1, 'iris_103': 6.3, 'iris_104': 6.5, 'iris_105': 7.6, 'iris_106': 4.9, 'iris_107': 7.3, 'iris_108': 6.7, 'iris_109': 7.2, 'iris_110': 6.5, 'iris_111': 6.4, 'iris_112': 6.8, 'iris_113': 5.7, 'iris_114': 5.8, 'iris_115': 6.4, 'iris_116': 6.5, 'iris_117': 7.7, 'iris_118': 7.7, 'iris_119': 6.0, 'iris_120': 6.9, 'iris_121': 5.6, 'iris_122': 7.7, 'iris_123': 6.3, 'iris_124': 6.7, 'iris_125': 7.2, 'iris_126': 6.2, 'iris_127': 6.1, 'iris_128': 6.4, 'iris_129': 7.2, 'iris_130': 7.4, 'iris_131': 7.9, 'iris_132': 6.4, 'iris_133': 6.3, 'iris_134': 6.1, 'iris_135': 7.7, 'iris_136': 6.3, 'iris_137': 6.4, 'iris_138': 6.0, 'iris_139': 6.9, 'iris_140': 6.7, 'iris_141': 6.9, 'iris_142': 5.8, 'iris_143': 6.8, 'iris_144': 6.7, 'iris_145': 6.7, 'iris_146': 6.3, 'iris_147': 6.5, 'iris_148': 6.2, 'iris_149': 5.9}, 'sepal_width': {'iris_0': 3.5, 'iris_1': 3.0, 'iris_2': 3.2, 'iris_3': 3.1, 'iris_4': 3.6, 'iris_5': 3.9, 'iris_6': 3.4, 'iris_7': 3.4, 'iris_8': 2.9, 'iris_9': 3.1, 'iris_10': 3.7, 'iris_11': 3.4, 'iris_12': 3.0, 'iris_13': 3.0, 'iris_14': 4.0, 'iris_15': 4.4, 'iris_16': 3.9, 'iris_17': 3.5, 'iris_18': 3.8, 'iris_19': 3.8, 'iris_20': 3.4, 'iris_21': 3.7, 'iris_22': 3.6, 'iris_23': 3.3, 'iris_24': 3.4, 'iris_25': 3.0, 'iris_26': 3.4, 'iris_27': 3.5, 'iris_28': 3.4, 'iris_29': 3.2, 'iris_30': 3.1, 'iris_31': 3.4, 'iris_32': 4.1, 'iris_33': 4.2, 'iris_34': 3.1, 'iris_35': 3.2, 'iris_36': 3.5, 'iris_37': 3.6, 'iris_38': 3.0, 'iris_39': 3.4, 'iris_40': 3.5, 'iris_41': 2.3, 'iris_42': 3.2, 'iris_43': 3.5, 'iris_44': 3.8, 'iris_45': 3.0, 'iris_46': 3.8, 'iris_47': 3.2, 'iris_48': 3.7, 'iris_49': 3.3, 'iris_50': 3.2, 'iris_51': 3.2, 'iris_52': 3.1, 'iris_53': 2.3, 'iris_54': 2.8, 'iris_55': 2.8, 'iris_56': 3.3, 'iris_57': 2.4, 'iris_58': 2.9, 'iris_59': 2.7, 'iris_60': 2.0, 'iris_61': 3.0, 'iris_62': 2.2, 'iris_63': 2.9, 'iris_64': 2.9, 'iris_65': 3.1, 'iris_66': 3.0, 'iris_67': 2.7, 'iris_68': 2.2, 'iris_69': 2.5, 'iris_70': 3.2, 'iris_71': 2.8, 'iris_72': 2.5, 'iris_73': 2.8, 'iris_74': 2.9, 'iris_75': 3.0, 'iris_76': 2.8, 'iris_77': 3.0, 'iris_78': 2.9, 'iris_79': 2.6, 'iris_80': 2.4, 'iris_81': 2.4, 'iris_82': 2.7, 'iris_83': 2.7, 'iris_84': 3.0, 'iris_85': 3.4, 'iris_86': 3.1, 'iris_87': 2.3, 'iris_88': 3.0, 'iris_89': 2.5, 'iris_90': 2.6, 'iris_91': 3.0, 'iris_92': 2.6, 'iris_93': 2.3, 'iris_94': 2.7, 'iris_95': 3.0, 'iris_96': 2.9, 'iris_97': 2.9, 'iris_98': 2.5, 'iris_99': 2.8, 'iris_100': 3.3, 'iris_101': 2.7, 'iris_102': 3.0, 'iris_103': 2.9, 'iris_104': 3.0, 'iris_105': 3.0, 'iris_106': 2.5, 'iris_107': 2.9, 'iris_108': 2.5, 'iris_109': 3.6, 'iris_110': 3.2, 'iris_111': 2.7, 'iris_112': 3.0, 'iris_113': 2.5, 'iris_114': 2.8, 'iris_115': 3.2, 'iris_116': 3.0, 'iris_117': 3.8, 'iris_118': 2.6, 'iris_119': 2.2, 'iris_120': 3.2, 'iris_121': 2.8, 'iris_122': 2.8, 'iris_123': 2.7, 'iris_124': 3.3, 'iris_125': 3.2, 'iris_126': 2.8, 'iris_127': 3.0, 'iris_128': 2.8, 'iris_129': 3.0, 'iris_130': 2.8, 'iris_131': 3.8, 'iris_132': 2.8, 'iris_133': 2.8, 'iris_134': 2.6, 'iris_135': 3.0, 'iris_136': 3.4, 'iris_137': 3.1, 'iris_138': 3.0, 'iris_139': 3.1, 'iris_140': 3.1, 'iris_141': 3.1, 'iris_142': 2.7, 'iris_143': 3.2, 'iris_144': 3.3, 'iris_145': 3.0, 'iris_146': 2.5, 'iris_147': 3.0, 'iris_148': 3.4, 'iris_149': 3.0}, 'petal_length': {'iris_0': 1.4, 'iris_1': 1.4, 'iris_2': 1.3, 'iris_3': 1.5, 'iris_4': 1.4, 'iris_5': 1.7, 'iris_6': 1.4, 'iris_7': 1.5, 'iris_8': 1.4, 'iris_9': 1.5, 'iris_10': 1.5, 'iris_11': 1.6, 'iris_12': 1.4, 'iris_13': 1.1, 'iris_14': 1.2, 'iris_15': 1.5, 'iris_16': 1.3, 'iris_17': 1.4, 'iris_18': 1.7, 'iris_19': 1.5, 'iris_20': 1.7, 'iris_21': 1.5, 'iris_22': 1.0, 'iris_23': 1.7, 'iris_24': 1.9, 'iris_25': 1.6, 'iris_26': 1.6, 'iris_27': 1.5, 'iris_28': 1.4, 'iris_29': 1.6, 'iris_30': 1.6, 'iris_31': 1.5, 'iris_32': 1.5, 'iris_33': 1.4, 'iris_34': 1.5, 'iris_35': 1.2, 'iris_36': 1.3, 'iris_37': 1.4, 'iris_38': 1.3, 'iris_39': 1.5, 'iris_40': 1.3, 'iris_41': 1.3, 'iris_42': 1.3, 'iris_43': 1.6, 'iris_44': 1.9, 'iris_45': 1.4, 'iris_46': 1.6, 'iris_47': 1.4, 'iris_48': 1.5, 'iris_49': 1.4, 'iris_50': 4.7, 'iris_51': 4.5, 'iris_52': 4.9, 'iris_53': 4.0, 'iris_54': 4.6, 'iris_55': 4.5, 'iris_56': 4.7, 'iris_57': 3.3, 'iris_58': 4.6, 'iris_59': 3.9, 'iris_60': 3.5, 'iris_61': 4.2, 'iris_62': 4.0, 'iris_63': 4.7, 'iris_64': 3.6, 'iris_65': 4.4, 'iris_66': 4.5, 'iris_67': 4.1, 'iris_68': 4.5, 'iris_69': 3.9, 'iris_70': 4.8, 'iris_71': 4.0, 'iris_72': 4.9, 'iris_73': 4.7, 'iris_74': 4.3, 'iris_75': 4.4, 'iris_76': 4.8, 'iris_77': 5.0, 'iris_78': 4.5, 'iris_79': 3.5, 'iris_80': 3.8, 'iris_81': 3.7, 'iris_82': 3.9, 'iris_83': 5.1, 'iris_84': 4.5, 'iris_85': 4.5, 'iris_86': 4.7, 'iris_87': 4.4, 'iris_88': 4.1, 'iris_89': 4.0, 'iris_90': 4.4, 'iris_91': 4.6, 'iris_92': 4.0, 'iris_93': 3.3, 'iris_94': 4.2, 'iris_95': 4.2, 'iris_96': 4.2, 'iris_97': 4.3, 'iris_98': 3.0, 'iris_99': 4.1, 'iris_100': 6.0, 'iris_101': 5.1, 'iris_102': 5.9, 'iris_103': 5.6, 'iris_104': 5.8, 'iris_105': 6.6, 'iris_106': 4.5, 'iris_107': 6.3, 'iris_108': 5.8, 'iris_109': 6.1, 'iris_110': 5.1, 'iris_111': 5.3, 'iris_112': 5.5, 'iris_113': 5.0, 'iris_114': 5.1, 'iris_115': 5.3, 'iris_116': 5.5, 'iris_117': 6.7, 'iris_118': 6.9, 'iris_119': 5.0, 'iris_120': 5.7, 'iris_121': 4.9, 'iris_122': 6.7, 'iris_123': 4.9, 'iris_124': 5.7, 'iris_125': 6.0, 'iris_126': 4.8, 'iris_127': 4.9, 'iris_128': 5.6, 'iris_129': 5.8, 'iris_130': 6.1, 'iris_131': 6.4, 'iris_132': 5.6, 'iris_133': 5.1, 'iris_134': 5.6, 'iris_135': 6.1, 'iris_136': 5.6, 'iris_137': 5.5, 'iris_138': 4.8, 'iris_139': 5.4, 'iris_140': 5.6, 'iris_141': 5.1, 'iris_142': 5.1, 'iris_143': 5.9, 'iris_144': 5.7, 'iris_145': 5.2, 'iris_146': 5.0, 'iris_147': 5.2, 'iris_148': 5.4, 'iris_149': 5.1}, 'petal_width': {'iris_0': 0.2, 'iris_1': 0.2, 'iris_2': 0.2, 'iris_3': 0.2, 'iris_4': 0.2, 'iris_5': 0.4, 'iris_6': 0.3, 'iris_7': 0.2, 'iris_8': 0.2, 'iris_9': 0.1, 'iris_10': 0.2, 'iris_11': 0.2, 'iris_12': 0.1, 'iris_13': 0.1, 'iris_14': 0.2, 'iris_15': 0.4, 'iris_16': 0.4, 'iris_17': 0.3, 'iris_18': 0.3, 'iris_19': 0.3, 'iris_20': 0.2, 'iris_21': 0.4, 'iris_22': 0.2, 'iris_23': 0.5, 'iris_24': 0.2, 'iris_25': 0.2, 'iris_26': 0.4, 'iris_27': 0.2, 'iris_28': 0.2, 'iris_29': 0.2, 'iris_30': 0.2, 'iris_31': 0.4, 'iris_32': 0.1, 'iris_33': 0.2, 'iris_34': 0.2, 'iris_35': 0.2, 'iris_36': 0.2, 'iris_37': 0.1, 'iris_38': 0.2, 'iris_39': 0.2, 'iris_40': 0.3, 'iris_41': 0.3, 'iris_42': 0.2, 'iris_43': 0.6, 'iris_44': 0.4, 'iris_45': 0.3, 'iris_46': 0.2, 'iris_47': 0.2, 'iris_48': 0.2, 'iris_49': 0.2, 'iris_50': 1.4, 'iris_51': 1.5, 'iris_52': 1.5, 'iris_53': 1.3, 'iris_54': 1.5, 'iris_55': 1.3, 'iris_56': 1.6, 'iris_57': 1.0, 'iris_58': 1.3, 'iris_59': 1.4, 'iris_60': 1.0, 'iris_61': 1.5, 'iris_62': 1.0, 'iris_63': 1.4, 'iris_64': 1.3, 'iris_65': 1.4, 'iris_66': 1.5, 'iris_67': 1.0, 'iris_68': 1.5, 'iris_69': 1.1, 'iris_70': 1.8, 'iris_71': 1.3, 'iris_72': 1.5, 'iris_73': 1.2, 'iris_74': 1.3, 'iris_75': 1.4, 'iris_76': 1.4, 'iris_77': 1.7, 'iris_78': 1.5, 'iris_79': 1.0, 'iris_80': 1.1, 'iris_81': 1.0, 'iris_82': 1.2, 'iris_83': 1.6, 'iris_84': 1.5, 'iris_85': 1.6, 'iris_86': 1.5, 'iris_87': 1.3, 'iris_88': 1.3, 'iris_89': 1.3, 'iris_90': 1.2, 'iris_91': 1.4, 'iris_92': 1.2, 'iris_93': 1.0, 'iris_94': 1.3, 'iris_95': 1.2, 'iris_96': 1.3, 'iris_97': 1.3, 'iris_98': 1.1, 'iris_99': 1.3, 'iris_100': 2.5, 'iris_101': 1.9, 'iris_102': 2.1, 'iris_103': 1.8, 'iris_104': 2.2, 'iris_105': 2.1, 'iris_106': 1.7, 'iris_107': 1.8, 'iris_108': 1.8, 'iris_109': 2.5, 'iris_110': 2.0, 'iris_111': 1.9, 'iris_112': 2.1, 'iris_113': 2.0, 'iris_114': 2.4, 'iris_115': 2.3, 'iris_116': 1.8, 'iris_117': 2.2, 'iris_118': 2.3, 'iris_119': 1.5, 'iris_120': 2.3, 'iris_121': 2.0, 'iris_122': 2.0, 'iris_123': 1.8, 'iris_124': 2.1, 'iris_125': 1.8, 'iris_126': 1.8, 'iris_127': 1.8, 'iris_128': 2.1, 'iris_129': 1.6, 'iris_130': 1.9, 'iris_131': 2.0, 'iris_132': 2.2, 'iris_133': 1.5, 'iris_134': 1.4, 'iris_135': 2.3, 'iris_136': 2.4, 'iris_137': 1.8, 'iris_138': 1.8, 'iris_139': 2.1, 'iris_140': 2.4, 'iris_141': 2.3, 'iris_142': 1.9, 'iris_143': 2.3, 'iris_144': 2.5, 'iris_145': 2.3, 'iris_146': 1.9, 'iris_147': 2.0, 'iris_148': 2.3, 'iris_149': 1.8}})
we obtain:
print(np.all(np.isclose(biweight_midcorrelation_pd_OP(df), result)))
# True
print(np.all(np.isclose(corr_np2pd(df, biweight_midcorrelation_OP), result)))
# True
print(np.all(np.isclose(corr_np2pd(df, biweight_midcorrelation_np), result)))
# True
print(np.all(np.isclose(corr_np2pd(df, biweight_midcorrelation_npv), result)))
# True
print(np.all(np.isclose(corr_np2pd(df, biweight_midcorrelation_nb), result)))
# True
print(np.all(np.isclose(df.corr(method=pairwise_biweight_midcorrelation_OP), result)))
# True
print(np.all(np.isclose(df.corr(method=pairwise_biweight_midcorrelation_opt), result)))
# True
print(np.all(np.isclose(df.corr(method=pairwise_biweight_midcorrelation_nb), result)))
# True
Benchmarks
%timeit biweight_midcorrelation_pd_OP(df)
# 10 loops, best of 3: 22.1 ms per loop
%timeit corr_np2pd(df, biweight_midcorrelation_OP)
# 1000 loops, best of 3: 682 µs per loop
%timeit corr_np2pd(df, biweight_midcorrelation_np)
# 1000 loops, best of 3: 422 µs per loop
%timeit corr_np2pd(df, biweight_midcorrelation_npv)
# 1000 loops, best of 3: 341 µs per loop
%timeit corr_np2pd(df, biweight_midcorrelation_nb)
# 1000 loops, best of 3: 325 µs per loop
%timeit df.corr(method=pairwise_biweight_midcorrelation_OP)
# 100 loops, best of 3: 1.96 ms per loop
%timeit df.corr(method=pairwise_biweight_midcorrelation_opt)
# 100 loops, best of 3: 1.83 ms per loop
%timeit df.corr(method=pairwise_biweight_midcorrelation_nb)
# 1000 loops, best of 3: 506 µs per loop
These results would indicate the Numba-based approach to be the fastest, closely followed by the NumPy-vectorized version of your original approach.
Note that going from a Pandas-based computation to a pure NumPy-based approach (even with explicit looping) we get almost 30x speed factor.
And vectorizing the two for loops buys us another approx. 2x factor.
The pd.DataFrame.corr() based approach(es) are, when not using Numba, approx. 4x slower than your original approach rewritten in NumPy, so be careful even if you do not see explicit looping!
The Numba accelerated pairwise_biweight_midcorrelation_nb() gives a significant boost to this family of approaches, but it cannot possibly avoid the overhead of the pre-computations.
Final warning: all these benchmarks should be taken with a grain of salt!
(EDITED to include a Numba-based approach to use with pd.DataFrame.corr()).
With a copy-n-paste of your X:
In [26]: X
Out[26]:
sepal_length sepal_width petal_length petal_width
iris_0 5.1 3.5 1.4 0.2
iris_1 4.9 3.0 1.4 0.2
iris_2 4.7 3.2 1.3 0.2
iris_3 4.6 3.1 1.5 0.2
iris_4 5.0 3.6 1.4 0.2
... ... ... ... ...
iris_145 6.7 3.0 5.2 2.3
iris_146 6.3 2.5 5.0 1.9
iris_147 6.5 3.0 5.2 2.0
iris_148 6.2 3.4 5.4 2.3
iris_149 5.9 3.0 5.1 1.8
[150 rows x 4 columns]
and using it:
In [29]: X.corr(method=_biweight_midcorrelation)
Out[29]:
sepal_length sepal_width petal_length petal_width
sepal_length 1.000000 -0.134780 0.831958 0.818575
sepal_width -0.134780 1.000000 -0.430312 -0.374034
petal_length 0.831958 -0.430312 1.000000 0.952285
petal_width 0.818575 -0.374034 0.952285 1.000000
In [30]: X.corr?
In [31]: _biweight_midcorrelation(X['sepal_length'],X['sepal_width'])
Out[31]: -0.13477989268659313
In [32]: _biweight_midcorrelation(X['sepal_length'],X['petal_length'])
Out[32]: 0.831958204443503
In _biweight_midcorrelation(a, b), a and b are Series, the same size. So all their derived arrays have the same shape, and (a_item * b_item) works just (by broadcasting - the rules of broadcasting apply to 2 1d arrays). I don't see any need for 'outer products'.

I am unable to access and make some operation in 3D array in Python

df is a 3D array mentioned below and which consists of 3 2D array and I should access last column of 2D array along all rows.
array([[[4.3, 3.0, 1.1, 0.1, 'Setosa'],
[4.4, 3.2, 1.3, 0.2, 'Setosa'],
[4.4, 3.0, 1.3, 0.2, 'Setosa'],
[4.4, 2.9, 1.4, 0.2, 'Setosa'],
[4.5, 2.3, 1.3, 0.3, 'Setosa'],
[4.6, 3.6, 1.0, 0.2, 'Setosa'],
[4.6, 3.1, 1.5, 0.2, 'Setosa'],
[4.6, 3.4, 1.4, 0.3, 'Setosa'],
[4.6, 3.2, 1.4, 0.2, 'Setosa'],
[4.7, 3.2, 1.3, 0.2, 'Setosa'],
[4.7, 3.2, 1.6, 0.2, 'Setosa'],
[4.8, 3.0, 1.4, 0.1, 'Setosa'],
[4.8, 3.0, 1.4, 0.3, 'Setosa'],
[4.8, 3.4, 1.9, 0.2, 'Setosa'],
[4.8, 3.4, 1.6, 0.2, 'Setosa'],
[4.8, 3.1, 1.6, 0.2, 'Setosa'],
[4.9, 2.4, 3.3, 1.0, 'Versicolor'],
[4.9, 2.5, 4.5, 1.7, 'Virginica'],
[4.9, 3.1, 1.5, 0.2, 'Setosa'],
[4.9, 3.1, 1.5, 0.1, 'Setosa'],
[4.9, 3.6, 1.4, 0.1, 'Setosa'],
[4.9, 3.0, 1.4, 0.2, 'Setosa'],
[5.0, 3.5, 1.3, 0.3, 'Setosa'],
[5.0, 3.4, 1.6, 0.4, 'Setosa'],
[5.0, 3.3, 1.4, 0.2, 'Setosa'],
[5.0, 3.2, 1.2, 0.2, 'Setosa'],
[5.0, 3.5, 1.6, 0.6, 'Setosa'],
[5.0, 2.0, 3.5, 1.0, 'Versicolor'],
[5.0, 3.4, 1.5, 0.2, 'Setosa'],
[5.0, 2.3, 3.3, 1.0, 'Versicolor'],
[5.0, 3.6, 1.4, 0.2, 'Setosa'],
[5.0, 3.0, 1.6, 0.2, 'Setosa'],
[5.1, 3.8, 1.9, 0.4, 'Setosa'],
[5.1, 3.8, 1.6, 0.2, 'Setosa'],
[5.1, 2.5, 3.0, 1.1, 'Versicolor'],
[5.1, 3.5, 1.4, 0.2, 'Setosa'],
[5.1, 3.4, 1.5, 0.2, 'Setosa'],
[5.1, 3.5, 1.4, 0.3, 'Setosa'],
[5.1, 3.3, 1.7, 0.5, 'Setosa'],
[5.1, 3.7, 1.5, 0.4, 'Setosa'],
[5.1, 3.8, 1.5, 0.3, 'Setosa'],
[5.2, 4.1, 1.5, 0.1, 'Setosa'],
[5.2, 3.4, 1.4, 0.2, 'Setosa'],
[5.2, 3.5, 1.5, 0.2, 'Setosa'],
[5.2, 2.7, 3.9, 1.4, 'Versicolor'],
[5.3, 3.7, 1.5, 0.2, 'Setosa'],
[5.4, 3.0, 4.5, 1.5, 'Versicolor'],
[5.4, 3.9, 1.7, 0.4, 'Setosa'],
[5.4, 3.4, 1.7, 0.2, 'Setosa'],
[5.4, 3.4, 1.5, 0.4, 'Setosa']],
[[5.4, 3.7, 1.5, 0.2, 'Setosa'],
[5.4, 3.9, 1.3, 0.4, 'Setosa'],
[5.5, 3.5, 1.3, 0.2, 'Setosa'],
[5.5, 2.6, 4.4, 1.2, 'Versicolor'],
[5.5, 4.2, 1.4, 0.2, 'Setosa'],
[5.5, 2.3, 4.0, 1.3, 'Versicolor'],
[5.5, 2.4, 3.7, 1.0, 'Versicolor'],
[5.5, 2.4, 3.8, 1.1, 'Versicolor'],
[5.5, 2.5, 4.0, 1.3, 'Versicolor'],
[5.6, 3.0, 4.1, 1.3, 'Versicolor'],
[5.6, 2.8, 4.9, 2.0, 'Virginica'],
[5.6, 3.0, 4.5, 1.5, 'Versicolor'],
[5.6, 2.5, 3.9, 1.1, 'Versicolor'],
[5.6, 2.7, 4.2, 1.3, 'Versicolor'],
[5.6, 2.9, 3.6, 1.3, 'Versicolor'],
[5.7, 2.6, 3.5, 1.0, 'Versicolor'],
[5.7, 2.9, 4.2, 1.3, 'Versicolor'],
[5.7, 2.8, 4.1, 1.3, 'Versicolor'],
[5.7, 4.4, 1.5, 0.4, 'Setosa'],
[5.7, 2.8, 4.5, 1.3, 'Versicolor'],
[5.7, 2.5, 5.0, 2.0, 'Virginica'],
[5.7, 3.8, 1.7, 0.3, 'Setosa'],
[5.7, 3.0, 4.2, 1.2, 'Versicolor'],
[5.8, 2.7, 4.1, 1.0, 'Versicolor'],
[5.8, 4.0, 1.2, 0.2, 'Setosa'],
[5.8, 2.6, 4.0, 1.2, 'Versicolor'],
[5.8, 2.8, 5.1, 2.4, 'Virginica'],
[5.8, 2.7, 5.1, 1.9, 'Virginica'],
[5.8, 2.7, 3.9, 1.2, 'Versicolor'],
[5.8, 2.7, 5.1, 1.9, 'Virginica'],
[5.9, 3.0, 5.1, 1.8, 'Virginica'],
[5.9, 3.0, 4.2, 1.5, 'Versicolor'],
[5.9, 3.2, 4.8, 1.8, 'Versicolor'],
[6.0, 2.9, 4.5, 1.5, 'Versicolor'],
[6.0, 2.7, 5.1, 1.6, 'Versicolor'],
[6.0, 3.0, 4.8, 1.8, 'Virginica'],
[6.0, 3.4, 4.5, 1.6, 'Versicolor'],
[6.0, 2.2, 4.0, 1.0, 'Versicolor'],
[6.0, 2.2, 5.0, 1.5, 'Virginica'],
[6.1, 3.0, 4.9, 1.8, 'Virginica'],
[6.1, 2.6, 5.6, 1.4, 'Virginica'],
[6.1, 2.8, 4.0, 1.3, 'Versicolor'],
[6.1, 2.9, 4.7, 1.4, 'Versicolor'],
[6.1, 2.8, 4.7, 1.2, 'Versicolor'],
[6.1, 3.0, 4.6, 1.4, 'Versicolor'],
[6.2, 2.2, 4.5, 1.5, 'Versicolor'],
[6.2, 2.9, 4.3, 1.3, 'Versicolor'],
[6.2, 3.4, 5.4, 2.3, 'Virginica'],
[6.2, 2.8, 4.8, 1.8, 'Virginica'],
[6.3, 2.5, 4.9, 1.5, 'Versicolor']],
[[6.3, 2.7, 4.9, 1.8, 'Virginica'],
[6.3, 2.5, 5.0, 1.9, 'Virginica'],
[6.3, 3.3, 4.7, 1.6, 'Versicolor'],
[6.3, 2.8, 5.1, 1.5, 'Virginica'],
[6.3, 3.3, 6.0, 2.5, 'Virginica'],
[6.3, 2.3, 4.4, 1.3, 'Versicolor'],
[6.3, 3.4, 5.6, 2.4, 'Virginica'],
[6.3, 2.9, 5.6, 1.8, 'Virginica'],
[6.4, 2.8, 5.6, 2.2, 'Virginica'],
[6.4, 2.8, 5.6, 2.1, 'Virginica'],
[6.4, 3.1, 5.5, 1.8, 'Virginica'],
[6.4, 3.2, 4.5, 1.5, 'Versicolor'],
[6.4, 3.2, 5.3, 2.3, 'Virginica'],
[6.4, 2.9, 4.3, 1.3, 'Versicolor'],
[6.4, 2.7, 5.3, 1.9, 'Virginica'],
[6.5, 3.0, 5.8, 2.2, 'Virginica'],
[6.5, 3.0, 5.5, 1.8, 'Virginica'],
[6.5, 3.0, 5.2, 2.0, 'Virginica'],
[6.5, 2.8, 4.6, 1.5, 'Versicolor'],
[6.5, 3.2, 5.1, 2.0, 'Virginica'],
[6.6, 2.9, 4.6, 1.3, 'Versicolor'],
[6.6, 3.0, 4.4, 1.4, 'Versicolor'],
[6.7, 3.1, 4.7, 1.5, 'Versicolor'],
[6.7, 3.1, 5.6, 2.4, 'Virginica'],
[6.7, 2.5, 5.8, 1.8, 'Virginica'],
[6.7, 3.0, 5.0, 1.7, 'Versicolor'],
[6.7, 3.1, 4.4, 1.4, 'Versicolor'],
[6.7, 3.3, 5.7, 2.5, 'Virginica'],
[6.7, 3.0, 5.2, 2.3, 'Virginica'],
[6.7, 3.3, 5.7, 2.1, 'Virginica'],
[6.8, 3.2, 5.9, 2.3, 'Virginica'],
[6.8, 2.8, 4.8, 1.4, 'Versicolor'],
[6.8, 3.0, 5.5, 2.1, 'Virginica'],
[6.9, 3.1, 5.4, 2.1, 'Virginica'],
[6.9, 3.1, 5.1, 2.3, 'Virginica'],
[6.9, 3.1, 4.9, 1.5, 'Versicolor'],
[6.9, 3.2, 5.7, 2.3, 'Virginica'],
[7.0, 3.2, 4.7, 1.4, 'Versicolor'],
[7.1, 3.0, 5.9, 2.1, 'Virginica'],
[7.2, 3.0, 5.8, 1.6, 'Virginica'],
[7.2, 3.2, 6.0, 1.8, 'Virginica'],
[7.2, 3.6, 6.1, 2.5, 'Virginica'],
[7.3, 2.9, 6.3, 1.8, 'Virginica'],
[7.4, 2.8, 6.1, 1.9, 'Virginica'],
[7.6, 3.0, 6.6, 2.1, 'Virginica'],
[7.7, 2.8, 6.7, 2.0, 'Virginica'],
[7.7, 2.6, 6.9, 2.3, 'Virginica'],
[7.7, 3.8, 6.7, 2.2, 'Virginica'],
[7.7, 3.0, 6.1, 2.3, 'Virginica'],
[7.9, 3.8, 6.4, 2.0, 'Virginica']]], dtype=object)
I tried:
element = df[g_index[[i],[4]]]
but it returns an error
def Numberofoccurences(data,sortcolindex,g_index):
df = DivideColumns(data,sortcolindex)
num_Iris_setosa = 0
num_Iris_versicolor = 0
num_Iris_virginica = 0
for i in range(0,50):
element = df[g_index[[i],[4]]]
if(element == 'Setosa'):
num_Iris_setosa+=1
elif(element == 'Versicolor'):
num_Iris_versicolor+=1
elif (element == 'Virginica'):
num_Iris_virginica+=1
array1D_for_occ = np.array([num_Iris_virginica,num_Iris_versicolor,num_Iris_setosa])
return array1D_for_occ ;
numberofocc = Numberofoccurences(iris_array,0,0)
The error I get is
TypeError: 'int' object is not subscriptable
'Not subscriptable' means you cannot index into it. So g_index[i] is not allowed.
Maybe you want:
element = df[g_index, i, 4]
Here is the fix on your function. So you do a loop on each element in data[g_index] and to get the last element, you put each[-1] where -1 means last element of the list.
def Numberofoccurences(data,sortcolindex,g_index):
df = DivideColumns(data,sortcolindex)
num_Iris_setosa = 0
num_Iris_versicolor = 0
num_Iris_virginica = 0
for each in df[g_index]:
element = each[-1]
if(element == 'Setosa'):
num_Iris_setosa+=1
elif(element == 'Versicolor'):
num_Iris_versicolor+=1
elif (element == 'Virginica'):
num_Iris_virginica+=1
array1D_for_occ = np.array([num_Iris_virginica,num_Iris_versicolor,num_Iris_setosa])
return array1D_for_occ

python access second element of list

When I print my list I get something like this
[[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
I want to extract first and second elements from above list into separate lists so that I can ask the plt to plot it for me.
So my results should be
[6.0,6.1,6.2 ... 6.8] and [0.5,1.0,1.5,2.0 , ... .4.5]
I want to know if we have a cleaner solution than to
for sublist in l:
i=0
for item in sublist:
flat_list.append(item)
break #get first element of each
You can try list indexing:
data = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
d1 = [item[0] for item in data]
print d1
d2 = [item[1] for item in data]
print d2
output :
[6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8]
[0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]
zip() will provide the required output.
xy = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
x,y = zip(*xy)
print(x)
print(y)
Output:
(6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8)
(0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5)
zip() aggregates the elements from all the iterable. zip(x,y) would provide the list you currently have. zip() with * can be used to unzip a list.
Also, there is no need to convert the tuples to list since pyplot.plot() takes an array-like parameter.
import matplotlib.pyplot as plt
plt.plot(x,y)
plt.show()
I would recommend using numpy arrays. For example:
import matplotlib.pyplot as plt
import numpy as np
a= np.array([[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]])
plt.plot(a[:,0], a[:,1])
plt.show()
Output:
Here a try with zip, zip() will makes iterator that aggregates elements based on the iterables passed, and returns an iterator of tuples, so map() function is used to make the tuples to list :
l = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
a,b = map(list,zip(*l))
print(a,b)
O/P will be like :
[6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8] [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]
One-liner using zip built-in and unpacking
>>> original = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
>>> left, right = zip(*original)
>>> left
(6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8)
>>> right
(0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5)
if you are embarassed that results are tuples we can turn them into lists simply using map built-in:
>>> left, right = map(list, zip(*original))
>>> left
[6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8]
>>> right
[0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]
Lots of pure Python approaches here. But given that your goal is to plot the separated values, I think there's a case to be made here for the simplicity of Pandas - just drop the list as-is into a data frame and plot():
import pandas as pd
pd.DataFrame(data).plot(x=0, y=1)
l = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
a,b=list(zip(*l))
print('first elements:',a)
print('second elements:',a)
To plot:
import matplotlib.pyplot as plt
l = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
a,b=list(zip(*l))
print('first elements:',a)
print('second elements:',a)
plt.plot(a,b)
plt.show()

Categories