Numpy: operands could not be broadcast together with shapes

Numpy: operands could not be broadcast together with shapes - python

I have a numpy array R of dimensions 150x3 and another numpy array D for dimensions150x4.
I am trying to np.dot(R.T, D) but I get
ValueError: operands could not be broadcast together with shapes (3,150) (150,4)
But when I do np.dot(D.T, R), I don't get any error.
What is wrong in np.dot(R.T, D)?
R = [[ 9.61020742e-02 3.46156874e-01 5.57741052e-01]
[ 7.89559849e-03 1.94729924e-01 7.97374478e-01]
[ 9.86036469e-03 3.58806741e-01 6.31332895e-01]
[ 2.48034126e-03 8.04021220e-01 1.93498439e-01]
[ 8.83193916e-02 5.48842033e-01 3.62838576e-01]
[ 3.71353736e-01 1.17560018e-01 5.11086246e-01]
[ 1.06980365e-02 4.27750286e-01 5.61551678e-01]
[ 3.86811475e-02 6.15241737e-01 3.46077116e-01]
[ 8.72297668e-04 6.71562777e-01 3.27564925e-01]
[ 6.89735774e-03 8.56750517e-01 1.36352125e-01]
[ 3.56313831e-01 2.78079828e-01 3.65606341e-01]
[ 7.57943813e-03 9.18418851e-01 7.40017112e-02]
[ 5.41821292e-03 7.30246525e-01 2.64335263e-01]
[ 1.47647182e-03 6.71706805e-01 3.26816723e-01]
[ 8.04498616e-01 3.75237147e-03 1.91749012e-01]
[ 8.97990546e-01 5.24969629e-03 9.67597575e-02]
[ 2.19970730e-01 4.20443727e-03 7.75824833e-01]
[ 6.09849253e-02 7.81150046e-02 8.60900070e-01]
[ 5.82902465e-01 7.96470608e-02 3.37450474e-01]
[ 1.90567056e-01 3.44574089e-01 4.64858855e-01]
[ 1.18054009e-01 5.35847701e-01 3.46098290e-01]
[ 8.64050519e-02 6.25795101e-02 8.51015438e-01]
[ 3.78483444e-02 2.02516101e-01 7.59635554e-01]
[ 8.59779741e-03 1.45064881e-02 9.76895714e-01]
[ 9.99346444e-04 9.94183378e-01 4.81727604e-03]
[ 9.40391340e-03 4.85716808e-01 5.04879279e-01]
[ 2.13243738e-02 8.65263690e-02 8.92149257e-01]
[ 1.15701417e-01 4.32636874e-01 4.51661709e-01]
[ 8.86018157e-02 1.84982960e-01 7.26415224e-01]
[ 3.01781162e-03 9.01584406e-01 9.53977822e-02]
[ 4.56990100e-03 7.91352466e-01 2.04077633e-01]
[ 3.45927029e-02 4.87892600e-03 9.60528371e-01]
[ 1.60632883e-01 8.27044274e-01 1.23228432e-02]
[ 8.35100215e-01 9.16902815e-02 7.32095037e-02]
[ 1.07230994e-02 4.73656742e-01 5.15620159e-01]
[ 1.87299922e-02 4.60373499e-02 9.35232658e-01]
[ 2.00924000e-01 2.51558825e-02 7.73920117e-01]
[ 2.70681415e-02 9.19211663e-01 5.37201953e-02]
[ 1.59735470e-03 5.41196014e-01 4.57206631e-01]
[ 5.95274793e-02 4.87221419e-01 4.53251102e-01]
[ 4.30642592e-02 5.31354413e-02 9.03800300e-01]
[ 4.82644394e-05 6.88366845e-03 9.93068067e-01]
[ 2.68874993e-03 7.18010358e-01 2.79300892e-01]
[ 6.64131338e-03 3.02540302e-03 9.90333284e-01]
[ 7.16077254e-02 7.62597372e-01 1.65794903e-01]
[ 3.26066793e-03 5.55729613e-02 9.41166371e-01]
[ 8.29860613e-02 8.51236805e-01 6.57771335e-02]
[ 4.92325113e-03 7.02327028e-01 2.92749721e-01]
[ 2.68651482e-01 4.07439949e-01 3.23908569e-01]
[ 3.48779651e-02 3.09743232e-01 6.55378803e-01]
[ 4.10371575e-02 3.25115421e-02 9.26451300e-01]
[ 4.23526092e-03 1.57896741e-02 9.79975065e-01]
[ 1.42010306e-02 3.56402209e-02 9.50158749e-01]
[ 1.89270683e-05 1.92636219e-02 9.80717451e-01]
[ 8.41129311e-04 5.24018083e-03 9.93918690e-01]
[ 1.69895846e-04 8.43802871e-01 1.56027233e-01]
[ 3.85831004e-03 3.59151277e-02 9.60226562e-01]
[ 2.66002749e-05 2.11684374e-01 7.88289025e-01]
[ 7.94266474e-03 1.78764870e-01 8.13292465e-01]
[ 2.86578904e-05 2.34398695e-02 9.76531473e-01]
[ 6.82866492e-06 1.83193984e-01 8.16799188e-01]
[ 3.43681400e-04 5.48735560e-03 9.94168963e-01]
[ 2.78629059e-04 2.56309299e-01 7.43412072e-01]
[ 1.12432440e-03 5.03094267e-01 4.95781409e-01]
[ 1.84562605e-04 2.85767672e-03 9.96957761e-01]
[ 8.00369161e-03 6.38415085e-03 9.85612158e-01]
[ 3.04872236e-04 2.93281239e-01 7.06413889e-01]
[ 2.44072328e-04 9.43368816e-01 5.63871119e-02]
[ 1.96883860e-05 9.11261180e-04 9.99069050e-01]
[ 1.98351546e-04 3.20962243e-01 6.78839406e-01]
[ 2.68683180e-04 1.02490410e-02 9.89482276e-01]
[ 6.35566443e-04 6.62457274e-03 9.92739861e-01]
[ 2.52439050e-04 6.96820959e-02 9.30065465e-01]
[ 2.43286411e-04 9.69657837e-01 3.00988762e-02]
[ 3.35900677e-03 3.47740518e-02 9.61866941e-01]
[ 4.11624011e-03 7.18683241e-03 9.88696927e-01]
[ 5.05652417e-03 4.91264721e-02 9.45817004e-01]
[ 1.38735573e-03 3.77829350e-03 9.94834351e-01]
[ 4.77430724e-04 3.62754653e-02 9.63247104e-01]
[ 5.06295723e-04 6.35474249e-02 9.35946279e-01]
[ 9.80128411e-05 1.72085050e-01 8.27816937e-01]
[ 1.46161671e-04 3.57717082e-01 6.42136756e-01]
[ 4.33049391e-04 5.20091202e-02 9.47557830e-01]
[ 1.78321609e-04 4.12044951e-01 5.87776727e-01]
[ 1.46347921e-04 5.31652774e-01 4.68200878e-01]
[ 2.46735826e-03 3.67683627e-02 9.60764279e-01]
[ 6.61311324e-03 1.54005833e-02 9.77986303e-01]
[ 1.77071930e-04 1.46768447e-02 9.85146083e-01]
[ 6.83094720e-04 3.13743642e-01 6.85573264e-01]
[ 5.09993051e-05 4.09115493e-02 9.59037451e-01]
[ 2.85804634e-05 9.48028881e-01 5.19425385e-02]
[ 1.87256703e-03 3.68745243e-01 6.29382190e-01]
[ 3.18003668e-04 8.67846435e-02 9.12897353e-01]
[ 2.15156635e-05 9.92444712e-02 9.00734013e-01]
[ 2.01886324e-04 2.67326809e-01 7.32471305e-01]
[ 6.17534155e-04 8.27995308e-01 1.71387158e-01]
[ 6.02903061e-04 3.23796846e-01 6.75600251e-01]
[ 2.29637394e-03 8.97763923e-02 9.07927234e-01]
[ 1.56780115e-05 1.27044886e-03 9.98713873e-01]
[ 3.76740795e-04 1.12972313e-01 8.86650946e-01]
[ 2.87005949e-05 2.22682411e-04 9.99748617e-01]
[ 1.51773249e-05 5.95561038e-03 9.94029212e-01]
[ 6.30006944e-04 1.06470667e-03 9.98305286e-01]
[ 4.17533071e-04 4.37824954e-01 5.61757513e-01]
[ 7.78017325e-05 1.24821334e-03 9.98673985e-01]
[ 6.86924196e-03 5.69614667e-02 9.36169291e-01]
[ 1.79387456e-06 4.14092395e-02 9.58588967e-01]
[ 4.19616727e-03 7.79044623e-01 2.16759210e-01]
[ 2.61103942e-04 1.25771751e-01 8.73967146e-01]
[ 7.08940723e-04 1.96467049e-05 9.99271413e-01]
[ 2.44579204e-04 1.79856898e-04 9.99575564e-01]
[ 6.42095442e-05 1.90401210e-03 9.98031778e-01]
[ 1.51746207e-04 1.13628244e-04 9.99734626e-01]
[ 1.53192339e-06 2.61257119e-04 9.99737211e-01]
[ 2.97672241e-07 5.89714434e-07 9.99999113e-01]
[ 2.31629596e-05 6.05785563e-06 9.99970779e-01]
[ 1.15482703e-03 1.41118116e-01 8.57727057e-01]
[ 1.65303872e-01 1.95290038e-01 6.39406091e-01]
[ 3.87539407e-04 2.76367328e-03 9.96848787e-01]
[ 3.64304533e-05 2.12756841e-01 7.87206729e-01]
[ 1.32951469e-04 2.07806473e-05 9.99846268e-01]
[ 4.71164398e-06 5.41626002e-04 9.99453662e-01]
[ 7.37701536e-03 2.26568147e-01 7.66054837e-01]
[ 5.19394987e-05 5.08165056e-04 9.99439895e-01]
[ 9.41246091e-04 3.90044573e-03 9.95158308e-01]
[ 1.70565028e-02 5.24482561e-01 4.58460936e-01]
[ 5.79928507e-05 4.85233058e-04 9.99456774e-01]
[ 1.71363177e-04 4.43037286e-03 9.95398264e-01]
[ 3.65223499e-05 9.98234443e-04 9.98965243e-01]
[ 1.03256791e-02 7.82752824e-01 2.06921497e-01]
[ 3.39500386e-03 3.19231731e-02 9.64681823e-01]
[ 4.42402046e-01 1.33305750e-01 4.24292203e-01]
[ 1.50282347e-05 1.46067493e-04 9.99838904e-01]
[ 7.75031897e-04 6.09241229e-01 3.89983739e-01]
[ 2.82386491e-06 9.99311738e-01 6.85438538e-04]
[ 4.49243709e-04 7.25519763e-06 9.99543501e-01]
[ 4.52262007e-05 5.28929786e-05 9.99901881e-01]
[ 1.35577576e-03 2.85828013e-01 7.12816211e-01]
[ 1.15450174e-04 2.87523077e-03 9.97009319e-01]
[ 2.33931095e-04 3.96689006e-05 9.99726400e-01]
[ 1.88516051e-05 2.20935448e-06 9.99978939e-01]
[ 1.95262213e-05 5.10008372e-08 9.99980423e-01]
[ 1.51773249e-05 5.95561038e-03 9.94029212e-01]
[ 1.81083155e-04 2.23844268e-04 9.99595073e-01]
[ 2.70194754e-05 1.79050294e-06 9.99971190e-01]
[ 1.07794371e-05 2.41601142e-07 9.99988979e-01]
[ 9.80962323e-06 8.73547012e-05 9.99902836e-01]
[ 1.12317424e-04 2.11402370e-04 9.99676280e-01]
[ 5.84789388e-05 9.18351123e-05 9.99849686e-01]
[ 1.84899708e-04 7.34680565e-02 9.26347044e-01]]
and
D = [[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]
[ 5. 3.6 1.4 0.2]
[ 5.4 3.9 1.7 0.4]
[ 4.6 3.4 1.4 0.3]
[ 5. 3.4 1.5 0.2]
[ 4.4 2.9 1.4 0.2]
[ 4.9 3.1 1.5 0.1]
[ 5.4 3.7 1.5 0.2]
[ 4.8 3.4 1.6 0.2]
[ 4.8 3. 1.4 0.1]
[ 4.3 3. 1.1 0.1]
[ 5.8 4. 1.2 0.2]
[ 5.7 4.4 1.5 0.4]
[ 5.4 3.9 1.3 0.4]
[ 5.1 3.5 1.4 0.3]
[ 5.7 3.8 1.7 0.3]
[ 5.1 3.8 1.5 0.3]
[ 5.4 3.4 1.7 0.2]
[ 5.1 3.7 1.5 0.4]
[ 4.6 3.6 1. 0.2]
[ 5.1 3.3 1.7 0.5]
[ 4.8 3.4 1.9 0.2]
[ 5. 3. 1.6 0.2]
[ 5. 3.4 1.6 0.4]
[ 5.2 3.5 1.5 0.2]
[ 5.2 3.4 1.4 0.2]
[ 4.7 3.2 1.6 0.2]
[ 4.8 3.1 1.6 0.2]
[ 5.4 3.4 1.5 0.4]
[ 5.2 4.1 1.5 0.1]
[ 5.5 4.2 1.4 0.2]
[ 4.9 3.1 1.5 0.2]
[ 5. 3.2 1.2 0.2]
[ 5.5 3.5 1.3 0.2]
[ 4.9 3.6 1.4 0.1]
[ 4.4 3. 1.3 0.2]
[ 5.1 3.4 1.5 0.2]
[ 5. 3.5 1.3 0.3]
[ 4.5 2.3 1.3 0.3]
[ 4.4 3.2 1.3 0.2]
[ 5. 3.5 1.6 0.6]
[ 5.1 3.8 1.9 0.4]
[ 4.8 3. 1.4 0.3]
[ 5.1 3.8 1.6 0.2]
[ 4.6 3.2 1.4 0.2]
[ 5.3 3.7 1.5 0.2]
[ 5. 3.3 1.4 0.2]
[ 7. 3.2 4.7 1.4]
[ 6.4 3.2 4.5 1.5]
[ 6.9 3.1 4.9 1.5]
[ 5.5 2.3 4. 1.3]
[ 6.5 2.8 4.6 1.5]
[ 5.7 2.8 4.5 1.3]
[ 6.3 3.3 4.7 1.6]
[ 4.9 2.4 3.3 1. ]
[ 6.6 2.9 4.6 1.3]
[ 5.2 2.7 3.9 1.4]
[ 5. 2. 3.5 1. ]
[ 5.9 3. 4.2 1.5]
[ 6. 2.2 4. 1. ]
[ 6.1 2.9 4.7 1.4]
[ 5.6 2.9 3.6 1.3]
[ 6.7 3.1 4.4 1.4]
[ 5.6 3. 4.5 1.5]
[ 5.8 2.7 4.1 1. ]
[ 6.2 2.2 4.5 1.5]
[ 5.6 2.5 3.9 1.1]
[ 5.9 3.2 4.8 1.8]
[ 6.1 2.8 4. 1.3]
[ 6.3 2.5 4.9 1.5]
[ 6.1 2.8 4.7 1.2]
[ 6.4 2.9 4.3 1.3]
[ 6.6 3. 4.4 1.4]
[ 6.8 2.8 4.8 1.4]
[ 6.7 3. 5. 1.7]
[ 6. 2.9 4.5 1.5]
[ 5.7 2.6 3.5 1. ]
[ 5.5 2.4 3.8 1.1]
[ 5.5 2.4 3.7 1. ]
[ 5.8 2.7 3.9 1.2]
[ 6. 2.7 5.1 1.6]
[ 5.4 3. 4.5 1.5]
[ 6. 3.4 4.5 1.6]
[ 6.7 3.1 4.7 1.5]
[ 6.3 2.3 4.4 1.3]
[ 5.6 3. 4.1 1.3]
[ 5.5 2.5 4. 1.3]
[ 5.5 2.6 4.4 1.2]
[ 6.1 3. 4.6 1.4]
[ 5.8 2.6 4. 1.2]
[ 5. 2.3 3.3 1. ]
[ 5.6 2.7 4.2 1.3]
[ 5.7 3. 4.2 1.2]
[ 5.7 2.9 4.2 1.3]
[ 6.2 2.9 4.3 1.3]
[ 5.1 2.5 3. 1.1]
[ 5.7 2.8 4.1 1.3]
[ 6.3 3.3 6. 2.5]
[ 5.8 2.7 5.1 1.9]
[ 7.1 3. 5.9 2.1]
[ 6.3 2.9 5.6 1.8]
[ 6.5 3. 5.8 2.2]
[ 7.6 3. 6.6 2.1]
[ 4.9 2.5 4.5 1.7]
[ 7.3 2.9 6.3 1.8]
[ 6.7 2.5 5.8 1.8]
[ 7.2 3.6 6.1 2.5]
[ 6.5 3.2 5.1 2. ]
[ 6.4 2.7 5.3 1.9]
[ 6.8 3. 5.5 2.1]
[ 5.7 2.5 5. 2. ]
[ 5.8 2.8 5.1 2.4]
[ 6.4 3.2 5.3 2.3]
[ 6.5 3. 5.5 1.8]
[ 7.7 3.8 6.7 2.2]
[ 7.7 2.6 6.9 2.3]
[ 6. 2.2 5. 1.5]
[ 6.9 3.2 5.7 2.3]
[ 5.6 2.8 4.9 2. ]
[ 7.7 2.8 6.7 2. ]
[ 6.3 2.7 4.9 1.8]
[ 6.7 3.3 5.7 2.1]
[ 7.2 3.2 6. 1.8]
[ 6.2 2.8 4.8 1.8]
[ 6.1 3. 4.9 1.8]
[ 6.4 2.8 5.6 2.1]
[ 7.2 3. 5.8 1.6]
[ 7.4 2.8 6.1 1.9]
[ 7.9 3.8 6.4 2. ]
[ 6.4 2.8 5.6 2.2]
[ 6.3 2.8 5.1 1.5]
[ 6.1 2.6 5.6 1.4]
[ 7.7 3. 6.1 2.3]
[ 6.3 3.4 5.6 2.4]
[ 6.4 3.1 5.5 1.8]
[ 6. 3. 4.8 1.8]
[ 6.9 3.1 5.4 2.1]
[ 6.7 3.1 5.6 2.4]
[ 6.9 3.1 5.1 2.3]
[ 5.8 2.7 5.1 1.9]
[ 6.8 3.2 5.9 2.3]
[ 6.7 3.3 5.7 2.5]
[ 6.7 3. 5.2 2.3]
[ 6.3 2.5 5. 1.9]
[ 6.5 3. 5.2 2. ]
[ 6.2 3.4 5.4 2.3]
[ 5.9 3. 5.1 1.8]]

>>> r = np.random.random((150, 3))
>>> d = np.random.random((150, 4))
>>>
>>> np.dot(r.T, d)
array([[ 42.50248324, 36.47470278, 37.01957774, 36.7750468 ],
[ 38.44103843, 32.94992495, 33.91911815, 35.04781215],
[ 44.35562949, 40.94601697, 40.07220766, 40.87044229]])
>>> np.dot(r.T, d).shape
(3, 4)
>>>
>>> np.dot(r.T, d) == np.dot(d.T, r).T
array([[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True]], dtype=bool)
>>>
>>> np.dot(r, d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: matrices are not aligned
>>> np.version.version
'1.7.1'
Works just the same on numpy 1.8.0b2. The only difference is the specific wording of the ValueError message.

Related

How do I mask only the output (labelled data). I don't have any problem in input data

I have so many Nan values in my output data and I padded those values with zeros. Please don't suggest me to delete Nan or impute with any other no. I want model to skip those nan positions.
example:
x = np.arange(0.5, 30)
x.shape = [10, 3]
x = [[ 0.5 1.5 2.5]
[ 3.5 4.5 5.5]
[ 6.5 7.5 8.5]
[ 9.5 10.5 11.5]
[12.5 13.5 14.5]
[15.5 16.5 17.5]
[18.5 19.5 20.5]
[21.5 22.5 23.5]
[24.5 25.5 26.5]
[27.5 28.5 29.5]]
y = np.arange(2, 10, 0.8)
y.shape = [10, 1]
y[4, 0] = 0.0
y[6, 0] = 0.0
y[7, 0] = 0.0
y = [[2. ]
[2.8]
[3.6]
[4.4]
[0. ]
[6. ]
[0. ]
[0. ]
[8.4]
[9.2]]
I expect keras deep learning model to predict zeros for 5th, 7th and 8th row as similar to the padded value in 'y'.

fillna with max value of each group in python

Dataframe
df=pd.DataFrame({"sym":["a","a","aa","aa","aa","a","ab","ab","ab"],
"id_h":[2.1, 2.2 , 2.5 , 3.1 , 2.5, 3.8 , 2.5, 5,6],
"pm_h":[np.nan, 2.3, np.nan , 2.8, 2.7, 3.7, 2.4, 4.9,np.nan]})
want to fill pm_h nan values with max id_h value of each "sys" group i.e. (a, aa, ab)
Required output:
df1=pd.DataFrame({"sym":["a","a","aa","aa","aa","a","ab","ab","ab"],
"id_h":[2.1, 2.2 , 2.5 , 3.1 , 2.5, 3.8 , 2.5, 5,6],
"pm_h":[3.8, 2.3, 3.1 , 2.8, 2.7, 3.7, 2.4, 4.9, 6})

Use Series.fillna with GroupBy.transform by maximal values for new Series with same index like original:
df['pm_h'] = df['pm_h'].fillna(df.groupby('sym')['id_h'].transform('max'))
print (df)
sym id_h pm_h
0 a 2.1 3.8
1 a 2.2 2.3
2 aa 2.5 3.1
3 aa 3.1 2.8
4 aa 2.5 2.7
5 a 3.8 3.7
6 ab 2.5 2.4
7 ab 5.0 4.9
8 ab 6.0 6.0

Vectorized equivalent of dict.get

I'm looking for the functionality that operates like such
lookup_dict = {5:1.0, 12:2.0, 39:2.0...}
# this is the missing magic:
lookup = vectorized_dict(lookup_dict)
x = numpy.array([5.0, 59.39, 39.49...])
xbins = numpy.trunc(x).astype(numpy.int_)
y = lookup.get(xbins, 0.0)
# the idea is that we get this as the postcondition:
for (result, input) in zip(y, xbins):
assert(result==lookup_dict.get(input, 0.0))
Is there some flavor of sparse array in numpy (or scipy) that gets at this kind of functionality?
The full context is that I'm binning some samples of a 1-D feature.

As far as I know, numpy does not support different data types in the same array structures but you can achieve a similar result if you are willing to separate keys from values and maintain the keys (and corresponding values) in sorted order:
import numpy as np
keys = np.array([5,12,39])
values = np.array([1.0, 2.0, 2.0])
valueOf5 = values[keys.searchsorted(5)] # 2.0
k = np.array([5,5,12,39,12])
values[keys.searchsorted(k)] # array([1., 1., 2., 2., 2.])
This may not be as efficient as a hashing key but it does support the propagation of indirections from arrays with any number of dimensions.
note that this assumes your keys are always present in the keys array. If not, rather than an error, you could be getting the value from the next key up.

Using np.select to create boolean masks over the array, ([xbins == k for k in lookup_dict]), the values from the dict (lookup_dict.values()), and a default value of 0:
y = np.select(
[xbins == k for k in lookup_dict],
lookup_dict.values(),
0.0
)
# In [17]: y
# Out[17]: array([1., 0., 2.])
This assumes that the dictionary is sorted, I'm not sure what the behaviour would be below python 3.6.
OR overkill with pandas:
import pandas as pd
s = pd.Series(xbins)
s = s.map(lookup_dict).fillna(0)

Another approach is to use searchsorted to search a numpy array which has the integer 'keys' and returns the initially loaded value in the range n <= x < n+1. This may be useful to somebody asking the a similar question in the future.
import numpy as np
class NpIntDict:
""" Class to simulate a python dict get for a numpy array. """
def __init__( self, dict_in, default = np.nan ):
""" dict_in: a dictionary with integer keys.
default: the value to be returned for keys not in the dictionary.
defaults to np.nan
default must be consistent with the dtype of values
"""
# Create list of dict items sorted by key.
list_in = sorted([ item for item in dict_in.items() ])
# Create three empty lists.
key_list = []
val_list = []
is_def_mask = []
for key, value in list_in:
key = int(key)
if not key in key_list: # key not yet in key list
# Update the three lists for key as default.
key_list.append( key )
val_list.append( default )
is_def_mask.append( True )
# Update the lists for key+1. With searchsorted this gives the required results.
key_list.append( key + 1 )
val_list.append( value )
is_def_mask.append( False )
# Add the key > max(key) to the val and is_def_mask lists.
val_list.append( default )
is_def_mask.append( True )
self.keys = np.array( key_list, dtype = np.int )
self.values = np.array( val_list )
self.default_mask = np.array( is_def_mask )
def set_default( self, default = 0 ):
""" Set the default to a new default value. Using self.default_mask.
Changes the default value for all future self.get(arr).
"""
self.values[ self.default_mask ] = default
def get( self, arr, default = None ):
""" Returns an array looking up the values in `arr` in the dict.
default can be used to change the default value returned for this get only.
"""
if default is None:
values = self.values
else:
values= self.values.copy()
values[ self.default_mask ] = default
return values[ np.searchsorted( self.keys, arr, side = 'right' ) ]
# side = 'right' to ensure key[ix] <= x < key[ix+1]
# side = 'left' would mean key[ix] < x <= key[ix+1]
This could be simplified if there's no requirement to change the default returned after the NpIntDict is created.
To test it.
d = { 2: 5.1, 3: 10.2, 5: 47.1, 8: -6}
# x <2 Return default
# 2 <= x <3 return 5.1
# 3 <= x < 4 return 10.2
# 4 <= x < 5 return default
# 5 <= x < 6 return 47.1
# 6 <= x < 8 return default
# 8 <= x < 9 return -6.
# 9 <= x return default
test = NpIntDict( d, default = 0.0 )
arr = np.arange( 0., 100. ).reshape(10,10)/10
print( arr )
"""
[[0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
[1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9]
[2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9]
[3. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9]
[4. 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9]
[5. 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9]
[6. 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9]
[7. 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9]
[8. 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9]
[9. 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9]]
"""
print( test.get( arr ) )
"""
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.1]
[10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[47.1 47.1 47.1 47.1 47.1 47.1 47.1 47.1 47.1 47.1]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[-6. -6. -6. -6. -6. -6. -6. -6. -6. -6. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]]
"""
This could be amended to raise an exception if any of the arr elements aren't in the key list. For me returning a default would be more useful.

Meshgrid of z values that match x and y meshgrid values

Edit: Original question was flawed but I am leaving it here for reasons of transparency.
Original:
I have some x, y, z data where x and y are coordinates of a 2D grid and z is a scalar value corresponding to (x, y).
>>> import numpy as np
>>> # Dummy example data
>>> x = np.arange(0.0, 5.0, 0.5)
>>> y = np.arange(1.0, 2.0, 0.1)
>>> z = np.sin(x)**2 + np.cos(y)**2
>>> print "x = ", x, "\n", "y = ", y, "\n", "z = ", z
x = [ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
y = [ 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9]
z = [ 0.29192658 0.43559829 0.83937656 1.06655187 0.85571064 0.36317266
0.02076747 0.13964978 0.62437081 1.06008127]
Using xx, yy = np.meshgrid(x, y) I can get two grids containing x and y values corresponding to each grid position.
>>> xx, yy = np.meshgrid(x, y)
>>> print xx
[[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]]
>>> print yy
[[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. ]
[ 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1]
[ 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2]
[ 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3]
[ 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4]
[ 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5]
[ 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6]
[ 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7]
[ 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8]
[ 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9]]
Now I want an array of the same shape for z, where the grid values correspond to the matching x and y values in the original data! But I cannot find an elegant, built-in solution where I do not need to re-grid the data, and I think I am missing some understanding of how I should approach it.
I have tried following this solution (with my real data, not this simple example data, but it should have the same result) but my final grid was not fully populated.
Please help!
Corrected question:
As was pointed out by commenters, my original dummy data was unsuitable for the question I am asking. Here is an improved version of the question:
I have some x, y, z data where x and y are coordinates of a 2D grid and z is a scalar value corresponding to (x, y). The data is read from a text file "data.txt":
#x y z
1.4 0.2 1.93164166734
1.4 0.3 1.88377897779
1.4 0.4 1.81946452501
1.6 0.2 1.9596778849
1.6 0.3 1.91181519535
1.6 0.4 1.84750074257
1.8 0.2 1.90890970517
1.8 0.3 1.86104701562
1.8 0.4 1.79673256284
2.0 0.2 1.78735230743
2.0 0.3 1.73948961789
2.0 0.4 1.67517516511
Loading the text:
>>> import numpy as np
>>> inFile = 'C:\data.txt'
>>> x, y, z = np.loadtxt(inFile, unpack=True, usecols=(0, 1, 2), comments='#', dtype=float)
>>> print x
[ 1.4 1.4 1.4 1.6 1.6 1.6 1.8 1.8 1.8 2. 2. 2. ]
>>> print y
[ 0.2 0.3 0.4 0.2 0.3 0.4 0.2 0.3 0.4 0.2 0.3 0.4]
>>> print z
[ 1.93164167 1.88377898 1.81946453 1.95967788 1.9118152 1.84750074
1.90890971 1.86104702 1.79673256 1.78735231 1.73948962 1.67517517]
Using xx, yy= np.meshgrid(np.unique(x), np.unique(y)) I can get two grids containing x and y values corresponding to each grid position.
>>> xx, yy= np.meshgrid(np.unique(x), np.unique(y))
>>> print xx
[[ 1.4 1.6 1.8 2. ]
[ 1.4 1.6 1.8 2. ]
[ 1.4 1.6 1.8 2. ]]
>>> print yy
[[ 0.2 0.2 0.2 0.2]
[ 0.3 0.3 0.3 0.3]
[ 0.4 0.4 0.4 0.4]]
Now each corresponding cell position in both xx and yy correspond to one of the original grid point locations.
I simply need an equivalent array where the grid values correspond to the matching z values in the original data!
"""e.g.
[[ 1.93164166734 1.9596778849 1.90890970517 1.78735230743]
[ 1.88377897779 1.91181519535 1.86104701562 1.73948961789]
[ 1.81946452501 1.84750074257 1.79673256284 1.67517516511]]"""
But I cannot find an elegant, built-in solution where I do not need to re-grid the data, and I think I am missing some understanding of how I should approach it. For example, using xx, yy, zz = np.meshgrid(x, y, z) returns three 3D arrays that I don't think I can use.
Please help!
Edit:
I managed to make this example work thanks to the solution from Jaime: Fill 2D numpy array from three 1D numpy arrays
>>> x_vals, x_idx = np.unique(x, return_inverse=True)
>>> y_vals, y_idx = np.unique(y, return_inverse=True)
>>> vals_array = np.empty(x_vals.shape + y_vals.shape)
>>> vals_array.fill(np.nan) # or whatever your desired missing data flag is
>>> vals_array[x_idx, y_idx] = z
>>> zz = vals_array.T
>>> print zz
But the code (with real input data) that led me on this path was still failing. I found the problem now. I have been using scipy.ndimage.zoom to resample my gridded data to a higher resolution before generating zz.
>>> import scipy.ndimage
>>> zoom = 2
>>> x = scipy.ndimage.zoom(x, zoom)
>>> y = scipy.ndimage.zoom(y, zoom)
>>> z = scipy.ndimage.zoom(z, zoom)
This produced an array containing many nan entries:
array([[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]])
When I skip the zoom stage, the correct array is produced:
array([[-22365.93400183, -22092.31794674, -22074.21420168, ...,
-14513.89091599, -12311.97437017, -12088.07062786],
[-29264.34039242, -28775.79743097, -29021.31886353, ...,
-21354.6799064 , -21150.76555669, -21046.41225097],
[-39792.93758344, -39253.50249278, -38859.2562673 , ...,
-24253.36838785, -25714.71895023, -29237.74277727],
...,
[ 44829.24733543, 44779.37084337, 44770.32987311, ...,
21041.42652441, 20777.00408692, 20512.58162671],
[ 44067.26616067, 44054.5398901 , 44007.62587598, ...,
21415.90416488, 21151.48168444, 20887.05918082],
[ 43265.35371973, 43332.5983711 , 43332.21743471, ...,
21780.32283309, 21529.39770759, 21278.47255848]])

How to generate a clean x and y axis for a numpy matrix?

I am creating a distance matrix in numpy, with an out put as such:
['H', 'B', 'D', 'A', 'I', 'C', 'F']
[[ 0. 2.4 6.1 3.2 5.2 3.9 7.1]
[ 2.4 0. 4.1 1.2 3.2 1.9 5.1]
[ 6.1 4.1 0. 3.1 6.9 2.8 5.2]
[ 3.2 1.2 3.1 0. 4. 0.9 4.1]
[ 5.2 3.2 6.9 4. 0. 4.7 7.9]
[ 3.9 1.9 2.8 0.9 4.7 0. 3.8]
[ 7.1 5.1 5.2 4.1 7.9 3.8 0. ]]
I am printing that x axis by just printing a list before I print the actual matrix, a:
print" ", names
print a
I need the axis in that order, as the list 'names' properly orders the variables with their value in the matrix. But how would i be able to get a similar y axis in numpy?

It is not so pretty, but this pretty table prints works:
import numpy as np
names=np.array(['H', 'B', 'D', 'A', 'I', 'C', 'F'])
a=np.array([[ 0., 2.4, 6.1, 3.2, 5.2, 3.9, 7.1],
[2.4, 0., 4.1, 1.2, 3.2, 1.9, 5.1],
[6.1, 4.1, 0., 3.1, 6.9, 2.8, 5.2],
[3.2, 1.2, 3.1, 0., 4., 0.9, 4.1],
[5.2, 3.2, 6.9, 4., 0., 4.7, 7.9],
[3.9, 1.9 , 2.8, 0.9, 4.7, 0., 3.8],
[7.1, 5.1, 5.2, 4.1, 7.9, 3.8, 0. ]])
def pptable(x_axis,y_axis,table):
def format_field(field, fmt='{:,.2f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)
def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])
for i,l in enumerate(table):
l.insert(0,y_axis[i])
x_axis.insert(0,' ')
table.insert(0,x_axis)
col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[str(row[0]).ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j])
for j in range(1,len(row))]
print(' '.join(row_tab))
x_axis=['x{}'.format(c) for c in names]
y_axis=['y{}'.format(c) for c in names]
pptable(x_axis,y_axis,a.tolist())
Prints:
xH xB xD xA xI xC xF
yH 0.00 2.40 6.10 3.20 5.20 3.90 7.10
yB 2.40 0.00 4.10 1.20 3.20 1.90 5.10
yD 6.10 4.10 0.00 3.10 6.90 2.80 5.20
yA 3.20 1.20 3.10 0.00 4.00 0.90 4.10
yI 5.20 3.20 6.90 4.00 0.00 4.70 7.90
yC 3.90 1.90 2.80 0.90 4.70 0.00 3.80
yF 7.10 5.10 5.20 4.10 7.90 3.80 0.00
If you want the X and Y axis to be the same, just call it with two lists of the same labels.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: operands could not be broadcast together with shapes - python

Related

How do I mask only the output (labelled data). I don't have any problem in input data

fillna with max value of each group in python

Vectorized equivalent of dict.get

Meshgrid of z values that match x and y meshgrid values

How to generate a clean x and y axis for a numpy matrix?

Categories

Resources