Numpy indexes order - python
I'm confused about how np.zeros() dimensions are handled.
I have a pandas dataframe with some toy data
# A toy 3col x 4row Dataframe
a = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9],[10,11,12]],columns=['colA','colB','colC'])
b = pd.DataFrame([[40,41,42],[43,44,45],[46,47,48],[49,50,51]],columns=['colA','colB','colC'])
colA colB colC
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
I want to get this two dataframes into a 3D numpy array of dimensions (4 rows, 3cols, 2channels), so I can calculate statistics between the two dataframes (eg: average, max values, etc...)
So I basically create a 3D array of zeros and populate each channel with the values from the dataframes.
But it looks like the dimensions are not correctly arranged.
c = np.zeros((4, 3, 2))
c[:,:,0] = a.values
c[:,:,1] = b.values
array([[[ 1., 40.],
[ 2., 41.],
[ 3., 42.]],
[[ 4., 43.],
[ 5., 44.],
[ 6., 45.]],
[[ 7., 46.],
[ 8., 47.],
[ 9., 48.]],
[[10., 49.],
[11., 50.],
[12., 51.]]])
If I put the number of channels as the first index then it is correctly arranged.
However this is very counterintuitive, usually in 3-dimensional data the channel is the third index, not the first.
c = np.zeros((2,4,3))
c[0,:,:] = a.values
c[1,:,:] = b.values
array([[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.]],
[[40., 41., 42.],
[43., 44., 45.],
[46., 47., 48.],
[49., 50., 51.]]])
I don't understand this logic. Why the third dimension (channel) is the first index instead of the last one?
When I calculate the average over the two channels I have to do it using axis=0 which is very confusing. Anybody looking at the code will think it's a columnwise average instead of an average between channels.
Am I doing anything wrong?
Regarding intuition, this is just a typical way of access in most (all?) programming language. Typically, when you do something like:
my_array[a][b][c][d]
which is more common than the Numpy style indexing considering all languages, what you typically mean is:
From my_array, get block a. This is an inner block.
From the previous block, get block b. this is an inner block.
From the previous block, get block c. this is an inner block.
From the previous block, get item d (which happens to not be a block because it's the last dimensions).
The order is always outer most dimensions to inner most dimension. This has nothing two do with images or channels. So in your example, if you expect c[0] to return the channel rather than what you call the row, then that is the intuition. You always put first your outer dimension - just like when you have an image as an array the first dimensions is rows (height) and then columns (width).
This entire conversation is ignoring FORTRAN based array orderings (Matlab uses that for example) where columns is "outer" to rows, by definition. If you came from those languages (to Python and C based orderings row->column), that is a common source of misunderstanding. In this case, intuition just equals what you are used to working with, which is subjective and somewhat arbitrary.
usually i think in 3-dimensional data the channel is the first index, as shown in your second codes. it how it is arranged. so just using it that way
This would be my approach
>>> x = a.values.reshape((a.shape[0], a.shape[1], 1)) # Convert 2D to 3D - One layer
>>> y = b.values.reshape((b.shape[0], b.shape[1], 1)) # Convert 2D to 3D - Second layer
>>> z = np.concatenate((x, y), axis=2) # Concatenate on 3rd(starts from zero) axis
Which would yeild something similar to your array, which is correct.
array([[[ 1, 40],
[ 2, 41],
[ 3, 42]],
[[ 4, 43],
[ 5, 44],
[ 6, 45]],
[[ 7, 46],
[ 8, 47],
[ 9, 48]],
[[10, 49],
[11, 50],
[12, 51]]])
Also, If you wanna visually see the array in dataframe (just for checking)
>>> pd.DataFrame(z.tolist())
0 1 2
0 [1, 40] [2, 41] [3, 42]
1 [4, 43] [5, 44] [6, 45]
2 [7, 46] [8, 47] [9, 48]
3 [10, 49] [11, 50] [12, 51]
Related
graph theory - connect point in 3D space with other three nearest points (distance based)
I want to connect nodes (Atoms) with the three nearest nodes(atoms). I am doing ad1 = np.zeros((31,31)) for i in range(31): dist_i = dist_mat[i] cut_off = sorted(dist_i)[3] ad1[i, np.where((dist_i<=cut_off) & (dist_i!=0.0))] = 1 np.sum(ad1, axis=1) #[3,3,3,3........3] np.sum(ad1, axis=0) #array([3., 3., 2., 2., 2., 4., 2., 5., 2., 3., 2., 3., 3., 2., 3., 6., 3., 5., 3., 4., 2., 3., 2., 4., 3., 3., 2., 4., 3., 2., 3.]) I want np.sum(ad1, axis=0) to be all 3. That would mean all node (atoms) are connected to exactly 3 nearest nodes. As we can see node/atom 5 is connected to 4 other nodes/atoms which is wrong, I want it to be connected to exactly 3 nearest nodes. How do I do it. Below is the distance matrix of 31 atoms (31 X 31). 0.0000,6.8223,7.5588,4.8966,7.2452,2.7778,3.7082,2.7345,7.1540,6.8273,3.6995,7.4136,4.6132,5.8456,2.8037,5.4881,8.1769,2.7361,8.3034,4.9450,4.8225,4.6152,4.8243,9.4876,7.2391,2.9941,7.4180,5.8523,7.6310,5.5996,8.1761 6.8223,0.0000,3.0097,2.8567,2.6647,5.0092,5.8451,6.8037,6.7031,4.8983,7.5806,5.2873,5.5000,7.0038,4.9530,3.9763,7.0263,4.8941,4.7416,6.8450,2.8166,2.9221,5.5502,7.0328,5.5148,7.6318,2.7456,5.1150,2.9654,4.2863,5.3168 7.5588,3.0097,0.0000,2.8679,2.9242,5.1443,6.8372,6.5621,5.5169,3.0135,6.8412,4.6886,4.2507,5.4276,5.8673,4.8804,4.7149,6.5707,2.5568,6.3820,5.0625,4.2533,5.0600,5.2125,2.9262,7.8126,4.6864,5.4224,2.6001,3.2330,4.7116 4.8966,2.8567,2.8679,0.0000,3.8064,3.0221,5.1335,4.1131,5.8284,2.8610,5.1354,3.8488,2.8086,4.9618,3.0028,2.7539,5.7401,4.1214,4.5830,5.4268,2.9346,2.8122,2.9319,6.8084,3.8015,5.8992,3.8516,4.9611,2.9212,3.0441,5.7392 7.2452,2.6647,2.9242,3.8064,0.0000,4.7907,5.1676,7.2899,4.5138,5.5206,7.0249,6.9978,5.3888,5.7435,6.2424,6.0645,5.2615,5.6079,2.8907,5.3589,4.7384,2.8152,6.6866,4.6140,4.7660,6.9472,5.3921,3.2734,4.7050,2.9157,2.6684 2.7778,5.0092,5.1443,3.0221,4.7907,0.0000,2.8352,3.0025,4.6425,5.0210,2.8371,6.4170,2.6958,3.6516,3.1115,4.9489,5.5823,3.0042,5.5496,3.0193,4.1730,2.6916,4.1796,6.7613,4.7913,2.8772,6.4156,3.6500,5.9378,2.8228,5.5754 3.7082,5.8451,6.8372,5.1335,5.1676,2.8352,0.0000,5.4535,5.0179,7.5889,4.7500,8.8240,5.4549,5.4488,4.9281,6.8616,7.0789,2.8247,6.7504,3.1734,4.9441,2.9387,6.8368,7.3498,7.0263,2.7571,7.6165,2.7391,7.8184,4.2915,5.4352 2.7345,6.8037,6.5621,4.1131,7.2899,3.0025,5.4535,0.0000,6.9646,4.8923,2.8238,5.6867,2.8002,4.7436,2.7895,4.7319,7.0025,4.5800,7.4954,5.2983,5.3961,5.3071,2.7803,8.9227,5.5914,4.2758,7.1749,6.6267,6.5470,5.1104,8.2905 7.1540,6.7031,5.5169,5.8284,4.5138,4.6425,5.0179,6.9646,0.0000,6.7047,5.0056,9.1250,4.9769,2.9634,7.5819,8.5546,2.8061,6.9800,3.6751,2.5834,7.6541,4.9881,7.6494,2.7948,4.5088,5.2110,9.1264,2.9761,7.8062,2.7941,2.8038 6.8273,4.8983,3.0135,2.8610,5.5206,5.0210,7.5889,4.8923,6.7047,0.0000,5.8617,2.7465,2.9301,5.1272,4.9558,3.9808,5.3162,6.8165,4.7404,6.8576,5.5561,5.5098,2.8152,7.0330,2.6644,7.6461,5.2912,7.0096,2.9674,4.2945,7.0258 3.6995,7.5806,6.8412,5.1354,7.0249,2.8371,4.7500,2.8238,5.0056,5.8617,0.0000,7.6266,2.9474,2.7359,4.9270,6.8677,5.4363,5.4517,6.7478,3.1721,6.8329,5.4529,4.9555,7.3438,5.1708,2.7597,8.8287,5.4452,7.8249,4.2909,7.0646 7.4136,5.2873,4.6886,3.8488,6.9978,6.4170,8.8240,5.6867,9.1250,2.7465,7.6266,0.0000,4.9529,7.5866,4.8602,2.7541,8.0299,7.1729,7.0207,8.9107,5.2910,6.5588,2.9146,9.5249,5.3922,9.1160,4.1719,8.7725,2.7497,6.4529,9.0793 4.6132,5.5000,4.2507,2.8086,5.3888,2.6958,5.4549,2.8002,4.9769,2.9301,2.9474,4.9529,0.0000,2.8097,3.9433,4.7505,4.4056,5.3122,4.8719,4.3293,5.3691,4.4387,2.8442,6.4077,2.8070,4.8696,6.5606,5.3482,5.0545,2.9281,6.2031 5.8456,7.0038,5.4276,4.9618,5.7435,3.6516,5.4488,4.7436,2.9634,5.1272,2.7359,7.5866,2.8097,0.0000,6.2398,7.3805,2.7234,6.6316,4.5653,2.8038,7.3250,5.3513,5.6427,4.8167,3.2793,4.4545,8.7753,4.6687,7.1729,2.8747,5.2395 2.8037,4.9530,5.8673,3.0028,6.2424,3.1115,4.9281,2.7895,7.5819,4.9558,4.9270,4.8602,3.9433,6.2398,0.0000,2.6964,7.9654,2.7924,7.3935,6.1211,2.8429,3.9452,2.8424,9.2720,6.2355,5.1988,4.8668,6.2430,5.2004,5.1737,7.9659 5.4881,3.9763,4.8804,2.7539,6.0645,4.9489,6.8616,4.7319,8.5546,3.9808,6.8677,2.7541,4.7505,7.3805,2.6964,0.0000,8.3401,4.7306,7.1148,7.8221,2.7718,4.7476,2.7753,9.5030,6.0617,7.5711,2.7577,7.3777,3.1397,5.7882,8.3392 8.1769,7.0263,4.7149,5.7401,5.2615,5.5823,7.0789,7.0025,2.8061,5.3162,5.4363,8.0299,4.4056,2.7234,7.9654,8.3401,0.0000,8.3066,2.9172,4.5762,8.3187,6.2166,6.9974,2.6988,2.6667,6.8504,9.0816,5.2518,7.0366,3.3010,4.3079 2.7361,4.8941,6.5707,4.1214,5.6079,3.0042,2.8247,4.5800,6.9800,6.8165,5.4517,7.1729,5.3122,6.6316,2.7924,4.7306,8.3066,0.0000,7.5094,5.3015,2.7788,2.8080,5.4012,8.9373,7.2956,4.2691,5.6901,4.7565,6.5515,5.1204,7.0169 8.3034,4.7416,2.5568,4.5830,2.8907,5.5496,6.7504,7.4954,3.6751,4.7404,6.7478,7.0207,4.8719,4.5653,7.3935,7.1148,2.9172,7.5094,0.0000,5.4477,6.7736,4.8812,6.7675,2.6661,2.8871,7.5857,7.0211,4.5660,5.1569,2.7695,2.9165 4.9450,6.8450,6.3820,5.4268,5.3589,3.0193,3.1734,5.2983,2.5834,6.8576,3.1721,8.9107,4.3293,2.8038,6.1211,7.8221,4.5762,5.3015,5.4477,0.0000,6.7806,4.3263,6.7865,5.3395,5.3642,2.6393,8.9072,2.7997,8.0722,3.1518,4.5623 4.8225,2.8166,5.0625,2.9346,4.7384,4.1730,4.9441,5.3961,7.6541,5.5561,6.8329,5.2910,5.3691,7.3250,2.8429,2.7718,8.3187,2.7788,6.7736,6.7806,0.0000,2.8411,4.6768,8.8466,6.6849,6.5259,2.9190,5.6409,4.2802,5.1556,7.0036 4.6152,2.9221,4.2533,2.8122,2.8152,2.6916,2.9387,5.3071,4.9881,5.5098,5.4529,6.5588,4.4387,5.3513,3.9452,4.7476,6.2166,2.8080,4.8812,4.3263,2.8411,0.0000,5.3713,6.4170,5.3921,4.8619,4.9490,2.8105,5.0539,2.9321,4.4129 4.8243,5.5502,5.0600,2.9319,6.6866,4.1796,6.8368,2.7803,7.6494,2.8152,4.9555,2.9146,2.8442,5.6427,2.8424,2.7753,6.9974,5.4012,6.7675,6.7865,4.6768,5.3713,0.0000,8.8413,4.7294,6.5361,5.2956,7.3262,4.2788,5.1556,8.3129 9.4876,7.0328,5.2125,6.8084,4.6140,6.7613,7.3498,8.9227,2.7948,7.0330,7.3438,9.5249,6.4077,4.8167,9.2720,9.5030,2.6988,8.9373,2.6661,5.3395,8.8466,6.4170,8.8413,0.0000,4.6127,7.9200,9.5247,4.8201,7.8081,4.1075,2.6956 7.2391,5.5148,2.9262,3.8015,4.7660,4.7913,7.0263,5.5914,4.5088,2.6644,5.1708,5.3922,2.8070,3.2793,6.2355,6.0617,2.6667,7.2956,2.8871,5.3642,6.6849,5.3921,4.7294,4.6127,0.0000,6.9513,6.9963,5.7431,4.7049,2.9171,5.2546 2.9941,7.6318,7.8126,5.8992,6.9472,2.8772,2.7571,4.2758,5.2110,7.6461,2.7597,9.1160,4.8696,4.4545,5.1988,7.5711,6.8504,4.2691,7.5857,2.6393,6.5259,4.8619,6.5361,7.9200,6.9513,0.0000,9.1125,4.4513,8.8145,4.8939,6.8395 7.4180,2.7456,4.6864,3.8516,5.3921,6.4156,7.6165,7.1749,9.1264,5.2912,8.8287,4.1719,6.5606,8.7753,4.8668,2.7577,9.0816,5.6901,7.0211,8.9072,2.9190,4.9490,5.2956,9.5247,6.9963,9.1125,0.0000,7.5797,2.7482,6.4514,8.0302 5.8523,5.1150,5.4224,4.9611,3.2734,3.6500,2.7391,6.6267,2.9761,7.0096,5.4452,8.7725,5.3482,4.6687,6.2430,7.3777,5.2518,4.7565,4.5660,2.7997,5.6409,2.8105,7.3262,4.8201,5.7431,4.4513,7.5797,0.0000,7.1676,2.8724,2.7190 7.6310,2.9654,2.6001,2.9212,4.7050,5.9378,7.8184,6.5470,7.8062,2.9674,7.8249,2.7497,5.0545,7.1729,5.2004,3.1397,7.0366,6.5515,5.1569,8.0722,4.2802,5.0539,4.2788,7.8081,4.7049,8.8145,2.7482,7.1676,0.0000,5.1559,7.0354 5.5996,4.2863,3.2330,3.0441,2.9157,2.8228,4.2915,5.1104,2.7941,4.2945,4.2909,6.4529,2.9281,2.8747,5.1737,5.7882,3.3010,5.1204,2.7695,3.1518,5.1556,2.9321,5.1556,4.1075,2.9171,4.8939,6.4514,2.8724,5.1559,0.0000,3.2927 8.1761,5.3168,4.7116,5.7392,2.6684,5.5754,5.4352,8.2905,2.8038,7.0258,7.0646,9.0793,6.2031,5.2395,7.9659,8.3392,4.3079,7.0169,2.9165,4.5623,7.0036,4.4129,8.3129,2.6956,5.2546,6.8395,8.0302,2.7190,7.0354,3.2927,0.0000 Edit 1 Thank-you, #yatu, and #mathfux but Kdtree does not produce what I want. so from #mathfux answer if the nearest list of 3 is 0,1,2 and 0 has the nearest list of 1,4,2 then the algorithm should give more importance to distances and break the connections of 3 and 0. 3 should find other points nearest to it, instead of joining with 0. if 3 cannot find other points nearest to it then 0 has to exclude either 1 or 2 based on the distance and include 3 because 3 could not find 3 nearest points other than 0,1,2. Also one can see 0 is connected to 25 but 25 is not connected to 0 which is wrong. if 0 is connected to 25 then 25 should also be connected to 0. [ 0, 17, 7, 25] [ 5, 21, 12, 6] [ 6, 25, 27, 17] [10, 25, 13, 7] [12, 5, 3, 24] [21, 5, 3, 4] [25, 6, 10, 19] [29, 4, 12, 24]
A KDTree would be more appropriate for what you're trying to do. Once you've built the tree, you can search for the 3 nearest points to all points in the tree: from sklearn.neighbors import KDTree tree = KDTree(X, leaf_size=2) dist, ind = tree.query(X, k=3) To check the result, we can verify that the three smallest values in the first row, match the three smallest distances returned by the query indexing on the first row ind[0]: np.sort(X[0])[:3] #array([0. , 2.7345, 2.7361]) X[0,ind[0]] # array([0. , 2.7361, 2.7345])
It appears that such connection might not always be possible because you are not able to hold your matrix symmetric. Actually, matrix of distance is not needed for explanation. import matplotlib.pyplot as plt points = np.array([[0,0],[0,1],[0,-1],[1,0],[-0.5,0]]) plt.scatter(*np.transpose(points)) for i in range(len(points)): plt.text(points[i][0], points[i][1], [0, 1, 2, 3, 4][i], fontsize=24) If you, let's say, find 3 closest points 0, 1, 2 for point 3 this doesn't mean that 3 is in a list of closest points of 0. If you still want to find 3 closests points, you don't need that looping at all, just use np.argsort(dist_mat, axis=1)[:,1:4] to find indexes of closest points or np.sort(dist_mat, axis=1)[:,1:4] to find distances. In case you need to work with points instead of dist_max (which is an overhead), KDTree is a better option.
How to replace value with its column or rows index on condition in numpy?
How to replace each value in the n x n array with column index if value >= 1 otherwise with row index. Even better if the replaced value would map into other 1d array and return the value from it. value_array = np.array([200, 200, 300, 10]) arr = np.array( [[1, 1, .66, 20], [1, 1, .66, 20], [1.5, 1.5, 1, 30], [.05, .05, .03, 1]] ) Goal is to get an array of the same size containing values from value_array . Examples: at position [0,2] the value is .66 which is less than 1 therefore row index is needed which is 0. 0 then is indexed into value_array and the answer in the result matrix position[0,2] is 200. at position [0,3] the value is 20 which is greater than 1 therefore column index is needed which is 3. 3 then is indexed into value_array and the answer in the result matrix position[0,3] is 10. Also, this to be applied for a big array(1m x 1m) executions needs to be somehow split into multiple parts.
One way using numpy.select: conds = [arr>1, arr<1] target = np.full(arr.shape, value_array) np.select(conds, [target, target.T], arr) Output: array([[ 1., 1., 200., 10.], [ 1., 1., 200., 10.], [200., 200., 1., 10.], [ 10., 10., 10., 1.]])
Another option using np.where: # I've rename value_array to v for simplicity. # We use the broadcasting potential of numpy to get our result res = np.where(arr>=1,v,v.reshape(len(v),1)) And res= array([[200, 200, 200, 10], [200, 200, 200, 10], [200, 200, 300, 10], [ 10, 10, 10, 10]]) This solution also use less memory because you don't need to store intermediate result.
How to efficiently make new matrix from sum of blocks from bigger sparse matrix
I have a large scipy sparse symmetric matrix which I need to condense by taking the sum of blocks to make a new smaller matrix. For example, for a 4x4 sparse matrix A I will like to make a 2x2 matrix B in which B[i,j] = sum(A[i:i+2,j:j+2]). Currently, I just go block by block to recreate the condensed matrix but this is slow. Any ideas on how to optimize this? Update: Here is an example code that works fine, but is slow for a sparse matrix of 50.000x50.000 that I want to condense in a 10.000x10.000: >>> A = (rand(4,4)<0.3)*rand(4,4) >>> A = scipy.sparse.lil_matrix(A + A.T) # make the matrix symmetric >>> B = scipy.sparse.lil_matrix((2,2)) >>> for i in range(B.shape[0]): ... for j in range(B.shape[0]): ... B[i,j] = A[i:i+2,j:j+2].sum()
First of all, lil matrix for the one your are summing up is probably really bad, I would try COO or maybe CSR/CSS (I don't know which will be better, but lil is probably inherently slower for many of these operations, even the slicing might be much slower, though I did not test). (Unless you know that for example dia fits perfectly) Based on COO I could imagine doing some tricking around. Since COO has row and col arrays to give the exact positions: matrix = A.tocoo() new_row = matrix.row // 5 new_col = matrix.col // 5 bin = (matrix.shape[0] // 5) * new_col + new_row # Now do a little dance because this is sparse, # and most of the possible bin should not be in new_row/new_col # also need to group the bins: unique, bin = np.unique(bin, return_inverse=True) sum = np.bincount(bin, weights=matrix.data) new_col = unique // (matrix.shape[0] // 5) new_row = unique - new_col * (matrix.shape[0] // 5) result = scipy.sparse.coo_matrix((sum, (new_row, new_col))) (I won't guarantee that I didn't confuse row and column somewhere and this only works for square matrices...)
Given a square matrix of size N and a split size of d (so matrix will be partitioned into N/d * N/d sub-matrices of size d), could you use numpy.split a couple times to build a collection of those sub-matrices, sum each of them, and put them back together? This should be treated more as pseudocode than an efficient implementation, but it expresses my idea: def chunk(matrix, size): row_wise = [] for hchunk in np.split(matrix, size): row_wise.append(np.split(hchunk, size, 1)) return row_wise def sum_chunks(chunks): sum_rows = [] for row in chunks: sum_rows.append([np.sum(col) for col in row]) return np.array(sum_rows) Or more compactly as def sum_in_place(matrix, size): return np.array([[np.sum(vchunk) for vchunk in np.split(hchunk, size, 1)] for hchunk in np.split(matrix, size)]) This gives you something like the following: In [16]: a Out[16]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) In [17]: chunk.sum_in_place(a, 2) Out[17]: array([[10, 18], [42, 50]])
For a 4x4 example you can do the following: In [43]: a = np.arange(16.).reshape((4, 4)) In [44]: a Out[44]: array([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.], [ 12., 13., 14., 15.]]) In [45]: u = np.array([a[:2, :2], a[:2, 2:], a[2:,:2], a[2:, 2:]]) In [46]: u Out[46]: array([[[ 0., 1.], [ 4., 5.]], [[ 2., 3.], [ 6., 7.]], [[ 8., 9.], [ 12., 13.]], [[ 10., 11.], [ 14., 15.]]]) In [47]: u.sum(1).sum(1).reshape(2, 2) Out[47]: array([[ 10., 18.], [ 42., 50.]]) Using something like itertools it should be possible to automate and generalise an expression for u.
What does .shape[] do in "for i in range(Y.shape[0])"?
I'm trying to break down a program line by line. Y is a matrix of data but I can't find any concrete data on what .shape[0] does exactly. for i in range(Y.shape[0]): if Y[i] == -1: This program uses numpy, scipy, matplotlib.pyplot, and cvxopt.
The shape attribute for numpy arrays returns the dimensions of the array. If Y has n rows and m columns, then Y.shape is (n,m). So Y.shape[0] is n. In [46]: Y = np.arange(12).reshape(3,4) In [47]: Y Out[47]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [48]: Y.shape Out[48]: (3, 4) In [49]: Y.shape[0] Out[49]: 3
shape is a tuple that gives dimensions of the array.. >>> c = arange(20).reshape(5,4) >>> c array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]) c.shape[0] 5 Gives the number of rows c.shape[1] 4 Gives number of columns
shape is a tuple that gives you an indication of the number of dimensions in the array. So in your case, since the index value of Y.shape[0] is 0, your are working along the first dimension of your array. From Link An array has a shape given by the number of elements along each axis: >>> a = floor(10*random.random((3,4))) >>> a array([[ 7., 5., 9., 3.], [ 7., 2., 7., 8.], [ 6., 8., 3., 2.]]) >>> a.shape (3, 4) and http://www.scipy.org/Numpy_Example_List#shape has some more examples.
In python, Suppose you have loaded up the data in some variable train: train = pandas.read_csv('file_name') >>> train train([[ 1., 2., 3.], [ 5., 1., 2.]],) I want to check what are the dimensions of the 'file_name'. I have stored the file in train >>>train.shape (2,3) >>>train.shape[0] # will display number of rows 2 >>>train.shape[1] # will display number of columns 3
In Python shape() is use in pandas to give number of row/column: Number of rows is given by: train = pd.read_csv('fine_name') //load the data train.shape[0] Number of columns is given by train.shape[1]
shape() consists of array having two arguments rows and columns. if you search shape[0] then it will gave you the number of rows. shape[1] will gave you number of columns.
Splice arrays in NumPy?
I'd like to perform a splice of sorts in NumPy. Let's say I have two arrays, a and b: >>> a array([[ 1, 10], [ 2, 20], [ 5, 30]]) >>> b array([[ 1, 11], [ 3, 31], [ 4, 41]]) which I want to splice together into the following array, c: >>> c array([[ 1., 10.], [ 2., 20.], [ 3., nan], [ 4., nan], [ 5., 30.]]) That is, I splice the values from the first column of b into a without bothering about the second column. I could of course implement this myself pretty easily, but it would be nicer to have NumPy do it for me instead. Is that possible?
The answer by mishaF is only missing the last step -- making the entries of the last column unique. The full code to get your c (except for the dtype, which changes from int to float in your post) is b[:,1]=numpy.nan c = numpy.r_[a, b] c.sort(0) c = c[numpy.unique(c[:,0], True)[1]]
You could stack the two together and then sort. However, this doesn't take care of the fact that you have two occurrences of the index 1. Not sure this is a great improvement... b[:,1]=np.nan c = np.vstack((a,b)) c.sort(0)
I don't think there is anything in NumPy to do that. Do you need exactly that result (in order, second column with undefined value)? Maybe there is something close that would still be useful for the end goal.