Get values of pandas series from a array of index locations - python

I have a 2-d array of an index of a pandas series. Would like to create a 2-d array of the values from the pandas series that correspond to the index.
For example:
import pandas as pd
import numpy as np
A = pd.Series(data=[1,2,3,4,5])
idx = np.array([[0,2,3],[2,3,1]])
Would like to return:
B = np.array([[1,3,4],[3,4,2]])
I know I could do this as a loop:
B = np.zeros((2,3))
for i in [0,1]:
B[i,:] = test[idx[i]]
However, in practice need to do this repeatedly so would like to broadcast the index locations directly. Pandas is not necessary, happy to do it all in numpy if easier.

Something like this might work:
A[idx.flatten()].values.reshape(idx.shape)

A[idx] gives a Cannot index with multidimensional key error.
In [190]: A = pd.Series(data=[1,2,3,4,5])
...: idx = np.array([[0,2,3],[2,3,1]])
But the 1d array derived from the Series, can be indexed this way:
In [191]: A.values
Out[191]: array([1, 2, 3, 4, 5])
In [192]: A.values[idx]
Out[192]:
array([[1, 3, 4],
[3, 4, 2]])
numpy has no problems returning an array with a dimension that matches idx.
Indexing the Series like this returns a Series - which by definition is 1d:
In [194]: A[idx.ravel()]
Out[194]:
0 1
2 3
3 4
2 3
3 4
1 2
dtype: int64

Related

numpy get row index where elements in certain columns are zero

I want to find indexes of row based on criteria over certain columns
So, something like:
import numpy as np
x = np.random.rand(4, 5)
x[2, 2] = 0
x[2, 3] = 0
x[3, 1] = 0
x[1, 3] = 0
Now, I want to get the index of the rows where either of columns 3 or 4 are zeros. How can one do that with numpy? Do I need to make multiple calls to nonzero for each column and combine these indices using a set or something like that?
Using np.where first array within the tuple is row index
np.where(x[:,[3,4]]==0)
Out[79]: (array([1, 2], dtype=int64), array([0, 0], dtype=int64))

Map index of numpy matrix

How should I map indices of a numpy matrix?
For example:
mx = np.matrix([[5,6,2],[3,3,7],[0,1,6]]
The row/column indices are 0, 1, 2.
So:
>>> mx[0,0]
5
Let s say I need to map these indices, converting 0, 1, 2 into, e.g. 10, 'A', 'B' in the way that:
mx[10,10] #returns 5
mx[10,'A'] #returns 6 and so on..
I can just set a dict and use it to access the elements, but I would like to know if it is possible to do something like what I just described.
I would suggest using pandas dataframe with the index and columns using the new mapping for row and col indexing respectively for ease in indexing. It allows us to select a single element or an entire row or column with the familiar colon operator.
Consider a generic (non-square 4x3 shaped matrix) -
mx = np.matrix([[5,6,2],[3,3,7],[0,1,6],[4,5,2]])
Consider the mappings for rows and columns -
row_idx = [10, 'A', 'B','C']
col_idx = [10, 'A', 'B']
Let's take a look on the workflow with the given sample -
# Get data into dataframe with given mappings
In [57]: import pandas as pd
In [58]: df = pd.DataFrame(mx,index=row_idx, columns=col_idx)
# Here's how dataframe data looks like
In [60]: df
Out[60]:
10 A B
10 5 6 2
A 3 3 7
B 0 1 6
C 4 5 2
# Get one scalar element
In [61]: df.loc['C',10]
Out[61]: 4
# Get one entire col
In [63]: df.loc[:,10].values
Out[63]: array([5, 3, 0, 4])
# Get one entire row
In [65]: df.loc['A'].values
Out[65]: array([3, 3, 7])
And best of all we are not making any extra copies as the dataframe and its slices are still indexing into the original matrix/array memory space -
In [98]: np.shares_memory(mx,df.loc[:,10].values)
Out[98]: True
Try this:
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
dt = np.dtype([('ID', np.int32), ('Ring', np.int32)])
B = np.array(list(map(tuple, A)), dtype=dt)
print(B['ID'])
You can use the __getitem__ and __setitem__ special methods and create a new class as shown.
Store the index map as a dictionary in an instance variable self.index_map.
import numpy as np
class Matrix(np.matrix):
def __init__(self, lis):
self.matrix = np.matrix(lis)
self.index_map = {}
def setIndexMap(self, index_map):
self.index_map = index_map
def getIndex(self, key):
if type(key) is slice:
return key
elif key not in self.index_map.keys():
return key
else:
return self.index_map[key]
def __getitem__(self, idx):
return self.matrix[self.getIndex(idx[0]), self.getIndex(idx[1])]
def __setitem__(self, idx, value):
self.matrix[self.getIndex(idx[0]), self.getIndex(idx[1])] = value
Usage:
Creating a matrix.
>>> mx = Matrix([[5,6,2],[3,3,7],[0,1,6]])
>>> mx
Matrix([[5, 6, 2],
[3, 3, 7],
[0, 1, 6]])
Defining the Index Map.
>>> mx.setIndexMap({10:0, 'A':1, 'B':2})
Different ways to index the matrix.
>>> mx[0,0]
5
>>> mx[10,10]
5
>>> mx[10,'A']
6
It also handles slicing as shown.
>>> mx[1:3, 1:3]
matrix([[3, 7],
[1, 6]])

Convert a numpy array to an array of numpy arrays

How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])
For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!
Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b

Convert row-column-value data to an array - numpy

I have data in long format that stores the row#, column# and value as shown below:
ROW COLUMN VALUE
1 1 1
1 3 3
2 1 1
2 2 2
3 1 1
3 2 2
3 3 3
Please note that the certain ROW, COLUMN combinations are missing (for instance there is no value for ROW = 1 and COLUMN = 2). I would like to convert this into a 3 x 3 array like so. The missing row column combination gets filled in by 0:
1 0 3
1 2 0
1 2 3
My initial approach to this problem was to declare an empty 3 x 3 array, read in the three columns as 1d arrays and loop over rows and columns and update the array based on the value array. For small dimensional cases this seems doable, but for higher dimensions this does not seem to be the "Pythonic" way to do it. Has this problem been tackled in some canned function in numpy package? I looked into reshape - but that assumes no missing values.
Once you have the row, column and values in numpy arrays, you can do something like the following. (Note that I've taken the more Pythonic approach of putting the 0-based indices in row and col).
Here's the data, in one-dimensonal arrays:
In [13]: row = np.array([0, 0, 1, 1, 2, 2, 2])
In [14]: col = np.array([0, 2, 0, 1, 0, 1, 2])
In [15]: values = np.array([11, 12, 13, 14, 15, 16, 17])
Create a two-dimensional array to hold the values. I use the maxima from row and col to figure out how big the array should be. You might use some other values if row and col don't necessarily include values in the last row or column.
In [16]: a = np.zeros((row.max()+1, col.max()+1), dtype=values.dtype)
Now fill in the values with this assignment
In [17]: a[row, col] = values
Et voilĂ :
In [18]: a
Out[18]:
array([[11, 0, 12],
[13, 14, 0],
[15, 16, 17]])
Your example is a 3x3 array, but if you will actually have much larger arrays and not a lot of entries, you might consider using a scipy sparse matrix. For example, here's how you can create a "COO" matrix from the same data as above, using the coo_matrix class:
In [25]: from scipy.sparse import coo_matrix
In [26]: c = coo_matrix((values, (row, col)), shape=(row.max()+1, col.max()+1))
In [27]: c
Out[27]:
<3x3 sparse matrix of type '<type 'numpy.int64'>'
with 7 stored elements in COOrdinate format>
In [28]: c.A
Out[28]:
array([[11, 0, 12],
[13, 14, 0],
[15, 16, 17]])

Colletting element from different DataFrames into arrays

I like using nested data structures and now I'm trying to understand how to use Pandas
Here is a toy model:
a=pd.DataFrame({'x':[1,2],'y':[10,20]})
b=pd.DataFrame({'x':[3,4],'y':[30,40]})
c=[a,b]
now I would like to get:
sol=np.array([[[1],[3]],[[2],[4]]])
I have an idea to get both sol[0] and sol[1] as:
s0=np.array([item[['x']].ix[0] for item in c])
s1=np.array([item[['x']].ix[1] for item in c])
but to get sol I would run over the index and I don't think it is really pythonic...
It looks like you want just the x columns from a and b. You can concatenate two Series (or DataFrames) into a new DataFrame using pd.concat:
In [132]: pd.concat([a['x'], b['x']], axis=1)
Out[132]:
x x
0 1 3
1 2 4
[2 rows x 2 columns]
Now, if you want a numpy array, use the values attribute:
In [133]: pd.concat([a['x'], b['x']], axis=1).values
Out[133]:
array([[1, 3],
[2, 4]], dtype=int64)
And if you want a numpy array with the same shape as sol, then use the reshape method:
In [134]: pd.concat([a['x'], b['x']], axis=1).values.reshape(2,2,1)
Out[134]:
array([[[1],
[3]],
[[2],
[4]]], dtype=int64)
In [136]: np.allclose(pd.concat([a['x'], b['x']], axis=1).values.reshape(2,2,1), sol)
Out[136]: True

Categories