I got two arrays:
arr1 = [1,2,3]
arr2 = [5,10]
Now i want to create a Dataframe from the arrays which hold the sum of all combinations:
pd.DataFrame([[6,7,8], [11,12,13]],
columns=['1', '2', '3'],
index=['5', '10'])
1
2
3
5
6
7
8
10
11
12
13
I know this can be easily done by iterating over the arrays, but I guess there is a built-in function to accomplish the same but way faster.
I've already looked in the documentation of different functions like the merge function but without success.
We can use numpy broadcasting with addition then build the resulting DataFrame by assigning the index and column names from the lists:
import numpy as np
import pandas as pd
arr1 = [1, 2, 3]
arr2 = [5, 10]
df = pd.DataFrame(
np.array(arr1) + np.array(arr2)[:, None], index=arr2, columns=arr1
)
Or with add + outer (which works if arr1 and arr2 are lists or arrays):
df = pd.DataFrame(np.add.outer(arr2, arr1), index=arr2, columns=arr1)
*Note if arr1 and arr2 are already arrays (instead of list) it can just look like:
import numpy as np
import pandas as pd
arr1 = np.array([1, 2, 3])
arr2 = np.array([5, 10])
df = pd.DataFrame(arr1 + arr2[:, None], index=arr2, columns=arr1)
All ways produce df:
1 2 3
5 6 7 8
10 11 12 13
Related
I have a 2-d array of an index of a pandas series. Would like to create a 2-d array of the values from the pandas series that correspond to the index.
For example:
import pandas as pd
import numpy as np
A = pd.Series(data=[1,2,3,4,5])
idx = np.array([[0,2,3],[2,3,1]])
Would like to return:
B = np.array([[1,3,4],[3,4,2]])
I know I could do this as a loop:
B = np.zeros((2,3))
for i in [0,1]:
B[i,:] = test[idx[i]]
However, in practice need to do this repeatedly so would like to broadcast the index locations directly. Pandas is not necessary, happy to do it all in numpy if easier.
Something like this might work:
A[idx.flatten()].values.reshape(idx.shape)
A[idx] gives a Cannot index with multidimensional key error.
In [190]: A = pd.Series(data=[1,2,3,4,5])
...: idx = np.array([[0,2,3],[2,3,1]])
But the 1d array derived from the Series, can be indexed this way:
In [191]: A.values
Out[191]: array([1, 2, 3, 4, 5])
In [192]: A.values[idx]
Out[192]:
array([[1, 3, 4],
[3, 4, 2]])
numpy has no problems returning an array with a dimension that matches idx.
Indexing the Series like this returns a Series - which by definition is 1d:
In [194]: A[idx.ravel()]
Out[194]:
0 1
2 3
3 4
2 3
3 4
1 2
dtype: int64
I am a learner , bit stuck not getting how do i print the even indexed value of an array
My code :
import numpy as np
arr_1 = np.array([2,4,6,11])
arr_1 [ (arr_1[ i for i in range(arr_1.size) ] ) % 2 == 0 ]
Expected Output :
2,6
2 -> comes under index 0
6 -> comes under index 2
Both are even index
IIUC, You can use [start:end:step] from list and get what you want like below:
>>> arr_1 = np.array([2,4,6,11])
>>> arr_1[::2]
array([2, 6])
>>> list(arr_1[::2])
[2, 6]
>>> print(*arr_1[::2], sep=',')
2,6
Your problem was that you were iterating through what range() function returned.
which obviously wasn't your original arr_1
in my code we just iterate through the arr_1 without using any range or length
Code that will work:
import numpy as np
arr_1 = np.array([2,4,6,11])
arr_1 = [i for index, i in enumerate(arr_1) if ((index % 2) == 0)]
print(arr_1)
Output:
[2, 6]
How should I map indices of a numpy matrix?
For example:
mx = np.matrix([[5,6,2],[3,3,7],[0,1,6]]
The row/column indices are 0, 1, 2.
So:
>>> mx[0,0]
5
Let s say I need to map these indices, converting 0, 1, 2 into, e.g. 10, 'A', 'B' in the way that:
mx[10,10] #returns 5
mx[10,'A'] #returns 6 and so on..
I can just set a dict and use it to access the elements, but I would like to know if it is possible to do something like what I just described.
I would suggest using pandas dataframe with the index and columns using the new mapping for row and col indexing respectively for ease in indexing. It allows us to select a single element or an entire row or column with the familiar colon operator.
Consider a generic (non-square 4x3 shaped matrix) -
mx = np.matrix([[5,6,2],[3,3,7],[0,1,6],[4,5,2]])
Consider the mappings for rows and columns -
row_idx = [10, 'A', 'B','C']
col_idx = [10, 'A', 'B']
Let's take a look on the workflow with the given sample -
# Get data into dataframe with given mappings
In [57]: import pandas as pd
In [58]: df = pd.DataFrame(mx,index=row_idx, columns=col_idx)
# Here's how dataframe data looks like
In [60]: df
Out[60]:
10 A B
10 5 6 2
A 3 3 7
B 0 1 6
C 4 5 2
# Get one scalar element
In [61]: df.loc['C',10]
Out[61]: 4
# Get one entire col
In [63]: df.loc[:,10].values
Out[63]: array([5, 3, 0, 4])
# Get one entire row
In [65]: df.loc['A'].values
Out[65]: array([3, 3, 7])
And best of all we are not making any extra copies as the dataframe and its slices are still indexing into the original matrix/array memory space -
In [98]: np.shares_memory(mx,df.loc[:,10].values)
Out[98]: True
Try this:
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
dt = np.dtype([('ID', np.int32), ('Ring', np.int32)])
B = np.array(list(map(tuple, A)), dtype=dt)
print(B['ID'])
You can use the __getitem__ and __setitem__ special methods and create a new class as shown.
Store the index map as a dictionary in an instance variable self.index_map.
import numpy as np
class Matrix(np.matrix):
def __init__(self, lis):
self.matrix = np.matrix(lis)
self.index_map = {}
def setIndexMap(self, index_map):
self.index_map = index_map
def getIndex(self, key):
if type(key) is slice:
return key
elif key not in self.index_map.keys():
return key
else:
return self.index_map[key]
def __getitem__(self, idx):
return self.matrix[self.getIndex(idx[0]), self.getIndex(idx[1])]
def __setitem__(self, idx, value):
self.matrix[self.getIndex(idx[0]), self.getIndex(idx[1])] = value
Usage:
Creating a matrix.
>>> mx = Matrix([[5,6,2],[3,3,7],[0,1,6]])
>>> mx
Matrix([[5, 6, 2],
[3, 3, 7],
[0, 1, 6]])
Defining the Index Map.
>>> mx.setIndexMap({10:0, 'A':1, 'B':2})
Different ways to index the matrix.
>>> mx[0,0]
5
>>> mx[10,10]
5
>>> mx[10,'A']
6
It also handles slicing as shown.
>>> mx[1:3, 1:3]
matrix([[3, 7],
[1, 6]])
It's the end of my first month coding with python and I'm struggling with a piece of code that seemed simpler in my mind.
I'm trying to edit array values based on the positions given by another array generated by np.argwhere. For example:
a = np.arange(6).reshape(2,3)
b = np.argwhere(a>3)
c = ([7,8,9],[10,11,12])
Now I want to change the values in c that are in the same position as the values that are greater than 3 in the array a.
I'm trying to avoid a for loop because of the size of the real data I am working on.
Thanks in advance!
You can use numpy indexing:
In [6]: c[np.where(a>3)] = a[a>3]
In [7]: c
Out[7]:
array([[ 7, 8, 9],
[10, 4, 5]])
cant you just do
c[a>3] = a[a>3]
example:
import numpy as np
c = np.arange(7,13).reshape(2,3)
a = np.arange(6).reshape(2,3)
c[a>3] = a[a>3]
outputs
>>> c
[[ 7 8 9]
[10 4 5]]
I´ve got the following problem:
If i select some index of my Pandas DataFrame:
df = pd.DataFrame(data=CoordArray[0:,1:],index=CoordArray[:,0],columns=["x","y","z"])
like this:
print(df.loc[['1234567','7654321'],:])
it works pretty well.
but if i have those data in a numpy array, transform this array to a list and do it like this:
mynewlist = list(SomeNumpyArray)
print(df.loc[mynewlist])
i get the following problem:
"None of [[1234567, 7654321]] are in the [index]"
I really dont know whats going wrong.
I haven't been able to replicate your issue. As #Wen commented, your list and numpy array may not have the same types.
Here is an example demonstrating that lists or numpy arrays are acceptable as indexers:
import pandas as pd, numpy as np
df = pd.DataFrame(data=[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
index=['1000', '2000', '3000', '4000'],
columns= ['x', 'y', 'z'])
idx = np.array(['2000', '3000'])
df.loc[idx]
# x y z
# 2000 4 5 6
# 3000 7 8 9
lst = list(idx)
df.loc[idx_lst]
# x y z
# 2000 4 5 6
# 3000 7 8 9