I am trying to encode a single crossover breeding method for a genetic algorithm without an explicit loop. So I need to add one row of an array with and another row of another array with the desired result as seen below. Note that the col_idx arrays choose the particular rows to breed, while the slice index array tells us where to slice (I would like to keep the chunk of the array a up to an including the endpoint).
a=np.arange(20).reshape(4,5)
print('a')
print(a)
a
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
b=np.arange(20).reshape(4,5)*100
print('b')
print(b)
b
[[ 0 100 200 300 400]
[ 500 600 700 800 900]
[1000 1100 1200 1300 1400]
[1500 1600 1700 1800 1900]]
row_idx_a=np.array([3,1,0,3,1,3]) #edit-fixed array
row_idx_b=np.array([1,1,0,0,0,3]) #edit-fixed array to fix error identified by the answer below
slice_idx=np.array([2,1,0,4,4,3])
merged_array=np.zeros((4,5)) ######place holder for final array
#####now some creative slicing magic so that my final array is an irregular indexed addition#######
[[ 15 16 17 800 900]
[ 5 6 700 800 900]
[ 0 100 200 300 400]
[ 15 16 17 18 19]
[ 5 6 7 8 9]
[ 15 16 17 18 1900]]
I am finding it difficult to vectorize this problem? Any takers? Thanks.
Assuming that bits in the expected answer that correspond to the marked numbers
*
row_idx_a=np.array([3,1,0,3,2,3])
row_idx_b=np.array([2,1,0,0,0,3])
*
are wrong.
np.where(np.less.outer(slice_idx,np.arange(5)),b[row_idx_b],a[row_idx_a])
# array([[ 15, 16, 17, 1300, 1400],
# [ 5, 6, 700, 800, 900],
# [ 0, 100, 200, 300, 400],
# [ 15, 16, 17, 18, 19],
# [ 10, 11, 12, 13, 14],
# [ 15, 16, 17, 18, 1900]])
Related
I am trying to reshape a matrix, but I am struggling to make reshape work
Lets say that I have a (6x6) matrix (A), and we want to divide it in 4 arrays (A1,A2,A3,A4). For example
A=[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]
[19 20 21 22 23 24]
[25 26 27 28 29 30]
[31 32 33 34 35 36]]
I want to divide it in 4 parts, such as:
A=[[ 1 2 3| 4 5 6]
[ 7 8 9| 10 11 12]
[13 14 15| 16 17 18]
---------------------
[19 20 21| 22 23 24]
[25 26 27| 28 29 30]
[31 32 33| 34 35 36]]
such as
A1=[[ 1 2 3]
[ 7 8 9]
[13 14 15]]
A2= ..
A3= ..
A4=[[22 23 24]
28 29 30]
34 35 36]]
Any suggestions would help me a lot!
The smaller arrays could simply be created by slicing the bigger array.
A = np.array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]])
A1 = A[0:3, 0:3]
A2 = A[3:6, 0:3]
A3 = A[0:3, 3:6]
A4 = A[3:6, 3:6]
When using reshape, the new array should be compatible with the old array (the number of elements should stay the same)
I have a 3d array containing missing values:
arr = np.array([[[ 1, 13],[ 2, 14],[ 3, np.nan]],[[ 4, 16],[ 5, 17],[ 6, 18]],[[ np.nan, 19],[ 8, 20],[ 9, 21]],[[10, 22],[11, 23],[12, np.nan]]])
I would like to perform imputation to replace those missing values, preferably using nearest neighbors. I tried looking into sklearn.impute module but none of the functions accept a 3d array. I know I could flatten the array, but that will result in loss of spatial information. Are there any alternatives?
EDIT:
The array has a 3d spatial configuration and in the real world might look like this:
layer 2
13 14 nan
16 17 18
19 20 21
22 23 nan
layer 1
1 2 3
4 5 6
nan 8 9
10 11 12
for example, value 1 is a neighbor of 2 and 4 in layer 1.
By flattening arr,
[[ 1, 13],
[ 2, 14],
[ 3, np.nan],
[ 4, 16],
[ 5, 17],
[ 6, 18],
[ np.nan, 19],
[ 8, 20],
[ 9, 21],
[10, 22],
[11, 23],
[12, np.nan]]
it looks as though 4 is farther away from 1 than 2 is, but it isn't. 1 is just as close to 2 as it is to 4, just in different dimensions.
So I have a source array like this:
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 100]
[ 0 100 33 100]
[ 3 110 22 100]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 100]]
and I want to update the array with this one, depend on the first column
[[ 3 110 22 105]
[ 5 105 17 110]
[ 1 95 28 115]]
to be like this
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 110]
[ 0 100 33 100]
[ 3 110 22 105]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 115]]
but I can't find a function in NumPy can do this directly, so currently have no way to do that better than this method I wrote:
def update_ary_with_ary(source, updates):
for x in updates:
index_of_col = np.argwhere(source[:,0] == x[0])
source[index_of_col] = x
This function makes a loop so it's not professional and not have high performance so I will use this until some-one give me a better way with NumPy laps, I don't want a solution from another lap, just Numpy
Assuming your source array is s and update array is u, and assuming that s and u are not huge, you can do:
update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u
Testing:
import numpy as np
s = np.array(
[[ 9, 85, 32, 100],
[ 7, 80, 30, 100],
[ 2, 90, 16, 100],
[ 6, 120, 22, 100],
[ 5, 105, 17, 100],
[ 0, 100, 33, 100],
[ 3, 110, 22, 100],
[ 4, 80, 22, 100],
[ 8, 115, 19, 100],
[ 1, 95, 28, 100]])
u = np.array(
[[ 3, 110, 22, 105],
[ 5, 105, 17, 110],
[ 1, 95, 28, 115]])
update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u
print(s)
This prints:
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 110]
[ 0 100 33 100]
[ 3 110 22 105]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 115]]
Edit:
OP has provided the following additional details:
The "source array" is "huge".
Each row in the "update array" matches
exactly one row in the "source array".
Based on this additional detail, the following alternative solution might provide a better performance, especially if the source array does not have its rows sorted on the first column:
sorted_idx = np.argsort(s[:,0])
pos = np.searchsorted(s[:,0],u[:,0],sorter=sorted_idx)
update_row_ids = sorted_idx[pos]
s[update_row_ids] = u
fountainhead your answer works correctly and yes it's full used Numpy laps, but in the performance test, it's rise the time on processing 50K rows in my simulation program in double!! from 22 seconds to 44 seconds!! I don't know why!! but your answer helps me to get the right answer on only this line:
source[updates[:,0]] = updates
# or
s[u[:,0]] = u
so when I use this its lower processing time from for 100K rows to only 0.5 seconds and then let me process more like 1M rows for only 5 seconds, am already learning python and data mining am shocked from these numbers, it's never happing before on other languages I play on the huge array like regular variables. you can see that on my GitHub.
https://github.com/qahmad81/war_simulation
fountainhead you should take the answer but visited should know the best answer to use.
If I have a tensor of (30,40,50), and I want to expand it out to the first order, then I get a second order tensor of (30,2000), and I don't know if tensorflow has an API that implements it.
import tensorflow as tf
import numpy as np
data1=tf.constant([
[[2,5,7,8],[6,4,9,10],[14,16,86,54]],
[[16,43,65,76],[43,65,7,24],[15,75,23,75]]])
data5=tf.reshape(data1,[3,8])
data2,data3,data4=tf.split(data1,3,1)
data6=tf.reshape(data2,[1,8])
data7=tf.reshape(data3,[1,8])
data8=tf.reshape(data4,[1,8])
data9=tf.concat([data6,data7,data8],0)
with tf.Session() as sess:
print(sess.run(data5))
print(sess.run(data))
This gives:
data5
[[ 2 5 7 8 6 4 9 10]
[14 16 86 54 16 43 65 76]
[43 65 7 24 15 75 23 75]]
data9
[[ 2 5 7 8 16 43 65 76]
[ 6 4 9 10 43 65 7 24]
[14 16 86 54 15 75 23 75]]
How do I get data9 directly?
Looks like you're trying to take the sub-tensors ranging across axis 0 (data1[0], data1[1], ...) and concatenate them along axis 2.
Transposing before reshaping should do the trick:
tf.reshape(tf.transpose(data1, [1,0,2]), [data1.shape[1], data1.shape[0] * data1.shape[2]])
You can try:
data9 = tf.layers.flatten(tf.transpose(data1, perm=[1, 0, 2]))
Output:
array([[ 2, 5, 7, 8, 16, 43, 65, 76],
[ 6, 4, 9, 10, 43, 65, 7, 24],
[14, 16, 86, 54, 15, 75, 23, 75]], dtype=int32)
Suppose I have a MultiIndex DataFrame similar to an example from the MultiIndex docs.
>>> df
0 1 2 3
first second
bar one 0 1 2 3
two 4 5 6 7
baz one 8 9 10 11
two 12 13 14 15
foo one 16 17 18 19
two 20 21 22 23
qux one 24 25 26 27
two 28 29 30 31
I want to generate a NumPy array from this DataFrame with a 3-dimensional structure like
>>> desired_arr
array([[[ 0, 4],
[ 1, 5],
[ 2, 6],
[ 3, 7]],
[[ 8, 12],
[ 9, 13],
[10, 14],
[11, 15]],
[[16, 20],
[17, 21],
[18, 22],
[19, 23]],
[[24, 28],
[25, 29],
[26, 30],
[27, 31]]])
How can I do so?
Hopefully it is clear what is happening here - I am effectively unstacking the DataFrame by the first level and then trying to turn each top level in the resulting column MultiIndex to its own 2-dimensional array.
I can get half way there with
>>> df.unstack(1)
0 1 2 3
second one two one two one two one two
first
bar 0 4 1 5 2 6 3 7
baz 8 12 9 13 10 14 11 15
foo 16 20 17 21 18 22 19 23
qux 24 28 25 29 26 30 27 31
but then I am struggling to find a nice way to turn each column into a 2-dimensional array and then join them together, beyond doing so explicitly with loops and lists.
I feel like there should be some way for me to specify the shape of my desired NumPy array beforehand, fill it with np.nan and then use a specific iterating order to fill the values with my DataFrame, but I have not managed to solve the problem with this approach yet .
To generate the sample DataFrame
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
ind = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.arange(8*4).reshape((8, 4)), index=ind)
Some reshape and swapaxes magic -
df.values.reshape(4,2,-1).swapaxes(1,2)
Generalizable to -
m,n = len(df.index.levels[0]), len(df.index.levels[1])
arr = df.values.reshape(m,n,-1).swapaxes(1,2)
Basically splitting the first axis into two of lengths 4 and 2 creating a 3D array and then swapping the last two axes, i.e. pushing in the axis of length 2 to the back (as the last one).
Sample output -
In [35]: df.values.reshape(4,2,-1).swapaxes(1,2)
Out[35]:
array([[[ 0, 4],
[ 1, 5],
[ 2, 6],
[ 3, 7]],
[[ 8, 12],
[ 9, 13],
[10, 14],
[11, 15]],
[[16, 20],
[17, 21],
[18, 22],
[19, 23]],
[[24, 28],
[25, 29],
[26, 30],
[27, 31]]])
to complete the answer of #divakar, for a multidimensionnal generalisation :
# sort values by index
A = df.sort_index()
# fill na
for idx in A.index.names:
A = A.unstack(idx).fillna(0).stack(1)
# create a tuple with the rights dimensions
reshape_size = tuple([len(x) for x in A.index.levels])
# reshape
arr = np.reshape(A.values, reshape_size ).swapaxes(0,1)