If I have a tensor of (30,40,50), and I want to expand it out to the first order, then I get a second order tensor of (30,2000), and I don't know if tensorflow has an API that implements it.
import tensorflow as tf
import numpy as np
data1=tf.constant([
[[2,5,7,8],[6,4,9,10],[14,16,86,54]],
[[16,43,65,76],[43,65,7,24],[15,75,23,75]]])
data5=tf.reshape(data1,[3,8])
data2,data3,data4=tf.split(data1,3,1)
data6=tf.reshape(data2,[1,8])
data7=tf.reshape(data3,[1,8])
data8=tf.reshape(data4,[1,8])
data9=tf.concat([data6,data7,data8],0)
with tf.Session() as sess:
print(sess.run(data5))
print(sess.run(data))
This gives:
data5
[[ 2 5 7 8 6 4 9 10]
[14 16 86 54 16 43 65 76]
[43 65 7 24 15 75 23 75]]
data9
[[ 2 5 7 8 16 43 65 76]
[ 6 4 9 10 43 65 7 24]
[14 16 86 54 15 75 23 75]]
How do I get data9 directly?
Looks like you're trying to take the sub-tensors ranging across axis 0 (data1[0], data1[1], ...) and concatenate them along axis 2.
Transposing before reshaping should do the trick:
tf.reshape(tf.transpose(data1, [1,0,2]), [data1.shape[1], data1.shape[0] * data1.shape[2]])
You can try:
data9 = tf.layers.flatten(tf.transpose(data1, perm=[1, 0, 2]))
Output:
array([[ 2, 5, 7, 8, 16, 43, 65, 76],
[ 6, 4, 9, 10, 43, 65, 7, 24],
[14, 16, 86, 54, 15, 75, 23, 75]], dtype=int32)
Related
I am trying to reshape a matrix, but I am struggling to make reshape work
Lets say that I have a (6x6) matrix (A), and we want to divide it in 4 arrays (A1,A2,A3,A4). For example
A=[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]
[19 20 21 22 23 24]
[25 26 27 28 29 30]
[31 32 33 34 35 36]]
I want to divide it in 4 parts, such as:
A=[[ 1 2 3| 4 5 6]
[ 7 8 9| 10 11 12]
[13 14 15| 16 17 18]
---------------------
[19 20 21| 22 23 24]
[25 26 27| 28 29 30]
[31 32 33| 34 35 36]]
such as
A1=[[ 1 2 3]
[ 7 8 9]
[13 14 15]]
A2= ..
A3= ..
A4=[[22 23 24]
28 29 30]
34 35 36]]
Any suggestions would help me a lot!
The smaller arrays could simply be created by slicing the bigger array.
A = np.array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]])
A1 = A[0:3, 0:3]
A2 = A[3:6, 0:3]
A3 = A[0:3, 3:6]
A4 = A[3:6, 3:6]
When using reshape, the new array should be compatible with the old array (the number of elements should stay the same)
So I have a source array like this:
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 100]
[ 0 100 33 100]
[ 3 110 22 100]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 100]]
and I want to update the array with this one, depend on the first column
[[ 3 110 22 105]
[ 5 105 17 110]
[ 1 95 28 115]]
to be like this
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 110]
[ 0 100 33 100]
[ 3 110 22 105]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 115]]
but I can't find a function in NumPy can do this directly, so currently have no way to do that better than this method I wrote:
def update_ary_with_ary(source, updates):
for x in updates:
index_of_col = np.argwhere(source[:,0] == x[0])
source[index_of_col] = x
This function makes a loop so it's not professional and not have high performance so I will use this until some-one give me a better way with NumPy laps, I don't want a solution from another lap, just Numpy
Assuming your source array is s and update array is u, and assuming that s and u are not huge, you can do:
update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u
Testing:
import numpy as np
s = np.array(
[[ 9, 85, 32, 100],
[ 7, 80, 30, 100],
[ 2, 90, 16, 100],
[ 6, 120, 22, 100],
[ 5, 105, 17, 100],
[ 0, 100, 33, 100],
[ 3, 110, 22, 100],
[ 4, 80, 22, 100],
[ 8, 115, 19, 100],
[ 1, 95, 28, 100]])
u = np.array(
[[ 3, 110, 22, 105],
[ 5, 105, 17, 110],
[ 1, 95, 28, 115]])
update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u
print(s)
This prints:
[[ 9 85 32 100]
[ 7 80 30 100]
[ 2 90 16 100]
[ 6 120 22 100]
[ 5 105 17 110]
[ 0 100 33 100]
[ 3 110 22 105]
[ 4 80 22 100]
[ 8 115 19 100]
[ 1 95 28 115]]
Edit:
OP has provided the following additional details:
The "source array" is "huge".
Each row in the "update array" matches
exactly one row in the "source array".
Based on this additional detail, the following alternative solution might provide a better performance, especially if the source array does not have its rows sorted on the first column:
sorted_idx = np.argsort(s[:,0])
pos = np.searchsorted(s[:,0],u[:,0],sorter=sorted_idx)
update_row_ids = sorted_idx[pos]
s[update_row_ids] = u
fountainhead your answer works correctly and yes it's full used Numpy laps, but in the performance test, it's rise the time on processing 50K rows in my simulation program in double!! from 22 seconds to 44 seconds!! I don't know why!! but your answer helps me to get the right answer on only this line:
source[updates[:,0]] = updates
# or
s[u[:,0]] = u
so when I use this its lower processing time from for 100K rows to only 0.5 seconds and then let me process more like 1M rows for only 5 seconds, am already learning python and data mining am shocked from these numbers, it's never happing before on other languages I play on the huge array like regular variables. you can see that on my GitHub.
https://github.com/qahmad81/war_simulation
fountainhead you should take the answer but visited should know the best answer to use.
have two data frames
import pandas as pd
df = pd.DataFrame({'x': [10, 47, 58, 68, 75, 80],
'y': [10, 9, 8, 7, 6, 5]})
df2 = pd.DataFrame({'x': [45, 55, 66, 69, 79, 82], 'y': [10, 9, 8, 7, 6, 5]})
df1
x y
10 10
47 9
58 8
68 7
75 6
80 5
df2
x y
45 10
55 9
66 8
69 7
79 6
82 5
I want to interpolate between them and generate a new data frame with a sampling rate of N. Assume N=3 for this example.
The desired output is
x y
10 10
27.5 10
45 10
...
75 6
77 6
79 6
80 5
81 5
82 5
How can I use my data frames to create the desired output?
If you don't mind using numpy, this solution will give you your desired output:
import pandas as pd
import numpy as np
N = 3
df = pd.DataFrame({'x': [10, 47, 58, 68, 75, 80],
'y': [10, 9, 8, 7, 6, 5]})
df2 = pd.DataFrame({'x': [45, 55, 66, 69, 79, 82], 'y': [10, 9, 8, 7, 6, 5]})
new_x = np.array([np.linspace(i, j, N) for i, j in zip(df['x'], df2['x'])]).flatten()
new_y = df['y'].loc[np.repeat(df.index.values, N)]
final_df = pd.DataFrame({'x': new_x, 'y': new_y})
print(final_df)
Output
x y
0 10.0 10
1 27.5 10
2 45.0 10
3 47.0 9
...
15 80.0 5
16 81.0 5
17 82.0 5
I have a dataset that consists of columns 0 to 10, and I would like to extract the information that is only in columns 1 to 5, not 6, and 7 to 9 (it means not the last column). So far, I have done the following:
A = B[:, [[1:5], [7:-1]]]
but I got a syntax error, how can I retrieve that data?
Advanced indexing doesn't take a list of lists of slices. Instead, you can use numpy.r_. This function doesn't take negative indices, but you can get round this by using np.ndarray.shape:
A = B[:, np.r_[1:6, 7:B.shape[1]-1]]
Remember to add 1 to the second part, since a: b does not include b, in the same way slice(a, b) does not include b. Also note that indexing begins at 0.
Here's a demo:
import numpy as np
B = np.random.randint(0, 10, (3, 11))
print(B)
[[5 8 8 8 3 0 7 2 1 6 7]
[4 3 8 7 3 7 5 6 0 5 7]
[1 0 4 0 2 2 5 1 4 2 3]]
A = B[:,np.r_[1:6, 7:B.shape[1]-1]]
print(A)
[[8 8 8 3 0 2 1 6]
[3 8 7 3 7 6 0 5]
[0 4 0 2 2 1 4 2]]
Another way would be to get your slices independently, and then concatenate:
A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)
Using similar example data as #jpp:
B = np.random.randint(0, 10, (3, 10))
>>> B
array([[0, 5, 0, 6, 8, 5, 9, 3, 2, 0],
[8, 8, 1, 7, 3, 5, 7, 7, 4, 8],
[5, 5, 5, 2, 3, 1, 6, 4, 9, 6]])
A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)
>>> A
array([[5, 0, 6, 8, 5, 3, 2],
[8, 1, 7, 3, 5, 7, 4],
[5, 5, 2, 3, 1, 4, 9]])
how about union the range?
B[:, np.union1d(range(1,6), range(7,10))]
Just to add some of my thoughts. There are two approaches one can take using either numpy or pandas. So I will demonstrate with some data, and assume that the data is the grades for a student in different courses he/she is enrolled in.
import pandas as pd
import numpy as np
data = {'Course A': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
'Course B': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
'Course C': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78],
'Course D': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
'Course E': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
'Course F': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78]
}
df = pd.DataFrame(data=data)
df.head()
CA CB CC CD CE CF
0 84 85 97 84 85 97
1 82 82 94 82 82 94
2 81 72 93 81 72 93
3 89 77 95 89 77 95
4 73 75 88 73 75 88
NOTE: CA through CF represent Course A through Course F.
To help us remember column names and their associated indexes, we can build a list of columns and their indexes via list comprehension.
map_cols = [f"{c[0]}:{c[1]}" for c in enumerate(df.columns)]
['0:Course A',
'1:Course B',
'2:Course C',
'3:Course D',
'4:Course E',
'5:Course F']
Now, to select say Course A, and Course D through Course F using indexing in numpy, you can do the following:
df.iloc[:, np.r_[0, 3:df.shape[1]]]
CA CD CE CF
0 84 84 85 97
1 82 82 82 94
2 81 81 72 93
3 89 89 77 95
4 73 73 75 88
You can also use pandas to the same effect.
df[[df.columns[0], *df.columns[3:]]]
CA CD CE CF
0 84 84 85 97
1 82 82 82 94
2 81 81 72 93
3 89 89 77 95
4 73 73 75 88
One can solve that with the sum of range
[In]: columns = list(range(1,6)) + list(range(7,10))
[Out]:
[1, 2, 3, 4, 5, 7, 8, 9]
Then, considering that your df is called df, using iloc to select the DF columns
newdf = df.iloc[:, columns]
So I found this:
When converting MATLAB code it might be necessary to first reshape a
matrix to a linear sequence, perform some indexing operations and then
reshape back. As reshape (usually) produces views onto the same
storage, it should be possible to do this fairly efficiently.
Note that the scan order used by reshape in Numpy defaults to the 'C'
order, whereas MATLAB uses the Fortran order. If you are simply
converting to a linear sequence and back this doesn't matter. But if
you are converting reshapes from MATLAB code which relies on the scan
order, then this MATLAB code:
z = reshape(x,3,4);
should become
z = x.reshape(3,4,order='F').copy()
in Numpy.
I have a multidimensional 16*2 array called mafs, when I do in MATLAB:
mafs2 = reshape(mafs,[4,4,2])
I get something different than when in python I do:
mafs2 = reshape(mafs,(4,4,2))
or even
mafs2 = mafs.reshape((4,4,2),order='F').copy()
Any help on this? Thank you all.
Example:
MATLAB:
>> mafs = [(1:16)' (17:32)']
mafs =
1 17
2 18
3 19
4 20
5 21
6 22
7 23
8 24
9 25
10 26
11 27
12 28
13 29
14 30
15 31
16 32
>> reshape(mafs,[4 4 2])
ans(:,:,1) =
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
ans(:,:,2) =
17 21 25 29
18 22 26 30
19 23 27 31
20 24 28 32
Python:
>>> import numpy as np
>>> mafs = np.c_[np.arange(1,17), np.arange(17,33)]
>>> mafs.shape
(16, 2)
>>> mafs[:,0]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
>>> mafs[:,1]
array([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32])
>>> r = np.reshape(mafs, (4,4,2), order="F")
>>> r.shape
(4, 4, 2)
>>> r[:,:,0]
array([[ 1, 5, 9, 13],
[ 2, 6, 10, 14],
[ 3, 7, 11, 15],
[ 4, 8, 12, 16]])
>>> r[:,:,1]
array([[17, 21, 25, 29],
[18, 22, 26, 30],
[19, 23, 27, 31],
[20, 24, 28, 32]])
I was having a similar issue myself, as I am also trying to make the transition from MATLAB to Python. I was finally able to convert a numpy matrix, given in depth, row, col, format to a single sheet of column vectors (per image).
In MATLAB I would have done something like:
output = reshape(imStack,[row*col,depth])
In Python this seems to translate to:
import numpy as np
output=np.transpose(imStack)
output=output.reshape((row*col, depth), order='F')