reshaping ndarrays versus regular arrays in numpy? - python

I have an object of type 'numpy.ndarray', called "myarray", that when printed to the screen using python's "print", looks like hits
[[[ 84 0 213 232] [153 0 304 363]]
[[ 33 0 56 104] [ 83 0 77 238]]
[[ 0 0 9 61] [ 0 0 2 74]]]
"myarray" is made by another library. The value of myarray.shape equals (3, 2). I expected this to be a 3dimensional array, with three indices. When I try to make this structure myself, using:
second_array = array([[[84, 0, 213, 232], [153, 0, 304, 363]],
[[33, 0, 56, 104], [83, 0, 77, 238]],
[[0, 0, 9, 61], [0, 0, 2, 74]]])
I get that second_array.shape is equal to (3, 2, 4), as expected. Why is there this difference? Also, given this, how can I reshape "myarray" so that the two columns are merged, i.e. so that the result is:
[[[ 84 0 213 232 153 0 304 363]]
[[ 33 0 56 104 83 0 77 238]]
[[ 0 0 9 61 0 0 2 74]]]
Edit: to clarify, I know that in the case of second_array, I can do second_array.reshape((3,8)). But how does this work for the ndarray which has the format of myarray but does not have a 3d index?
myarray.dtype is "object" but can be changed to be ndarray too.
Edit 2: Getting closer, but still cannot quite get the ravel/flatten followed by reshape. I have:
a = array([[1, 2, 3],
[4, 5, 6]])
b = array([[ 7, 8, 9],
[10, 11, 12]])
arr = array([a, b])
I try:
arr.ravel().reshape((2,6))
But this gives [[1, 2, 3, 4, 5, 6], ...] and I wanted [[1, 2, 3, 7, 8, 9], ...]. How can this be done?
thanks.

Indeed, ravel and hstack can be useful tools for reshaping arrays:
import numpy as np
myarray = np.empty((3,2),dtype = object)
myarray[:] = [[np.array([ 84, 0, 213, 232]), np.array([153, 0, 304, 363])],
[np.array([ 33, 0, 56, 104]), np.array([ 83, 0, 77, 238])],
[np.array([ 0, 0, 9, 61]), np.array([ 0, 0, 2, 74])]]
myarray = np.hstack(myarray.ravel()).reshape(3,2,4)
print(myarray)
# [[[ 84 0 213 232]
# [153 0 304 363]]
# [[ 33 0 56 104]
# [ 83 0 77 238]]
# [[ 0 0 9 61]
# [ 0 0 2 74]]]
myarray = myarray.ravel().reshape(3,8)
print(myarray)
# [[ 84 0 213 232 153 0 304 363]
# [ 33 0 56 104 83 0 77 238]
# [ 0 0 9 61 0 0 2 74]]
Regarding Edit 2:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6]])
b = np.array([[ 7, 8, 9],
[10, 11, 12]])
arr = np.array([a, b])
print(arr)
# [[[ 1 2 3]
# [ 4 5 6]]
# [[ 7 8 9]
# [10 11 12]]]
Notice that
In [45]: arr[:,0,:]
Out[45]:
array([[1, 2, 3],
[7, 8, 9]])
Since you want the first row to be [1,2,3,7,8,9], the above shows that you want the second axis to be the first axis. This can be accomplished with the swapaxes method:
print(arr.swapaxes(0,1).reshape(2,6))
# [[ 1 2 3 7 8 9]
# [ 4 5 6 10 11 12]]
Or, given a and b, or equivalently, arr[0] and arr[1], you could form arr directly with the hstack method:
arr = np.hstack([a, b])
# [[ 1 2 3 7 8 9]
# [ 4 5 6 10 11 12]]

Related

array copy and view in numpy python

I am new to numpy.Recently only I started learning.I am doing one practice problem and getting error.
Question is to replace all even elements in the array by -1.
import numpy as np
np.random.seed(123)
array6 = np.random.randint(1,50,20)
slicing_array6 = array6[array6 %2==0]
print(slicing_array6)
slicing_array6[:]= -1
print(slicing_array6)
print("Answer is:")
print(array6)
I am getting output as :
[46 18 20 34 48 10 48 26 20]
[-1 -1 -1 -1 -1 -1 -1 -1 -1]
Answer is:
[46 3 29 35 39 18 20 43 23 34 33 48 10 33 47 33 48 26 20 15]
My doubt is why original array not replaced?
Thank you in advance for help
In [12]: np.random.seed(123)
...: array6 = np.random.randint(1,50,20)
...: slicing_array6 = array6[array6 %2==0]
In [13]: array6.shape
Out[13]: (20,)
In [14]: slicing_array6.shape
Out[14]: (9,)
slicing_array6 is not a view; it's a copy. It does not use or reference the array6 data:
In [15]: slicing_array6.base
Modifying this copy does not change array6:
In [16]: slicing_array6[:] = -1
In [17]: slicing_array6
Out[17]: array([-1, -1, -1, -1, -1, -1, -1, -1, -1])
In [18]: array6
Out[18]:
array([46, 3, 29, 35, 39, 18, 20, 43, 23, 34, 33, 48, 10, 33, 47, 33, 48,
26, 20, 15])
But if the indexing and modification occurs in the same step:
In [19]: array6[array6 %2==0] = -1
In [20]: array6
Out[20]:
array([-1, 3, 29, 35, 39, -1, -1, 43, 23, -1, 33, -1, -1, 33, 47, 33, -1,
-1, -1, 15])
slicing_array6 = array6[array6 %2==0] has actually done
array6.__getitem__(array6%2==0)
array6[array6 %2==0] = -1 does
array6.__setitem__(array6 %2==0, -1)
A __setitem__ applied to a view does change the original, the base.
An example with view that works:
In [32]: arr = np.arange(10)
In [33]: arr
Out[33]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [34]: x = arr[::3] # basic indexing, with a slice
In [35]: x
Out[35]: array([0, 3, 6, 9])
In [36]: x.base # it's a `view` of arr
Out[36]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [37]: x[:] = -1
In [38]: arr
Out[38]: array([-1, 1, 2, -1, 4, 5, -1, 7, 8, -1])
Explanation
Let's move step by step. We start with
array6 = np.array([46, 3, 29, 35, 39, 18, 20, 43, 23, 34, 33, 48, 10, 33, 47, 33, 48, 26, 20, 15])
print(array6 %2==0)
# array([ True, False, False, False, False, True, True, False, False,
# True, False, True, True, False, False, False, True, True,
# True, False])
You just made a mask (%2==0) to array6. Then you apply the mask to it and assign to a new variable:
slicing_array6 = array6[array6 %2==0]
print(slicing_array6)
# [46 18 20 34 48 10 48 26 20]
Note that this returns a new array:
print(id(array6))
# 2643833531968
print(id(slicing_array6))
# 2643833588112
print(array6)
# [46 3 29 35 39 18 20 43 23 34 33 48 10 33 47 33 48 26 20 15]
# unchanged !!
Final step, you assign all elements in slicing_array6 to -1:
slicing_array6[:]= -1
print(slicing_array6)
# [-1 -1 -1 -1 -1 -1 -1 -1 -1]
Solution
Instead of assigning masked array to a new variable, you apply new value directly to the original array:
array6[array6 %2==0] = -1
print(array6)
print(id(array6))
# [-1 3 29 35 39 -1 -1 43 23 -1 33 -1 -1 33 47 33 -1 -1 -1 15]
# 2643833531968
# same id !!
Numpy's slicing functionality directly modifies the array, but you have to assign the value you want to it.
Not to completely ruin your learning, here is a similar example to what you are trying to achieve:
import numpy as np
a = np.arange(10)**3
print(f"start: {a}")
#start: [ 0 1 8 27 64 125 216 343 512 729]
# to assign every 3rd item as -99
a[2::3] = -99
print(f"answer: {a}")
#answer: [ 0 1 -99 27 64 -99 216 343 -99 729]

Is there a method of vectorizing the printing of elements in a Numpy array?

I have a numpy array named "a":
a = numpy.array([
[[1, 2, 3], [11, 22, 33]],
[[4, 5, 6], [44, 55, 66]],
])
I want to print the following (in this exact format):
1 2 3
11 22 33
4 5 6
44 55 66
To accomplish this, I wrote the following:
for i in range(len(A)):
a = A[i]
for j in range(len(a)):
a1 = a[j][0]
a2 = a[j][1]
a3 = a[j][2]
print(a1, a2, a3)
The output is:
1 2 3
11 22 33
4 5 6
44 55 66
I would like to vectorize my solution (if possible) and discard the for loop. I understand that this problem might not benefit from vectorization. In reality (for work-related purposes), the array "a" has 52 elements and each element contains hundreds of arrays stored inside. I'd like to solve a basic/trivial case and move onto a more advanced, realistic case.
Also, I know that Numpy arrays were not meant to be iterated through.
I could have used Python lists to accomplish the following, but I really want to vectorize this (if possible, of course).
You could use np.apply_along_axis which maps the array with a function on an arbitrary axis. Applying it on axis=2 to get the desired result.
Using print directly as the callback:
>>> np.apply_along_axis(print, 2, a)
[1 2 3]
[11 22 33]
[4 5 6]
[44 55 66]
Or with a lambda wrapper:
>>> np.apply_along_axis(lambda r: print(' '.join([str(x) for x in r])), 2, a)
1 2 3
11 22 33
4 5 6
44 55 66
In [146]: a = numpy.array([
...: [[1, 2, 3], [11, 22, 33]],
...: [[4, 5, 6], [44, 55, 66]],
...: ])
...:
In [147]: a
Out[147]:
array([[[ 1, 2, 3],
[11, 22, 33]],
[[ 4, 5, 6],
[44, 55, 66]]])
A proper "vectorized" numpy output is:
In [148]: a.reshape(-1,3)
Out[148]:
array([[ 1, 2, 3],
[11, 22, 33],
[ 4, 5, 6],
[44, 55, 66]])
You could also convert that to a list of lists:
In [149]: a.reshape(-1,3).tolist()
Out[149]: [[1, 2, 3], [11, 22, 33], [4, 5, 6], [44, 55, 66]]
But you want a print without the standard numpy formatting (nor list formatting)
But this iteration is easy:
In [150]: for row in a.reshape(-1,3):
...: print(*row)
...:
1 2 3
11 22 33
4 5 6
44 55 66
Since your desired output is a print, or at least "unformatted" strings, there's no "vectorized", i.e. whole-array, option. You have to iterate on each line!
np.savetxt creates a csv output by iterating on rows and writing a format tuple, e.g. f.write(fmt%tuple(row)).
In [155]: np.savetxt('test', a.reshape(-1,3), fmt='%d')
In [156]: cat test
1 2 3
11 22 33
4 5 6
44 55 66
To get that exact output without iterating, try this:
print(str(a.tolist()).replace('], [', '\n').replace('[', '').replace(']', '').replace(',', ''))

Need to find the greatest sum of five adjacent numbers in a 25x25 grid

import numpy as np
Bees = open("BeesData.txt", "r")
Bees = Bees.read()
Bees = Bees.split( )
for i in range(0, len(Bees)):
Bees[i] = int(Bees[i])
Bees = np.reshape(Bees,(25,25))
def suma5x5(x,y):
suma=0
This is the BeesData file:
https://drive.google.com/file/d/1aWcLZq2MuGENavoTnCfokXr1Nnyygz23/view?usp=sharing![enter image description here](https://i.stack.imgur.com/G9u5P.png)
I took this as kind of a learning exercise for myself. Not a numpy user really.
TLDR:
search_area = np.arange(25).reshape(5,5)
search_kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
results = convolve2d(search_area, search_kernel, mode='same')
max_index = np.argmax(results, axis=None)
max_location = np.unravel_index(max_index, results.shape)
print(max_location)
Assuming that 5 adjacent means: up, down, left, right, center, then you can find the value using a convolution.
Assume that we want to find the sum of every 3x3 block, only for the values marked as 1's:
[[0, 1, 0],
[1, 1, 1],
[0, 1, 0]]
This shape can be used as your kernel for a convolution. It will be used to sum up every value in a 3x3 square by multiplying their corresponding values by 1. e.g for
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
You would get 1x0 + 2x1 + 3x0 + 4x1 + 5x1 + 6x1 + 7x0 + 8x1 + 9x0 = 25
scipy has a method for this called convolve2d.
import numpy as np
from scipy.signal import convolve2d
search_area = np.arange(36).reshape(6,6)
search_kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
results = convolve2d(search_area, search_kernel)
print(search_area)
print(results)
This outputs:
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
[[ 0 0 1 2 3 4 5 0]
[ 0 7 10 14 18 22 20 5]
[ 6 25 35 40 45 50 43 11]
[ 12 49 65 70 75 80 67 17]
[ 18 73 95 100 105 110 91 23]
[ 24 97 125 130 135 140 115 29]
[ 30 85 118 122 126 130 98 35]
[ 0 30 31 32 33 34 35 0]]
Because we included the edges as part of the convolution, you'll see that the result size is now 8x8 instead of the original 6x6. For values it couldn't find because they go off the edge of the array, the method assumed a value of zero.
To discard the edges you can use the same mode, which make it drop these edges from the results:
results = convolve2d(search_area, search_kernel, mode='same')
print(search_area)
print(results)
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
[[ 7 10 14 18 22 20]
[ 25 35 40 45 50 43]
[ 49 65 70 75 80 67]
[ 73 95 100 105 110 91]
[ 97 125 130 135 140 115]
[ 85 118 122 126 130 98]]
Now to find the location with the most bees, you can use argmax to get the index of the largest value, and unravel_index to get this as a location in the original shape.
max_index = np.argmax(results, axis=None)
max_location = np.unravel_index(max_index, results.shape)
print(max_location)
(4, 4)
Assuming window size of 5x5 from your last 2 lines of code:
def suma5x5(x,y):
suma=0
tried a simple basic python + numpy operation code.
arr = np.array([[2, 3, 7, 4, 6, 2, 9],
[6, 6, 9, 8, 7, 4, 3],
[3, 4, 8, 3, 8, 9, 7],
[7, 8, 3, 6, 6, 3, 4],
[4, 2, 1, 8, 3, 4, 6],
[3, 2, 4, 1, 9, 8, 3],
[0, 1, 3, 9, 2, 1, 4]])
w_size = 5
res = max(sum(sum(a)) for a in (arr[row:row+w_size, col:col+w_size] for row in range(arr.shape[0]-w_size+1) for col in range(arr.shape[1]-w_size+1)))
print(res) # 138

obtain elements of array in column based on the values of element in a different array in another column in pandas

I have the following data set in pandas.
import numpy as np
import pandas as pd
events = ['event1', 'event2', 'event3', 'event4', 'event5', 'event6']
wells = [np.array([1, 2]), np.array([1, 3]), np.array([1]),
np.array([4, 5, 6]), np.array([4, 5, 6]), np.array([7, 8])]
traces_per_well = [np.array([24, 24]), np.array([24, 21]), np.array([18]),
np.array([24, 24, 24]), np.array([24, 21, 24]), np.array([18, 21])]
df = pd.DataFrame({"event_no": events, "well_array": wells,
"trace_per_well": traces_per_well})
df["total_traces"] = df['trace_per_well'].apply(np.sum)
df['supposed_traces_no'] = df['well_array'].apply(lambda x: len(x)*24)
df['pass'] = df['total_traces'] == df['supposed_traces_no']
print(df)
the output is printed below:
event_no well_array trace_per_well total_traces supposed_traces_no pass
0 event1 [1, 2] [24, 24] 48 48 True
1 event2 [1, 3] [24, 21] 45 48 False
2 event3 [1] [18] 18 24 False
3 event4 [4, 5, 6] [24, 24, 24] 72 72 True
4 event5 [4, 5, 6] [24, 21, 24] 69 72 False
5 event6 [7, 8] [18, 21] 39 48 False
I want to create two new columns in which the item of numpy array from column trace_per_well when it is not equal to 24 will be put in one column and the corresponding array element from column well_array in another column
The result should look like this.
event_no well_array trace_per_well total_traces supposed_traces_no pass wrong_trace_in_well wrong_well
0 event1 [1, 2] [24, 24] 48 48 True NaN NaN
1 event2 [1, 3] [24, 21] 45 48 False 21 3
2 event3 [1] [18] 18 24 False 18 1
3 event4 [4, 5, 6] [24, 24, 24] 72 72 True NaN NaN
4 event5 [4, 5, 6] [24, 21, 24] 69 72 False 21 5
5 event6 [7, 8] [18, 21] 39 48 False (18, 21) (7, 8)
Any help is greatly appreciated!
I would do this with a list comprehension. Generate your result in a single pass of the data and then assign to appropriate columns.
v = pd.Series(
[list(zip(*((x, y) for x, y in zip(X, Y) if x != 24)))
for X, Y in zip(df['trace_per_well'], df['well_array'])])
df['wrong_trace_in_well'] = v.str[0]
df['wrong_well'] = v.str[-1]
df[['wrong_trace_in_well', 'wrong_well']]
wrong_trace_in_well wrong_well
0 NaN NaN
1 (21,) (3,)
2 (18,) (1,)
3 NaN NaN
4 (21,) (5,)
5 (18, 21) (7, 8)
Alternatively, if you want to do this in multiple passes, then
df['wrong_trace_in_well'] = [[x for x in X if x != 24] for X in df['trace_per_well']]
df['wrong_well'] = [
[y for x, y in zip(X, Y) if x != 24]
for X, Y in zip(df['trace_per_well'], df['well_array'])]
df[['wrong_trace_in_well', 'wrong_well']]
wrong_trace_in_well wrong_well
0 [] []
1 [21] [3]
2 [18] [1]
3 [] []
4 [21] [5]
5 [18, 21] [7, 8]

Selecting a range of columns in a dataframe

I have a dataset that consists of columns 0 to 10, and I would like to extract the information that is only in columns 1 to 5, not 6, and 7 to 9 (it means not the last column). So far, I have done the following:
A = B[:, [[1:5], [7:-1]]]
but I got a syntax error, how can I retrieve that data?
Advanced indexing doesn't take a list of lists of slices. Instead, you can use numpy.r_. This function doesn't take negative indices, but you can get round this by using np.ndarray.shape:
A = B[:, np.r_[1:6, 7:B.shape[1]-1]]
Remember to add 1 to the second part, since a: b does not include b, in the same way slice(a, b) does not include b. Also note that indexing begins at 0.
Here's a demo:
import numpy as np
B = np.random.randint(0, 10, (3, 11))
print(B)
[[5 8 8 8 3 0 7 2 1 6 7]
[4 3 8 7 3 7 5 6 0 5 7]
[1 0 4 0 2 2 5 1 4 2 3]]
A = B[:,np.r_[1:6, 7:B.shape[1]-1]]
print(A)
[[8 8 8 3 0 2 1 6]
[3 8 7 3 7 6 0 5]
[0 4 0 2 2 1 4 2]]
Another way would be to get your slices independently, and then concatenate:
A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)
Using similar example data as #jpp:
B = np.random.randint(0, 10, (3, 10))
>>> B
array([[0, 5, 0, 6, 8, 5, 9, 3, 2, 0],
[8, 8, 1, 7, 3, 5, 7, 7, 4, 8],
[5, 5, 5, 2, 3, 1, 6, 4, 9, 6]])
A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)
>>> A
array([[5, 0, 6, 8, 5, 3, 2],
[8, 1, 7, 3, 5, 7, 4],
[5, 5, 2, 3, 1, 4, 9]])
how about union the range?
B[:, np.union1d(range(1,6), range(7,10))]
Just to add some of my thoughts. There are two approaches one can take using either numpy or pandas. So I will demonstrate with some data, and assume that the data is the grades for a student in different courses he/she is enrolled in.
import pandas as pd
import numpy as np
data = {'Course A': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
'Course B': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
'Course C': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78],
'Course D': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
'Course E': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
'Course F': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78]
}
df = pd.DataFrame(data=data)
df.head()
CA CB CC CD CE CF
0 84 85 97 84 85 97
1 82 82 94 82 82 94
2 81 72 93 81 72 93
3 89 77 95 89 77 95
4 73 75 88 73 75 88
NOTE: CA through CF represent Course A through Course F.
To help us remember column names and their associated indexes, we can build a list of columns and their indexes via list comprehension.
map_cols = [f"{c[0]}:{c[1]}" for c in enumerate(df.columns)]
['0:Course A',
'1:Course B',
'2:Course C',
'3:Course D',
'4:Course E',
'5:Course F']
Now, to select say Course A, and Course D through Course F using indexing in numpy, you can do the following:
df.iloc[:, np.r_[0, 3:df.shape[1]]]
CA CD CE CF
0 84 84 85 97
1 82 82 82 94
2 81 81 72 93
3 89 89 77 95
4 73 73 75 88
You can also use pandas to the same effect.
df[[df.columns[0], *df.columns[3:]]]
CA CD CE CF
0 84 84 85 97
1 82 82 82 94
2 81 81 72 93
3 89 89 77 95
4 73 73 75 88
One can solve that with the sum of range
[In]: columns = list(range(1,6)) + list(range(7,10))
[Out]:
[1, 2, 3, 4, 5, 7, 8, 9]
Then, considering that your df is called df, using iloc to select the DF columns
newdf = df.iloc[:, columns]

Categories