numpy.insert() function insert array into wrong index - python

Here, my code feats value form text file; and create matrices as multidimensional array, but the problem is the code create more then two dimensional array, that I can't manipulate, I need two dimensional array, how I do that?
Explain algorithm of my code:
Moto of code: My code fetch value from a specific folder, each folder contain 7 'txt' file, that generate from one user, in this way multiple folder contain multiple data of multiple user.
step1: Start a 1st for loop, and control it using how many folder have in specific folder,and in variable 'path' store the first path of first folder.
step2: Open the path and fetch data of 7 txt file using 2nd for loop.after feats, it close 2nd for loop and execute the rest code.
step3: Concat the data of 7 txt file in one 1d array.
step4: create 2d array using getting data of 2 folder
step5(here problem arise): create a row in 2d array ind inser id array
import numpy as np
import array as arr
import os
f_path='Result'
array_control_var=0
#for feacth directory path
for (path,dirs,file) in os.walk(f_path):
if(path==f_path):
continue
f_path_1= path +'\page_1.txt'
#Get data from page1 indivisualy beacuse there string type data exiest
pgno_1 = np.array(np.loadtxt(f_path_1, dtype='U', delimiter=','))
#only for page_2.txt
f_path_2= path +'\page_2.txt'
with open(f_path_2) as f:
str_arr = ','.join([l.strip() for l in f])
pgno_2 = np.asarray(str_arr.split(','), dtype=int)
#using loop feach data from those text file.datda type = int
for j in range(3,8):
#store file path using variable
txt_file_path=path+'\page_'+str(j)+'.txt'
if os.path.exists(txt_file_path)==True:
#genarate a variable name that auto incriment with for loop
foo='pgno_'+str(j)
else:
break
#pass the variable name as string and store value
exec(foo + " = np.array(np.loadtxt(txt_file_path, dtype='i', delimiter=','))")
#marge all array from page 2 to rest in single array in one dimensation
f_array=np.concatenate((pgno_2,pgno_3,pgno_4,pgno_5,pgno_6,pgno_7), axis=0)
#for first time of the loop assing this value
if array_control_var==0:
main_f_array=f_array
if array_control_var==1:
#here use np.array()
main_f_array=np.array([main_f_array,f_array])
else:
main_f_array=np.insert(main_f_array, array_control_var, f_array, 0)
array_control_var+=1
print(main_f_array)
I want output like this
Initial
[[0,0,0],[0,0,0,]]
after insert
[[0,0,0],[0,0,0],[0,0,0]]
but the out put is
[array([0, 0, 0])
array([0, 0, 0])
0 0 0]

When I recommend replacing the insert with a list build, here's what I have in mind.
import numpy as np
alist = []
for i in range(4):
f_array = np.array([i, i+2, i+4])
alist.append(f_array)
print(alist)
main_f_array = np.array(alist)
print(main_f_array)
test run:
1246:~/mypy$ python3 stack54715610.py
[array([0, 2, 4]), array([1, 3, 5]), array([2, 4, 6]), array([3, 5, 7])]
[[0 2 4]
[1 3 5]
[2 4 6]
[3 5 7]]
If your file loading produces arrays that differ in size you'll get different results
f_array = np.arange(i, i+1+i)
1246:~/mypy$ python3 stack54715610.py
[array([0]), array([1, 2]), array([2, 3, 4]), array([3, 4, 5, 6])]
[array([0]) array([1, 2]) array([2, 3, 4]) array([3, 4, 5, 6])]
This is a 1d object dtype array, as opposed to the 2d.

As I commented, collecting arrays with insert (or variations on concatenate) is hard to do right, and slow when working. It builds a whole new array each time. Collecting the arrays in a list, and doing one array build at the end is easier and faster. List append is efficient, and easy to use.
That said, your reported result looks suspicious. I can reproduce it with:
In [281]: arr = np.zeros(2, object)
In [282]: arr
Out[282]: array([0, 0], dtype=object)
In [283]: arr[0] = np.array([0,0,0])
In [284]: arr[1] = np.array([0,0,0])
In [285]: arr
Out[285]: array([array([0, 0, 0]), array([0, 0, 0])], dtype=object)
In [286]: np.insert(arr, 2, np.array([0,0,0]), 0)
Out[286]: array([array([0, 0, 0]), array([0, 0, 0]), 0, 0, 0], dtype=object)
At an earlier iteration, main_f_array must have been created as an object dtype array.
If it had been a 'normal' 2d array, the insert would be different:
In [287]: arr1 = np.zeros((2,3),int)
In [288]: np.insert(arr1, 2, np.array([0,0,0]), 0)
Out[288]:
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
Or more iteratively as I think you wanted:
In [289]: f_array = np.array([0,0,0])
In [290]: main = f_array
In [291]: main = np.array([main, f_array])
In [292]: main
Out[292]:
array([[0, 0, 0],
[0, 0, 0]])
In [293]: main = np.insert(main, 2, f_array, 0)
In [294]: main
Out[294]:
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])

Related

Repeat Array while Maintaining Order within group

I have the below array and would like to repeat each array n times.
x_array
[array([14.91488012, 1.2986064 , 4.98965322]),
array([2.39389187e+02, 1.04442059e-01, 3.06391338e-01]),
array([ 48.19437348, 201.09951372, 0.35223001]),
array([ 19.96978171, 367.52578786, 0.68676553]),
array([0.55120466, 0.27133609, 0.75646697]),
array([8.21287360e+02, 1.76495077e+02, 4.87263691e-01]),
array([184.03439377, 1.24823107, 5.33109884]),
array([575.59800297, 186.4650814 , 2.21028258]),
array([0.50308552, 3.09976082, 0.10537899]),
array([1.02259912e+00, 1.52282513e+02, 1.15085308e-01])]
I've tried np.repeat(x_array, 2) but this doesn't preserve the order of the matrix/array. I've also tried x_array*2, but this seems to just put the new array at the bottom. I was hopping to repeat x_array[0] n times and do the same for the next set of arrays, so that I have n total of each in order.
Thanks in advance.
Building off of the last example from https://numpy.org/doc/stable/reference/generated/numpy.repeat.html,
x_array = np.array(x_array) # Or a similiar operation to convert x_array to an ndarray vs. a list of arrays.
expanded_x_array = np.repeat(x_array, n, axis=0)
print(expanded_x_array)
should produce what you are looking for.
You just need to specify the axis:
>>> np.repeat(x_array, 2, axis=0)
array([[1.49149e+01, 1.29861e+00, 4.98965e+00],
[1.49149e+01, 1.29861e+00, 4.98965e+00],
[2.39389e+02, 1.04442e-01, 3.06391e-01],
[2.39389e+02, 1.04442e-01, 3.06391e-01],
...,
[5.03086e-01, 3.09976e+00, 1.05379e-01],
[5.03086e-01, 3.09976e+00, 1.05379e-01],
[1.02260e+00, 1.52283e+02, 1.15085e-01],
[1.02260e+00, 1.52283e+02, 1.15085e-01]])
From the docs:
numpy.repeat(a, repeats, axis=None)
...
axis int, optional
The axis along which to repeat values. By default, use the flattened input array, and return a flat output array.
(added bold)
You could use a list comprehension:
n = 2
repeated_list = [row for row in a for _ in range(n)]
print(repeated_list)
Your terminology is confusing. You say it's an "array", but the display looks more like a list, And the fact that x_array*2 puts an "new array" at the bottom confirms that - that's a list use of *.
np.repeat(x_array) first makes an array (a real one!)
np.array(x_array)
is a (n,3) float dtype array. Without axis np.repeat flattens - as documented!
Specifying the axis=0 works because it's repeating on that first n dimension. The result is a (2*n,3) float dtype array (not a list).
It is possible to make a 1d object dtype array containing those arrays. With that repeat will work without the axis parameter.
Knowing what you have, and describing it accurately, can make this kind of task much easier - and the questions clearer.
illustration
Make a list of arrays:
In [21]: alist = [np.ones(3,int),np.zeros(3,int),np.arange(3)]
In [22]: alist
Out[22]: [array([1, 1, 1]), array([0, 0, 0]), array([0, 1, 2])]
List repeat:
In [23]: alist*2
Out[23]:
[array([1, 1, 1]),
array([0, 0, 0]),
array([0, 1, 2]),
array([1, 1, 1]),
array([0, 0, 0]),
array([0, 1, 2])]
Make a 2d array from the list:
In [24]: np.array(alist)
Out[24]:
array([[1, 1, 1],
[0, 0, 0],
[0, 1, 2]])
repeat without axis repeats elements in a flattened way:
In [25]: np.repeat(alist,2)
Out[25]: array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 2])
repeat this 2d array on 0 axis:
In [26]: np.repeat(alist,2,axis=0)
Out[26]:
array([[1, 1, 1],
[1, 1, 1],
[0, 0, 0],
[0, 0, 0],
[0, 1, 2],
[0, 1, 2]])
Object dtype array from list:
In [27]: arr = np.empty(3,object); arr[:]=alist
In [28]: arr
Out[28]: array([array([1, 1, 1]), array([0, 0, 0]), array([0, 1, 2])], dtype=object)
Since the arrays have the same size we have to use this special construct. Otherwise we get the 2d array [24].
This array has a repeat method, and with only one dimension we dont need to specify the axis. It's repeating the object elements, arrays, not the numbers in the 2d [24] array.
In [29]: arr.repeat(2)
Out[29]:
array([array([1, 1, 1]), array([1, 1, 1]), array([0, 0, 0]),
array([0, 0, 0]), array([0, 1, 2]), array([0, 1, 2])], dtype=object)

Create a Numpy array with an Arbitary shape, (prefereably without for loops)

I am attempting to map the Topology of my neural network using numpy.
I am looking for a method to create an irregularly shaped array preferably without the use of for loops.
The code below creates a numpy array of objects. the array is an irregular shape and will change based on the "Iarray" variable passed in.
The topology of my Neural net is [2,3,2] so this function outputs a array with three columns, 2 elements in the first, 3 elements in the second, and 2 elements in the third.
def object_array(Iarray):
Array = np.empty([1,len(Iarray)],"object")
Cell_Chain = np.empty()
for i in range(len(Iarray)):
row = np.array([LSTM.Cell(i,ii) for ii in range(Iarray[i])])
Array[0,i] = row
return Array
This is clunky looking, and I would very much like to find a better way to write this code.
If anybody has an idea, I would be happy to hear them.
It's easy to create an object dtype array:
In [550]: arr = np.empty(5, object)
In [551]: arr
Out[551]: array([None, None, None, None, None], dtype=object)
You can fill it from a list of objects:
In [552]: arr[:] = [np.arange(i) for i in range(5)]
In [553]: arr
Out[553]:
array([array([], dtype=int64), array([0]), array([0, 1]),
array([0, 1, 2]), array([0, 1, 2, 3])], dtype=object)
in fact you can create the array directly from the list:
In [554]: np.array([np.arange(i) for i in range(5)])
Out[554]:
array([array([], dtype=int64), array([0]), array([0, 1]),
array([0, 1, 2]), array([0, 1, 2, 3])], dtype=object)
In [555]: np.array([np.arange(3) for i in range(5)])
Out[555]:
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[0, 1, 2]])
Assignment to the predefined array is more reliable:
In [561]: arr[:]=[np.arange(3) for i in range(5)]
In [562]: arr
Out[562]:
array([array([0, 1, 2]), array([0, 1, 2]), array([0, 1, 2]),
array([0, 1, 2]), array([0, 1, 2])], dtype=object)
Occasionally you can have broadcasting errors in such an assignment.
But in any case, you still have to create the objects that you are going to assign to the array, and it's hard to avoid loops when doing that - at least not in the most general cases.

Efficient way to create an array of nonzero values in a numpy tensor?

I have a 3D numpy grid A[ix,iy,iz] and I filter out elements by zeroing them out via A[ A<minval ] = 0, or A[ A>maxval ] = 0, etc. I then want to perform statistics on the remaining items. For now, I am doing:
for ai in np.reshape(A, nx*ny*nz):
if( ai > 0 ):
Atemp.append(ai)
and then I perform statistics on Atemp. This takes quite a long time, however, and I wonder if there is a more efficient way to create Atemp. For what it's worth, I am working with several GB of data in these arrays.
NOTE: I do not want a different way to filter these items out. I want to zero them out, then create a temporary array of all nonzero elements in A.
You can use:
Atemp = A[A != 0]
Eg:
In [3]: x = np.array([0,1,2,3,0,1,2,3,0]).reshape((3,3))
In [4]: x
Out[4]:
array([[0, 1, 2],
[3, 0, 1],
[2, 3, 0]])
In [5]: x[x == 0]
Out[5]: array([0, 0, 0])
In [6]: x[x != 0]
Out[6]: array([1, 2, 3, 1, 2, 3])
Another option:
Atemp = A.ravel()[np.flatnonzero(A)]

How to convert a series of index/category, into a classification array

How, to convert a series of indexes, into a 2-D array which expresses the category/classifier that's defined by the indexes values in list ?
e.g.:
import numpy as np
aList = [0,1,0,2]
anArray = np.array(aList)
resultArray = convertToCategories(anArray)
and the return value of convertToCategories() would be like:
[[1,0,0], # the 0th element of aList is index category 0
[0,1,0], # the 1st element of aList is index category 1
[1,0,0], # the 2nd element of aList is index category 0
[0,0,1]] # the 3rd element of aList is index category 2
In last resort, I could of course:
parse the list,
count the number of categories (it's contiguous/continuous, it could be simply to find the maximum)
create a zeroed array with the good size found
then reparse the list, so as to fill the array according the indices given by the list, with 1 (or True).
But I am wondering if there exists a more pythonic, or dedicated numpy, or pandas function to achieve this kind of transformation.
You can do something like this -
import numpy as np
# Size parameters
N = anArray.size
M = anArray.max()+1
# Setup output array
resultArray = np.zeros((N,M),int)
# Find out the linear indices where 1s would be put
idx = (np.arange(N)*M) + anArray
# Finally, put 1s at those places for the final output
resultArray.ravel()[idx] = 1
Sample run -
In [188]: anArray
Out[188]: array([0, 1, 0, 2, 4, 1, 3])
In [189]: resultArray
Out[189]:
array([[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0]])
Or, better just directly index into the output array with the row and column indices -
# Setup output array and put 1s at places indexed by row and column indices.
# Here, anArray would be the column indices and [0,1,....N-1] would be the row indices
resultArray = np.zeros((N,M),int)
resultArray[np.arange(N),anArray] = 1

initialize a matrix in python of the row number

Is there a way to initialize a 3 row, 5 column matrix which contains these values without using a for loop?
[[0 0 0 0 0
1 1 1 1 1
2 2 2 2 2]]
It's possible.
i = 0
matrix = []
while i <=2:
matrix += [[i]*5]
i += 1
Without any for loops or list comprehensions, you can use a combination of built-in functions:
map(list, zip(*[range(3)] * 5))
If you're dealing with large datasets and are worried about performance, you might want to consider putting your data into a two-dimensional NumPy array. Here are a couple of ways of generating the matrix you ask for in NumPy:
>>> import numpy as np
>>> np.indices((3, 5))[0]
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2]])
>>> np.repeat(np.arange(3), 5).reshape((3, 5))
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2]])
The first of these is simpler, but a little bit wasteful: the np.indices call actually generates the array you want (which could be regarded as an array of row indices) along with a companion array of column indices:
>>> np.indices((3, 5))[1]
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
with both arrays packed conveniently into a single array of shape (2, 3, 5). If you need that second array anyway for what you're doing then np.indices is the way to go (though in that case you may also want to look into NumPy's mgrid, ogrid and meshgrid functions). The second solution with np.repeat only generates the values you need, and not surprisingly, finishes about twice as fast on my machine when I bump the size of the matrix up to (3000, 5000):
In [19]: %timeit np.indices((3000, 5000))[0]
10 loops, best of 3: 156 ms per loop
In [20]: %timeit np.repeat(np.arange(3000), 5000).reshape((3000, 5000))
10 loops, best of 3: 88.4 ms per loop
Having said that, using np.repeat in this way is a little bit of an antipattern in NumPy: it's often better to avoid the repetition by creating a 2d array with 3 rows and a single column, and rely on NumPy's broadcasting to interpret this correctly when it's combined with other arrays. If you go that way, all you need is:
>>> np.arange(3)[:, np.newaxis]
array([[0],
[1],
[2]])
This is an array of shape (3, 1); a subsequent operation with an array of shape (5,) or (1, 5) (for example) would be subject to NumPy's broadcasting rules, producing an output of shape (3, 5). For example, here's what happens when we add a 1d array of zeros to the above:
>>> np.arange(3)[:, np.newaxis] + np.zeros(5, dtype=int)
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2]])
And for completeness, here's one more variation, using np.tile:
>>> np.tile(np.arange(3)[:, np.newaxis], (1, 5))
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2]])
All of these solutions should have reasonably similar performance for large values of 3 and 5; if this is a bottleneck, you'll want to do timings on your machine to decide which to use. On my machine, the +np.zeros broadcasting solution beats the others by some margin.
This is one of easy way to understand for Python Beginner.
matrix = []
for data in range(3):
matrix.append([data] * 5)
This is possible using:
[[data] * 5 for data in range(3)]

Categories