I read on the official numpy page that there was an alternative to creating an initialised array without using just zeros:
empty, unlike zeros, does not set the array values to zero, and may therefore be marginally faster.
I created two functions below that demonstrate an unusual issue with using this function:
import numpy as np
def getInitialisedArray():
return np.empty((), dtype=np.float).tolist()
def createFloatArray(x):
return np.array(float(x))
If I just call getInitialisedArray() on its own, here is what it outputs:
>>> getInitialisedArray()
0.007812501848093234
And if I just call the createFloatArray() function:
>>> createFloatArray(3.1415)
3.1415
This all seems fine, but if I repeat the test and call the getInitialisedArray() after creating the float array, there is an issue:
print(getInitialisedArray())
print(createFloatArray(3.1415))
print(getInitialisedArray())
Output:
>>>
0.007812501848093234
3.1415
3.1415
It seems the second call to get an initialised array gets the same value as what was put in the normal np.array(). I don't get why this occurs. Shouldn't they be separate arrays that have no link between each other?
--- Update ---
I repeated this and changed the size of the empty array:
import numpy as np
def getInitialisedArray():
# Changed size to 2 x 2
return np.empty((2, 2), dtype=np.float).tolist()
def createFloatArray(x):
return np.array(float(x))
print(getInitialisedArray())
print(createFloatArray(3.1415))
print(getInitialisedArray())
Output:
[[1.6717403e-316, 6.9051865033801e-310], [9.97338022253e-313, 2.482735075993e-312]]
3.1415
[[1.6717403e-316, 6.9051865033801e-310], [9.97338022253e-313, 2.482735075993e-312]]
This is the sort of output I was expecting, but here it works because I changed the size. Does size now affect if an empty array takes on the same value of a normal np.array()?
From the documentation, np.empty() will "return a new array of given shape and type, without initializing entries." This should mean that it will just allocate a space in memory for the variable it is assigned to. Whatever the data in the memory space is will not be changed.
In the first example, you are printing the return from getInitialisedArray without actually storing it. Python must then know you didn't store the address of that value. Then python will keep that address for the next value that needs an address. Since createFloatArray does not store the address as well, the value in the address will be changed to 3.1415, and python will keep the address for the next assignment. When you call getInitialisedArray again, it will use that address again and print out 3.1415. If you change the datatype however (such as changing the dimensions of the array), depending on how python handles that datatype, it might need more blocks of memory and have to get a different address. In theory, if createFloatArray was the same shape as getInitialisedArray, it could have the same behavior.
WARNING! I would highly recommend not doing this. It is possible that python or your system in general will perform a task between those two operations which would change the memory address between calls even if it is the same datatype.
Check id() for each array after initializing it. np.empty() creates space that later can be used after initializing array of the same shape.
for more understanding:
print(np.array(float(1))
print(np.empty((),dtype=np.float).tolist())
same thing but assigning to variables:
x = np.array(float(1))
y = np.empty((),dtype=np.float).tolist()
print(x)
print(y)
Related
I want to understand why the comportement of my array is like that:
import numpy as np
import math
n=10
P=np.array([[0]*n]*n)
P[2][2]=1 #It works as I want
for i in range(n):
for j in range(n):
P[i][j]=math.comb(n,j+1)*((i+1)/n)**(j+1)*(1-(i+1)/n)**(n-j-1)
print(math.comb(n,j+1)*((i+1)/n)**(j+1)*(1-(i+1)/n)**(n-j-1))
print(P)
I get as a result for P an array with only 0 except 1 for the (n,n) position but values printed are not 0.
I suppose it comes from the fact that I use [[0]*n]*n for my list with mutable/immutable variable because it works well when I use np.zeros() but I don't understand why it works when I set value manually (with P[2][2]=1 for example)
Thanks
The way you are creating the array is defaulting to an integer dtype because it uses the first value to determine the type if you don't explicitly set it. You can demonstrate this by trying to assign a float instead of an int with
P[2][2]=1 #It works as I want
P[2][2]=0.3 #It doesn't work
To use your approach you need to create an array with a dtype of float so values don't get clipped: P=np.array([[0.0]*n]*n) or P=np.array([[0]*n]*n, dtype=float).
This will produce an array of the expected values:
array([[3.87420489e-01, 1.93710245e-01, 5.73956280e-02, 1.11602610e-02,
1.48803480e-03, 1.37781000e-04, 8.74800000e-06, 3.64500000e-07,
9.00000000e-09, 1.00000000e-10],
[2.68435456e-01, 3.01989888e-01, 2.01326592e-01, 8.80803840e-02,
2.64241152e-02, 5.50502400e-03, 7.86432000e-04, 7.37280000e-05,
4.09600000e-06, 1.02400000e-07],
...
I learned that numpy.reshape doesn't really change the array's shape. Ok. And I can use numpy.resize instead. That's fine.
My question is why the '-1' notation doesn't work using 'np.resize'? As it does with 'np.reshape'...
myarray = np.arange(16)
myarray.resize((4,-1))
gives me
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-606-aa09b20c947a> in <module>
1 myarray = np.arange(16)
2
----> 3 myarray.resize((4,-1))
ValueError: negative dimensions not allowed
As I checked, resize does not allow negative arguments.
It is a feature of reshape (and some other methods, but not resize).
If you want to have a new array, change your code to:
myNewArray = myarray.reshape((4,-1)).copy()
Another solution to change the shape in-place:
myarray.shape = (4, -1)
Here you may pass -1 as one element of the new shape tuple.
Since reshape (method or function) does not change the total number of elements, it can safely calculate one of dimensions, given the others.
resize can change the total number of elements, so it can't safely make any assumptions about the desired dimensions.
Looking at code, both in other numpy functions, and SO answers, you'll see that reshape is widely used. The fact that it doesn't operate in-place usually isn't a problem. It returns a view (where possible), so time isn't an issue.
Since resize can change the number of elements, it can be more dangerous (but occasionally more useful). Keep in mind that the function form fills with different values than the method form.
I often create sample arrays with
arr = np.arange(24).reshape(2,3,4)
In python, I am trying to change the values of np array inside the function
def function(array):
array = array + 1
array = np.zeros((10, 1))
function(array)
For array as function parameter, it is supposed to be a reference, and I should be able to modify its content inside function.
array = array + 1 performs element wise operation that adds one to every element in the array, so it changes inside values.
But the array actually does not change after the function call. I am guessing that the program thinks I am trying to change the reference itself, not the content of the array, because of the syntax of the element wise operation. Is there any way to make it do the intended behavior? I don't want to loop through individual elements or make the function return the new array.
This line:
array = array + 1
… does perform an elementwise operation, but the operation it performs is creating a new array with each element incremented. Assigning that array back to the local variable array doesn't do anything useful, because that local variable is about to go away, and you haven't done anything to change the global variable of the same name,
On the other hand, this line:
array += 1
… performs the elementwise operation of incrementing all of the elements in-place, which is probably what you want here.
In Python, mutable collections are only allowed, not required, to handle the += statement this way; they could handle it the same way as array = array + 1 (as immutable types like str do). But builtin types like list, and most popular third-party types like np.array, do what you want.
Another solution if you want to change the content of your array is to use this:
array[:] = array + 1
If I want to get the dot product of two arrays, I can get a performance boost by specifying an array to store the output in instead of creating a new array (if I am performing this operation many times)
import numpy as np
a = np.array([[1.0,2.0],[3.0,4.0]])
b = np.array([[2.0,2.0],[2.0,2.0]])
out = np.empty([2,2])
np.dot(a,b, out = out)
Is there any way I can take advantage of this feature if I need to modify an array in place? For instance, if I want:
out = np.array([[3.0,3.0],[3.0,3.0]])
out *= np.dot(a,b)
Yes, you can use the out argument to modify an array (e.g. array=np.ones(10)) in-place, e.g. np.multiply(array, 3, out=array).
You can even use in-place operator syntax, e.g. array *= 2.
To confirm if the array was updated in-place, you can check the memory address array.ctypes.data before and after the modification.
Please see the code snippet below:
import numpy as np
# Load the .txt file in
myData = np.loadtxt('data.txt')
# Extract the time and acceleration columns
time = myData[:,0]
# Extract the linear acceleration columns
xLinearAcc = myData[:,4]
yLinearAcc = myData[:,5]
zLinearAcc = myData[:,6]
# Find the linear accelerations
xLinearAccSqr = myData[:,0]
for i, v in enumerate(xLinearAcc):
xLinearAccSqr[i] = pow(v,2)
myData is my 2D data matrix. What I am trying to do is to extract the 4th column into an new array xLinearAcc. Then I square every single term in xLinearAcc and store them into another new array xLinearAccSqr.
(The reason why I have xLinearAccSqr = myData[:,0] is that if I do not have that line, the compiler always tells me that my xLinearAccSqr is undefined. So I just randomly make it equal to the 1st column, because anyway later all the values get overwritten. Dunno whether this line causes trouble or not)
Then comes the problem.
The first column of myData gets strangely modified. I do not want this.
Anyone can help??
I will really appreciate the help!!~~
==========================UPDATES=======================================
Problem solved.
Post the solution here may help others.
Use
xLinearAccSqr = copy(myData[:,0])
Some how I guess Python passes the references instead of the values.
Thus, just make a copy then.
NumPy arrays behave differently from regular Python lists. In NumPy, basic slicing returns a view on the original array. That's why your original array gets modified when you modify the slice.
Create a new array using any of the array creation routines.