Related
So I'm finished one part of this assignment I have to do. There's only one part of the assignment that doesn't make any sense to me.
I'm doing a LinearRegression model and according to others I need to apply ans[i,:] = y_poly at the very end, but I never got an answer as to why.
Can someone please explain to me what [i,:] means? I haven't found any explanations online.
It's specific to the numpy module, used in most data science modules.
ans[i,:] = y_poly
this is assigning a vector to a slice of numpy 2D array (slice assignment). Self-contained example:
>>> import numpy
>>> a = numpy.array([[0,0,0],[1,1,1]])
>>> a[0,:] = [3,4,5]
>>> a
array([[3, 4, 5],
[1, 1, 1]])
There is also slice assignment in base python, using only one dimension (a[:] = [1,2,3])
I guess you are also using numpy to manipulate data (as a matrix)?
If based on numpy, ans[i,:] means to pick the ith 'row' of ans with all of its 'columns'.
Note: when dealing with numpy arrays, we should (almost) always use [i, j] instead of [i][j]. This might be counter-intuitive if you've used Python or Java to manipulate matrixes before.
I think in this case [] means the indexing operator for a class object which can be used by defining the getitem method
class A:
def __getitem__(self, key):
pass
key can be literally anything. In your case "[1,:]" key is a tuple containing of "1" and a slice(None, None, None). Such a key can be useful if your class represents multi-dimensional data which you want to access via [] operator. A suggested by others answers this could be a numpy array:
Here is a quick example of how such a multi-dimensional indexing could work:
class A:
values = [[1,2,3,4], [4,5,6,7]]
def __getitem__(self, key):
i, j = key
if isinstance(i, int):
i = slice(i, i + 1)
if isinstance(j, int):
j = slice(j, j + 1)
for row in self.values[i]:
print(row[j])
>>>a = A()
>>>a[:,2:4]
[3, 4]
[6, 7]
>>>a[1,1]
[5]
>>>a[:, 2]
[3]
[6]
what is the difference below codes
built-in list code
>>> a = [1,2,3,4]
>>> b = a[1:3]
>>> b[1] = 0
>>> a
[1, 2, 3, 4]
>>> b
[2, 0]
numpy array
>>> c = numpy.array([1,2,3,4])
>>> d = c[1:3]
>>> d[1] = 0
>>> c
array([1, 2, 0, 4])
>>> d
array([2, 0])
as it is seen in numpy array c is effected directly. I think in built-in lists, new memory is allocated for the variable b. Probably in numpy the reference of c[1:3] is assigned d, I am not clear about these.
How this works for numpy and built-in?
The key point to understand is that every assignment in Python associates a name with an object in memory. Python never copies on assignment. It now becomes important to understand when new objects are created and how they behave.
In your first example, the slicing in the list creates a new list object. In this case, both of the lists reference some of the same objects (the int 2 and the int 3). The fact that these references are copied is what is called a "shallow" copy. In other words, the references are copied, but the objects they refer to are still the same. Keep in mind that this will be true regardless of the type of thing that is stored in the list.
Now, we create a new object (the int 0) and assign b[1] = 0. Because a and b are separate lists, it should not surprise us that they now show different elements.
I like the pythontutor visualisation of this situation.
In the array case, "All arrays generated by basic slicing are always views of the original array.".
This new object shares data with the original, and indexed assignment is handled in such a way that any updates to the view will update the shared data.
This has been covered alot, but finding a good duplicate is too much work. :(
Let's see if I can quickly describe things with your examples:
>>> a = [1,2,3,4] # a list contains pointers to numbers elsewhere
>>> b = a[1:3] # a new list, with copies of those pointers
>>> b[1] = 0 # change one pointer in b
>>> a
[1, 2, 3, 4] # does not change any pointers in a
>>> b
[2, 0]
An array has a different structure - it has a data buffer with 'raw' numbers (or other byte values).
numpy array
>>> c = numpy.array([1,2,3,4])
>>> d = c[1:3] # a view; a new array but uses same data buffer
>>> d[1] = 0 # change a value in d;
>>> c
array([1, 2, 0, 4]) # we see the change in the corrsponding slot of c
>>> d
array([2, 0])
The key point with lists is that they contain pointers to objects. You can copy the pointers without copying the objects; and you can change pointers without changing other copies of the pointers.
To save memory and speed numpy as implemented a concept of view. It can make a new array without copying values from the original - because it can share the data buffer. But it is also possible to make a copy, e.g.
e = c[1:3].copy()
e[0] = 10
# no change in c
view v copy is a big topic in numpy and a fundamental one, especially when dealing with different kinds of indexing (slices, basic, advanced). We can help with questions, but you also should read the numpy docs. There's no substitute for understanding the basics of how a numpy array is stored.
http://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
http://www.scipy-lectures.org/advanced/advanced_numpy/ (may be more advanced that what you need now)
Lets say I have a numpy array like
x = np.arange(10)
is it somehow possible to create a reference to a single element i.e.
y = create_a_reference_to(x[3])
y = 100
print x
[ 0 1 2 100 4 5 6 7 8 9]
You can't create a reference to a single element, but you can get a view over that single element:
>>> x = numpy.arange(10)
>>> y = x[3:4]
>>> y[0] = 100
>>> x
array([0, 1, 2, 100, 4, 5, 6, 7, 8, 9])
The reason you can't do the former is that everything in python is a reference. By doing y = 100, you're modifying what y points to - not it's value.
If you really want to, you can get that behaviour on instance attributes by using properties. Note this is only possible because the python data model specifies additional operations while accessing class attributes - it's not possible to get this behaviour for variables.
No you cannot do that, and that is by design.
Numpy arrays are of type numpy.ndarray. Individual items in it can be accessed with numpy.ndarray.item which does "copy an element of an array to a standard Python scalar and return it".
I'm guessing numpy returns a copy instead of direct reference to the element to prevent mutability of numpy items outside of numpy's own implementation.
Just as a thoughtgame, let's assume this wouldn't be the case and you would be allowed to get reference to individual items. Then what would happen if: numpy was in the midle of calculation and you altered an individual intime in another thread?
#goncalopp gives a correct answer, but there are a few variations that will achieve similar effects.
All of the notations shown below are able to reference a single element while still returning a view:
x = np.arange(10)
two_index_method = [None] * 10
scalar_element_method = [None] * 10
expansion_method = [None] * 10
for i in range(10):
two_index_method[i] = x[i:i+1]
scalar_element_method[i] = x[..., i] # x[i, ...] works, too
expansion_method[i] = x[:, np.newaxis][i] # np.newaxis == None
two_index_method[5] # Returns a length 1 numpy.ndarray, shape=(1,)
# >>> array([5])
scalar_element_method[5] # Returns a numpy scalar, shape = ()
# >>> array(5)
expansion_method[5] # Returns a length 1 numpy.ndarray, shape=(1,)
# >>> array([5])
x[5] = 42 # Change the value in the original `ndarray`
x
# >>> array([0, 1, 2, 3, 4, 42, 6, 7, 8, 9]) # The element has been updated
# All methods presented here are correspondingly updated:
two_index_method[5], scalar_element_method[5], expansion_method[5]
# >>> (array([42]), array(42), array([42]))
Since the object in scalar_element_method is a dimension zero scalar, attempting to reference the element contained within the ndarray via element[0] will return an IndexError. For a scalar ndarray, element[()] can be used to reference the element contained within the numpy scalar. This method can also be used for assignment to a length-1 ndarray, but has the unfortunate side effect that it does not dereference a length-1 ndarray to a python scalar. Fortunately, there is a single method, element.item(), that can be used (for dereferencing only) to obtain the value regardless of whether the element is a length one ndarray or a scalar ndarray:
scalar_element_method[5][0] # This fails
# >>> IndexError: too many indices for array
scalar_element_method[5][()] # This works for scalar `ndarray`s
# >>> 42
scalar_element_method[5][()] = 6
expansion_method[5][0] # This works for length-1 `ndarray`s
# >>> 6
expansion_method[5][()] # Doesn't return a python scalar (or even a numpy scalar)
# >>> array([6])
expansion_method[5][()] = 8 # But can still be used to change the value by reference
scalar_element_method[5].item() # item() works to dereference all methods
# >>> 8
expansion_method[5].item()
# >>> [i]8
TLDR; You can create a single-element view v with v = x[i:i+1], v = x[..., i], or v = x[:, None][i]. While different setters and getters work with each method, you can always assign values with v[()]=new_value, and you can always retrieve a python scalar with v.item().
Is it possible to modify the numpy.random.choice function in order to make it return the index of the chosen element?
Basically, I want to create a list and select elements randomly without replacement
import numpy as np
>>> a = [1,4,1,3,3,2,1,4]
>>> np.random.choice(a)
>>> 4
>>> a
>>> [1,4,1,3,3,2,1,4]
a.remove(np.random.choice(a)) will remove the first element of the list with that value it encounters (a[1] in the example above), which may not be the chosen element (eg, a[7]).
Regarding your first question, you can work the other way around, randomly choose from the index of the array a and then fetch the value.
>>> a = [1,4,1,3,3,2,1,4]
>>> a = np.array(a)
>>> random.choice(arange(a.size))
6
>>> a[6]
But if you just need random sample without replacement, replace=False will do. Can't remember when it was firstly added to random.choice, might be 1.7.0. So if you are running very old numpy it may not work. Keep in mind the default is replace=True
Here's one way to find out the index of a randomly selected element:
import random # plain random module, not numpy's
random.choice(list(enumerate(a)))[0]
=> 4 # just an example, index is 4
Or you could retrieve the element and the index in a single step:
random.choice(list(enumerate(a)))
=> (1, 4) # just an example, index is 1 and element is 4
numpy.random.choice(a, size=however_many, replace=False)
If you want a sample without replacement, just ask numpy to make you one. Don't loop and draw items repeatedly. That'll produce bloated code and horrible performance.
Example:
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.random.choice(a, size=5, replace=False)
array([7, 5, 8, 6, 2])
On a sufficiently recent NumPy (at least 1.17), you should use the new randomness API, which fixes a longstanding performance issue where the old API's replace=False code path unnecessarily generated a complete permutation of the input under the hood:
rng = numpy.random.default_rng()
result = rng.choice(a, size=however_many, replace=False)
This is a bit in left field compared with the other answers, but I thought it might help what it sounds like you're trying to do in a slightly larger sense. You can generate a random sample without replacement by shuffling the indices of the elements in the source array :
source = np.random.randint(0, 100, size=100) # generate a set to sample from
idx = np.arange(len(source))
np.random.shuffle(idx)
subsample = source[idx[:10]]
This will create a sample (here, of size 10) by drawing elements from the source set (here, of size 100) without replacement.
You can interact with the non-selected elements by using the remaining index values, i.e.:
notsampled = source[idx[10:]]
Maybe late but it worth to mention this solution because I think the simplest way to do so is:
a = [1, 4, 1, 3, 3, 2, 1, 4]
n = len(a)
idx = np.random.choice(list(range(n)), p=np.ones(n)/n)
It means you are choosing from the indices uniformly. In a more general case, you can do a weighted sampling (and return the index) in this way:
probs = [.3, .4, .2, 0, .1]
n = len(a)
idx = np.random.choice(list(range(n)), p=probs)
If you try to do so for so many times (e.g. 1e5), the histogram of the chosen indices would be like [0.30126 0.39817 0.19986 0. 0.10071] in this case which is correct.
Anyway, you should choose from the indices and use the values (if you need) as their probabilities.
Instead of using choice, you can also simply random.shuffle your array, i.e.
random.shuffle(a) # will shuffle a in-place
Based on your comment:
The sample is already a. I want to work directly with a so that I can control how many elements are still left and perform other operations with a. – HappyPy
it sounds to me like you're interested in working with a after n randomly selected elements are removed. Instead, why not work with N = len(a) - n randomly selected elements from a? Since you want them to still be in the original order, you can select from indices like in #CTZhu's answer, but then sort them and grab from the original list:
import numpy as np
n = 3 #number to 'remove'
a = np.array([1,4,1,3,3,2,1,4])
i = np.random.choice(np.arange(a.size), a.size-n, replace=False)
i.sort()
a[i]
#array([1, 4, 1, 3, 1])
So now you can save that as a again:
a = a[i]
and work with a with n elements removed.
Here is a simple solution, just choose from the range function.
import numpy as np
a = [100,400,100,300,300,200,100,400]
I=np.random.choice(np.arange(len(a)))
print('index is '+str(I)+' number is '+str(a[I]))
The question title versus its description are a bit different. I just wanted the answer to the title question which was getting only an (integer) index from numpy.random.choice(). Rather than any of the above, I settled on index = numpy.random.choice(len(array_or_whatever)) (tested in numpy 1.21.6).
Ex:
import numpy
a = [1, 2, 3, 4]
i = numpy.random.choice(len(a))
The problem I had in the other solutions were the unnecessary conversions to list which would recreate the entire collection in a new object (slow!).
Reference: https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html?highlight=choice#numpy.random.choice
Key point from the docs about the first parameter a:
a: 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)
Since the question is very old then it's possible I'm coming at this from the convenience of newer versions supporting exactly what myself and the OP wanted.
I came across the fact that numpy arrays are passed by reference at multiple places, but then when I execute the following code, why is there a difference between the behavior of foo and bar
import numpy as np
def foo(arr):
arr = arr - 3
def bar(arr):
arr -= 3
a = np.array([3, 4, 5])
foo(a)
print a # prints [3, 4, 5]
bar(a)
print a # prints [0, 1, 2]
I'm using python 2.7 and numpy version 1.6.1
In Python, all variable names are references to values.
When Python evaluates an assignment, the right-hand side is evaluated before the left-hand side. arr - 3 creates a new array; it does not modify arr in-place.
arr = arr - 3 makes the local variable arr reference this new array. It does not modify the value originally referenced by arr which was passed to foo. The variable name arr simply gets bound to the new array, arr - 3. Moreover, arr is local variable name in the scope of the foo function. Once the foo function completes, there is no more reference to arr and Python is free to garbage collect the value it references. As Reti43 points out, in order for arr's value to affect a, foo must return arr and a must be assigned to that value:
def foo(arr):
arr = arr - 3
return arr
# or simply combine both lines into `return arr - 3`
a = foo(a)
In contrast, arr -= 3, which Python translates into a call to the __iadd__ special method, does modify the array referenced by arr in-place.
The first function calculates (arr - 3), then assigns the local name arr to it, which doesn't affect the array data passed in. My guess is that in the second function, np.array overrides the -= operator, and operates in place on the array data.
Python passes the array by reference:
$:python
...python startup message
>>> import numpy as np
>>> x = np.zeros((2,2))
>>> x
array([[0.,0.],[0.,0.]])
>>> def setx(x):
... x[0,0] = 1
...
>>> setx(x)
>>> x
array([[1.,0.],[0.,0.]])
The top answer is referring to a phenomenon that occurs even in compiled c-code, as any BLAS events will involve a "read-onto" step where either a new array is formed which the user (code writer in this case) is aware of, or a new array is formed "under the hood" in a temporary variable which the user is unaware of (you might see this as a .eval() call).
However, I can clearly access the memory of the array as if it is in a more global scope than the function called (i.e., setx(...)); which is exactly what "passing by reference" is, in terms of writing code.
And let's do a few more tests to check the validity of the accepted answer:
(continuing the session above)
>>> def minus2(x):
... x[:,:] -= 2
...
>>> minus2(x)
>>> x
array([[-1.,-2.],[-2.,-2.]])
Seems to be passed by reference. Let us do a calculation which will definitely compute an intermediate array under the hood, and see if x is modified as if it is passed by reference:
>>> def pow2(x):
... x = x * x
...
>>> pow2(x)
>>> x
array([[-1.,-2.],[-2.,-2.]])
Huh, I thought x was passed by reference, but maybe it is not? -- No, here, we have shadowed the x with a brand new declaration (which is hidden via interpretation in python), and python will not propagate this "shadowing" back to global scope (which would violate the python-use case: namely, to be a beginner level coding language which can still be used effectively by an expert).
However, I can very easily perform this operation in a "pass-by-reference" manner by forcing the memory (which is not copied when I submit x to the function) to be modified instead:
>>> def refpow2(x):
... x *= x
...
>>> refpow2(x)
>>> x
array([[1., 4.],[4., 4.]])
And so you see that python can be finessed a bit to do what you are trying to do.