Python: Get item from list based on input - python

I appreciate this may not be directly possible so I would be interested how you would go about solving this problem for a general case.
I have a list item that looks like this, [(array,time),(array,time)...] the array is a numpy array which can have any n by m dimensions. This will look like array[[derivatives dimension1],[derivatives dimension 2] ...]
From the list I want a function to create two lists which would contain all the values at the position passed to it. These could then be used for plotting.
I can think of ways to do this with alternative data structures but unfortunately this is no an option.
Essentially what I want is
def f(list, pos1, pos2):
xs = []
ys = []
for i in list:
ys.append(i pos1)
xs.append(i pos2)
return xs, ys
Where i pos1 is equivalent to i[n][m]
The real problem being when it's 1 by 1 so i can't just pass integers.
Any advice would be great, sorry the post is a bit long I wanted to be clear.
Thanks

If I'm understanding your question correctly, you essentially want to select indexes from a list of lists, and create new lists from that selection.
Selecting indexes from a list of lists is fairly simple, particularly if you have a fixed number of selections:
parts = [(item[pos1], item[pos2]) for item in list]
Creating new lists from those selections is also fairly easy, using the built-in zip() function:
separated = zip(*parts)
You can further reduce memory usage by using a generator expression instead of a list comprehension in the final function:
def f( list, pos1, pos2 ):
partsgen = ((item[pos1], item[pos2]) for item in list)
return zip(*partsgen)
Here's how it looks in action:
>>> f( [['ignore', 'a', 1], ['ignore', 'b', 2],['ignore', 'c', 3]], 1, 2 )
[('a', 'b', 'c'), (1, 2, 3)]
Update: After re-reading the question and comments, I'm realizing this is a bit over-simplified. However, the general idea should still work when you exchange pos1 and pos2 for appropriate indexing into the contained array.

if i understand your question, something like the following should be easy and fast, particularly if you need to do this multiple times:
z = np.dstack([ arr for arr, time in lst ])
x, y = z[pos1], z[pos2]
for example:
In [42]: a = arange(9).reshape(3,3)
In [43]: z = np.dstack([a, a*2, a*3])
In [44]: z[0,0]
Out[44]: array([0, 0, 0])
In [45]: z[1,1]
Out[45]: array([ 4, 8, 12])
In [46]: z[0,1]
Out[46]: array([1, 2, 3])

Related

How to filter two numpy arrays?

Edit: I fixed y so that x,y have the same length
I don't understand much about programing but I have a giant mass of data to analyze and it has to be done in Python.
Say I have two arrays:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
and say I want to choose the values in y which are greater than 17, and keep only the values in x which has the same index as the left values in y. for example I want to erase the first value of y (25) and accordingly the matching value in x (1).
I tried this:
filter=np.where(y>17, 0, y)
but I don't know how to filter the x values accordingly (the actual data are much longer arrays so doing it "by hand" is basically imposible)
Solution: using #mozway tip, now that x,y have the same length the needed code is:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
x_filtered=x[y>17]
As your question is not fully clear and you did not provide the expected output, here are two possibilities:
filtering
Nunique arrays can be sliced by an array (iterable) of booleans.
If the two arrays were the same length you could do:
x[y>17]
Here, xis longer than y so we first need to make it the same length:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20])
x[:len(y)][y>17]
Output: array([1, 2, 4, 5, 8])
replacement
To select between x and y based on a condition, use where:
np.where(y>17, x[:len(y)], y)
Output:
array([ 1, 2, 16, 4, 5, 5, 9, 8])
As someone with little experience in Numpy specifically, I wrote this answer before seeing #mozway's excellent answer for filtering. My answer works on more generic containers than Numpy's arrays, though it uses more concepts as a result. I'll attempt to explain each concept in enough detail for the answer to make sense.
TL;DR:
Please, definitely read the rest of the answer, it'll help you understand what's going on.
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([25,18,16,19,30,5,9,20])
filtered_x_list = []
filtered_y_list = []
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
# These lines are just for us to see what happened
print(filtered_x) # prints [1 2 4 5 8]
print(filtered_y) # prints [25 18 19 30 20]
Pre-requisite Knowledge
Python containers (lists, arrays, and a bunch of other stuff I won't get into)
Lets take a look at the line:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
What's Python doing?
The first thing it's doing is creating a list:
[1, 2, 3] # and so on
Lists in Python have a few features that are useful for us in this solution:
Accessing elements:
x_list = [ 1, 2, 3 ]
print(x_list[0]) # prints 1
print(x_list[1]) # prints 2, and so on
Adding elements to the end:
x_list = [ 1, 2, 3 ]
x_list.append(4)
print(x_list) # prints [1, 2, 3, 4]
Iteration:
x_list = [ 1, 2, 3 ]
for x in x_list:
print(x)
# prints:
# 1
# 2
# 3
Numpy arrays are slightly different: we can still access and iterate elements in them, but once they're created, we can't modify them - they have no .append, and there are other modifications one can do with lists (like changing one value, or deleting a value) we can't do with numpy arrays.
So the filtered_x_list and the filtered_y_list are empty lists we're creating, but we're going to modify them by adding the values we care about to the end.
The second thing Python is doing is creating a numpy array, using the list to define its contents. The array constructor can take a list expressed as [...], or a list defined by x_list = [...], which we're going to take advantage of later.
A little more on iteration
In your question, for every x element, there is a corresponding y element. We want to test something for each y element, then act on the corresponding x element, too.
Since we can access the same element in both arrays using an index - x[0], for instance - instead of iterating over one list or the other, we can iterate over all indices needed to access the lists.
First, we need to figure out how many indices we're going to need, which is just the length of the lists. len(x) lets us do that - in this case, it returns 10.
What if x and y are different lengths? In this case, I chose the smallest of the two - first, do len(x) and len(y), then pass those to the min() function, which is what min(len(x), len(y)) in the code above means.
Finally, we want to actually iterate through the indices, starting at 0 and ending at len(x) - 1 or len(y) - 1, whichever is smallest. The range sequence lets us do exactly that:
for i in range(10):
print(i)
# prints:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
So range(min(len(x), len(y))), finally, gets us the indices to iterate over, and finally, this line makes sense:
for i in range(min(len(x), len(y))):
Inside this for loop, i now gives us an index we can use for both x and y.
Now, we can do the comparison in our for loop:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
Then, including xs for the corresponding ys is a simple case of just appending the same x value to the x list:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
The filtered lists now contain the numbers you're after. The last two lines, outside the for loop, just create numpy arrays from the results:
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
Which you might want to do, if certain numpy functions expect arrays.
While there are, in my opinion, better ways to do this (I would probably write custom iterators that produce the intended results without creating new lists), they require a somewhat more advanced understanding of programming, so I opted for something simpler.

List of strings to array of integers

From a list of strings, like this one:
example_list = ['010','101']
I need to get an array of integers, where each row is each one of the strings, being each character in one column, like this one:
example_array = np.array([[0,1,0],[1,0,1]])
I have tried with this code, but it isn't working:
example_array = np.empty([2,3],dtype=int)
i = 0 ; j = 0
for string in example_list:
for bit in string:
example_array[i,j] = int(bit)
j+=1
i+=1
Can anyone help me? I am using Python 3.6.
Thank you in advance for your help!
If all strings are the same length (this is crucial to building a contiguous array), then use view to efficiently separate the characters.
r = np.array(example_list)
r = r.view('<U1').reshape(*r.shape, -1).astype(int)
print(r)
array([[0, 1, 0],
[1, 0, 1]])
You could also go the list comprehension route.
r = np.array([[*map(int, list(l))] for l in example_list])
print(r)
array([[0, 1, 0],
[1, 0, 1]])
The simplest way is to use a list comprehension because it automatically generates the output list for you, which can be easily converted to a numpy array. You could do this using multiple for loops, but then you are stuck creating your list, sub lists, and appending to them. While not difficult, the code looks more elegant with list comprehensions.
Try this:
newList = np.array([[int(b) for b in a] for a in example_list])
newList now looks like this:
>>> newList
... [[0, 1, 0], [1, 0, 1]]
Note: there is not need to invoke map at this point, though that certainly works.
So what is going on here? We are iterating through your original list of strings (example_list) item-by-item, then iterating through each character within the current item. Functionally, this is equivalent to...
newList = []
for a in example_list:
tmpList = []
for b in a:
tmpList.append(int(b))
newList.append(tmpList)
newList = np.array(newList)
Personally, I find the multiple for loops to be easier to understand for beginners. However, once you grasp the list comprehensions you probably won't want to go back.
You could do this with map:
example_array = map(lambda x: map(lambda y: int(y), list(x)), example_list)
The outer lambda performs a list(x) operation on each item in example_list. For example, '010' => ['0','1','0']. The inner lambda converts the individual characters (resultants from list(x)) to integers. For example, ['0','1','0'] => [0,1,0].

Unpacking a list in python using .T?

I'm using scipy's method integrate.odeint to solve a second order LDE. The method requires that the equation be put in the form of a system of two first-order equations in two unknowns. The method
odeint(system_matrix,initial_conditions_matrix,time_values)
outputs the solution vector at each point of time in time_values. The solution vector is actually of the form [u,u'], where u is the variable I am interested in. So I want to plot only u. I found online one way of accomplishing this is to use
u,u'=odeint(system_matrix,initial_conditions_matrix,time_values).T
but I don't understand why this works and what does the .T at the end mean?
odeint(system_matrix,initial_conditions_matrix,time_values) is a matrix of 2 columns.
To be able to get the first column, first use .T (transpose) and then you are able to unpack since the elements are oriented like you want.
BTW I doubt that u' is a valid variable name. I would do:
u,_ = odeint(system_matrix,initial_conditions_matrix,time_values).T
since second value is of no interest to you.
The example I have in mind is:
>>> sol = odeint(pend, y0, t, args=(b, c))
The solution is an array with shape (101, 2). The first column is theta(t), and the second is omega(t). The following code plots both components.
>>>
>>> import matplotlib.pyplot as plt
>>> plt.plot(t, sol[:, 0], 'b', label='theta(t)')
>>> plt.plot(t, sol[:, 1], 'g', label='omega(t)')
sol[:,0] selects the first column of sol
Unpacking is usually used with a function that returns a tuple, for example:
def foo():
....
return [1,2,3],{3:3}
x, y = foo()
should end up with x being a list, y a dictionary.
But it works with any iterable, provide the number of terms match. For example a 2 row array can be unpacked into 2 arrays.
In [1]: x, y = np.arange(6).reshape(2,3)
In [4]: x,y
Out[4]: (array([0, 1, 2]), array([3, 4, 5]))
If I'd created a (3,2) array I would have needed x,y,z= ..., or .T.
Because we can index columns and rows, unpacking isn't used a lot in numpy. Usually we have too many rows to unpack. But it works just as basic Python intended to.
As a matter of curiosity, transpose works on a tuple
In [6]: np.transpose((x,y))
Out[6]:
array([[0, 3],
[1, 4],
[2, 5]])
This is actually used in np.argwhere, which turns the tuple of indices produced by np.where into array with the same number of columns as dimensions.

NumPy List Comprehension Syntax

I'd like to be able to use list comprehension syntax to work with NumPy arrays easily.
For instance, I would like something like the below obviously wrong code to just reproduce the same array.
>>> X = np.random.randn(8,4)
>>> [[X[i,j] for i in X] for j in X[i]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type
What is the easy way to do this, to avoid using range(len(X)?
First, you should not be using NumPy arrays as lists of lists.
Second, let's forget about NumPy; your listcomp doesn't make any sense in the first place, even for lists of lists.
In the inner comprehension, for i in X is going to iterate over the rows in X. Those rows aren't numbers, they're lists (or, in NumPy, 1D arrays), so X[i] makes no sense whatsoever. You may have wanted i[j] instead.
In the outer comprehension, for j in X[i] has the same problem, but is has an even bigger problem: there is no i value. You have a comprehension looping over each i inside this comprehension.
If you're confused by a comprehension, write it out as an explicit for statement, as explained in the tutorial section on List Comprehensions:
tmp = []
for j in X[i]:
tmp.append([X[i,j] for i in X])
… which expands to:
tmp = []
for j in X[i]:
tmp2 = []
for i in X:
tmp2.append(X[i,j])
tmp.append(tmp2)
… which should make it obvious what's wrong here.
I think what you wanted was:
[[cell for cell in row] for row in X]
Again, turn it back into explicit for statements:
tmp = []
for row in X;
tmp2 = []
for cell in row:
tmp2.append(cell)
tmp.append(tmp2)
That's obviously right.
Or, if you really want to use indexing (but you don't):
[[X[i][j] for j in range(len(X[i]))] for i in range(len(X))]
So, back to NumPy. In NumPy terms, that last version is:
[[X[i,j] for j in range(X.shape[1])] for i in range(X.shape[0])]
… and if you want to go in column-major order instead of row-major, you can (unlike with a list of lists):
[[X[i,j] for i in range(X.shape[0])] for j in range(X.shape[1])]
… but that will of course transpose the array, which isn't what you wanted to do.
The one thing you can't do is mix up column-major and row-major order in the same expression, because you end up with nonsense.
Of course the right way to make a copy of an array is to use the copy method:
X.copy()
Just as the right way to transpose an array is:
X.T
The easy way is to not do this. Use numpy's implicit vectorization instead. For example, if you have arrays A and B as follows:
A = numpy.array([[1, 3, 5],
[2, 4, 6],
[9, 8, 7]])
B = numpy.array([[5, 3, 5],
[3, 5, 3],
[5, 3, 5]])
then the following code using list comprehensions:
C = numpy.array([[A[i, j] * B[i, j] for j in xrange(A.shape[1])]
for i in xrange(A.shape[0])])
can be much more easily written as
C = A * B
It'll also run much faster. Generally, you will produce faster, clearer code if you don't use list comprehensions with numpy than if you do.
If you really want to use list comprehensions, standard Python list-comprehension-writing techniques apply. Iterate over the elements, not the indices:
C = numpy.array([[a*b for a, b in zip(a_row, b_row)]
for a_row, b_row in zip(A, B)]
Thus, your example code would become
numpy.array([[elem for elem in x_row] for x_row in X])
Another option (though not necessarily performant) is to rethink your problem as a map instead of a comprehension and write a ufunc:
http://docs.scipy.org/doc/numpy/reference/ufuncs.html
You can call functional-lite routines like:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_over_axes.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
Etc.
Do you mean following?
>>> [[X[i,j] for j in range(X.shape[1])] for i in range(X.shape[0])]
[[0.62757350000000001, -0.64486080999999995, -0.18372566000000001, 0.78470704000000002],
[1.78209799, -1.336448459999999 9, -1.3851422200000001, -0.49668994],
[-0.84148266000000005, 0.18864597999999999, -1.1135151299999999, -0.40225053999999 999],
[0.93852824999999995, 0.24652238000000001, 1.1481637499999999, -0.70346624999999996],
[0.83842508000000004, 1.0058 697599999999, -0.91267403000000002, 0.97991269000000003],
[-1.4265273000000001, -0.73465904999999998, 0.6684284999999999 8, -0.21551155],
[-1.1115614599999999, -1.0035033200000001, -0.11558254, -0.4339924],
[1.8771354, -1.0189299199999999, - 0.84754008000000003, -0.35387946999999997]]
Using numpy.ndarray.copy:
>>> X.copy()
array([[ 0.6275735 , -0.64486081, -0.18372566, 0.78470704],
[ 1.78209799, -1.33644846, -1.38514222, -0.49668994],
[-0.84148266, 0.18864598, -1.11351513, -0.40225054],
[ 0.93852825, 0.24652238, 1.14816375, -0.70346625],
[ 0.83842508, 1.00586976, -0.91267403, 0.97991269],
[-1.4265273 , -0.73465905, 0.6684285 , -0.21551155],
[-1.11156146, -1.00350332, -0.11558254, -0.4339924 ],
[ 1.8771354 , -1.01892992, -0.84754008, -0.35387947]])

how to get the index of numpy.random.choice? - python

Is it possible to modify the numpy.random.choice function in order to make it return the index of the chosen element?
Basically, I want to create a list and select elements randomly without replacement
import numpy as np
>>> a = [1,4,1,3,3,2,1,4]
>>> np.random.choice(a)
>>> 4
>>> a
>>> [1,4,1,3,3,2,1,4]
a.remove(np.random.choice(a)) will remove the first element of the list with that value it encounters (a[1] in the example above), which may not be the chosen element (eg, a[7]).
Regarding your first question, you can work the other way around, randomly choose from the index of the array a and then fetch the value.
>>> a = [1,4,1,3,3,2,1,4]
>>> a = np.array(a)
>>> random.choice(arange(a.size))
6
>>> a[6]
But if you just need random sample without replacement, replace=False will do. Can't remember when it was firstly added to random.choice, might be 1.7.0. So if you are running very old numpy it may not work. Keep in mind the default is replace=True
Here's one way to find out the index of a randomly selected element:
import random # plain random module, not numpy's
random.choice(list(enumerate(a)))[0]
=> 4 # just an example, index is 4
Or you could retrieve the element and the index in a single step:
random.choice(list(enumerate(a)))
=> (1, 4) # just an example, index is 1 and element is 4
numpy.random.choice(a, size=however_many, replace=False)
If you want a sample without replacement, just ask numpy to make you one. Don't loop and draw items repeatedly. That'll produce bloated code and horrible performance.
Example:
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.random.choice(a, size=5, replace=False)
array([7, 5, 8, 6, 2])
On a sufficiently recent NumPy (at least 1.17), you should use the new randomness API, which fixes a longstanding performance issue where the old API's replace=False code path unnecessarily generated a complete permutation of the input under the hood:
rng = numpy.random.default_rng()
result = rng.choice(a, size=however_many, replace=False)
This is a bit in left field compared with the other answers, but I thought it might help what it sounds like you're trying to do in a slightly larger sense. You can generate a random sample without replacement by shuffling the indices of the elements in the source array :
source = np.random.randint(0, 100, size=100) # generate a set to sample from
idx = np.arange(len(source))
np.random.shuffle(idx)
subsample = source[idx[:10]]
This will create a sample (here, of size 10) by drawing elements from the source set (here, of size 100) without replacement.
You can interact with the non-selected elements by using the remaining index values, i.e.:
notsampled = source[idx[10:]]
Maybe late but it worth to mention this solution because I think the simplest way to do so is:
a = [1, 4, 1, 3, 3, 2, 1, 4]
n = len(a)
idx = np.random.choice(list(range(n)), p=np.ones(n)/n)
It means you are choosing from the indices uniformly. In a more general case, you can do a weighted sampling (and return the index) in this way:
probs = [.3, .4, .2, 0, .1]
n = len(a)
idx = np.random.choice(list(range(n)), p=probs)
If you try to do so for so many times (e.g. 1e5), the histogram of the chosen indices would be like [0.30126 0.39817 0.19986 0. 0.10071] in this case which is correct.
Anyway, you should choose from the indices and use the values (if you need) as their probabilities.
Instead of using choice, you can also simply random.shuffle your array, i.e.
random.shuffle(a) # will shuffle a in-place
Based on your comment:
The sample is already a. I want to work directly with a so that I can control how many elements are still left and perform other operations with a. – HappyPy
it sounds to me like you're interested in working with a after n randomly selected elements are removed. Instead, why not work with N = len(a) - n randomly selected elements from a? Since you want them to still be in the original order, you can select from indices like in #CTZhu's answer, but then sort them and grab from the original list:
import numpy as np
n = 3 #number to 'remove'
a = np.array([1,4,1,3,3,2,1,4])
i = np.random.choice(np.arange(a.size), a.size-n, replace=False)
i.sort()
a[i]
#array([1, 4, 1, 3, 1])
So now you can save that as a again:
a = a[i]
and work with a with n elements removed.
Here is a simple solution, just choose from the range function.
import numpy as np
a = [100,400,100,300,300,200,100,400]
I=np.random.choice(np.arange(len(a)))
print('index is '+str(I)+' number is '+str(a[I]))
The question title versus its description are a bit different. I just wanted the answer to the title question which was getting only an (integer) index from numpy.random.choice(). Rather than any of the above, I settled on index = numpy.random.choice(len(array_or_whatever)) (tested in numpy 1.21.6).
Ex:
import numpy
a = [1, 2, 3, 4]
i = numpy.random.choice(len(a))
The problem I had in the other solutions were the unnecessary conversions to list which would recreate the entire collection in a new object (slow!).
Reference: https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html?highlight=choice#numpy.random.choice
Key point from the docs about the first parameter a:
a: 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)
Since the question is very old then it's possible I'm coming at this from the convenience of newer versions supporting exactly what myself and the OP wanted.

Categories