Implicit looping through numpy array to replace values - python

I'm new to python and am trying to find a way to implicitly replace values in "array_to_replace" with one of two values in "values_to_use" based on the values in "array_of_positions":
First, the setup:
values_to_use = np.array([[0.5, 0.3, 0.4], [0.6, 0.7, 0.75]])
array_of_positions = np.array([0, 1, 1, 0, 1, 0, 0, 1, 0, 1])
array_to_replace = np.array([[5, 5, 4], [6, 5, 4], [1, 2, 3], [9, 9, 9], [8, 8, 8], [7, 7, 7], [6, 5, 7], [5, 7, 9], [1, 3, 5], [3, 3, 3]])
Then, the brute force way to do what I want, which is to replace values in "array_to_replace" based on conditional values in "array_of_positions", is something like the following:
for pos in range(0, len(aray_to_replace)):
if (array_of_positions[pos] == 0):
array_to_replace[pos] = values_to_use[0]
else:
array_to_replace[pos] = values_to_use[1]
Would you have any recommendations on how to do this happen implicitly?

The answer for this turned out to be pretty simple. To get what I wanted, all I needed to do was the following:
print values_to_use[array_of_positions]
This gave me what I needed.

Related

How to recursively extract values from a pandas DataFrame?

I have the following pandas DataFrame:
df = pd.DataFrame([
[3, 2, 5, 2],
[8, 5, 4, 2],
[9, 0, 8, 6],
[9, 2, 7, 1],
[1, 9, 2, 3],
[8, 1, 1, 6],
[8, 8, 0, 0],
[0, 1, 3, 0],
[2, 4, 5, 3],
[4, 0, 9, 7]
])
I am trying to write a recursive function that extracts all the possible paths up until 3 iterations:
and saves them into a list. Several attempts but no results to post.
Desired Output:
[
[0, 3, 9, 4],
[0, 3, 9, 0],
[0, 3, 9, 9],
[0, 3, 9, 7],
[0, 3, 2, 9],
[0, 3, 2, 0],
...
]
Represented as a tree, this is how it looks like:
Since you use numeric naming for both rows and columns in your dataframe, it's faster to convert the frame to a 2-D numpy array. Try this;
arr = df.to_numpy()
staging = [[0]]
result = []
while len(staging) > 0:
s = staging.pop(0)
if len(s) == 4:
result.append(s)
else:
i = s[-1]
for j in range(4):
staging.append(s + [arr[i, j]])

Python - Reshape matrix by taking n consecutive rows every n rows

There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.
I have got a matrix with millions of lines (shape m x n) that looks like this:
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[...]]
From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:
The first two rows stay like they are.
Take row two and three and horizontally concatenate them to row zero and one.
Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
...
[[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7],
[...]]
How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?
Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:
import numpy
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
...,
[3997, 3997, 3997, 3997],
[3998, 3998, 3998, 3998],
[3999, 3999, 3999, 3999]])
left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[ 0, 0, 0, ..., 2, 2, 2],
[ 1, 1, 1, ..., 3, 3, 3],
[ 4, 4, 4, ..., 6, 6, 6],
...,
[3993, 3993, 3993, ..., 3995, 3995, 3995],
[3996, 3996, 3996, ..., 3998, 3998, 3998],
[3997, 3997, 3997, ..., 3999, 3999, 3999]])
Here's an application of the swapaxes answer in your link.
In [11]: x=np.array([[0, 0, 0, 0],
...: [1, 1, 1, 1],
...: [2, 2, 2, 2],
...: [3, 3, 3, 3],
...: [4, 4, 4, 4],
...: [5, 5, 5, 5],
...: [6, 6, 6, 6],
...: [7, 7, 7, 7]])
break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.
In [17]: x.reshape(2,2,2,4)
Out[17]:
array([[[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[2, 2, 2, 2],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[5, 5, 5, 5]],
[[6, 6, 6, 6],
[7, 7, 7, 7]]]])
swap the 2 middle dimensions, regrouping rows:
In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]:
array([[[[0, 0, 0, 0],
[2, 2, 2, 2]],
[[1, 1, 1, 1],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[6, 6, 6, 6]],
[[5, 5, 5, 5],
[7, 7, 7, 7]]]])
Then back to the target shape. This final step creates a copy of the original (the previous steps were view):
In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7]])
It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:
In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[4, 4, 4, 4, 6, 6, 6, 6],
[1, 1, 1, 1, 3, 3, 3, 3],
[5, 5, 5, 5, 7, 7, 7, 7]])

how to merge several arrays stored in list

I want to concatenate several arrays store in list. Length of the arrays are different. I already read this solution, but unfortunately I could not solve my problem. This is is simplified input data:
arr_all= [array([[1 ,2 , 10],
[5, 8, 3]]),
array([[1, 0, 5]]),
array([[0, 1, 8]]),
array([[9, 13, 0]]),
array([[2, 10, 2],
[1.1, 3, 3]]),
array([[25, 0, 0]])]
n_data_sets=2
n_repetition=3
Now, I want to merge (concatenate) the first array of arr_all (arr_all[0]) with the fourth one (arr_all[3]), the second (arr_all[1]) with the fifth one (arr_all[4]) and the third one (arr_all[2]) with the last one (arr_all[5]). In fact here I have two data sets (n_data_sets=2) which are repeated three times (n_repetition=3). In reality I have several data sets that are repeated tens of times. I want to put each data set in a single array of my list. I can say the input is sorted based on the repetition but I want make it based on the data sets of each repetition. My expected result is:
arr_all= [array([[1, 2 , 10],
[5, 8, 3],
[9, 13, 0]]),
array([[1, 0, 5],
[2, 10, 2],
[1.1, 3, 3]]),
array([[0, 1, 8],
[25, 0, 0]])]
My input data was a list with six arrays (n_repetition times n_data_sets) but my result has n_repetition arrays.
In advance I appreciate any feedback.
To further Alexander's response, this is what I came up with:
import numpy as np
arr_all = [np.array([[1, 2, 10], [5, 8, 3]]),
np.array([[1, 0, 5]]),
np.array([[0, 1, 8]]),
np.array([[9, 13, 0]]),
np.array([[2, 10, 2], [1.1, 3, 3]]),
np.array([[25, 0, 0]])]
n_data_sets = 2
n_repetition = 3
new_array = []
for i in range(n_repetition):
dataset = arr_all[i]
for j in range(n_data_sets-1):
dataset = np.concatenate([dataset, arr_all[i+(n_repetition*(j+1))]])
new_array.append(dataset)
print(new_array)
I also found a cleaner method, but which is possibly worse in terms of time:
import numpy as np
arr_all = [np.array([[1, 2, 10], [5, 8, 3]]),
np.array([[1, 0, 5]]),
np.array([[0, 1, 8]]),
np.array([[9, 13, 0]]),
np.array([[2, 10, 2], [1.1, 3, 3]]),
np.array([[25, 0, 0]])]
n_data_sets = 2
n_repetition = 3
reshaped = np.reshape(arr_all, (n_repetition, n_data_sets), order='F')
new = []
for arr in reshaped:
new.append(np.concatenate(arr))
print(new)
Two merge always the first half with the seconds half (if this was your intention), you can do something like this (which will work if you have an even amount of arrays.
import numpy as np
arr_all= [np.array([[1 ,2 , 10],
[5, 8, 3]]),
np.array([[1, 0, 5]]),
np.array([[0, 1, 8]]),
np.array([[9, 13, 0]]),
np.array([[2, 10, 2],
[1.1, 3, 3]]),
np.array([[25, 0, 0]])]
half = int(len(arr_all)/2)
new = []
for i in range(half):
new.append(np.concatenate((arr_all[i],arr_all[i+half]), axis=0))
print(new)

Remove nested lists from a list if nested list contains a certain value

I have a nested list and I would like to remove the empty list [] and any nested list that has a value of -1 in it. Here is what I have so far, it was working earlier but I think jupyter was being buggy.
regions = [[], [2, -1, 1], [4, -1, 1, 3], [5, 0, -1, 4], [9, 10, 7, 8],
[7, 6, 10], [8, 0, 5, 6, 7], [9, 2, 1, 3, 10],
[9, 2, -1, 0, 8], [10, 3, 4, 5, 6]]
counter = range(len(regions))
for region in counter:
print(region)
for i in range(len(regions[region])): # IndexError: list index out of range
#print(regions[region])
if regions[region][i] == -1:
regions.remove(regions[region])
break
print(regions)
I think the issue is when I am removing a region from the regions list, the counter for the regions nested list is modified and that makes me run out of index values before I finish iterating through each nested list.
Also does my approach even make sense or is there a more native solution that I am overlooking (such as some sort of list comprehension, lambda, filter, etc)?
You can simply use this list comprehension :
regions = [i for i in regions if i and (-1 not in i)]
Output :
[[9, 10, 7, 8], [7, 6, 10], [8, 0, 5, 6, 7], [9, 2, 1, 3, 10], [10, 3, 4, 5, 6]]
you can also use:
regions = list(filter(lambda r: r and (r.count(-1)==0),regions))

Python loops or any iterations to find all the combinations such that a condition is satisfied

I want to find all possible combinations of n numbers such that the sum is = 100 in Python
A sample of 2 numbers:
x=[]
for i, j in itertools.product(range(0,101), range(0,101)):
if i+j==100:
x.append([i,j])
Any alternative and clever way to do this with a variable number of iteration variables and get the outcome in the form of this:
n=5:
[[10,10,10,30,40], [100,0,0,0,0], [1,1,2,3,97] .......]
A Pure Python Solution (i.e. without itertools.product)
The main difficulty here is executing a variable number of for-loops inside a function. The way we can get around this easily is using recursion which involves a function calling itself.
If we use recursion, then inside any instance of the function, only one for-loop is actually iterated through. So to apply this to the problem at hand, we want our function to take two parameters: the target for what number we are trying to sum to, and n - the number of positive integers we have available to use.
Each function will then return (given a target and n numbers), all the combinations that will make that target - in the form of a two-dimensional list.
The only special case that we must consider is the "leaf nodes" of our recursive tree (the cases where we have a certain target, but n == 1, so we only have one number to make the target with). This is easy to handle, we just need to remember that we should always return all combinations that make the target so in this case, there is only one "combination" which is the target.
Then (if n > 1) the rest is self explanatory, we are simply looping through every number less than target and adding to a list of combinations (cs) with the results of calling the function again.
However, before we concatenate these combos onto our list, we need to use a comprehension to add i (the next number) to the start of every combination.
And that's it! Hopefully you can see how the above translates into the following code:
def combos(target, n):
if n == 1:
return [[target]]
cs = []
for i in range(0, target+1):
cs += [[i]+c for c in combos(target-i, n-1)]
return cs
and a test (with target as 10 and n as 3 to make it clearer) shows it works:
>>> combos(10, 3)
[[0, 0, 10], [0, 1, 9], [0, 2, 8], [0, 3, 7], [0, 4, 6], [0, 5, 5], [0, 6, 4], [0, 7, 3], [0, 8, 2], [0, 9, 1], [0, 10, 0], [1, 0, 9], [1, 1, 8], [1, 2, 7], [1, 3, 6], [1, 4, 5], [1, 5, 4], [1, 6, 3], [1, 7, 2], [1, 8, 1], [1, 9, 0], [2, 0, 8], [2, 1, 7], [2, 2, 6], [2, 3, 5], [2, 4, 4], [2, 5, 3], [2, 6, 2], [2, 7, 1], [2, 8, 0], [3, 0, 7], [3, 1, 6], [3, 2, 5], [3, 3, 4], [3, 4, 3], [3, 5, 2], [3, 6, 1], [3, 7, 0], [4, 0, 6], [4, 1, 5], [4, 2, 4], [4, 3, 3], [4, 4, 2], [4, 5, 1], [4, 6, 0], [5, 0, 5], [5, 1, 4], [5, 2, 3], [5, 3, 2], [5, 4, 1], [5, 5, 0], [6, 0, 4], [6, 1, 3], [6, 2, 2], [6, 3, 1], [6, 4, 0], [7, 0, 3], [7, 1, 2], [7, 2, 1], [7, 3, 0], [8, 0, 2], [8, 1, 1], [8, 2, 0], [9, 0, 1], [9, 1, 0], [10, 0, 0]]
Improving performance
If we consider the case where we are trying to make 10 with 4 numbers. At one point, the function will be called with a target of 6 after say 1 and 3. The algorithm will as we have already explained and return the combinations using 2 numbers to make 6. However, if we now consider another case further down the line when the function is asked to give the combinations that make 6 (same as before) having been called with say 2 and 2. Notice how even though we will get the right answer (through our recursion and the for-loop), we will return the same combinations as before - when we were called with 1 and 3. Furthermore, this scenario will happen extremely often: the function will be called from different situations but be asked to give the same combinations that have already been previously calculated at a different time.
This gives way to a great optimisation technique called memoization which essentially just means storing the results of our function as a key: value pair in a dictionary (mem) before returning.
We then just check at the start of every function call if we have ever been called before with the same parameters (by seeing if the key is in the dictionary) and if it is, then we can just return the result we got last time.
This speeds up the algorithm dramatically.
mem = {}
def combos(target, n):
k = (target, n)
if k in mem:
return mem[k]
if n == 1:
return [[target]]
cs = []
for i in range(0, target+1):
cs += [[i]+c for c in combos(target-i, n-1)]
mem[k] = cs
return cs
itertools.product takes a repeat argument, which you can use to repeat the range(1, 101) iterator repeat number of times. This way you don't need to specify the iterator multiple times or generate the desired number of arguments. For example, for 5 times:
[i for i in itertools.product(range(1, 101), repeat=5) if sum(i) == 100]
This is a trivial generalisation / partial optimisation of your algorithm.
Edit: #heemayl's alternative which uses repeat is preferable to this solution.
import itertools
n = 3
x = []
x = [list(i) for i in itertools.product(*(range(0,101) \
for _ in range(n))) if sum(i) == 100]

Categories