Python-1D indice link with 2D array location - python

Introduction
Sometimes, I want to get the value of an 2-d array at a random location.
For example, there is an array data in the shape of (20,20). There is a random number-pair (5,5). Then, I get the data[5,5] as my target value.
On the purpose of using genetic algorithm. I want to get the samples from an 2-d array as several individuals. So, I want to generate an linked table which connect an 1d value to 2d position.
My attempt
## data was the 2-d array in the shape of 20x20
data = np.random.randint(0,1000,400)
data = data.reshape(20,20)
## direction was my linked table
direction = {"Indice":[],"X":[],"Y":[]}
k = 0
for i in range(0,data.shape[0],1):
for j in range(0,data.shape[1],1):
k+=1
direction["Indice"].append(k)
direction["X"].append(j)
direction["Y"].append(i)
direction = pd.DataFrame(direction)
## generate an random int and connect with the 2-d value.
loc = np.random.randint(0,400)
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
target_value = data[YY,XX]
My question
Are there any neat way to achieve my attempt?
Any advice would be appreciate!

You could use np.ravel to make data 1-dimensional, then index it using the flat index loc:
target_value = data.ravel()[loc-1]
Or, if you want XX and YY, perhaps you are looking for np.unravel_index. It maps a flat index or an array of flat indices to a tuple of coordinates.
For example, instead of building the direction DataFrame, you could use
np.unravel_index(loc-1, data.shape)
instead of
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
Then you could define target_value as :
target_value = data[np.unravel_index(loc-1, data.shape)]
Alternatively, to simply get a random value from the 2D array data, you could use
target_value = np.random.choice(data.flat)
Or to get N random values, use
target_values = np.random.choice(data.flat, size=(N,))
Why the minus one in loc-1:
In your original code, the direction['Indice'] column uses k values which
start at 1, not 0. So when loc equals 1, the 0th-indexed row of direction is
selected. I used loc-1 to make
target_value = data[np.unravel_index(loc-1, data.shape)]
return the same result that
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
target_value = data[YY,XX]
returns. Note however, that if loc equals 0, then np.unravel_index(-1, data.shape) raises a ValueError, while your original code would return an empty array for target_value.

Related

Delete 2D unique elements in a 2D NumPy array

I generate a set of unique coordinate combinations by using:
axis_1 = np.arange(image.shape[0])
axis_1 = np.reshape(axis_1,(axis_1.shape[0],1))
axis_2 = np.arange(image.shape[1])
axis_2 = np.reshape(axis_2,(axis_2.shape[0],1))
coordinates = np.array(np.meshgrid(axis_1, axis_2)).T.reshape(-1,2)
I then check for some condition and if it is satisfied i want to delete the coordinates from the array.
Something like this:
if image[coordinates[i,0], coordinates[i,1]] != 0:
remove coordinates i from coordinates
I tried the remove and delete commands but one doesn't work for arrays and the other simply just removes every instance where coordinates[i,0] and coordinates[i,1] appear, rather than the unique combination of both.
You can use np.where to generate the coordinate pairs that should be removed, and np.unique combined with masking to remove them:
y, x = np.where(image > 0.7)
yx = np.column_stack((y, x))
combo = np.vstack((coordinates, yx))
unique, counts = np.unique(combo, axis=0, return_counts=True)
clean_coords = unique[counts == 1]
The idea here is to stack the original coordinates and the coordinates-to-be-removed in the same array, then drop the ones that occur in both.
You can use the numpy.delete function, but this function returns a new modified array, and does not modify the array in-place (which would be quite problematic, specially in a for loop).
So your code would look like that:
nb_rows_deleted = 0
for i in range(0, coordinates.shape[0]):
corrected_i = i - nb_rows_deleted
if image[coordinates[corrected_i, 0], coordinates[corrected_i, 1]] != 0:
coordinates = np.delete(coordinates, corrected_i, 0)
nb_rows_deleted += 1
The corrected_i takes into consideration that some rows have been deleted during your loop.

Iteratively inserting column and rows to 2d numpy

The next 2d numpy is given.
prv = np.array([[1,2],[3,4],[5,6]])
I want to go through prv lines iteratively, perform some calculationa and insert the results into a to a new column at new numpy. The values from the previous numpy need to be save with the calculation context, so the size of the new numpy depends on the results obtained from the function.
For example, performing the following operation:
for i in prv:
new_line = calc_something(i)
The results of calc_something(i) may be something like this:
[[1,2,7],
[1,2,6],
[1,2,3],
[3,4,3],
[5,6,9],
[5,6,7]]
That is, the results of calc_something(i) for each row can be a list that stored in the last column while the first and second columns preserved the values of the first numpy in the results context.
For example test data for calc_something(i) can be:
def calc_something(i):
if i[0] == 1:
b = np.array([7,6,3])
return b
if i[0] == 3:
b = np.array([3])
return b
if i[0] == 5:
b = np.array([9,7])
return b
Run calc_something for all rows and store them in a list
Use numpy.hstack to combine original array and the results
#Example usage
import numpy as np
prv = np.zeros((10, 2)) # original array
results = []
for x in prv:
results.append(calc_something(x))
results = np.reshape(np.array(results), (prv.shape[0], 1))
combined = np.hstack((prv, results)) # combine the two array

How to extract rows from a list of lists?

I have a list of lists and I'm trying to extract rows from list and plot them over a common x-variable. So I'm trying to extract each row at a time using a loop,
for i in range(10):
tlist = list(zip(*v_avg_store))
tlist[0]
print(tlist)
x = np.array(steps_store)
y = np.array(tlist)
plt.plot(x,y)
v_avg_store = [100,23,23,45,12,122], [2,1232,123,43,545,645], [234,23,43,556,33,45]
I want to extract each set of data and plot,
ex: 100,23,23,45,12,122 (y-axis) vs index (x-axis)
for each set on the same plot.
Which returns me the error,
x = np.array(steps_store)
y = np.array(tlist)
plt.plot(x,y)
plt.xlabel("steps")
/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y)
if x.shape[0] != y.shape[0]:
raise ValueError("x and y must have same first dimension, but "have shapes {} and {}".format(x.shape, y.shape))
if x.ndim > 2 or y.ndim > 2:
raise ValueError("x and y can be no greater than 2-D, but have "
ValueError: x and y must have same first dimension, but have shapes (99990,) and (50000, 10)
How should I address this issue? Is there a way to extract row by row form the list of list and plot them at the same time all on one plot?
What's the value of steps_store? Also I'm assuming tlistis your 2 dimensional list of lists.
A 2D list example being:
x = [[1,2,3] , [4,5,6] , [7,8,9] ,[10 , 11 ,12]] # list of lists
The rows are considered the lists inside the list. So x[0] would be the row [1 2,3] , the column's correspond to the value inside the list of the row. So x[0][0] is row 0, column 0 and that corresponds to 1.
You already extracted a row (assuming tlist is a list of lists) when you called tlist[0]. However you did not assign a variable to the extracted list but instead passed the entire tlist list it to value y.
Your error means that x = np.array(steps_store) and y = np.array(tlist) aren't the same dimension (one might be a 1D list, the other might be a 2D list), without knowing what exactly the values of tlist and steps_store is.
Make sure your lists dimensions agree then extract the row with:
rowList = tlist[0]
and pass it into a numpy array:
x = np.array(rowList)

Is there a way to apply numpy.extract to a 3d-array?

I have the following:
X = np.ndarray (324,349,24)
y = np.ndarray (324,349)
I would like to create a dictionary to house conditional extraction, to wit:
myDict = {'keyA':cond,'keyB':cond,'keyC':cond,'keyD':cond}
Each key in myDict.keys() is represented numerically within y. What I would like to do is apply a mask and extract only those indices within X that correspond to the set mask.
For example,
condA = y==0
...
condD = y==3
How would I go about applying those condition on X? I was thinking something along these lines:
for k in range(1, X.shape[2]):
myDict['keyA'] = np.extract(condA,k)
myDict['keyB'] = np.extract(condB,k)
myDict['keyC'] = np.extract(condC,k)
myDict['keyD'] = np.extract(condD,k)
However, I get the error:
IndexError: index is out of bounds for size
Expected Output:
A dictionary:
myDict{'keyA':ndarray [n,n,24],'keyB':ndarray[n,n,24],'keyC':ndarray[n,n,24],'keyD':ndarray[n,n,24]}
The numpy function wheremay be useful here:
myDict['keyA'] = X.where(condA)

pandas groupby + transform gives shape mismatch

I have a Pandas Dataframe with one column for the index of the row in the group. I now want to determine whether that row is in the beginning, middle, or end of the group based on this index. I wanted to apply a UDF that returns start (0) middle (1) or end(2) as output, and I want to save that output per row in a new column. Here is my UDF:
def add_position_within_group(group):
length_of_group = group.max()
three_lists = self.split_lists_into_three_parts([x for x in range(length_of_group)])
result_list = []
for x in group:
if int(x) in three_lists[0]:
result_list.append(0)
elif int(x) in three_lists[1]:
result_list.append(1)
elif int(x) in three_lists[2]:
result_list.append(2)
return result_list
Here is the split_lists_into_three_parts method (tried and tested):
def split_lists_into_three_parts(self, event_list):
k, m = divmod(len(event_list), 3)
total_list = [event_list[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(3)]
start_of_list = total_list[0]
middle_of_list = total_list[1]
end_of_list = total_list[2]
return [start_of_list,middle_of_list,end_of_list]
Here is the line of code that groups the Dataframe and runs transform() which when called on a groupby, according to what I have read, iterates over all the groups and takes the column as a series as an argument and applies my UDF. It has to return a one-dimensional list or series the same size as the group.:
compound_data_frame["position_in_sequence"] = compound_data_frame.groupby('patient_id')["group_index"].transform(self.add_position_within_group)
I'm getting the following error :
shape mismatch: value array of shape (79201,) could not be broadcast to indexing result of shape (79202,)
I still can't figure out what kind of output my function has to have when passed to transform, or why I'm getting this error. Any help would be much appreciated.
Well I'm embarrassed to say this but here goes: in order to create the three lists of indices I use range(group.max()), which creates a range of the group-size -1. What I should have done is either used the group size or added 1 to group.max().

Categories