The next 2d numpy is given.
prv = np.array([[1,2],[3,4],[5,6]])
I want to go through prv lines iteratively, perform some calculationa and insert the results into a to a new column at new numpy. The values from the previous numpy need to be save with the calculation context, so the size of the new numpy depends on the results obtained from the function.
For example, performing the following operation:
for i in prv:
new_line = calc_something(i)
The results of calc_something(i) may be something like this:
[[1,2,7],
[1,2,6],
[1,2,3],
[3,4,3],
[5,6,9],
[5,6,7]]
That is, the results of calc_something(i) for each row can be a list that stored in the last column while the first and second columns preserved the values of the first numpy in the results context.
For example test data for calc_something(i) can be:
def calc_something(i):
if i[0] == 1:
b = np.array([7,6,3])
return b
if i[0] == 3:
b = np.array([3])
return b
if i[0] == 5:
b = np.array([9,7])
return b
Run calc_something for all rows and store them in a list
Use numpy.hstack to combine original array and the results
#Example usage
import numpy as np
prv = np.zeros((10, 2)) # original array
results = []
for x in prv:
results.append(calc_something(x))
results = np.reshape(np.array(results), (prv.shape[0], 1))
combined = np.hstack((prv, results)) # combine the two array
Related
What I want is to repeat the 10x10 array by 3x3 array. For example, an array of 3x3 contains nine values of indexes [0][0:3], [1][0:3], [2][0:3], and I want to find the max value of these nine values and apply them to a new array. I will add a picture and what I tried.
enter image description here
[1
array_33 = []
new_list = []
for i in range(10):
for j in range(10):
array_33.append([i:i+3])
max_value = max(map(max, array_33) # to find a max_vlaue in 3x3 array
new_list.append(max_value)
One row succeeded in finding a value up to index [0:3], but the next row failed to find a way to get a value up to [0:3]. The value [8:10] is not divided by 3, so the value is added to the new array as it is. Then I want to do a repetitive task of finding a value of [0:3] from lines 4 to 6. I dont know how can i do this
you can use 2D slicing from numpy:
import numpy as np
a = np.array([[(i+1)%10 for i in range(10)]]*10)
print(a)
sz = 3
b = [[np.max(a[i:i+3, j:j+3])
for j in range(0,a.shape[1],sz)]
for i in range(0,a.shape[0],sz)]
print(b)
generally, a[row1:row2, col1:col2] will give you the submatrix in those indexes (not including the last index)
I have an array of 10 elements, and I would like to compute the following
a = [1,2,3,4,5,6,7,8,9,10]
and wish to do the following operation:
k = [a[1]-a[0], a[3]-a[2], a[5]-a[4], a[7]-a[6], a[9]-a[8]]
I wish to extend this operation any array size.
For a list with even elements you could build upon the following:
a = [1,3,5,7,9,11,13,15,17,19]
m = []
for i in range(1,len(a)-1):
m.append([a[i] - a[i-1]])
print(m)
I have a Pandas Dataframe with one column for the index of the row in the group. I now want to determine whether that row is in the beginning, middle, or end of the group based on this index. I wanted to apply a UDF that returns start (0) middle (1) or end(2) as output, and I want to save that output per row in a new column. Here is my UDF:
def add_position_within_group(group):
length_of_group = group.max()
three_lists = self.split_lists_into_three_parts([x for x in range(length_of_group)])
result_list = []
for x in group:
if int(x) in three_lists[0]:
result_list.append(0)
elif int(x) in three_lists[1]:
result_list.append(1)
elif int(x) in three_lists[2]:
result_list.append(2)
return result_list
Here is the split_lists_into_three_parts method (tried and tested):
def split_lists_into_three_parts(self, event_list):
k, m = divmod(len(event_list), 3)
total_list = [event_list[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(3)]
start_of_list = total_list[0]
middle_of_list = total_list[1]
end_of_list = total_list[2]
return [start_of_list,middle_of_list,end_of_list]
Here is the line of code that groups the Dataframe and runs transform() which when called on a groupby, according to what I have read, iterates over all the groups and takes the column as a series as an argument and applies my UDF. It has to return a one-dimensional list or series the same size as the group.:
compound_data_frame["position_in_sequence"] = compound_data_frame.groupby('patient_id')["group_index"].transform(self.add_position_within_group)
I'm getting the following error :
shape mismatch: value array of shape (79201,) could not be broadcast to indexing result of shape (79202,)
I still can't figure out what kind of output my function has to have when passed to transform, or why I'm getting this error. Any help would be much appreciated.
Well I'm embarrassed to say this but here goes: in order to create the three lists of indices I use range(group.max()), which creates a range of the group-size -1. What I should have done is either used the group size or added 1 to group.max().
I have a pretty simple operation involving two not so large arrays:
For every element in the first (larger) array, located in position i
Find if it exists in the second (smaller) array
If it does, find its index in the second array: j
Store a float taken from a third array (same length as first array) in the position i, in the position j of a fourth array (same length as second array)
The for block below works, but gets very slow for not so large arrays (>10000).
Can this implementation be made faster?
import numpy as np
import random
##############################################
# Generate some random data.
#'Nb' is always smaller then 'Na
Na, Nb = 50000, 40000
# List of IDs (could be any string, I use integers here for simplicity)
ids_a = random.sample(range(1, Na * 10), Na)
ids_a = [str(_) for _ in ids_a]
random.shuffle(ids_a)
# Some floats associated to these IDs
vals_in_a = np.random.uniform(0., 1., Na)
# Smaller list of repeated IDs from 'ids_a'
ids_b = random.sample(ids_a, Nb)
# Array to be filled
vals_in_b = np.zeros(Nb)
##############################################
# This block needs to be *a lot* more efficient
#
# For each string in 'ids_a'
for i, id_a in enumerate(ids_a):
# if it exists in 'ids_b'
if id_a in ids_b:
# find where in 'ids_b' this element is located
j = ids_b.index(id_a)
# store in that position the value taken from 'ids_a'
vals_in_b[j] = vals_in_a[i]
In defense of my approach, here is the authoritative implementation:
import itertools as it
def pp():
la,lb = len(ids_a),len(ids_b)
ids = np.fromiter(it.chain(ids_a,ids_b),'<S6',la+lb)
unq,inv = np.unique(ids,return_inverse=True)
vals = np.empty(la,vals_in_a.dtype)
vals[inv[:la]] = vals_in_a
return vals[inv[la:]]
(juanpa()==pp()).all()
# True
timeit(juanpa,number=100)
# 3.1373191522434354
timeit(pp,number=100)
# 2.5256317732855678
That said, #juanpa.arrivillaga's suggestion can also be implemented better:
import operator as op
def ja():
return op.itemgetter(*ids_b)(dict(zip(ids_a,vals_in_a)))
(ja()==pp()).all()
# True
timeit(ja,number=100)
# 2.015202699229121
I tried the approaches by juanpa.arrivillaga and Paul Panzer. The first one is the fastest by far. It is also the simplest. The second one is faster than my original approach, but considerably slower than the first one. It also has the drawback that this line vals[inv_a] = vals_in_a stores floats into a U5 array, thus converting them into strings. It can be converted back to floats at the end, but I lose digits (unless I'm missing something obvious of course.
Here are the implementations:
def juanpa():
dict_ids_b = {_: i for i, _ in enumerate(ids_b)}
for i, id_a in enumerate(ids_a):
try:
vals_in_b[dict_ids_b[id_a]] = vals_in_a[i]
except KeyError:
pass
return vals_in_b
def Paul():
# 1) concatenate ids_a and ids_b
ids_ab = ids_a + ids_b
# 2) apply np.unique with keyword return_inverse=True
vals, idxs = np.unique(ids_ab, return_inverse=True)
# 3) split the inverse into inv_a and inv_b
inv_a, inv_b = idxs[:len(ids_a)], idxs[len(ids_a):]
# 4) map the values to match the order of uniques: vals[inv_a] = vals_in_a
vals[inv_a] = vals_in_a
# 5) use inv_b to pick the correct values: result = vals[inv_b]
vals_in_b = vals[inv_b].astype(float)
return vals_in_b
Introduction
Sometimes, I want to get the value of an 2-d array at a random location.
For example, there is an array data in the shape of (20,20). There is a random number-pair (5,5). Then, I get the data[5,5] as my target value.
On the purpose of using genetic algorithm. I want to get the samples from an 2-d array as several individuals. So, I want to generate an linked table which connect an 1d value to 2d position.
My attempt
## data was the 2-d array in the shape of 20x20
data = np.random.randint(0,1000,400)
data = data.reshape(20,20)
## direction was my linked table
direction = {"Indice":[],"X":[],"Y":[]}
k = 0
for i in range(0,data.shape[0],1):
for j in range(0,data.shape[1],1):
k+=1
direction["Indice"].append(k)
direction["X"].append(j)
direction["Y"].append(i)
direction = pd.DataFrame(direction)
## generate an random int and connect with the 2-d value.
loc = np.random.randint(0,400)
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
target_value = data[YY,XX]
My question
Are there any neat way to achieve my attempt?
Any advice would be appreciate!
You could use np.ravel to make data 1-dimensional, then index it using the flat index loc:
target_value = data.ravel()[loc-1]
Or, if you want XX and YY, perhaps you are looking for np.unravel_index. It maps a flat index or an array of flat indices to a tuple of coordinates.
For example, instead of building the direction DataFrame, you could use
np.unravel_index(loc-1, data.shape)
instead of
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
Then you could define target_value as :
target_value = data[np.unravel_index(loc-1, data.shape)]
Alternatively, to simply get a random value from the 2D array data, you could use
target_value = np.random.choice(data.flat)
Or to get N random values, use
target_values = np.random.choice(data.flat, size=(N,))
Why the minus one in loc-1:
In your original code, the direction['Indice'] column uses k values which
start at 1, not 0. So when loc equals 1, the 0th-indexed row of direction is
selected. I used loc-1 to make
target_value = data[np.unravel_index(loc-1, data.shape)]
return the same result that
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
target_value = data[YY,XX]
returns. Note however, that if loc equals 0, then np.unravel_index(-1, data.shape) raises a ValueError, while your original code would return an empty array for target_value.