I am trying to create a process pool using multiprocessing with 2d array arguments using starmap. However, the arguments seem to be input row by row instead of element by element.
What I would like is to use each element to create a 3d output array with an array corresponding to each element in the 2d input array
I've created a simplified code to illustrate what I mean:
import multiprocessing
import numpy as np
MeshNumberY = 5
MeshNumberX = 10
result_list = np.zeros( (MeshNumberX,MeshNumberY,3) )
Xindices = np.tile(np.arange(MeshNumberX),(MeshNumberY,1))
Yindices = np.tile(np.reshape(np.arange(MeshNumberY),(MeshNumberY,1)),(1,MeshNumberX))
def image_pixel_array(x,y):
return np.array([5*x,5*y,255])
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
result_list = np.array(pool.starmap(image_pixel_array, zip(Xindices, Yindices)))
print(result_list)
The input arrays Xindices and Yindices were,
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
and
[[0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1]
[2 2 2 2 2 2 2 2 2 2]
[3 3 3 3 3 3 3 3 3 3]
[4 4 4 4 4 4 4 4 4 4]]
respectively, with the corresponding output being,
[[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]
[array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]
[array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]
[array([15, 15, 15, 15, 15, 15, 15, 15, 15, 15])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]
[array([20, 20, 20, 20, 20, 20, 20, 20, 20, 20])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]]
My goal is to receive an output more like,
[[[0 0 255] [5 0 255] [10 0 255] [15 0 255] [20 0 255] [25 0 255] [30 0 255] [35 0 255] [40 0 255] [45 0 255]]
[[[0 5 255] [5 5 255] [10 5 255] [15 5 255] [20 5 255] [25 5 255] [30 5 255] [35 5 255] [40 5 255] [45 5 255]]
etc.
If there's a suggestion to optimize how I set up my arrays it would certainly also be welcome as I'm fairly new to this.
This was all written in Python 3.7.
Thank you in advance for the help!
I tried this
import multiprocessing
import numpy as np
MeshNumberY = 5
MeshNumberX = 10
result_list = np.zeros( (MeshNumberX,MeshNumberY,3) )
Xindices = np.tile(np.arange(MeshNumberX),(MeshNumberY,1))
Yindices = np.tile(np.reshape(np.arange(MeshNumberY),(MeshNumberY,1)),(1,MeshNumberX))
Zindices = Yindices.copy()
def image_pixel_array(x,y,z):
return np.transpose([5*x,5*y,z*0+255])
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
result_list = np.array(pool.starmap(image_pixel_array, zip(Xindices, Yindices,Zindices)))
print(np.reshape(result_list,(MeshNumberY,MeshNumberX,3),order='F'))
Related
In Stata, mkspline automatically creates variables containing a linear spline given a series of knot point values...
mkspline knot1 30 knot2 40 knot3 50 knot4 = v1
Here is the result of running this on a series of values in Stata. It basically distributes the value over the spline knots. Sorry I don't know the technical math or statistical term for this, just the concept overall.
v1 knot1 knot2 knot3 knot4
10 10 0 0 0
20 20 0 0 0
30 30 0 0 0
40 30 10 0 0
50 30 10 10 0
60 30 10 10 10
70 30 10 10 20
80 30 10 10 30
90 30 10 10 40
100 30 10 10 50
Is there an equivalent to this in Python with Numpy or Pandas or similar?
I don't think there is a function for that.
Try with numpy:
thresh = [0,30,40,50]
diffs = np.maximum(df[['v1']].to_numpy() - thresh,0)
diffs[:,:-1] = np.minimum(diffs[:,:-1], [np.diff(thresh)])
Output:
array([[10, 0, 0, 0],
[20, 0, 0, 0],
[30, 0, 0, 0],
[30, 10, 0, 0],
[30, 10, 10, 0],
[30, 10, 10, 10],
[30, 10, 10, 20],
[30, 10, 10, 30],
[30, 10, 10, 40],
[30, 10, 10, 50]])
I have one list representing point in time of a change, and another one of values:
indexes_list = [5, 6, 8, 9, 12, 15]
# [ 5 6 8 9 12 15]
values_list = [i * 10 for i in range(6)]
# [ 0 10 20 30 40 50]
I want to create the "full" list, which in the above example is:
expanded_values = [0, 0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
# [ 0 0 0 0 0 0 10 20 20 30 40 40 40 50 50 50]
I wrote something, but it feels wrong and I guess there is a better, more pythonic way of doing that:
result = []
for i in range(len(values_list)):
if i == 0:
tmp = [values_list[i]] * (indexes_list[i] + 1)
else:
tmp = [values_list[i]] * (indexes_list[i] - indexes_list[i - 1])
result += tmp
# result = [0, 0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
Use:
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
diffs = indexes_array[:1] + [j - i for i, j in zip(indexes_array, indexes_array[1:])]
res = [v for i, v in zip(diffs, values_array) for _ in range(i)]
print(res)
Output
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
As an alternative, you could use the pairwise recipe with a twist:
from itertools import tee
def pairwise(iterable, prepend):
a, b = tee(iterable)
yield prepend, next(b, None)
yield from zip(a, b)
indices = [5, 6, 8, 9, 12, 15]
values = [i * 10 for i, _ in enumerate(range(6))]
differences = [second - first for first, second in pairwise(indices, prepend=0)]
res = [v for i, v in zip(differences, values) for _ in range(i)]
print(res)
Output
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
Finally if you are doing numerical work I advise that you use numpy, as below:
import numpy as np
indices = [5, 6, 8, 9, 12, 15]
values = [i * 10 for i, _ in enumerate(range(6))]
differences = np.diff(indices, prepend=0)
res = np.repeat(values, differences).tolist()
print(res)
I would argue that it is pythonic to use the appropriate library, which in this case is pandas:
import pandas as pd
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i in range(6)]
series = pd.Series(values_array, indexes_array).reindex(
range(indexes_array[-1] + 1), method='backfill')
series
0 0
1 0
2 0
3 0
4 0
5 0
6 10
7 20
8 20
9 30
10 40
11 40
12 40
13 50
14 50
15 50
dtype: int64
See the reindex documentation for details.
Try this:
indexes_array = [5, 6, 8, 9, 12, 15]
# [ 5 6 8 9 12 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
# [ 0 10 20 30 40 50]
result = []
last_ind = 0
zipped = zip(indexes_array, values_array)
for ind, val in zipped:
count = ind - last_ind
last_ind = ind
for i in range(count):
result.append(val)
print(result)
Output:
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
Try this:
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
output=[]
for x in range(len(indexes_array)):
if x ==0:
output.extend([values_array[x]]*indexes_array[x])
else:
output.extend([values_array[x]]*(indexes_array[x]-indexes_array[x-1]))
print(output)
The output is :
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
I have an array [ 0 10 15 20 10 0 35 25 15 35 0 30 20 25 30 0] and I need to insert each element of another array ' [5,7,8,15] ' at locations with an increment of 5 such that the final array looks [ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15] length is 20
I am trying with this code
arr_fla = [ 0 10 15 20 10 0 35 25 15 35 0 30 20 25 30 0]
arr_split = [5,7,8,15]
node = 5
node_len = node * (node-1)
for w in range(node, node_len, 5):
for v in arr_split:
arr_fla = np.insert(arr_fla,w,v)
print(arr_fla)
The result I am getting is
'[ 0 10 15 20 10 15 8 7 5 0 15 8 7 5 35 15 8 7 5 25 15 35 0 30
20 25 30 0]' length 28
Can someone please tell me where I am going wrong.
If the sizes line up as cleanly as in your example you can use reshape ...
np.reshape(arr_fla,(len(arr_split),-1))
# array([[ 0, 10, 15, 20],
# [10, 0, 35, 25],
# [15, 35, 0, 30],
# [20, 25, 30, 0]])
... append arr_split as a new column ...
np.c_[np.reshape(arr_fla,(len(arr_split),-1)),arr_split]
# array([[ 0, 10, 15, 20, 5],
# [10, 0, 35, 25, 7],
# [15, 35, 0, 30, 8],
# [20, 25, 30, 0, 15]])
... and flatten again ...
np.c_[np.reshape(arr_fla,(len(arr_split),-1)),arr_split].ravel()
# array([ 0, 10, 15, 20, 5, 10, 0, 35, 25, 7, 15, 35, 0, 30, 8, 20, 25,
# 30, 0, 15])
I have corrected it:
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
node = 5
for w in range(len(arr_split)):
arr_fla = np.insert(arr_fla, (w+1)*node-1, arr_split[w])
print(arr_fla)
'''
Output:
[ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15]
'''
In your code:
for v in arr_split:
This gets all the elements at once (in total w times), but you need just one element at a time. Thus you do not need an extra for loop.
You want to have a counter that keeps going up every time you insert the item from your second array arr_split.
Try this code. My assumption is that your last element can be inserted directly as the original array has only 16 elements.
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
j = 0 #use this as a counter to insert from arr_split
#start iterating from 4th position as you want to insert in the 5th position
for i in range(4,len(arr_fla),5):
arr_fla.insert(i,arr_split[j]) #insert at the 5th position every time
#every time you insert an element, the array size increase
j +=1 #increase the counter by 1 so you can insert the next element
arr_fla.append(arr_split[j]) #add the final element to the original array
print(arr_fla)
Output:
[0, 10, 15, 20, 5, 10, 0, 35, 25, 7, 15, 35, 0, 30, 8, 20, 25, 30, 0, 15]
You could split the list in even chunks, append to each the split values to each chunk, and reassemble the whole (credit to Ned Batchelder for the chunk function ):
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
node = 5
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
tmp_arr = chunks(arr_fla, node)
arr_out = []
for index, chunk in enumerate(tmp_arr):
if arr_split[index]: # make sure arr_split is not exhausted
chunk.append(arr_split[index]) # we use the index of the chunks list to access the split number to insert
arr_out += chunk
print(arr_out)
Outputs:
[0, 10, 15, 20, 10, 5, 0, 35, 25, 15, 35, 7, 0, 30, 20, 25, 30, 8, 0, 15]
you can change to below and have a try.
import numpy as np
arr_fla = [0, 10, 15, 20, 10, 0, 35, 25, 15, 35, 0, 30, 20, 25, 30, 0]
arr_split = [5, 7, 8, 15]
index = 4
for ele in arr_split:
arr_fla = np.insert(arr_fla, index, ele)
index += 5
print(arr_fla)
the result is
[ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15]
about the wrong part of yours, I think it's have two questions:
the second loop is no need, it will cause np insert all the element of arr_split at the same position
the position is not start at 5, it should be 4
I have the following data set in pandas.
import numpy as np
import pandas as pd
events = ['event1', 'event2', 'event3', 'event4', 'event5', 'event6']
wells = [np.array([1, 2]), np.array([1, 3]), np.array([1]),
np.array([4, 5, 6]), np.array([4, 5, 6]), np.array([7, 8])]
traces_per_well = [np.array([24, 24]), np.array([24, 21]), np.array([18]),
np.array([24, 24, 24]), np.array([24, 21, 24]), np.array([18, 21])]
df = pd.DataFrame({"event_no": events, "well_array": wells,
"trace_per_well": traces_per_well})
df["total_traces"] = df['trace_per_well'].apply(np.sum)
df['supposed_traces_no'] = df['well_array'].apply(lambda x: len(x)*24)
df['pass'] = df['total_traces'] == df['supposed_traces_no']
print(df)
the output is printed below:
event_no well_array trace_per_well total_traces supposed_traces_no pass
0 event1 [1, 2] [24, 24] 48 48 True
1 event2 [1, 3] [24, 21] 45 48 False
2 event3 [1] [18] 18 24 False
3 event4 [4, 5, 6] [24, 24, 24] 72 72 True
4 event5 [4, 5, 6] [24, 21, 24] 69 72 False
5 event6 [7, 8] [18, 21] 39 48 False
I want to create two new columns in which the item of numpy array from column trace_per_well when it is not equal to 24 will be put in one column and the corresponding array element from column well_array in another column
The result should look like this.
event_no well_array trace_per_well total_traces supposed_traces_no pass wrong_trace_in_well wrong_well
0 event1 [1, 2] [24, 24] 48 48 True NaN NaN
1 event2 [1, 3] [24, 21] 45 48 False 21 3
2 event3 [1] [18] 18 24 False 18 1
3 event4 [4, 5, 6] [24, 24, 24] 72 72 True NaN NaN
4 event5 [4, 5, 6] [24, 21, 24] 69 72 False 21 5
5 event6 [7, 8] [18, 21] 39 48 False (18, 21) (7, 8)
Any help is greatly appreciated!
I would do this with a list comprehension. Generate your result in a single pass of the data and then assign to appropriate columns.
v = pd.Series(
[list(zip(*((x, y) for x, y in zip(X, Y) if x != 24)))
for X, Y in zip(df['trace_per_well'], df['well_array'])])
df['wrong_trace_in_well'] = v.str[0]
df['wrong_well'] = v.str[-1]
df[['wrong_trace_in_well', 'wrong_well']]
wrong_trace_in_well wrong_well
0 NaN NaN
1 (21,) (3,)
2 (18,) (1,)
3 NaN NaN
4 (21,) (5,)
5 (18, 21) (7, 8)
Alternatively, if you want to do this in multiple passes, then
df['wrong_trace_in_well'] = [[x for x in X if x != 24] for X in df['trace_per_well']]
df['wrong_well'] = [
[y for x, y in zip(X, Y) if x != 24]
for X, Y in zip(df['trace_per_well'], df['well_array'])]
df[['wrong_trace_in_well', 'wrong_well']]
wrong_trace_in_well wrong_well
0 [] []
1 [21] [3]
2 [18] [1]
3 [] []
4 [21] [5]
5 [18, 21] [7, 8]
I have an object of type 'numpy.ndarray', called "myarray", that when printed to the screen using python's "print", looks like hits
[[[ 84 0 213 232] [153 0 304 363]]
[[ 33 0 56 104] [ 83 0 77 238]]
[[ 0 0 9 61] [ 0 0 2 74]]]
"myarray" is made by another library. The value of myarray.shape equals (3, 2). I expected this to be a 3dimensional array, with three indices. When I try to make this structure myself, using:
second_array = array([[[84, 0, 213, 232], [153, 0, 304, 363]],
[[33, 0, 56, 104], [83, 0, 77, 238]],
[[0, 0, 9, 61], [0, 0, 2, 74]]])
I get that second_array.shape is equal to (3, 2, 4), as expected. Why is there this difference? Also, given this, how can I reshape "myarray" so that the two columns are merged, i.e. so that the result is:
[[[ 84 0 213 232 153 0 304 363]]
[[ 33 0 56 104 83 0 77 238]]
[[ 0 0 9 61 0 0 2 74]]]
Edit: to clarify, I know that in the case of second_array, I can do second_array.reshape((3,8)). But how does this work for the ndarray which has the format of myarray but does not have a 3d index?
myarray.dtype is "object" but can be changed to be ndarray too.
Edit 2: Getting closer, but still cannot quite get the ravel/flatten followed by reshape. I have:
a = array([[1, 2, 3],
[4, 5, 6]])
b = array([[ 7, 8, 9],
[10, 11, 12]])
arr = array([a, b])
I try:
arr.ravel().reshape((2,6))
But this gives [[1, 2, 3, 4, 5, 6], ...] and I wanted [[1, 2, 3, 7, 8, 9], ...]. How can this be done?
thanks.
Indeed, ravel and hstack can be useful tools for reshaping arrays:
import numpy as np
myarray = np.empty((3,2),dtype = object)
myarray[:] = [[np.array([ 84, 0, 213, 232]), np.array([153, 0, 304, 363])],
[np.array([ 33, 0, 56, 104]), np.array([ 83, 0, 77, 238])],
[np.array([ 0, 0, 9, 61]), np.array([ 0, 0, 2, 74])]]
myarray = np.hstack(myarray.ravel()).reshape(3,2,4)
print(myarray)
# [[[ 84 0 213 232]
# [153 0 304 363]]
# [[ 33 0 56 104]
# [ 83 0 77 238]]
# [[ 0 0 9 61]
# [ 0 0 2 74]]]
myarray = myarray.ravel().reshape(3,8)
print(myarray)
# [[ 84 0 213 232 153 0 304 363]
# [ 33 0 56 104 83 0 77 238]
# [ 0 0 9 61 0 0 2 74]]
Regarding Edit 2:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6]])
b = np.array([[ 7, 8, 9],
[10, 11, 12]])
arr = np.array([a, b])
print(arr)
# [[[ 1 2 3]
# [ 4 5 6]]
# [[ 7 8 9]
# [10 11 12]]]
Notice that
In [45]: arr[:,0,:]
Out[45]:
array([[1, 2, 3],
[7, 8, 9]])
Since you want the first row to be [1,2,3,7,8,9], the above shows that you want the second axis to be the first axis. This can be accomplished with the swapaxes method:
print(arr.swapaxes(0,1).reshape(2,6))
# [[ 1 2 3 7 8 9]
# [ 4 5 6 10 11 12]]
Or, given a and b, or equivalently, arr[0] and arr[1], you could form arr directly with the hstack method:
arr = np.hstack([a, b])
# [[ 1 2 3 7 8 9]
# [ 4 5 6 10 11 12]]