What is the Python equivalent of Stata's mkspline?

What is the Python equivalent of Stata's mkspline? - python

In Stata, mkspline automatically creates variables containing a linear spline given a series of knot point values...
mkspline knot1 30 knot2 40 knot3 50 knot4 = v1
Here is the result of running this on a series of values in Stata. It basically distributes the value over the spline knots. Sorry I don't know the technical math or statistical term for this, just the concept overall.
v1 knot1 knot2 knot3 knot4
10 10 0 0 0
20 20 0 0 0
30 30 0 0 0
40 30 10 0 0
50 30 10 10 0
60 30 10 10 10
70 30 10 10 20
80 30 10 10 30
90 30 10 10 40
100 30 10 10 50
Is there an equivalent to this in Python with Numpy or Pandas or similar?

I don't think there is a function for that.
Try with numpy:
thresh = [0,30,40,50]
diffs = np.maximum(df[['v1']].to_numpy() - thresh,0)
diffs[:,:-1] = np.minimum(diffs[:,:-1], [np.diff(thresh)])
Output:
array([[10, 0, 0, 0],
[20, 0, 0, 0],
[30, 0, 0, 0],
[30, 10, 0, 0],
[30, 10, 10, 0],
[30, 10, 10, 10],
[30, 10, 10, 20],
[30, 10, 10, 30],
[30, 10, 10, 40],
[30, 10, 10, 50]])

Related

How to delete elements repeated in a 2_D numpy array?

I have the following question which I want to solve with numpy Library.
Let's suppose that we have this 'a' array
a = np.vstack(([10, 10, 20, 20, 30, 10, 40, 50, 20] ,[10, 20, 10, 20, 30, 10, 40, 50, 20]))
As output we have
[[10 10 20 20 30 10 40 50 20]
[10 20 10 20 30 10 40 50 20]]
with the shape (2, 9)
I want to delete the elements repeated vertically in our array so that I have as result:
[[10 10 20 20 30 40 50]
[10 20 10 20 30 40 50]]
In this example I want to delete the elements ((0, 5), (1, 5)) and ((0, 8), (1, 8)). Is there any numpy function that can do the job ?
Thanks

This is easily done with:
np.unique(a, axis=1)

Following the idea of this answer, you could do the following.
np.hstack({tuple(row) for row in a.T}).T

The "pythonic" way for expanding a list

I have one list representing point in time of a change, and another one of values:
indexes_list = [5, 6, 8, 9, 12, 15]
# [ 5 6 8 9 12 15]
values_list = [i * 10 for i in range(6)]
# [ 0 10 20 30 40 50]
I want to create the "full" list, which in the above example is:
expanded_values = [0, 0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
# [ 0 0 0 0 0 0 10 20 20 30 40 40 40 50 50 50]
I wrote something, but it feels wrong and I guess there is a better, more pythonic way of doing that:
result = []
for i in range(len(values_list)):
if i == 0:
tmp = [values_list[i]] * (indexes_list[i] + 1)
else:
tmp = [values_list[i]] * (indexes_list[i] - indexes_list[i - 1])
result += tmp
# result = [0, 0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]

Use:
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
diffs = indexes_array[:1] + [j - i for i, j in zip(indexes_array, indexes_array[1:])]
res = [v for i, v in zip(diffs, values_array) for _ in range(i)]
print(res)
Output
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
As an alternative, you could use the pairwise recipe with a twist:
from itertools import tee
def pairwise(iterable, prepend):
a, b = tee(iterable)
yield prepend, next(b, None)
yield from zip(a, b)
indices = [5, 6, 8, 9, 12, 15]
values = [i * 10 for i, _ in enumerate(range(6))]
differences = [second - first for first, second in pairwise(indices, prepend=0)]
res = [v for i, v in zip(differences, values) for _ in range(i)]
print(res)
Output
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
Finally if you are doing numerical work I advise that you use numpy, as below:
import numpy as np
indices = [5, 6, 8, 9, 12, 15]
values = [i * 10 for i, _ in enumerate(range(6))]
differences = np.diff(indices, prepend=0)
res = np.repeat(values, differences).tolist()
print(res)

I would argue that it is pythonic to use the appropriate library, which in this case is pandas:
import pandas as pd
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i in range(6)]
series = pd.Series(values_array, indexes_array).reindex(
range(indexes_array[-1] + 1), method='backfill')
series
0 0
1 0
2 0
3 0
4 0
5 0
6 10
7 20
8 20
9 30
10 40
11 40
12 40
13 50
14 50
15 50
dtype: int64
See the reindex documentation for details.

Try this:
indexes_array = [5, 6, 8, 9, 12, 15]
# [ 5 6 8 9 12 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
# [ 0 10 20 30 40 50]
result = []
last_ind = 0
zipped = zip(indexes_array, values_array)
for ind, val in zipped:
count = ind - last_ind
last_ind = ind
for i in range(count):
result.append(val)
print(result)
Output:
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]

Try this:
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
output=[]
for x in range(len(indexes_array)):
if x ==0:
output.extend([values_array[x]]*indexes_array[x])
else:
output.extend([values_array[x]]*(indexes_array[x]-indexes_array[x-1]))
print(output)
The output is :
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]

Insert in array at specific location

I have an array [ 0 10 15 20 10 0 35 25 15 35 0 30 20 25 30 0] and I need to insert each element of another array ' [5,7,8,15] ' at locations with an increment of 5 such that the final array looks [ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15] length is 20
I am trying with this code
arr_fla = [ 0 10 15 20 10 0 35 25 15 35 0 30 20 25 30 0]
arr_split = [5,7,8,15]
node = 5
node_len = node * (node-1)
for w in range(node, node_len, 5):
for v in arr_split:
arr_fla = np.insert(arr_fla,w,v)
print(arr_fla)
The result I am getting is
'[ 0 10 15 20 10 15 8 7 5 0 15 8 7 5 35 15 8 7 5 25 15 35 0 30
20 25 30 0]' length 28
Can someone please tell me where I am going wrong.

If the sizes line up as cleanly as in your example you can use reshape ...
np.reshape(arr_fla,(len(arr_split),-1))
# array([[ 0, 10, 15, 20],
# [10, 0, 35, 25],
# [15, 35, 0, 30],
# [20, 25, 30, 0]])
... append arr_split as a new column ...
np.c_[np.reshape(arr_fla,(len(arr_split),-1)),arr_split]
# array([[ 0, 10, 15, 20, 5],
# [10, 0, 35, 25, 7],
# [15, 35, 0, 30, 8],
# [20, 25, 30, 0, 15]])
... and flatten again ...
np.c_[np.reshape(arr_fla,(len(arr_split),-1)),arr_split].ravel()
# array([ 0, 10, 15, 20, 5, 10, 0, 35, 25, 7, 15, 35, 0, 30, 8, 20, 25,
# 30, 0, 15])

I have corrected it:
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
node = 5
for w in range(len(arr_split)):
arr_fla = np.insert(arr_fla, (w+1)*node-1, arr_split[w])
print(arr_fla)
'''
Output:
[ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15]
'''
In your code:
for v in arr_split:
This gets all the elements at once (in total w times), but you need just one element at a time. Thus you do not need an extra for loop.

You want to have a counter that keeps going up every time you insert the item from your second array arr_split.
Try this code. My assumption is that your last element can be inserted directly as the original array has only 16 elements.
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
j = 0 #use this as a counter to insert from arr_split
#start iterating from 4th position as you want to insert in the 5th position
for i in range(4,len(arr_fla),5):
arr_fla.insert(i,arr_split[j]) #insert at the 5th position every time
#every time you insert an element, the array size increase
j +=1 #increase the counter by 1 so you can insert the next element
arr_fla.append(arr_split[j]) #add the final element to the original array
print(arr_fla)
Output:
[0, 10, 15, 20, 5, 10, 0, 35, 25, 7, 15, 35, 0, 30, 8, 20, 25, 30, 0, 15]

You could split the list in even chunks, append to each the split values to each chunk, and reassemble the whole (credit to Ned Batchelder for the chunk function ):
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
node = 5
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
tmp_arr = chunks(arr_fla, node)
arr_out = []
for index, chunk in enumerate(tmp_arr):
if arr_split[index]: # make sure arr_split is not exhausted
chunk.append(arr_split[index]) # we use the index of the chunks list to access the split number to insert
arr_out += chunk
print(arr_out)
Outputs:
[0, 10, 15, 20, 10, 5, 0, 35, 25, 15, 35, 7, 0, 30, 20, 25, 30, 8, 0, 15]

you can change to below and have a try.
import numpy as np
arr_fla = [0, 10, 15, 20, 10, 0, 35, 25, 15, 35, 0, 30, 20, 25, 30, 0]
arr_split = [5, 7, 8, 15]
index = 4
for ele in arr_split:
arr_fla = np.insert(arr_fla, index, ele)
index += 5
print(arr_fla)
the result is
[ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15]
about the wrong part of yours, I think it's have two questions:
the second loop is no need, it will cause np insert all the element of arr_split at the same position
the position is not start at 5, it should be 4

Vectorized approach for breeding in a genetic algorithm

I am trying to encode a single crossover breeding method for a genetic algorithm without an explicit loop. So I need to add one row of an array with and another row of another array with the desired result as seen below. Note that the col_idx arrays choose the particular rows to breed, while the slice index array tells us where to slice (I would like to keep the chunk of the array a up to an including the endpoint).
a=np.arange(20).reshape(4,5)
print('a')
print(a)
a
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
b=np.arange(20).reshape(4,5)*100
print('b')
print(b)
b
[[ 0 100 200 300 400]
[ 500 600 700 800 900]
[1000 1100 1200 1300 1400]
[1500 1600 1700 1800 1900]]
row_idx_a=np.array([3,1,0,3,1,3]) #edit-fixed array
row_idx_b=np.array([1,1,0,0,0,3]) #edit-fixed array to fix error identified by the answer below
slice_idx=np.array([2,1,0,4,4,3])
merged_array=np.zeros((4,5)) ######place holder for final array
#####now some creative slicing magic so that my final array is an irregular indexed addition#######
[[ 15 16 17 800 900]
[ 5 6 700 800 900]
[ 0 100 200 300 400]
[ 15 16 17 18 19]
[ 5 6 7 8 9]
[ 15 16 17 18 1900]]
I am finding it difficult to vectorize this problem? Any takers? Thanks.

Assuming that bits in the expected answer that correspond to the marked numbers
*
row_idx_a=np.array([3,1,0,3,2,3])
row_idx_b=np.array([2,1,0,0,0,3])
*
are wrong.
np.where(np.less.outer(slice_idx,np.arange(5)),b[row_idx_b],a[row_idx_a])
# array([[ 15, 16, 17, 1300, 1400],
# [ 5, 6, 700, 800, 900],
# [ 0, 100, 200, 300, 400],
# [ 15, 16, 17, 18, 19],
# [ 10, 11, 12, 13, 14],
# [ 15, 16, 17, 18, 1900]])

Python Multiprocessing 2d Array Input

I am trying to create a process pool using multiprocessing with 2d array arguments using starmap. However, the arguments seem to be input row by row instead of element by element.
What I would like is to use each element to create a 3d output array with an array corresponding to each element in the 2d input array
I've created a simplified code to illustrate what I mean:
import multiprocessing
import numpy as np
MeshNumberY = 5
MeshNumberX = 10
result_list = np.zeros( (MeshNumberX,MeshNumberY,3) )
Xindices = np.tile(np.arange(MeshNumberX),(MeshNumberY,1))
Yindices = np.tile(np.reshape(np.arange(MeshNumberY),(MeshNumberY,1)),(1,MeshNumberX))
def image_pixel_array(x,y):
return np.array([5*x,5*y,255])
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
result_list = np.array(pool.starmap(image_pixel_array, zip(Xindices, Yindices)))
print(result_list)
The input arrays Xindices and Yindices were,
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
and
[[0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1]
[2 2 2 2 2 2 2 2 2 2]
[3 3 3 3 3 3 3 3 3 3]
[4 4 4 4 4 4 4 4 4 4]]
respectively, with the corresponding output being,
[[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]
[array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]
[array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]
[array([15, 15, 15, 15, 15, 15, 15, 15, 15, 15])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]
[array([20, 20, 20, 20, 20, 20, 20, 20, 20, 20])
array([ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]) 255]]
My goal is to receive an output more like,
[[[0 0 255] [5 0 255] [10 0 255] [15 0 255] [20 0 255] [25 0 255] [30 0 255] [35 0 255] [40 0 255] [45 0 255]]
[[[0 5 255] [5 5 255] [10 5 255] [15 5 255] [20 5 255] [25 5 255] [30 5 255] [35 5 255] [40 5 255] [45 5 255]]
etc.
If there's a suggestion to optimize how I set up my arrays it would certainly also be welcome as I'm fairly new to this.
This was all written in Python 3.7.
Thank you in advance for the help!

I tried this
import multiprocessing
import numpy as np
MeshNumberY = 5
MeshNumberX = 10
result_list = np.zeros( (MeshNumberX,MeshNumberY,3) )
Xindices = np.tile(np.arange(MeshNumberX),(MeshNumberY,1))
Yindices = np.tile(np.reshape(np.arange(MeshNumberY),(MeshNumberY,1)),(1,MeshNumberX))
Zindices = Yindices.copy()
def image_pixel_array(x,y,z):
return np.transpose([5*x,5*y,z*0+255])
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
result_list = np.array(pool.starmap(image_pixel_array, zip(Xindices, Yindices,Zindices)))
print(np.reshape(result_list,(MeshNumberY,MeshNumberX,3),order='F'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is the Python equivalent of Stata's mkspline? - python

Related

How to delete elements repeated in a 2_D numpy array?

The "pythonic" way for expanding a list

Insert in array at specific location

Vectorized approach for breeding in a genetic algorithm

Python Multiprocessing 2d Array Input

Categories

Resources