Python: Creating list of subarrays - python

I have a massive array but for illustration I am using an array of size 14. I have another list which contains 2, 3, 3, 6. How do I efficiently without for look create a list of new arrays such that:
import numpy as np
A = np.array([1,2,4,5,7,1,2,4,5,7,2,8,12,3]) # array with 1 axis
subArraysizes = np.array( 2, 3, 3, 6 ) #sums to number of elements in A
B = list()
B[0] = [1,2]
B[1] = [4,5,7]
B[2] = [1,2,4]
B[3] = [5,7,2,8,12,3]
i.e. select first 2 elements from A store it in B, select next 3 elements of A store it in B and so on in the order it appears in A.

You can use np.split -
B = np.split(A,subArraysizes.cumsum())[:-1]
Sample run -
In [75]: A
Out[75]: array([ 1, 2, 4, 5, 7, 1, 2, 4, 5, 7, 2, 8, 12, 3])
In [76]: subArraysizes
Out[76]: array([2, 3, 3, 6])
In [77]: np.split(A,subArraysizes.cumsum())[:-1]
Out[77]:
[array([1, 2]),
array([4, 5, 7]),
array([1, 2, 4]),
array([ 5, 7, 2, 8, 12, 3])]

Related

numpy.union that preserves order

Two arrays have been produced by dropping random values of an original array (with unique and unsorted elements):
orig = np.array([2, 1, 7, 5, 3, 8])
Let's say these arrays are:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3, 8])
Given just these two arrays, I need to merge them so that the dropped values are on their correct positions.
The result should be:
result = np.array([2, 1, 7, 3, 8])
Another example:
a1 = np.array([2, 1, 7, 5, 8])
b1 = np.array([2, 5, 3, 8])
# the result should be: [2, 1, 7, 5, 3, 8]
Edit:
This question is ambiguous because it is unclear what to do in this situation:
a2 = np.array([2, 1, 7, 8])
b2 = np.array([2, 5, 3, 8])
# the result should be: ???
What I have in reality + solution:
Elements of these arrays are indices of two data frames containing time series. I can use pandas.merge_ordered in order to achieve the ordered indices as I want.
My previous attempts:
numpy.union1d is not suitable, because it always sorts:
np.union1d(a, b)
# array([1, 2, 3, 7, 8]) - not what I want
Maybe pandas could help?
These methods use the first array in full, and then append the leftover values of the second one:
pd.concat([pd.Series(index=a, dtype=int), pd.Series(index=b, dtype=int)], axis=1).index.to_numpy()
pd.Index(a).union(b, sort=False).to_numpy() # jezrael's version
# array([2, 1, 7, 8, 3]) - not what I want
Idea is join both arrays with flatten and then remove duplicates in order:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3, 8])
c = np.vstack((a, b)).ravel(order='F')
_, idx = np.unique(c, return_index=True)
c = c[np.sort(idx)]
print (c)
[2 1 7 3 8]
Pandas solution:
c = pd.DataFrame([a,b]).unstack().unique()
print (c)
[2 1 7 3 8]
If different number of values:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3])
c = pd.DataFrame({'a':pd.Series(a), 'b':pd.Series(b)}).stack().astype(int).unique()
print (c)
[2 1 7 3 8]

Create a 1D array of 1D arrays in Python

How can I create a 1D array of 1D array in python? That is, something like:
a = [array([0]) array([1]) array([2]) array([3])]
If I create a list of arrays and cast it, I obtain a matrix:
a = [array([1]), array([2])]
b = np.asarray(a)
then b.shape = (2,1) but if i reshape it:
c = np.asarray(a)
then c = array([1, 2]) which is an array of ints.
Is there any way to avoid this? It is worth noting that the inner arrays have shape (1,).
Ok, found. The solution is to create an empty array with dtype object and assign there a list of arrays.
a = [array([1]), array([2])]
b = np.empty(len(a), dtype=object)
b[:] = a
And now b = array([array([1]), array([2])], dtype=object)
Do you mean something like this:
ans = [np.array([i]) for i in range(4)]
print (ans)
Output
[array([0]), array([1]), array([2]), array([3])]
You can either have a matrix-like list, when the shape of the arrays is all the same:
matrix_like_list = np.array([np.arange(10) for i in range(3)])
>>> array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
with shape (3, 10), or you can have a list of arrays when the size of at least one array is different:
list_of_arrays = np.array([np.arange(np.random.randint(10)) for i in range(3)])
>>> array([array([0, 1, 2, 3, 4, 5]), array([0, 1, 2, 3, 4, 5, 6]),
array([0, 1, 2, 3, 4, 5, 6])], dtype=object)
The resulting object will have shape (3,).
There are no other options.

how to roll two arrays of diffeent dimesnions into one dimensional array in python

I have two arrays (a,b) of different mXn dimensions
I need to know that how can I roll these two arrays into a single one dimensional array
I used np.flatten() for both a,b array and then rolled them into a single array but what i get is an array containg two one dimensional array(a,b)
a = np.array([[1,2,3,4],[3,4,5,6],[4,5,6,7]]) #3x4 array
b = np.array([ [1,2],[2,3],[3,4],[4,5],[5,6]]) #5x2 array
result = [a.flatten(),b.flatten()]
print(result)
[array([1, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7]), array([1, 2, 2, 3, ... 5, 6])]
In matlab , I would do it like this :
res = [a(:);b(:)]
Also, how can I retrieve a and b back from the result?
Use ravel + concatenate:
>>> np.concatenate((a.ravel(), b.ravel()))
array([1, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6])
ravel returns a 1D view of the arrays, and is a cheap operation. concatenate joins the views together, returning a new array.
As an aside, if you want to be able to retrieve these arrays back, you'll need to store their shapes in some variable.
i = a.shape
j = b.shape
res = np.concatenate((a.ravel(), b.ravel()))
Later, to retrieve a and b from res,
a = res[:np.prod(i)].reshape(i)
b = res[np.prod(i):].reshape(j)
a
array([[1, 2, 3, 4],
[3, 4, 5, 6],
[4, 5, 6, 7]])
b
array([[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6]])
How about changing the middle line to:
result = [a.flatten(),b.flatten()].flatten()
Or even more simply (if you know there's always exactly 2 arrays)
result = a.flatten() + b.flatten()

Pandas: Can you access rolling window items

Can you access pandas rolling window object.
rs = pd.Series(range(10))
rs.rolling(window = 3)
#print's
Rolling [window=3,center=False,axis=0]
Can I get as groups?:
[0,1,2]
[1,2,3]
[2,3,4]
I will start off this by saying this is reaching into the internal impl. But if you really really wanted to compute the indexers the same way as pandas.
You will need v0.19.0rc1 (just about released), you can conda install -c pandas pandas=0.19.0rc1
In [41]: rs = pd.Series(range(10))
In [42]: rs
Out[42]:
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
dtype: int64
# this reaches into an internal implementation
# the first 3 is the window, then second the minimum periods we
# need
In [43]: start, end, _, _, _, _ = pandas._window.get_window_indexer(rs.values,3,3,None,use_mock=False)
# starting index
In [44]: start
Out[44]: array([0, 0, 0, 1, 2, 3, 4, 5, 6, 7])
# ending index
In [45]: end
Out[45]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# windo size
In [3]: end-start
Out[3]: array([1, 2, 3, 3, 3, 3, 3, 3, 3, 3])
# the indexers
In [47]: [np.arange(s, e) for s, e in zip(start, end)]
Out[47]:
[array([0]),
array([0, 1]),
array([0, 1, 2]),
array([1, 2, 3]),
array([2, 3, 4]),
array([3, 4, 5]),
array([4, 5, 6]),
array([5, 6, 7]),
array([6, 7, 8]),
array([7, 8, 9])]
So this is sort of trivial in the fixed window case, this becomes extremely useful in a variable window scenario, e.g. in 0.19.0 you can specify things like 2S for example to aggregate by-time.
All of that said, getting these indexers is not particularly useful. you generally want to do something with the results. That is the point of the aggregation functions, or .apply if you want to generically aggregate.
Here's a workaround, but waiting to see if anyone has pandas solution:
def rolling_window(a, step):
shape = a.shape[:-1] + (a.shape[-1] - step + 1, step)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
rolling_window(rs, 3)
array([[ 0, 1, 2],
[ 1, 2, 3],
[ 2, 3, 4],
[ 3, 4, 5],
[ 4, 5, 6],
[ 5, 6, 7],
[ 6, 7, 8],
[ 7, 8, 9],
[ 8, 9, 10]])
This is solved in pandas 1.1, as the rolling object is now an iterable:
[window.tolist() for window in rs.rolling(window=3) if len(window) == 3]

Finding differences between all values in an List

I want to find the differences between all values in a numpy array and append it to a new list.
Example: a = [1,4,2,6]
result : newlist= [3,1,5,3,2,2,1,2,4,5,2,4]
i.e for each value i of a, determine difference between values of the rest of the list.
At this point I have been unable to find a solution
You can do this:
a = [1,4,2,6]
newlist = [abs(i-j) for i in a for j in a if i != j]
Output:
print newlist
[3, 1, 5, 3, 2, 2, 1, 2, 4, 5, 2, 4]
I believe what you are trying to do is to calculate absolute differences between elements of the input list, but excluding the self-differences. So, with that idea, this could be one vectorized approach also known as array programming -
# Input list
a = [1,4,2,6]
# Convert input list to a numpy array
arr = np.array(a)
# Calculate absolute differences between each element
# against all elements to give us a 2D array
sub_arr = np.abs(arr[:,None] - arr)
# Get diagonal indices for the 2D array
N = arr.size
rem_idx = np.arange(N)*(N+1)
# Remove the diagonal elements for the final output
out = np.delete(sub_arr,rem_idx)
Sample run to show the outputs at each step -
In [60]: a
Out[60]: [1, 4, 2, 6]
In [61]: arr
Out[61]: array([1, 4, 2, 6])
In [62]: sub_arr
Out[62]:
array([[0, 3, 1, 5],
[3, 0, 2, 2],
[1, 2, 0, 4],
[5, 2, 4, 0]])
In [63]: rem_idx
Out[63]: array([ 0, 5, 10, 15])
In [64]: out
Out[64]: array([3, 1, 5, 3, 2, 2, 1, 2, 4, 5, 2, 4])

Categories