Breaking down numpy array into smaller arrays of same value [Python]

Breaking down numpy array into smaller arrays of same value [Python] - python

I have the following numpy array:
array=[1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7]
I need to break this array into smaller arrays of same values such as
[1,1,1,1] and [3,3,3]
My code for this is as follows but it doesn't work:
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq)-size))
counter=0
sub_arr=[]
arr=[]
for i in range(len(array)):
if(array[i]==array[i+1]):
counter+=1
else:
break
subarr=chunker(array,counter)
arr.append(sub_arr)
array=array[counter:]
what is an efficient to break down the array into smaller arrays of equal/same values?

A numpy solution for floats and integers:
import numpy as np
a = np.asarray([1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7])
#calculate differences between neighbouring elements and get index where element changes
#sample output for index would be [ 4 6 9 10 16]
index = np.where(np.diff(a) != 0)[0] + 1
#separate arrays
print(np.split(a, index))
Sample output:
[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]
If you had strings, this method naturally wouldn't work. Then you should go with DyZ's itertools approach.

NumPy has poor support for such grouping. I suggest using itertools that operate on lists.
from itertools import groupby
[np.array(list(data)) for _,data in itertools.groupby(array)]
#[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), \
# array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]
This is not necessarily the most efficient method, because it involves converstions to and from lists.

Here's an approach using Pandas:
import pandas as pd
(pd.Series(array)
.value_counts()
.reset_index()
.apply(lambda x: [x["index"]] * x[0], axis=1))
Explanation:
First, convert array to a Series, and use value_counts() to get a count of each unique entry:
counts = pd.Series(array).value_counts().reset_index()
index 0
0 6 6
1 1 4
2 3 3
3 2 2
4 7 1
5 5 1
Then recreate each repeated-element list, using apply():
counts.apply(lambda x: [x["index"]] * x[0], axis=1)
0 [6, 6, 6, 6, 6, 6]
1 [1, 1, 1, 1]
2 [3, 3, 3]
3 [2, 2]
4 [7]
5 [5]
dtype: object
You can use the .values property to convert from a Series of lists to a list of lists, if needed.

Related

How can I merge rows in np matrix?

I've got a numpy matrix that has 2 rows and N columns, e.g. (if N=4):
[[ 1 3 5 7]
[ 2 4 6 8]]
The goal is create a string 1,2,3,4,5,6,7,8.
Merge the rows such that the elements from the first row have the even (1, 3, ..., N - 1) positions (the index starts from 1) and the elements from the second row have the odd positions (2, 4, ..., N).
The following code works but it isn't really nice:
xs = []
for i in range(number_of_cols):
xs.append(nums.item(0, i))
ys = []
for i in range(number_of_cols):
ys.append(nums.item(1, i))
nums_str = ""
for i in range(number_of_cols):
nums_str += '{},{},'.format(xs[i], ys[i])
Join the result list with a comma as a delimiter (row.join(','))
How can I merge the rows using built in functions (or just in a more elegant way overall)?

Specify F order when flattening (or ravel):
In [279]: arr = np.array([[1,3,5,7],[2,4,6,8]])
In [280]: arr
Out[280]:
array([[1, 3, 5, 7],
[2, 4, 6, 8]])
In [281]: arr.ravel(order='F')
Out[281]: array([1, 2, 3, 4, 5, 6, 7, 8])

Joining rows can be done this way :
>>> a = np.arange(12).reshape(3,4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.hstack([a[i,:] for i in range(a.shape[0])])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Then it's simple to convert this array into string.

Here's one way of doing it:
out_str = ','.join(nums.T.ravel().astype('str'))
We are first transposing the array with .T, then flattening it with .ravel(), then converting each element from int to str, and then applying `','.join() to combine all the str elements
Trying it out:
import numpy as np
nums = np.array([[1,3,5,7],[2,4,6,8]])
out_str = ','.join(nums.T.ravel().astype('str'))
print (out_str)
Result:
1,2,3,4,5,6,7,8

how to loop over one axis of numpy array, returning inner arrays instead of values

I have several arrays of data, collected into a single array. I want to loop over it, and do operations on each of the inner arrays. What would be the correct way to do this in Numpy
import numpy as np
a = np.arange(9)
a = a.reshape(3,3)
for val in np.nditer(a):
print(val)
and this gives:
0
1
2
3
4
5
6
7
8
But what I want is (something like):
array([0 1 2])
array([3 4 5])
array([6 7 8])
I have been looking at this page: https://docs.scipy.org/doc/numpy-1.15.0/reference/arrays.nditer.html but so far have not found the answer. I also know I can do it with a plain for loop but I am assuming there is a more correct way. Any help would be appreciated, thank you.

You can use apply_along_axis but it depends on what your ultimate goal/output is. Here is a simple example showing this.
a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
np.apply_along_axis(lambda x: x + 1, 0, a)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

Actually when you use reshape it will return an array of lists not arrays.
If you would like to get each individual list you could just use
a = np.arange(9)
a = a.reshape(3,3)
for val in a:
print(val)

Why not just simply loop over the array where you get individual rows in the for loop
import numpy as np
a = np.arange(9)
a = a.reshape(3,3)
for val in a:
print(val)
# [0 1 2]
# [3 4 5]
# [6 7 8]

Randomly shuffle items in each row of numpy array

I have a numpy array like the following:
Xtrain = np.array([[1, 2, 3],
[4, 5, 6],
[1, 7, 3]])
I want to shuffle the items of each row separately, but do not want the shuffle to be the same for each row (as in several examples just shuffle column order).
For example, I want an output like the following:
output = np.array([[3, 2, 1],
[4, 6, 5],
[7, 3, 1]])
How can I randomly shuffle each of the rows randomly in an efficient way? My actual np array is over 100000 rows and 1000 columns.

Since you want to only shuffle the columns you can just perform the shuffling on transposed of your matrix:
In [86]: np.random.shuffle(Xtrain.T)
In [87]: Xtrain
Out[87]:
array([[2, 3, 1],
[5, 6, 4],
[7, 3, 1]])
Note that random.suffle() on a 2D array shuffles the rows not items in each rows. i.e. changes the position of the rows. Therefor if your change the position of the transposed matrix rows you're actually shuffling the columns of your original array.
If you still want a completely independent shuffle you can create random indexes for each row and then create the final array with a simple indexing:
In [172]: def crazyshuffle(arr):
...: x, y = arr.shape
...: rows = np.indices((x,y))[0]
...: cols = [np.random.permutation(y) for _ in range(x)]
...: return arr[rows, cols]
...:
Demo:
In [173]: crazyshuffle(Xtrain)
Out[173]:
array([[1, 3, 2],
[6, 5, 4],
[7, 3, 1]])
In [174]: crazyshuffle(Xtrain)
Out[174]:
array([[2, 3, 1],
[4, 6, 5],
[1, 3, 7]])

From: https://github.com/numpy/numpy/issues/5173
def disarrange(a, axis=-1):
"""
Shuffle `a` in-place along the given axis.
Apply numpy.random.shuffle to the given axis of `a`.
Each one-dimensional slice is shuffled independently.
"""
b = a.swapaxes(axis, -1)
# Shuffle `b` in-place along the last axis. `b` is a view of `a`,
# so `a` is shuffled in place, too.
shp = b.shape[:-1]
for ndx in np.ndindex(shp):
np.random.shuffle(b[ndx])
return

This solution is not efficient by any means, but I had fun thinking about it, so wrote it down. Basically, you ravel the array, and create an array of row labels, and an array of indices. You shuffle the index array, and index the original and row label arrays with that. Then you apply a stable argsort to the row labels to gather the data into rows. Apply that index and reshape and viola, data shuffled independently by rows:
import numpy as np
r, c = 3, 4 # x.shape
x = np.arange(12) + 1 # Already raveled
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()
np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]
inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)
Here is an IDEOne Link

We can create a random 2-dimensional matrix, sort it by each row, and then use the index matrix given by argsort to reorder the target matrix.
target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]
shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]
target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
# [7, 4, 6, 8, 5],
# [4, 7, 9, 5, 6],
# [6, 6, 8, 2, 8],
# [1, 6, 7, 8, 3]])
Explanation
We use np.random.rand and argsort to mimic the effect from shuffling.
random.rand gives randomness.
Then, we use argsort with axis=1 to help rank each row. This creates the index that can be used for reordering.

Lets say you have array a with shape 100000 x 1000.
b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]
I don't know if this is faster than loop, because it needs sorting, but with this solution maybe you will invent something better, for example with np.argpartition instead of np.argsort

You may use Pandas:
df = pd.DataFrame(X_train)
_ = df.apply(lambda x: np.random.permutation(x), axis=1, raw=True)
df.values
Change the keyword to axis=0 if you want to shuffle columns.

Summing columns of a 2D tensor according to multiple sets of indices

In tensorflow, I would like to sum columns of a 2D tensor according to multiple sets of indices.
For example:
Summing the columns of the following tensor
[[1 2 3 4 5]
[5 4 3 2 1]]
according to the 2 sets of indices (first set to sum columns 0 1 2 and second set to sum columns 3 4)
[[0,1,2],[3,4]]
should give 2 columns
[[6 9]
[12 3]]
Remarks:
All columns' indices will appear in one and only one set of indices.
This has to be done in Tensorflow, so that gradient can flow through this operation.
Do you have any idea how to perform that operation? I suspect I need to use tf.slice and probably tf.while_loop.

You can do that with tf.segment_sum:
import tensorflow as tf
nums = [[1, 2, 3, 4, 5],
[5, 4, 3, 2, 1]]
column_idx = [[0, 1, 2], [3, 4]]
with tf.Session() as sess:
# Data as TF tensor
data = tf.constant(nums)
# Make segment ids
segments = tf.concat([tf.tile([i], [len(lst)]) for i, lst in enumerate(column_idx)], axis=0)
# Select columns
data_cols = tf.gather(tf.transpose(data), tf.concat(column_idx, axis=0))
col_sum = tf.transpose(tf.segment_sum(data_cols, segments))
print(sess.run(col_sum))
Output:
[[ 6 9]
[12 3]]

I know of a crude way of solving this in NumPy if you don't mind solving this problem with NumPy.
import numpy as np
mat = np.array([[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]])
grid1 = np.ix_([0], [0, 1, 2])
item1 = np.sum(mat[grid1])
grid2 = np.ix_([1], [0, 1, 2])
item2 = np.sum(mat[grid2])
grid3 = np.ix_([0], [3, 4])
item3 = np.sum(mat[grid3])
grid4 = np.ix_([1], [3, 4])
item4 = np.sum(mat[grid4])
result = np.array([[item1, item3], [item2, item4]])

Translate integers in a numpy array to a contiguous range 0...n

I would like to translate arbitrary integers in a numpy array to a contiguous range 0...n, like this:
source: [2 3 4 5 4 3]
translating [2 3 4 5] -> [0 1 2 3]
target: [0 1 2 3 2 1]
There must be a better way to than the following:
import numpy as np
"translate arbitrary integers in the source array to contiguous range 0...n"
def translate_ids(source, source_ids, target_ids):
target = source.copy()
for i in range(len(source_ids)):
x = source_ids[i]
x_i = source == x
target[x_i] = target_ids[i]
return target
#
source = np.array([ 2, 3, 4, 5, 4, 3 ])
source_ids = np.unique(source)
target_ids = np.arange(len(source_ids))
target = translate_ids(source, source_ids, target_ids)
print "source:", source
print "translating", source_ids, '->', target_ids
print "target:", target
What is it?

IIUC you can simply use np.unique's optional argument return_inverse, like so -
np.unique(source,return_inverse=True)[1]
Sample run -
In [44]: source
Out[44]: array([2, 3, 4, 5, 4, 3])
In [45]: np.unique(source,return_inverse=True)[1]
Out[45]: array([0, 1, 2, 3, 2, 1])

pandas.factorize is one method:
import pandas as pd
lst = [2, 3, 4, 5, 4, 3]
res = pd.factorize(lst, sort=True)[0]
# [0 1 2 3 2 1]
Note: this returns a list, while np.unique will always return an np.ndarray.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Breaking down numpy array into smaller arrays of same value [Python] - python

Related

How can I merge rows in np matrix?

how to loop over one axis of numpy array, returning inner arrays instead of values

Randomly shuffle items in each row of numpy array

Summing columns of a 2D tensor according to multiple sets of indices

Translate integers in a numpy array to a contiguous range 0...n

Categories

Resources