I would like to translate arbitrary integers in a numpy array to a contiguous range 0...n, like this:
source: [2 3 4 5 4 3]
translating [2 3 4 5] -> [0 1 2 3]
target: [0 1 2 3 2 1]
There must be a better way to than the following:
import numpy as np
"translate arbitrary integers in the source array to contiguous range 0...n"
def translate_ids(source, source_ids, target_ids):
target = source.copy()
for i in range(len(source_ids)):
x = source_ids[i]
x_i = source == x
target[x_i] = target_ids[i]
return target
#
source = np.array([ 2, 3, 4, 5, 4, 3 ])
source_ids = np.unique(source)
target_ids = np.arange(len(source_ids))
target = translate_ids(source, source_ids, target_ids)
print "source:", source
print "translating", source_ids, '->', target_ids
print "target:", target
What is it?
IIUC you can simply use np.unique's optional argument return_inverse, like so -
np.unique(source,return_inverse=True)[1]
Sample run -
In [44]: source
Out[44]: array([2, 3, 4, 5, 4, 3])
In [45]: np.unique(source,return_inverse=True)[1]
Out[45]: array([0, 1, 2, 3, 2, 1])
pandas.factorize is one method:
import pandas as pd
lst = [2, 3, 4, 5, 4, 3]
res = pd.factorize(lst, sort=True)[0]
# [0 1 2 3 2 1]
Note: this returns a list, while np.unique will always return an np.ndarray.
Related
If I have a 2D numpy array A:
[[6 9 6]
[1 1 2]
[8 7 3]]
And I have access to array [1 1 2]. Clearly, [1 1 2] belongs to index 1 of array A. But how do I do this?
Access the second row using the following operator:
import numpy as np
a = np.array([[6, 9, 6],
[1, 1, 2],
[8, 7, 3]])
row = [1, 1, 2]
i = np.where(np.all(a==row, axis=1))
print(i[0][0])
np.where will return a tuple of indices (lists), which is why you need to use the operators [0][0] consecutively in order to obtain an int.
One option:
a = np.array([[6, 9, 6],
[1, 1, 2],
[8, 7, 3]])
b = np.array([1, 1, 2])
np.nonzero((a == b).all(1))[0]
output: [1]
arr1 = [[6,9,6],[1,1,2],[8,7,3]]
ind = arr1.index([1,1,2])
Output:
ind = 1
EDIT for 2D np.array:
arr1 = np.array([[6,9,6],[1,1,2],[8,7,3]])
ind = [l for l in range(len(arr1)) if (arr1[l,:] == np.array([1,1,2])).all()]
import numpy as np
a = np.array([[6, 9, 6],
[1, 1, 2],
[8, 7, 3]])
b = np.array([1, 1, 2])
[x for x,y in enumerate(a) if (y==b).all()] # here enumerate will keep the track of index
#output
[1]
How would you shift left a list by x in Python, and fill the empty values with zeros?
shift left by 1
input: [1, 2, 3, 4]
output: [2, 3, 4, 0]
shift left by 2
input [1, 2, 3, 4]
output [3, 4, 0, 0]
As far as I'm concerned, there's no 'easy' way, since Python's lists are not constrained by a size, but you can easily implement an algorithm which handles this:
def shift_left(arr, n):
return arr[n:] + [0 for _ in range(n)]
or a bit more consise:
def shift_left(arr, n):
return arr[n:] + [0] * n
You can concat two lists as:
arr[shift:]+[0]*shift
Or if you are a fan of chaining like me:
arr[shift:].__add__([0]*shift)
You can combine numpy's append and roll methods:
import numpy as np
def shift_left(x_, r):
return np.append(np.roll(x_, -r)[:-r], [0 for _ in range(0, r)])
print(shift_left([1, 2, 3, 4], 1))
print(shift_left([1, 2, 3, 4], 2))
Result:
[2 3 4 0]
[3 4 0 0]
Explanation
When you use roll on a list:
print(np.roll([1, 2, 3, 4], -2))
Result:
[3 4 1 2]
You move the each element to the left by r times (r= -2). But we don't want the last r elements so:
print(np.roll([1, 2, 3, 4], -2)[:-2])
Result:
[3 4]
We want the last r values to be 0. So we can append r 0 to the end of the array.
print(np.append(np.roll([1, 2, 3, 4], -2)[:-2], [0 for _ in range(0, 2)]))
Result:
[3 4 0 0]
Problem
I have a 2D NumPy array, arr, and for each row, I would like to reverse a section of the array. Crucially, for each row, the start and stop indices must be unique. I can achieve this using the following.
import numpy as np
arr = np.repeat(np.arange(10)[np.newaxis, :], 3, axis=0)
reverse = np.sort(np.random.choice(arr.shape[1], [arr.shape[0], 2], False))
# arr
# array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
# reverse
# array([[1, 7],
# [8, 9],
# [4, 6]])
Reverse each row between the start, stop indices in `reverse.
for idx, (i, j) in enumerate(reverse):
arr[idx, i:j+1] = arr[idx, i:j+1][::-1]
# arr
# array([[0, 7, 6, 5, 4, 3, 2, 1, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 9, 8],
# [0, 1, 2, 3, 6, 5, 4, 7, 8, 9]])
Question
Is this possible using basic slicing and indexing? I tried to use the output of reverse to form multiple slice objects, but was unsuccessful.
Update
A simple comparison of the original method vs answer. For my data, the solution is only required to deal with 2D matrices with shape (50, 100).
import numpy as np
def reverse_one(arr, n):
temp = np.repeat(arr.copy(), n, axis=0)
reverse = np.sort(np.random.choice(temp.shape[1], [n, 2], False))
for idx, (i, j) in enumerate(reverse):
temp[idx, i:j+1] = temp[idx, i:j+1][::-1]
return temp
def reverse_two(arr, n):
temp = np.repeat(arr.copy(), n, axis=0)
reverse = np.sort(np.random.choice(temp.shape[1], [n, 2], False))
rev = np.ravel_multi_index((np.arange(n)[:, np.newaxis], reverse), temp.shape)
rev[:, 1] += 1
idx = np.arange(temp.size).reshape(temp.shape)
s = np.searchsorted(rev.ravel(), idx, 'right')
m = (s % 2 == 1)
g = rev[s[m] // 2]
idx[m] = g[:, 0] - (idx[m] - g[:, 1]) - 1
return temp.take(idx)
m = 100
arr = np.arange(m)[np.newaxis, :]
print("reverse_one:")
%timeit reverse_one(arr, m//2)
print("=" * 40)
print("reverse_two:")
%timeit reverse_two(arr, m//2)
Running the following code in a Jupyter Notebook gives the following results.
reverse_one:
1000 loops, best of 5: 202 µs per loop
========================================
reverse_two:
1000 loops, best of 5: 363 µs per loop
This was kinda tricky but I figured out one way to do it. Advanced indexing is expensive though so you'd have to see whether it is really faster or not depending on the data that you have.
import numpy as np
np.random.seed(0)
arr = np.repeat(np.arange(10)[np.newaxis, :], 3, axis=0)
reverse = np.sort(np.random.choice(arr.shape[1], [arr.shape[0], 2], False))
print(arr)
# [[0 1 2 3 4 5 6 7 8 9]
# [0 1 2 3 4 5 6 7 8 9]
# [0 1 2 3 4 5 6 7 8 9]]
print(reverse)
# [[2 8]
# [4 9]
# [1 6]]
# Get "flat" indices of the bounds
rev = np.ravel_multi_index((np.arange(arr.shape[0])[:, np.newaxis], reverse), arr.shape)
# Add one to the second bound (so it is first index after the slice)
rev[:, 1] += 1
# Make array of flat indices for the data
idx = np.arange(arr.size).reshape(arr.shape)
# Find the position of flat indices with respect to bounds
s = np.searchsorted(rev.ravel(), idx, 'right')
# For each "i" within a slice, "s[i]" is odd
m = (s % 2 == 1)
# Replace indices within slices with their reversed ones
g = rev[s[m] // 2]
idx[m] = g[:, 0] - (idx[m] - g[:, 1]) - 1
# Apply indices to array
res = arr.take(idx)
print(res)
# [[0 1 8 7 6 5 4 3 2 9]
# [0 1 2 3 9 8 7 6 5 4]
# [0 6 5 4 3 2 1 7 8 9]]
I have the following numpy array:
array=[1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7]
I need to break this array into smaller arrays of same values such as
[1,1,1,1] and [3,3,3]
My code for this is as follows but it doesn't work:
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq)-size))
counter=0
sub_arr=[]
arr=[]
for i in range(len(array)):
if(array[i]==array[i+1]):
counter+=1
else:
break
subarr=chunker(array,counter)
arr.append(sub_arr)
array=array[counter:]
what is an efficient to break down the array into smaller arrays of equal/same values?
A numpy solution for floats and integers:
import numpy as np
a = np.asarray([1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7])
#calculate differences between neighbouring elements and get index where element changes
#sample output for index would be [ 4 6 9 10 16]
index = np.where(np.diff(a) != 0)[0] + 1
#separate arrays
print(np.split(a, index))
Sample output:
[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]
If you had strings, this method naturally wouldn't work. Then you should go with DyZ's itertools approach.
NumPy has poor support for such grouping. I suggest using itertools that operate on lists.
from itertools import groupby
[np.array(list(data)) for _,data in itertools.groupby(array)]
#[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), \
# array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]
This is not necessarily the most efficient method, because it involves converstions to and from lists.
Here's an approach using Pandas:
import pandas as pd
(pd.Series(array)
.value_counts()
.reset_index()
.apply(lambda x: [x["index"]] * x[0], axis=1))
Explanation:
First, convert array to a Series, and use value_counts() to get a count of each unique entry:
counts = pd.Series(array).value_counts().reset_index()
index 0
0 6 6
1 1 4
2 3 3
3 2 2
4 7 1
5 5 1
Then recreate each repeated-element list, using apply():
counts.apply(lambda x: [x["index"]] * x[0], axis=1)
0 [6, 6, 6, 6, 6, 6]
1 [1, 1, 1, 1]
2 [3, 3, 3]
3 [2, 2]
4 [7]
5 [5]
dtype: object
You can use the .values property to convert from a Series of lists to a list of lists, if needed.
I have the following to calculate the difference of a matrix, i.e. the i-th element - the (i-1) element.
How can I (easily) calculate the difference for each element horizontally and vertically? With a transpose?
inputarr = np.arange(12)
inputarr.shape = (3,4)
inputarr+=1
#shift one position
newarr = list()
for x in inputarr:
newarr.append(np.hstack((np.array([0]),x[:-1])))
z = np.array(newarr)
print inputarr
print 'first differences'
print inputarr-z
Output
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
first differences
[[1 1 1 1]
[5 1 1 1]
[9 1 1 1]]
Check out numpy.diff.
From the documentation:
Calculate the n-th order discrete difference along given axis.
The first order difference is given by out[n] = a[n+1] - a[n] along
the given axis, higher order differences are calculated by using diff
recursively.
An example:
>>> import numpy as np
>>> a = np.arange(12).reshape((3,4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.diff(a,axis = 1) # row-wise
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
>>> np.diff(a, axis = 0) # column-wise
array([[4, 4, 4, 4],
[4, 4, 4, 4]])