Create NumPy array from list of tuples - python

I have data in the following format:
[('user_1', 2, 1.0),
('user_2', 6, 2.5),
('user_3', 9, 3.0),
('user_4', 1, 3.0)]
And I want use this information to create a NumPy array that has the value 1.0 in position 2, value 2.5 in position 6, etc. All positions not listed in the above should be zeroes. Like this:
array([0, 3.0, 0, 0, 0, 0, 2.5, 0, 0, 3.0])

First reformat the data:
data = [
("user_1", 2, 1.0),
("user_2", 6, 2.5),
("user_3", 9, 3.0),
("user_4", 1, 3.0),
]
usernames, indices, values = zip(*data)
And then create the array:
length = max(indices) + 1
arr = np.zeros(shape=(length,))
arr[list(indices)] = values
print(arr) # array([0. , 3. , 1. , 0. , 0. , 0. , 2.5, 0. , 0. , 3. ])
Note that you need to convert indices to a list,
otherwise when using it for indexing numpy will
think it is trying to index multiple dimensions.

I've come up with this solution:
import numpy as np
a = [('user_1', 2, 1.0),
('user_2', 6, 2.5),
('user_3', 9, 3.0),
('user_4', 1, 3.0)]
res = np.zeros(max(x[1] for x in a)+1)
for i in range(len(a)):
res[a[i][1]] = a[i][2]
res
# array([0. , 3. , 1. , 0. , 0. , 0. , 2.5, 0. , 0. , 3. ])
First I create a 0 filled array with maximum value of the number in index 1 of each tuple in list a + 1 to account that your positions are 1 higher than the indexes inside the array are.
Then I do a simple loop and assign the values according to the arguments in the tuple.

Related

Calculating a rolling weighted sum using numpy

I am curious to know if there are any more optimal ways to compute this "rolling weighted sum" (unsure what the actual terminology is, but I will provide an example to further clarify). I am asking this because I am certain that my current code snippet is not coded in the most optimal way with respect to memory usage, and there is opportunity to improve its performance by using numpy's more advanced functions.
Example:
import numpy as np
A = np.append(np.linspace(0, 1, 10), np.linspace(1.1, 2, 30))
np.random.seed(0)
B = np.random.randint(3, size=40) + 1
# list of [(weight, (lower, upper))]
d = [(1, (-0.25, -0.20)), (0.5, (-0.20, -0.10)), (2, (-0.10, 0.15))]
In Python 3.7:
## A
array([0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ,
1.1 , 1.13103448, 1.16206897, 1.19310345, 1.22413793,
1.25517241, 1.2862069 , 1.31724138, 1.34827586, 1.37931034,
1.41034483, 1.44137931, 1.47241379, 1.50344828, 1.53448276,
1.56551724, 1.59655172, 1.62758621, 1.65862069, 1.68965517,
1.72068966, 1.75172414, 1.78275862, 1.8137931 , 1.84482759,
1.87586207, 1.90689655, 1.93793103, 1.96896552, 2. ])
## B
array([1, 2, 1, 2, 2, 3, 1, 3, 1, 1, 1, 3, 2, 3, 3, 1, 2, 2, 2, 2, 1, 2,
1, 1, 2, 3, 1, 3, 1, 2, 2, 3, 1, 2, 2, 2, 1, 3, 1, 3])
Expected Solution:
array([ 6. , 6.5, 8. , 10.5, 12. , 11. , 11.5, 11.5, 6.5, 13.5, 25. ,
27.5, 30.5, 34.5, 37.5, 36. , 35. , 35. , 34. , 34.5, 34. , 36.5,
33. , 34. , 34.5, 34.5, 36. , 39. , 37. , 36. , 37. , 36.5, 37.5,
39. , 36.5, 37.5, 34. , 31. , 27.5, 23. ])
The logic I want to translate into code:
Let's look at how 10.5 (the fourth element in the expected solution) is computed. d represents a collection of nested tuples with first float element weight, and second tuple element bounds (in the form of (lower, upper)).
We look at the fourth element of A (0.33333333) and apply bounds for each tuple in d. For the first tuple in d:
0.33333333 + (-0.25) = 0.08333333
0.33333333 + (-0.20) = 0.13333333
We go back to A to see if there are any elements between bounds (0.08333333, 0.1333333). Because the second element of A (0.11111111) falls in this range, we pull the second element of B (2) and multiply it by its weight from d (1) and add it to the second element of the expected output.
After iterating across all tuples in d, the fourth element of the expected output is computed as:
1 * 2 + 0.5 * 1 + 2 * (2 + 2) = 10.5
Here is my attempted code:
D = np.zeros(len(A))
for v in d:
weight, (_lower, _upper) = v
lower, upper = A + _lower, A + _upper
_A = np.tile(A, (len(A), 1))
__A = np.bitwise_and(_A > lower.reshape(-1, 1), _A < upper.reshape(-1, 1))
D += weight * (__A # B)
D
Hopefully this makes sense. Please feel free to ask clarifying questions. Thanks!
Since intervals (-0.25, -0.20), (-0.20, -0.10) and (-0.10, 0.15) are actually subintervals of partition of an interval (-0.25, 0.15) you could find indices where elements should be inserted in A to maintain order. They specify slices of B to perform addition on. In short:
partition = np.array([-0.25, -0.20, -0.10, 0.15])
weights = np.array([1, 0.5, 2])
out = []
for n in A:
idx = np.searchsorted(A, n + partition)
results = np.add.reduceat(B[:idx[-1]], idx[:-1])
out.append(np.dot(results, weights))
>>> print(out)
[7.5, 7.5, 8.0, 10.5, 12.0, 11.0, 11.5, 11.5, 6.5, 13.5, 27.5, 27.5, 31.5, 35.5, 37.5, 37.0, 36.0, 35.0, 34.0, 34.5, 34.0, 36.5, 33.0, 34.0, 34.5, 34.5, 36.0, 39.0, 37.0, 36.0, 37.0, 36.5, 37.5, 39.0, 36.5, 37.5, 34.0, 31.0, 27.5, 23.0]
Note that results are wrong if there are empty slices of B
Credits to #mathfux for providing me enough guidance. Here's the final code solution that I developed based on conversations here:
partition = np.array([-0.25, -0.20, -0.10, 0.15])
weights = np.array([1, 0.5, 2])
idx = np.searchsorted(A, partition + A[:, None])
_idx = np.lib.stride_tricks.sliding_window_view(idx, 2, axis = 1)
values = np.apply_along_axis(lambda x: B[slice(*(x))].sum(), 2, _idx)
values # weights

Use numpy array to do conditional operations on another array

Let's say I have 2 arrays:
a = np.array([2, 2, 0, 0, 2, 1, 0, 0, 0, 0, 3, 0, 1, 0, 0, 2])
b = np.array([0, 0.5, 0.25, 0.9])
What I would like to do, is take the value in array b and multiple it to the values in array a, based on it's index.
So the first value in array a is 2. I want the value in array b at that index position to be multiplied by that value. So in array b, index postion 2's value is 0.25, so multiple that value (2) in array a by 0.25.
I know it can be done with iteration, but I'm trying to figure out how it's done elmentwise operations.
Here's the iteration way that I've done:
result = np.array([])
for idx in a:
result = np.append(result, (b[idx] * idx))
To get the result:
print(result)
[0.5 0.5 0. 0. 0.5 0.5 0. 0. 0. 0. 2.7 0. 0.5 0. 0. 0.5]
What's an elementwise equivalent?
Integer arrays can be used as indices in numpy. As a consequence, you can simply do something like this
b[a] * a
EDIT:
Just for completeness, your iterative solution triggers a new memory allocation every time append is called (see the 'returns' section of this page). Since you already now the shape of your output (i.e. a.shape), it's much better to allocate the output array in advance, e.g. result = np.empty(a.shape) and then go through the cycle.
So there are a few ways to do this, but if you want purely element-wise operations you could do the following:
Before getting the result, each element of b is transformed by its index. So create another vector n.
n = np.arange(len(b)) * b
# In the example, n now equals [0. , 0.5, 0.5, 2.7]
# then the result is just n indexed by a
result = n[a]
# result = [0.5, 0.5, 0. , 0. , 0.5, 0.5, 0. , 0. , 0. , 0. , 2.7, 0. , 0.5, 0. , 0. , 0.5]

Creating tuples of multiples from pairs of indices

Given a numpy array, which can be subset to indices for array elements meeting given criteria. How do I create tuples of triplets (or quadruplets, quintuplets, ...) from the resulting pairs of indices ?
In the example below, pairs_tuples is equal to [(1, 0), (3, 0), (3, 1), (3, 2)]. triplets_tuples should be [(0, 1, 3)] because all of its elements (i.e. (1, 0), (3, 0), (3, 1)) have pairwise values meeting the condition, whereas (3, 2) does not.
a = np.array([[0. , 0. , 0. , 0. , 0. ],
[0.96078379, 0. , 0. , 0. , 0. ],
[0.05498203, 0.0552454 , 0. , 0. , 0. ],
[0.46005028, 0.45468466, 0.11167813, 0. , 0. ],
[0.1030161 , 0.10350956, 0.00109096, 0.00928037, 0. ]])
pairs = np.where((a >= .11) & (a <= .99))
pairs_tuples = list(zip(pairs[0].tolist(), pairs[1].tolist()))
# [(1, 0), (3, 0), (3, 1), (3, 2)]
How to get to the below?
triplets_tuples = [(0, 1, 3)]
quadruplets_tuples = []
quintuplets_tuples = []
This has an easy part and an NP part. Here's the solution to the easy part.
Let's assume you have the full correlation matrix:
>>> c = a + a.T
>>> c
array([[0. , 0.96078379, 0.05498203, 0.46005028, 0.1030161 ],
[0.96078379, 0. , 0.0552454 , 0.45468466, 0.10350956],
[0.05498203, 0.0552454 , 0. , 0.11167813, 0.00109096],
[0.46005028, 0.45468466, 0.11167813, 0. , 0.00928037],
[0.1030161 , 0.10350956, 0.00109096, 0.00928037, 0. ]])
What you're doing is converting this into an adjacency matrix:
>>> adj = (a >= .11) & (a <= .99)
>>> adj.astype(int) # for readability below - False and True take a lot of space
array([[0, 1, 0, 1, 0],
[1, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[1, 1, 1, 0, 0],
[0, 0, 0, 0, 0]])
This now represents a graph where columns and rows corresponds to nodes, and a 1 is a line between them. We can use networkx to visualize this:
import networkx
g = networkx.from_numpy_matrix(adj)
networkx.draw(g)
You're looking for maximal fully-connected subgraphs, or "cliques", within this graph. This is the Clique problem, and is the NP part. Thankfully, networkx can solve that too:
>>> list(networkx.find_cliques(g))
[[3, 0, 1], [3, 2], [4]]
Here [3, 0, 1] is one of your triplets.

matlab ismember function in python

Although similar questions have been raised a couple of times, still I cannot make a function similar to the matlab ismember function in Python. In particular, I want to use this function in a loop, and compare in each iteration a whole matrix to an element of another matrix. Where the same value is occurring, I want to print 1 and in any other case 0.
Let say that I have the following matrices
d = np.reshape(np.array([ 2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ]),(1,10))
d_unique = np.unique(d)
then I have
d_unique
array([ 0. , 1. , 1.25, 1.5 , 1.75, 2.25])
Now I want to iterate like
J = np.zeros(np.size(d_unique))
for i in xrange(len(d_unique)):
J[i] = np.sum(ismember(d,d_unique[i]))
so as to take as an output:
J = [3,1,2,2,1,1]
Does anybody have any idea? Many thanks in advance.
In contrast to other answers, numpy has the built-in numpy.in1d for doing that.
Usage in your case:
bool_array = numpy.in1d(array1, array2)
Note: It also accepts lists as inputs.
EDIT (2021):
numpy now recommend using np.isin instead of np.in1d. np.isin preserves the shape of the input array, while np.in1d returns a flattened output.
To answer your question, I guess you could define a ismember similarly to:
def ismember(d, k):
return [1 if (i == k) else 0 for i in d]
But I am not familiar with numpy, so a little adjustement may be in order.
I guess you could also use Counter from collections:
>>> from collections import Counter
>>> a = [2.25, 1.25, 1.5, 1., 0., 1.25, 1.75, 0., 1.5, 0. ]
>>> Counter(a)
Counter({0.0: 3, 1.25: 2, 1.5: 2, 2.25: 1, 1.0: 1, 1.75: 1})
>>> Counter(a).keys()
[2.25, 1.25, 0.0, 1.0, 1.5, 1.75]
>>> c =Counter(a)
>>> [c[i] for i in sorted(c.keys())]
[3, 1, 2, 2, 1, 1]
Once again, not numpy, you will probably have to do some list(d) somewhere.
Try the following function:
def ismember(A, B):
return [ np.sum(a == B) for a in A ]
This should very much behave like the corresponding MALTAB function.
Try the ismember library from pypi.
pip install ismember
Example:
# Import library
from ismember import ismember
# data
d = [ 2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ]
d_unique = [ 0. , 1. , 1.25, 1.5 , 1.75, 2.25]
# Lookup
Iloc,idx = ismember(d, d_unique)
# Iloc is boolean defining existence of d in d_unique
print(Iloc)
# [[True True True True True True True True True True]]
# indexes of d_unique that exists in d
print(idx)
# array([5, 2, 3, 1, 0, 2, 4, 0, 3, 0], dtype=int64)
print(d_unique[idx])
array([2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ])
print(d[Iloc])
array([2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ])
# These vectors will match
d[Iloc]==d_unique[idx]

Map numpy array with ufunc

I'm trying to efficiently map a N * 1 numpy array of ints to a N * 3 numpy array of floats using a ufunc.
What I have so far:
map = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
ufunc = numpy.frompyfunc(lambda x: numpy.array(map[x], numpy.float32), 1, 1)
input = numpy.array([1, 2, 3], numpy.int32)
ufunc(input) gives a 3 * 3 array with dtype object. I'd like this array but with dtype float32.
You could use np.hstack:
import numpy as np
mapping = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
ufunc = np.frompyfunc(lambda x: np.array(mapping[x], np.float32), 1, 1, dtype = np.float32)
data = np.array([1, 2, 3], np.int32)
result = np.hstack(ufunc(data))
print(result)
# [ 0. 0. 0. 0.5 0.5 0.5 1. 1. 1. ]
print(result.dtype)
# float32
print(result.shape)
# (9,)
If your mapping is a numpy array, you can just use fancy indexing for this:
>>> valmap = numpy.array([(0, 0, 0), (0.5, 0.5, 0.5), (1, 1, 1)])
>>> input = numpy.array([1, 2, 3], numpy.int32)
>>> valmap[input-1]
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])
You can use ndarray fancy index to get the same result, I think it should be faster than frompyfunc:
map_array = np.array([[0,0,0],[0,0,0],[0.5,0.5,0.5],[1,1,1]], dtype=np.float32)
index = np.array([1,2,3,1])
map_array[index]
Or you can just use list comprehension:
map = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
np.array([map[i] for i in [1,2,3,1]], dtype=np.float32)
Unless I misread the doc, the output of np.frompyfunc on a scalar a object indeed: when using a ndarray as input, you'll get a ndarray with dtype=obj.
A workaround is to use the np.vectorize function:
F = np.vectorize(lambda x: mapper.get(x), 'fff')
Here, we force the dtype of F's output to be 3 floats (hence the 'fff').
>>> mapper = {1: (0, 0, 0), 2: (0.5, 1.0, 0.5), 3: (1, 2, 1)}
>>> inp = [1, 2, 3]
>>> F(inp)
(array([ 0. , 0.5, 1. ], dtype=float32), array([ 0., 0.5, 1.], dtype=float32), array([ 0. , 0.5, 1. ], dtype=float32))
OK, not quite what we want: it's a tuple of three float arrays (as we gave 'fff'), the first array being equivalent to [mapper[i][0] for i in inp]. So, with a bit of manipulation:
>>> np.array(F(inp)).T
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]], dtype=float32)

Categories