Getting column from a multidimensional list - python

I have a quite involved nested list: each element is a tuple with two elements: one is an object, the other is an 3x2xn array. Here is a toy model.
toy=[('mol1',array([[[1,1,1],[2,2,2]],[[1,1,1],[2,2,2]]])),('mol2',array([[[1,1,1],[2,2,2]],[[1,1,1],[2,2,2]]]))]
How can I get a single column from that?
I am looking for
('mol1', 'mol2')
and for the 2Darrays like:
array([[1,1,1],[1,1,1],[1,1,1],[1,1,1]])
I have a solution but I think it is pretty inefficient:
zip(*toy)[0]
it returns
('mol1', 'mol2')
then
zip(*toy)[1][0][:,0]
which returns
array([[1, 1, 1],
[1, 1, 1]])
a for cycle like that
for i in range(len(toy)):
zip(*toy)[1][i][:,0]
gives all the element of the column and I can build it with a vstack

This should be reasonably efficient:
>>> tuple(t[0] for t in toy)
('mol1', 'mol2')
For the 2D array, with the help of numpy's vstack function:
>>> from numpy import vstack
>>> vstack([t[1][:, 0] for t in toy])
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])

You can use the array in numpy to store your data or convert yours to that, then use the column slicing function built in. In general numpy slicing is very fast.
import numpy as np
np.asarray(toy)[::, 0] # first column
# output
array(['mol1', 'mol2'],
dtype='|S4')

Related

Set rows in Python 2D array to another row without Numpy?

I want to "set" the values of a row of a Python nested list to another row without using NumPy.
I have a sample list:
lst = [[0, 0, 1],
[0, 2, 3],
[5, 2, 3]]
I want to make row 1 to row 2, row 2 to row 3, and row 3 to row 1. My desired output is:
lst = [[0, 2, 3],
[5, 2, 3],
[0, 0, 1]]
How can I do this without using Numpy?
I tried to do something like arr[[0, 1]] = arr[[1, 0]] but it gives the error 'NoneType' object is not subscriptable.
One very straightforward way:
arr = [arr[-1], *arr[:-1]]
Or another way to achieve the same:
arr = [arr[-1]] + arr[:-1]
arr[-1] is the last element of the array. And arr[:-1] is everything up to the last element of the array.
The first solution builds a new list and adds the last element first and then all the other elements. The second one constructs a list with only the last element and then extends it with the list containing the rest.
Note: naming your list an array doesn't make it one. Although you can access a list of lists like arr[i1][i2], it's still just a list of lists. Look at the array documentation for Python's actual array.
The solution user #MadPhysicist provided comes down to the second solution provided here, since [arr[-1]] == arr[-1:]
Since python does not actually support multidimensional lists, your task becomes simpler by virtue of the fact that you are dealing with a list containing lists of rows.
To roll the list, just reassemble the outer container:
result = lst[-1:] + lst[:-1]
Numpy has a special interpretation for lists of integers, like [0, 1], tuples, like :, -1, single integers, and slices. Python lists only understand single integers and slices as indices, and do not accept tuples as multidimensional indices, because, again, lists are fundamentally one-dimensional.
use this generalisation
arr = [arr[-1]] + arr[:-1]
which according to your example means
arr[0],arr[1],arr[2] = arr[1],arr[2],arr[0]
or
arr = [arr[2]]+arr[:2]
or
arr = [arr[2]]+arr[:-1]
You can use this
>>> lst = [[0, 0, 1],
[0, 2, 3],
[5, 2, 3]]
>>> lst = [*lst[1:], *lst[:1]]
>>>lst
[[0, 2, 3],
[5, 2, 3],
[0, 0, 1]]

How to optimize array storage within a numpy array?

I have a numpy array with shape (n, m):
import numpy as np
foo = np.zeros((5,5))
I make some calculations, getting results in a (n, 2) shape:
bar = np.zeros((8,2))
I want to store the calculation results within the array, since I might have to extend them after another calculation. I can do it like this:
foo = np.zeros((5,5), object)
# one calculation result for index (1, 1)
bar1 = np.zeros((8,2))
foo[1, 1] = bar1
# another calculation result for index (1, 1)
bar2 = np.zeros((5,2))
foo[1, 1] = np.concatenate((foo[1, 1], bar2))
however this seems quite odd to me since I have to do a lot of checking if the array has already got a value at this place or not. Additionally I don't know if using object as datatype is a good idea since I only want to store numpy specific data and not any python objects.
Is there a more numpy specific way to this approach?
defaultdict streamlines the task of adding values to dict elements incrementallly:
In [644]: from collections import defaultdict
Start with a dict that has default value of list, [].
In [645]: dd = defaultdict(list)
In [646]: dd[(1,1)].append(np.zeros((1,2),int))
In [647]: dd[(1,1)].append(np.ones((3,2),int))
In [648]: dd
Out[648]:
defaultdict(list,
{(1, 1): [array([[0, 0]]), array([[1, 1],
[1, 1],
[1, 1]])]})
Once we've collected all values, we can convert the nested lists into an array:
In [649]: dd[(1,1)] = np.concatenate(dd[(1,1)])
In [650]: dd
Out[650]:
defaultdict(list,
{(1, 1): array([[0, 0],
[1, 1],
[1, 1],
[1, 1]])})
In [652]: dict(dd)
Out[652]:
{(1,
1): array([[0, 0],
[1, 1],
[1, 1],
[1, 1]])}
In doing the conversion we will have to take care with keys with [], since we can't concatenate an empty list.

Numpy array indexing syntax

I am learning numpy newly and confused about syntax used in indexing of arrays. For example:
arr[2, 3]
This means element at intersection of 3nd row and 4th column. What confuses me separation of different indices by comma inside square brackets (like in function arguments). Doing so with python lists is not valid:
l = [[1, 2], [3, 4]]
l[1, 1]
Traceback (most recent call last):
File "", line 1, in
TypeError: list indices must be integers or slices, not tuple
So, if this not a valid python syntax, how numpy arrays work?
Use Colon ':' instead of commas ','.
In slicing or indexing is done using colon ':'
In your above example,
l = [[1, 2], [3, 4]]
->l[0] is [1,2] and -> l[1] is [3,4]
Read further documentation for better understanding.
Thank You
In your given example, you're comparing a numpy array to a list of lists. The main difference between the two is that a numpy array is predictable in terms of shape, data type of its elements, and so on, while a list can contain an arbitrary combination of any other python objects (lists, tuples, strings, etc.)
Take this as an example, say you create a numpy array like so:
arr = np.array([[0, 1], [2, 3], [4, 5]])
Here, the shape of arr is known right after instantiation "arr.shape returns (3,2)", so you can easily index the array with only a comma separated square bracket. On the other hand, take the list example:
l = [[0, 1], [2, 3], [4, 5]]
l[0] # This returns the list [0, 1]
l[0].append("HELLO")
l[0] # This returns the list [0, 1, "HELLO"]
A list is very unpredictable, as there's no way to know what each list element will return to you. So, the way we index a specific element in a list of lists is by using 2 square brackets "e.g. l[0][0]"
What if we created a non-uniform numpy array? Well, you get a similar behaviour to a list of lists:
arr = np.array([[0, 1], [2, 3], [4]]) # Here, you get a Warning!
print(arr) # Returns: array([list([0, 1]), list([2, 3]), list([4])], dtype=object)
In this case, you can't index the numpy array using [0, 0]. Instead, you have to use two square brackets, just like a list of lists
You can also check the documentation of ndarray for more info.

How to convert [2,3,4] to [0,0,1,1,1,2,2,2,2] to utilize tf.math.segment_sum?

Assume I have an array like [2,3,4], I am looking for a way in NumPy (or Tensorflow) to convert it to [0,0,1,1,1,2,2,2,2] to apply tf.math.segment_sum() on a tensor that has a size of 2+3+4.
No elegant idea comes to my mind, only loops and list comprehension.
Would something like this work for you?
import numpy
arr = numpy.array([2, 3, 4])
numpy.repeat(numpy.arange(arr.size), arr)
# array([0, 0, 1, 1, 1, 2, 2, 2, 2])
You don't need to use numpy. You can use nothing but list comprehensions:
>>> foo = [2,3,4]
>>> sum([[i]*foo[i] for i in range(len(foo))], [])
[0, 0, 1, 1, 1, 2, 2, 2, 2]
It works like this:
You can create expanded arrays by multiplying a simple one with a constant, so [0] * 2 == [0,0]. So for each index in the array, we expand with [i]*foo[i]. In other words:
>>> [[i]*foo[i] for i in range(len(foo))]
[[0, 0], [1, 1, 1], [2, 2, 2, 2]]
Then we use sum to reduce the lists into a single list:
>>> sum([[i]*foo[i] for i in range(len(foo))], [])
[0, 0, 1, 1, 1, 2, 2, 2, 2]
Because we are "summing" lists, not integers, we pass [] to sum to make an empty list the starting value of the sum.
(Note that this likely will be slower than numpy, though I have not personally compared it to something like #Patol75's answer.)
I really like the answer from #Patol75 since it's neat. However, there is no pure tensorflow solution yet, so I provide one which maybe kinda complex. Just for reference and fun!
BTW, I didn't see tf.repeat this API in tf master. Please check this PR which adds tf.repeat support equivalent to numpy.repeat.
import tensorflow as tf
repeats = tf.constant([2,3,4])
values = tf.range(tf.size(repeats)) # [0,1,2]
max_repeats = tf.reduce_max(repeats) # max repeat is 4
tiled = tf.tile(tf.reshape(values, [-1,1]), [1,max_repeats]) # [[0,0,0,0],[1,1,1,1],[2,2,2,2]]
mask = tf.sequence_mask(repeats, max_repeats) # [[1,1,0,0],[1,1,1,0],[1,1,1,1]]
res = tf.boolean_mask(tiled, mask) # [0,0,1,1,1,2,2,2,2]
Patol75's answer uses Numpy but Gort the Robot's answer is actually faster (on your example list at least).
I'll keep this answer up as another solution, but it's slower than both.
Given that a = [2,3,4] this could be done using a loop like so:
b = []
for i in range(len(a)):
for j in range(a[i]):
b.append(range(len(a))[i])
Which, as a list comprehension one-liner, is this diabolical thing:
b = [range(len(a))[i] for i in range(len(a)) for j in range(a[i])]
Both end up with b = [0,0,1,1,1,2,2,2,2].

Efficient way to multiply/add/devide each element of a list with each element of another list in Python

I want to multiply each element of a list with each element of another list.
lst1 = [1, 2, 1, 2]
lst2 = [2, 2, 2]
lst3 = []
for item in lst1:
for i in lst2:
rs = i * item
lst3.append(rs)
This would work, but this is very inefficient in large dataset and can take very long to complete loop. Note, the length of both lists can vary here.
I am fine with using non built-in data structures. I checked numpy and there seems to be way called broadcasting in ndarray. I am not sure if it is way to go. So far, multiplying array with scalar works as expected.
arr = np.arange(3)
arr * 2
This returns:
array([0, 2, 4])
But they way it works with another array is bit different and I can't seem to achieve above.
I guess it must be something straight forward, but I can't seem to find exact solution needed at the moment. Any input will be highly appreciated. Thanks.
Btw, there is similar question for Scheme without considering efficiency here
Edit: Thanks for you answers. Multiplication works, see Dval's answer. However, I also need to do addition and possibly division too exactly same way. For that reason, I updated question a bit.
Edit: I can work with numpy array itself, so I don't need to convert list to array and back.
Numpy is the way to go, specifically numpy.outer, which returns the product of each element as a matrix. Using .flatten() compresses it into 1d.
import numpy
lst1 = numpy.array([1, 2, 1, 2])
lst2 = numpy.array([2, 2, 2])
numpy.outer(lst1, lst2).flatten()
To add to updated question, addition seems to work similar way:
numpy.add.outer(lst1, lst2).flatten()
Linear operations on arrays like this are the meat-n-potatoes of numpy. Once you have defined the arrays, matrix like operations on them are easy, and relatively fast. That includes outer products and inner (matrix) products, as well as element by element operations.
For example:
In [133]: a=np.array([1,2,1,2])
In [134]: b=np.array([2,2,2])
A list comprehension version of your double loop:
In [135]: [i*j for i in a for j in b]
Out[135]: [2, 2, 2, 4, 4, 4, 2, 2, 2, 4, 4, 4]
A numpy product using broadcasting. Think a a[:,None] as turning a into a column vector.
In [136]: a[:,None]*b
Out[136]:
array([[2, 2, 2],
[4, 4, 4],
[2, 2, 2],
[4, 4, 4]])
element by element division also works
In [137]: a[:,None]/b
Out[137]:
array([[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])
But this gets more useful when combining operations.
There is overhead in converting lists to arrays, so I wouldn't recommend it for small occasional calculations.
Use numpy - it's a library designed for complex matrix-based arithmetic.
import numpy
lst1 = numpy.array([1, 2, 1, 2])
lst2 = numpy.array([2, 2, 2]]
numpy.outer(lst1, lst2)

Categories