build numpy array from list of tuples - python

I want to convert a list of tuples into a numpy array. For example:
items = [(1, 2), (3, 4)]
using np.asarray(items) I get:
array([[1, 2],
[3, 4]])
but if I try to append the items individually:
new_array = np.empty(0)
for item in items:
new_array = np.append(new_array, item)
the new_array loses the original shape and becomes:
array([1., 2., 3., 4.])
I can get it to the shape I wanted using new_array.reshape(2, 2):
array([[1., 2.],
[3., 4.]])
but how would I get that shape without reshaping?

Firstly you need to provide a correct shape to the array so that numpy could understand how to interpret the values provided to the append method.
Then, to prevent automatic flattening, specify the axis you wish to append on.
This code does what you intended to do:
import numpy as np
items = [(1,2),(3,4)]
new_array = np.ndarray((0,2))
for item in items:
new_array = np.append(new_array, [item], axis=0)
print(new_array) # [[1. 2.]
# [3. 4.]]

If you have a list of tuples, and you've decided you hate the standard array constructors (np.array, np.asarray, etc, which, as #JohnZwinck pointed out are probably the best answer) for some reason, the most efficient approach would be to preallocate the entire array and then assign to it:
items = [(1, 2), (3, 4)]
arr = np.empty((len(items), len(items[0])))
arr[...] = items
Even if what you want is to grow an array over time, row-by-row, it has been shown through detailed timings that you're usually better off just allocating a whole new array and then copying over the old values.
So given the above arr, by this approach the most efficient way to append a row would be:
newitem = (5, 6)
oldarr = arr
arr = np.empty((oldarr.shape[0] + 1, *oldarr.shape[1:]))
arr[:-1,:] = oldarr
arr[-1,:] = newitem

Related

How to index multi-dimensional array with another array?

Let's consider a multi-dimensional array
arr = np.zeros((3,2,4))
and some indexing array
index_arr = np.array([2, 1])
To clarify what I want to get, it's this (but I want to provide indices dynamically):
arr[2, 1] # array([0., 0., 0., 0.])
NOT this:
arr[[2, 1]] # which returns result with shape (2, 2, 4)
I would have liked to do something like this
arr[*index_arr] # using * to unpack the items of `index_arr`
but that gives a syntax error. Is there a native way to do what I'm asking for?

Add a level to Numpy array

I have a problem with a numpy array.
In particular, suppose to have a matrix
x = np.array([[1., 2., 3.], [4., 5., 6.]])
with shape (2,3), I want to convert the float numbers into list so to obtain the array [[[1.], [2.], [3.]], [[4.], [5.], [6.]]] with shape (2,3,1).
I tried to convert each float number to a list (i.e., x[0][0] = [x[0][0]]) but it does not work.
Can anyone help me? Thanks
What you want is adding another dimension to your numpy array. One way of doing it is using reshape:
x = x.reshape(2,3,1)
output:
[[[1.]
[2.]
[3.]]
[[4.]
[5.]
[6.]]]
There is a function in Numpy to perform exactly what #Valdi_Bo mentions. You can use np.expand_dims and add a new dimension along axis 2, as follows:
x = np.expand_dims(x, axis=2)
Refer:
np.expand_dims
Actually, you want to add a dimension (not level).
To do it, run:
result = x[...,np.newaxis]
Its shape is just (2, 3, 1).
Or save the result back under x.
You are trying to add a new dimension to the numpy array. There are multiple ways of doing this as other answers mentioned np.expand_dims, np.new_axis, np.reshape etc. But I usually use the following as I find it the most readable, especially when you are working with vectorizing multiple tensors and complex operations involving broadcasting (check this Bounty question that I solved with this method).
x[:,:,None].shape
(2,3,1)
x[None,:,None,:,None].shape
(1,2,1,3,1)
Well, maybe this is an overkill for the array you have, but definitely the most efficient solution is to use np.lib.stride_tricks.as_strided. This way no data is copied.
import numpy as np
x = np.array([[1., 2., 3.], [4., 5., 6.]])
newshape = x.shape[:-1] + (x.shape[-1], 1)
newstrides = x.strides + x.strides[-1:]
a = np.lib.stride_tricks.as_strided(x, shape=newshape, strides=newstrides)
results in:
array([[[1.],
[2.],
[3.]],
[[4.],
[5.],
[6.]]])
>>> a.shape
(2, 3, 1)

Adding a New Column to an Empty NumPy Array

I'm trying to add a new column to an empty NumPy array and am facing some troubles. I've looked at a lot of other questions, but for some reason they don't seem to be helping me solve the problem I'm facing, so I decided to ask my own question.
I have an empty NumPy array such that:
array1 = np.array([])
Let's say I have data that is of shape (100, 100), and want to append each column to array1 one by one. However, if I do for example:
array1 = np.append(array1, some_data[:, 0])
array1 = np.append(array1, some_data[:, 1])
I noticed that I won't be getting a (100, 2) matrix, but a (200,) array. So I tried to specify the axis as
array1 = np.append(array1, some_data[:, 0], axis=1)
which produces a AxisError: axis 1 is out of bounds for array of dimension 1.
Next I tried to use the np.c_[] method:
array1 = np.c_[array1, somedata[:, 0]]
which gives me a ValueError: all the input array dimensions except for the concatenation axis must match exactly.
Is there any way that I would be able to add columns to the NumPy array sequentially?
Thank you.
EDIT
I learned that my initial question didn't contain enough information for others to offer help, and made this update to make up for the initial mistake.
My big objective is to make a program that selects features in a "greedy fashion." Basically, I'm trying to take the design matrix some_data, which is a (100, 100) matrix containing floating point numbers as entries, and fitting a linear regression model with an increasing number of features until I find the best set of features.
For example, since I have a total of 100 features, the first round would fit the model on each 100, select the best one and store it, then continue with the remaining 99.
That's what I'm trying to do in my head, but I got stuck from the beginning with the problem I mentioned.
You start with a (0,) array and (n,) shaped one:
In [482]: arr1 = np.array([])
In [483]: arr1.shape
Out[483]: (0,)
In [484]: arr2 = np.array([1,2,3])
In [485]: arr2.shape
Out[485]: (3,)
np.append uses concatenate (but with some funny business when axis is not provided):
In [486]: np.append(arr1, arr2)
Out[486]: array([1., 2., 3.])
In [487]: np.append(arr1, arr2,axis=0)
Out[487]: array([1., 2., 3.])
In [489]: np.concatenate([arr1, arr2])
Out[489]: array([1., 2., 3.])
And trying axis=1
In [488]: np.append(arr1, arr2,axis=1)
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
<ipython-input-488-457b8657453e> in <module>()
----> 1 np.append(arr1, arr2,axis=1)
/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in append(arr, values, axis)
4526 values = ravel(values)
4527 axis = arr.ndim-1
-> 4528 return concatenate((arr, values), axis=axis)
AxisError: axis 1 is out of bounds for array of dimension 1
Look at the whole message - the error occurs in the concatenate step. You can't concatenate 1d arrays along axis=1.
Using np.append or even np.concatenate iteratively is slow (it creates a new array each time), and hard to initialize correctly. It is a poor substitute for the widely use list append-to-empty-list recipe.
np.c_ is also just a cover function for concatenate.
There isn't just one empty array. np.array([[]]) and np.array([[[]]]) also have 0 elements.
If you want to add a column to an array, you need to start with a 2d array, and the column also needs to be 2d.
Here's an example of a proper concatenation of 2 2d arrays:
In [490]: np.concatenate([ np.zeros((3,0),int), np.arange(3)[:,None]], axis=1)
Out[490]:
array([[0],
[1],
[2]])
column_stack is another cover function for concatenate that makes sure the inputs are 2d. But even with that getting an initial 'empty' array is tricky.
In [492]: np.column_stack([np.zeros(3,int), np.arange(3)])
Out[492]:
array([[0, 0],
[0, 1],
[0, 2]])
In [493]: np.column_stack([np.zeros((3,0),int), np.arange(3)])
Out[493]:
array([[0],
[1],
[2]])
np.c_ is a lot like column_stack, though implemented in a different way:
In [496]: np.c_[np.zeros(3,int), np.arange(3)]
Out[496]:
array([[0, 0],
[0, 1],
[0, 2]])
The basic message is, that when using np.concatenate you need to pay attention to dimensions. Its variants allow you to fudge things a bit, but you really need to understand that fudging to get things right, especially when starting from this poorly defined idea of a 'empty' array.
I usually use concatenate method and do it like this:
# Some stuff
alldata = None
....
array1 = np.random.random((100,1))
if alldata is None: alldata = array1
...
array2 = np.random.random((100,1))
alldata = np.concatenate((alldata,array2),axis=1)
In case, you are working with vectors:
alldata = None
....
array1 = np.random.random((100,))
if alldata is None: alldata = array1[:,np.newaxis]
...
array2 = np.random.random((100,))
alldata = np.concatenate((alldata,array2[:,np.newaxis]),axis=1)

Is there a tensorflow equivalent to np.empty?

Numpy has this helper function, np.empty, which will:
Return a new array of given shape and type, without initializing entries.
I find it pretty useful when I want to create a tensor using tf.concat since:
The number of dimensions of the input tensors must match, and all dimensions except axis must be equal.
So it comes in handy to start with an empty tensor of an expected shape. Is there any way to achieve this in tensorflow?
[edit]
A simplified example of why I want this
netInput = np.empty([0, 4])
netTarget = np.empty([0, 4])
inputWidth = 2
for step in range(data.shape.as_list()[-2]-frames_width-1):
netInput = tf.concat([netInput, data[0, step:step + frames_width, :]], -2)
target = tf.concat([target, data[0, step + frames_width + 1:step + frames_width + 2, :]], -2)
In this example, if netInput or netTarget are initialized, I'll be concatenating an extra example with that initialization. And to initialize them with the first value, I need to hack the loop. Nothing mayor, I just wondered if there is a 'tensorflow' way to solve this.
In TF 2,
tensor = tf.reshape(tf.convert_to_tensor(()), (0, n))
worked for me.
If you're creating an empty tensor, tf.zeros will do
>>> a = tf.zeros([0, 4])
>>> tf.concat([a, [[1, 2, 3, 4], [5, 6, 7, 8]]], axis=0)
<tf.Tensor: shape=(2, 4), dtype=float32, numpy=
array([[1., 2., 3., 4.],
[5., 6., 7., 8.]], dtype=float32)>
The closest thing you can do is create a variable that you do not initialize. If you use tf.global_variables_initializer() to initialize your variables, disable putting your variable in the list of global variables during initialization by setting collections=[].
For example,
import numpy as np
import tensorflow as tf
x = tf.Variable(np.empty((2, 3), dtype=np.float32), collections=[])
y = tf.Variable(np.empty((2, 3), dtype=np.float32))
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# y has been initialized with the content of "np.empty"
y.eval()
# x is not initialized, you have to do it yourself later
x.eval()
Here np.empty is provided to x only to specify its shape and type, not for initialization.
Now for operations such as tf.concat, you actually don't have (indeed cannot) manage the memory yourself -- you cannot preallocate the output as some numpy functions allow you to. Tensorflow already manages memory and does smart tricks such as reusing memory block for the output if it detects it can do so.

How can I initialize an empty Numpy array with a given number of dimensions?

I basically want to initialize an empty 6-tensor, like this:
a = np.array([[[[[[]]]]]])
Is there a better way than writing the brackets explicitly?
You can use empty or zeros.
For example, to create a new array of 2x3, filled with zeros, use: numpy.zeros(shape=(2,3))
You can do something like np.empty(shape = [1] * (dimensions - 1) + [0]).
Example:
>>> a = np.array([[[[[[]]]]]])
>>> b = np.empty(shape = [1] * 5 + [0])
>>> a.shape == b.shape
True
Iteratively adding rows of that rank-1 using np.concatenate(a,b,axis=0)
Don't. Creating an array iteratively is slow, since it has to create a new array at each step. Plus a and b have to match in all dimensions except the concatenation one.
np.concatenate((np.array([[[]]]),np.array([1,2,3])), axis=0)
will give you dimensions error.
The only thing you can concatenate to such an array is an array with size 0 dimenions
In [348]: np.concatenate((np.array([[]]),np.array([[]])),axis=0)
Out[348]: array([], shape=(2, 0), dtype=float64)
In [349]: np.concatenate((np.array([[]]),np.array([[1,2]])),axis=0)
------
ValueError: all the input array dimensions except for the concatenation axis must match exactly
In [354]: np.array([[]])
Out[354]: array([], shape=(1, 0), dtype=float64)
In [355]: np.concatenate((np.zeros((1,0)),np.zeros((3,0))),axis=0)
Out[355]: array([], shape=(4, 0), dtype=float64)
To work iteratively, start with a empty list, and append to it; then make the array at the end.
a = np.zeros((1,1,1,1,1,0)) could be concatenated on the last axis with another np.ones((1,1,1,1,1,n)) array.
In [363]: np.concatenate((a,np.array([[[[[[1,2,3]]]]]])),axis=-1)
Out[363]: array([[[[[[ 1., 2., 3.]]]]]])
You could directly use the ndarray constructor:
numpy.ndarray(shape=(1,) * 6)
Or the empty variant, since it seems to be more popular:
numpy.empty(shape=(1,) * 6)
This should do it:
x = np.array([])

Categories