Converting nested lists of data into multidimensional Numpy arrays - python

In the code below I am building data up in a nested list. After the for loop what I would like is to cast it into a multidimensional Numpy array as neatly as possible. However, when I do the array conversion on it, it only seems to convert the outer list into an array. Even worse when I continue downward I wind up with dataPoints as shape (100L,)...so an array of lists where each list is my data (obviously I wanted a (100,3)). I have tried fooling with numpy.asanyarray() also but I can't seem to work it out. I would really like a 3d array from my 3d list from the outset if that is possible. If not, how can I get the array of lists into a 2d array without having to iterate and convert them all?
Edit: I am also open to better way of structuring the data from the outset if it makes processing easier. However, it is coming over a serial port and the size is not known beforehand.
import numpy as np
import time
data = []
for _i in range(100): #build some list of lists
d = [np.random.rand(), np.random.rand(), np.random.rand()]
data.append([d,time.clock()])
dataArray = np.array(data) #now I have an array of lists of a list(of data) and a time
dataPoints = dataArray[:,0] #this is the data in an array of lists

dataPoints is not a 2d list. Convert it first into a 2d list and then it will work:
d=np.array(dataPoints.tolist())
Now d is (100,3) as you wanted.

If a 2d array is what you want:
from itertools import chain
dataArray = np.array(list(chain(*data)),shape=(100,3))
I didn't work out the code so you may have to change the column/row ordering to get the shape to match.

Related

How to append an empty array by multiple 1D array

I want to create 2D array by multiple 1D array (1,7680) to have multiple number of arrays under each other creating 2D array (n,7680)
Any help would be appreciated
code
y=[]
t=0
movement=int(S*256)
if(S==0):
movement=_SIZE_WINDOW
while data.shape[1]-(t*movement+_SIZE_WINDOW) > 0:
for i in range(0, 22):
start = t*movement
stop = start+_SIZE_WINDOW
signals[i,:]=data[i,start:stop]
y=np.append(signals[i,:],y)
t=t+1
If the shape of the arrays you want to create is well defined the easiest and optimal way to do so is to create an empty array like this:
array_NxM = np.empty((N,M))
This will create an empty array with the desired shape, then you can fill the array by iterating through its elements.
Creating an array by appending 1d arrays is definitely not optimal but an acceptable way to do so would be to create a list, appending 1d arrays to it and then cast the list to a numpy array like this:
array_NxM = []
for i in range(M):
array_NxM.append(array_1xM)
array_NxM = np.array(array_NxM)
The worst way to do this is definitely to use np.append. If possible always avoid appending to a numpy array as this operations leads to a full copy in memory of the array and a read/write of it.

Random array from list of arrays by numpy.random.choice()

I have list of arrays similar to lstB and want to pick random collection of 2D arrays. The problem is that numpy somehow does not treat objects in lists equally:
lstA = [numpy.array(0), numpy.array(1)]
lstB = [numpy.array([0,1]), numpy.array([1,0])]
print(numpy.random.choice(lstA)) # returns 0 or 1
print(numpy.random.choice(lstB)) # returns ValueError: must be 1-dimensional
Is there an ellegant fix to this?
Let's call it semi-elegant...
# force 1d object array
swap = lstB[0]
lstB[0] = None
arrB = np.array(lstB)
# reinsert value
arrB[0] = swap
# and clean up
lstB[0] = swap
# draw
numpy.random.choice(arrB)
# array([1, 0])
Explanation: The problem you encountered appears to be that numpy when converting the input list to an array will make as deep an array as it can. Since all your list elements are sequences of the same length this will be 2d. The hack shown here forces it to make a 1d array of object dtype instead by temporarily inserting an incompatible element.
However, I personally would not use this. Because if you draw multiple subarrays with this method you'll get a 1d array of arrays which is probably not what you want and tedious to convert.
So I'd actually second what one of the comments recommends, i.e. draw ints and then use advanced indexing into np.array(lstB).

How to build a numpy array row by row in a for loop?

This is basically what I am trying to do:
array = np.array() #initialize the array. This is where the error code described below is thrown
for i in xrange(?): #in the full version of this code, this loop goes through the length of a file. I won't know the length until I go through it. The point of the question is to see if you can build the array without knowing its exact size beforehand
A = random.randint(0,10)
B = random.randint(0,10)
C = random.randint(0,10)
D = random.randint(0,10)
row = [A,B,C,D]
array[i:]= row # this is supposed to add a row to the array with A,C,B,D as column values
This code doesn't work. First of all it complains: TypeError: Required argument 'object' (pos 1) not found. But I don't know the final size of the array.
Second, I know that last line is incorrect but I am not sure how to call this in python/numpy. So how can I do this?
A numpy array must be created with a fixed size. You can create a small one (e.g., one row) and then append rows one at a time, but that will be inefficient. There is no way to efficiently grow a numpy array gradually to an undetermined size. You need to decide ahead of time what size you want it to be, or accept that your code will be inefficient. Depending on the format of your data, you can possibly use something like numpy.loadtxt or various functions in pandas to read it in.
Use a list of 1D numpy arrays, or a list of lists, and then convert it to a numpy 2D array (or use more nesting and get more dimensions if you need to).
import numpy as np
a = []
for i in range(5):
a.append(np.array([1,2,3])) # or a.append([1,2,3])
a = np.asarray(a) # a list of 1D arrays (or lists) becomes a 2D array
print(a.shape)
print(a)

Concatenating Numpy array to Numpy array of arrays

I'm trying to make a for loop that each time adds an array, to the end of an array of arrays and I can't quite put my finger on how to.
The general idea of the program:
for x in range(0,longnumber):
generatenewarray
add new array to end of array
So for example, the output of:
newArray = [1,2,3]
array = [[1,2,3,4],[1,4,3]]
would be: [[1,2,3,4],[1,4,3],[1,2,3]]
If the wording is poor let me know and I can try and edit it to be better!
Is this what you need?
list_of_arrays = []
for x in range(0,longnumber):
a = generatenewarray
list_of_arrays.append(a)
It's not pretty, but this will work. You turn both numpy arrays into lists, add those two lists, and finally convert the result into a new numpy array:
np.array(array.tolist() + newArray.tolist())

Appending rows onto a numpy matrix

I'm trying to append a 4x1 row of data onto a matrix in python. The matrix is initialized as empty, and then grows by one row during each iteration of a loop until the process ends. I won't know how many times the matrix will be appended, so initializing the array to a predetermined final size is not an option unfortunately. The issue that I'm finding with np.r_ is that the matrix and list being appended need to be the same size, which is rarely the case. Below is some pseudocode of what I've been working with.
import numpy as np
dataMatrix = np.empty([4,1])
def collectData():
receive data from hardware in the form of a 4x1 list
while receivingData:
newData = collectData()
dataMatrix = np.r_(dataMatrix, newData)
Does anyone have an idea of how to go about finding a solution to this issue?
As #hpaulj suggested you should use a list of lists and then convert to a NumPy matrix at the end. This will be at least 2x faster than building up the matrix using np.r_ or other NumPy methods
import numpy as np
dataMatrix = []
def collectData():
return 4x1 list
while receivingData:
dataMatrix.append(collectData())
dataMatrix = np.array(dataMatrix)
As a sidenote, with np.r_ the only requirement is that the first dimension of the matrix be equal to the first (and only, in your case) dimension of the array. Perhaps you used np.r_ when you should have used np.c_

Categories