Appending rows onto a numpy matrix - python

I'm trying to append a 4x1 row of data onto a matrix in python. The matrix is initialized as empty, and then grows by one row during each iteration of a loop until the process ends. I won't know how many times the matrix will be appended, so initializing the array to a predetermined final size is not an option unfortunately. The issue that I'm finding with np.r_ is that the matrix and list being appended need to be the same size, which is rarely the case. Below is some pseudocode of what I've been working with.
import numpy as np
dataMatrix = np.empty([4,1])
def collectData():
receive data from hardware in the form of a 4x1 list
while receivingData:
newData = collectData()
dataMatrix = np.r_(dataMatrix, newData)
Does anyone have an idea of how to go about finding a solution to this issue?

As #hpaulj suggested you should use a list of lists and then convert to a NumPy matrix at the end. This will be at least 2x faster than building up the matrix using np.r_ or other NumPy methods
import numpy as np
dataMatrix = []
def collectData():
return 4x1 list
while receivingData:
dataMatrix.append(collectData())
dataMatrix = np.array(dataMatrix)
As a sidenote, with np.r_ the only requirement is that the first dimension of the matrix be equal to the first (and only, in your case) dimension of the array. Perhaps you used np.r_ when you should have used np.c_

Related

Use scipy.sparse.dok_matrix.setdiag on only a part of a dok_matrix object

Let's suppose I have a (very large) sparse matrix being a scipy.sparse.dok_matrix object. I want to set the diagonal of only a submatrix to certain value. I first thought something like this would work:
import scipy.sparse as sp
dim = 20 # dim can go up to large numbers
A = sp.dok_matrix((num, num))
A[num//2:-1,num//2:-1].setdiag(2)
, but this only leads to an empty matrix (because of the way the matrix is internally stored using arrays, I suppose?). I know that for this small example I could use setdiag on the whole matrix and plug in an array with zeros at the beginning, but this won't be sufficent for larger matrix dimensions as the array would get too big.
I also tried:
A[num//2:-1,num//2:-1] = 2*sp.eye((num-1)//2)
This does what I want it to do, but much too slow. Is there a way to get to the same result faster (i.e. whithout setting all the entries of the submatrix explicitly)?

Python: return the row index of the minimum in a matrix

I wanna print the index of the row containing the minimum element of the matrix
my matrix is matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
and the code
matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
a = np.array(matrix)
buff_min = matrix.argmin(axis = 0)
print(buff_min) #index of the row containing the minimum element
min = np.array(matrix[buff_min])
print(str(min.min(axis=0))) #print the minium of that row
print(min.argmin(axis = 0)) #index of the minimum
print(matrix[buff_min]) # print all row containing the minimum
after running, my result is
1
3
1
[22, 3, 4, 12]
the first number should be 2, because the minimum is 2 in the third list ([34,6,4,5,8,2]), but it returns 1. It returns 3 as minimum of the matrix.
What's the error?
I am not sure which version of Python you are using, i tested it for Python 2.7 and 3.2 as mentioned your syntax for argmin is not correct, its should be in the format
import numpy as np
np.argmin(array_name,axis)
Next, Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
np.array([[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
Also, to mention if you can resize your numpy array thing might work, i haven't tested it, but by the concept that should be an easy solution. But i will prefer use a nested list in this case of input matrix
Does this work?
np.where(a == a.min())[0][0]
Note that all rows of the matrix need to contain the same number of elements.

How to build a numpy array row by row in a for loop?

This is basically what I am trying to do:
array = np.array() #initialize the array. This is where the error code described below is thrown
for i in xrange(?): #in the full version of this code, this loop goes through the length of a file. I won't know the length until I go through it. The point of the question is to see if you can build the array without knowing its exact size beforehand
A = random.randint(0,10)
B = random.randint(0,10)
C = random.randint(0,10)
D = random.randint(0,10)
row = [A,B,C,D]
array[i:]= row # this is supposed to add a row to the array with A,C,B,D as column values
This code doesn't work. First of all it complains: TypeError: Required argument 'object' (pos 1) not found. But I don't know the final size of the array.
Second, I know that last line is incorrect but I am not sure how to call this in python/numpy. So how can I do this?
A numpy array must be created with a fixed size. You can create a small one (e.g., one row) and then append rows one at a time, but that will be inefficient. There is no way to efficiently grow a numpy array gradually to an undetermined size. You need to decide ahead of time what size you want it to be, or accept that your code will be inefficient. Depending on the format of your data, you can possibly use something like numpy.loadtxt or various functions in pandas to read it in.
Use a list of 1D numpy arrays, or a list of lists, and then convert it to a numpy 2D array (or use more nesting and get more dimensions if you need to).
import numpy as np
a = []
for i in range(5):
a.append(np.array([1,2,3])) # or a.append([1,2,3])
a = np.asarray(a) # a list of 1D arrays (or lists) becomes a 2D array
print(a.shape)
print(a)

Editing every value in a numpy matrix

I have a numpy matrix which I filled with data from a *.csv-file
csv = np.genfromtxt (file,skiprows=22)
matrix = np.matrix(csv)
This is a 64x64 matrix which looks like
print matrix
[[...,...,....]
[...,...,.....]
.....
]]
Now I need to take the logarithm math.log10() of every single value and safe it into another 64x64 matrix.
How can I do this? I tried
matrix_lg = np.matrix(csv)
for i in range (0,len(matrix)):
for j in range (0,len(matrix[0])):
matrix_lg[i,j]=math.log10(matrix[i,j])
but this only edited the first array (meaning the first row) of my initial matrix.
It's my first time working with python and I start getting confused.
You can just do:
matrix_lg = numpy.log10(matrix)
And it will do it for you. It's also much faster to do it this vectorized way instead of looping over every entry in python. It will also handle domain errors more gracefully.
FWIW though, the issue with your posted code is that the len() for matrices don't work exactly the same as they do for nested lists. As suggested in the comments, you can just use matrix.shape to get the proper dims to iterate through:
matrix_lg = np.matrix(csv)
for i in range(0,matrix_lg.shape[0]):
for j in range(0,matrix_lg.shape[1]):
matrix_lg[i,j]=math.log10(matrix_lg[i,j])

Converting nested lists of data into multidimensional Numpy arrays

In the code below I am building data up in a nested list. After the for loop what I would like is to cast it into a multidimensional Numpy array as neatly as possible. However, when I do the array conversion on it, it only seems to convert the outer list into an array. Even worse when I continue downward I wind up with dataPoints as shape (100L,)...so an array of lists where each list is my data (obviously I wanted a (100,3)). I have tried fooling with numpy.asanyarray() also but I can't seem to work it out. I would really like a 3d array from my 3d list from the outset if that is possible. If not, how can I get the array of lists into a 2d array without having to iterate and convert them all?
Edit: I am also open to better way of structuring the data from the outset if it makes processing easier. However, it is coming over a serial port and the size is not known beforehand.
import numpy as np
import time
data = []
for _i in range(100): #build some list of lists
d = [np.random.rand(), np.random.rand(), np.random.rand()]
data.append([d,time.clock()])
dataArray = np.array(data) #now I have an array of lists of a list(of data) and a time
dataPoints = dataArray[:,0] #this is the data in an array of lists
dataPoints is not a 2d list. Convert it first into a 2d list and then it will work:
d=np.array(dataPoints.tolist())
Now d is (100,3) as you wanted.
If a 2d array is what you want:
from itertools import chain
dataArray = np.array(list(chain(*data)),shape=(100,3))
I didn't work out the code so you may have to change the column/row ordering to get the shape to match.

Categories