Editing every value in a numpy matrix - python

I have a numpy matrix which I filled with data from a *.csv-file
csv = np.genfromtxt (file,skiprows=22)
matrix = np.matrix(csv)
This is a 64x64 matrix which looks like
print matrix
[[...,...,....]
[...,...,.....]
.....
]]
Now I need to take the logarithm math.log10() of every single value and safe it into another 64x64 matrix.
How can I do this? I tried
matrix_lg = np.matrix(csv)
for i in range (0,len(matrix)):
for j in range (0,len(matrix[0])):
matrix_lg[i,j]=math.log10(matrix[i,j])
but this only edited the first array (meaning the first row) of my initial matrix.
It's my first time working with python and I start getting confused.

You can just do:
matrix_lg = numpy.log10(matrix)
And it will do it for you. It's also much faster to do it this vectorized way instead of looping over every entry in python. It will also handle domain errors more gracefully.
FWIW though, the issue with your posted code is that the len() for matrices don't work exactly the same as they do for nested lists. As suggested in the comments, you can just use matrix.shape to get the proper dims to iterate through:
matrix_lg = np.matrix(csv)
for i in range(0,matrix_lg.shape[0]):
for j in range(0,matrix_lg.shape[1]):
matrix_lg[i,j]=math.log10(matrix_lg[i,j])

Related

Whats the best way to iterate over multidimensional array and tracking/doing operations on iteration index

I need to do a lot of operations on multidimensional numpy arrays and therefor i am experimenting towards the best approach on this.
So let's say i have an array like this:
A = np.random.uniform(0, 1, size = 100).reshape(20, 5)
My goal is to get the maximum value numpy.amax() of each entry and it's index. So may A[0] be something like this:
A[0] = [ 0.64570441 0.31781716 0.07268926 0.84183753 0.72194227]
I want to get the maximum and the index of that maximum [0.84183753][0, 3]. No specific representation of the results needed, just an example. I even need the horizontal index only.
I tried using numpy's nditer object:
A_it = np.nditer(A, flags=['multi_index'], op_flags=['readwrite'])
while not A_it.finished:
print(np.amax(A_it.value))
print(A_it.multi_index[1])
A_it.iternext()
I can access every element of the array and its index over the iterations that way but i don't seem to be able to bring the numpy.amax() function in each element and the index together syntax wise. Can i even do it using nditerobject?
Also, in Numpy: Beginner nditer i read that using nditer or using iterations in numpy usually means that i am doing something wrong. But i can't find another convenient way to achieve my goal here without any iterations. Obviously i am a total beginner in numpy and python in general, so any keyword to search for or hint is very much appreciated.
A major problem with nditer is that it iterates over each element, not each row. It's best used as a stepping stone toward a Cython or C rewrite of your code.
If you just want the maximum for each row of your array, a simple iteration or list comprehension will do nicely.
for row in A: print(np.amax(row))
or to turn it back into an array:
np.array([np.amax(row) for row in A])
But you can get the same values by giving amax an axis parameter
np.amax(A,axis=1)
np.argmax identifies the location of the maximum.
np.argmax(A,axis=1)
With the argmax values you could then select the max values as well,
ind=np.argmax(A,axis=1)
A[np.arange(A.shape[0]),ind]
(speed's about the same as repeating the np.amax call).

How to get rid of zeros of each array in a list of array in Python?

I am trying to do time series data analysis on all the fracking wells in pennsylvania, and naturally a lot of these are dry wells with 0 production. I want to create the histogram of each array inside the list without zero in it, therefore the total length of each array will shrink a little bit
P = [data3P, data4P, data5P, data6P, data7P, data8P, data9P, data10P]
for i in P
N = []
for i in data3P:
if i >0:
N.append(i)
N
I think I should do it in a for loop, but just not sure how to do that for all the arrays in the list. Shall I use a double for loop?
If you are dealing with large amounts of data, numpy is your friend. You can create a masked array (where the zeros are masked), and apply the regular histogram function, see this answer for an example.
I'm not 100% sure if this is what you need, but if you want to gather all the NumPy arrays datanP but without any zeros they might contain, you can do this:
[a[a!=0] for a in P]
It would help if you showed what one of those input arrays looks like, and what you'd like to get out of the processing you're trying to do.

Appending rows onto a numpy matrix

I'm trying to append a 4x1 row of data onto a matrix in python. The matrix is initialized as empty, and then grows by one row during each iteration of a loop until the process ends. I won't know how many times the matrix will be appended, so initializing the array to a predetermined final size is not an option unfortunately. The issue that I'm finding with np.r_ is that the matrix and list being appended need to be the same size, which is rarely the case. Below is some pseudocode of what I've been working with.
import numpy as np
dataMatrix = np.empty([4,1])
def collectData():
receive data from hardware in the form of a 4x1 list
while receivingData:
newData = collectData()
dataMatrix = np.r_(dataMatrix, newData)
Does anyone have an idea of how to go about finding a solution to this issue?
As #hpaulj suggested you should use a list of lists and then convert to a NumPy matrix at the end. This will be at least 2x faster than building up the matrix using np.r_ or other NumPy methods
import numpy as np
dataMatrix = []
def collectData():
return 4x1 list
while receivingData:
dataMatrix.append(collectData())
dataMatrix = np.array(dataMatrix)
As a sidenote, with np.r_ the only requirement is that the first dimension of the matrix be equal to the first (and only, in your case) dimension of the array. Perhaps you used np.r_ when you should have used np.c_

Appending arrays in numpy

I have a loop that reads through a file until the end is reached. On each pass through the loop, I extract a 1D numpy array. I want to append this array to another numpy array in the 2D direction. That is, I might read in something of the form
x = [1,2,3]
and I want to append it to something of the form
z = [[0,0,0],
[1,1,1]]
I know I can simply do z = numpy.append([z],[x],axis = 0) and achieve my desired result of
z = [[0,0,0],
[1,1,1],
[1,2,3]]
My issue comes from the fact that in the first run through the loop, I don't have anything to append to yet because first array read in is the first row of the 2D array. I dont want to have to write an if statement to handle the first case because that is ugly. If I were working with lists I could simply do z = [] before the loop and every time I read in an array, simply do z.append(x) to achieve my desired result. However I can find no way doing a similar procedure in numpy. I can create an empty numpy array, but then I can't append to it in the way I want. Can anyone help? Am I making any sense?
EDIT:
After some more research, I found another workaround that does technically do what I want although I think I will go with the solution given by #Roger Fan given that numpy appending is very slow. I'm posting it here just so its out there.
I can still define z = [] at the beginning of the loop. Then append my arrays with `np.append(z, x). This will ultimately give me something like
z = [0,0,0,1,1,1,1,2,3]
Then, because all the arrays I read in are of the same size, after the loop I can simply resize with `np.resize(n, m)' and get what I'm after.
Don't do it. Read the whole file into one array, using for example numpy.genfromtext().
With this one array, you can then loop over the rows, loop over the columns, and perform other operations using slices.
Alternatively, you can create a regular list, append a lot of arrays to that list, and in the end generate your desired array from the list using either numpy.array(list_of_arrays) or, for more control, numpy.vstack(list_of_arrays).
The idea in this second approach is "delayed array creation": find and organize your data first, and then create the desired array once, already in its final form.
As #heltonbiker mentioned in his answer, something like np.genfromtext is going to be the best way to do this if it fits your needs. Otherwise, I suggest reading the answers to this question about appending to numpy arrays. Basically, numpy array appending is extremely slow and should be avoided whenever possible. There are two much better (and faster by about 20x) solutions:
If you know the length in advance, you can preallocate your array and assign to it.
length_of_file = 5000
results = np.empty(length_of_file)
with open('myfile.txt', 'r') as f:
for i, line in enumerate(f):
results[i] = processing_func(line)
Otherwise, just keep a list of lists or list of arrays and convert it to a numpy array all at once.
results = []
with open('myfile.txt', 'r') as f:
for line in f:
results.append(processing_func(line))
results = np.array(results)

Numpy: average over one dimension in "jagged" 3D array

Suppose I have an N*M*X-dimensional array "data", where N and M are fixed, but X is variable for each entry data[n][m].
(Edit: To clarify, I just used np.array() on the 3D python list which I used for reading in the data, so the numpy array is of dimensions N*M and its entries are variable-length lists)
I'd now like to compute the average over the X-dimension, so that I'm left with an N*M-dimensional array. Using np.average/mean with the axis-argument doesn't work, so the way I'm doing it right now is just iterating over N and M and appending the manually computed average to a new list, but that just doesn't feel very "python":
avgData=[]
for n in data:
temp=[]
for m in n:
temp.append(np.average(m))
avgData.append(temp)
Am I missing something obvious here? I'm trying to freshen up my python skills while I'm at it, so interesting/varied responses are more than welcome! :)
Thanks!
What about using np.vectorize:
do_avg = np.vectorize(np.average)
data_2d = do_avg(data)
data = np.array([[1,2,3],[0,3,2,4],[0,2],[1]]).reshape(2,2)
avg=np.zeros(data.shape)
avg.flat=[np.average(x) for x in data.flat]
print avg
#array([[ 2. , 2.25],
# [ 1. , 1. ]])
This still iterates over the elements of data (nothing un-Pythonic about that). But since there's nothing special about the shape or axes of data, I'm just using data.flat. While appending to Python list, with numpy it is better to assign values to the elements of an existing array.
There are fast numeric methods to work with numpy arrays, but most (if not all) work with simple numeric dtypes. Here the array elements are object (either list or array), numpy has to resort to the usual Python iteration and list operations.
For this small example, this solution is a bit faster than Zwicker's vectorize. For larger data the two solutions take about the same time.

Categories