Creating a numpy matrix with dtypes

Creating a numpy matrix with dtypes - python

I want to create a numpy matrix with three columns, in which the first two columns contain integers and the third column contains floats. I want to start with an empty matrix, and add a single row every time in a for loop. However, I cannot get it to work to add a row to a numpy matrix with a specific data type. This is the code I started with:
import numpy as np
def grow_table():
dat_dtype = {
'names' : ['A', 'B', 'C'],
'formats' : ['i', 'i', 'd']}
S = np.zeros(0, dat_dtype)
X = np.array([1, 2, 3.5], dat_dtype)
S = np.vstack((S, X))
if __name__ == '__main__':
grow_table()
However, this gives a TypeError: expected a readable buffer object.
I then change the line in which I define the row as follows:
X = np.array((1, 2, 3.5), dat_dtype)
This line is accepted. However, now X is a tuple. If I try to print X[0], I end up with an IndexError: 0-d arrays can't be indexed. Furthermore, I can't add X to S, it will give me a ValueError: all the input array dimensions except for the concatenation axis must match exactly.
Next, I remove the names from the data type; in this case I end up with a ValueError: entry not a 2- or 3- tuple.
Am I on the right track of tackling this problem, or should I try it completely different?

I'm not a huge fan of the hybrid dtypes, could instead use separate arrays, arrays in a dictionary, or pandas data-frames. Anyway, here is how you can do it:
X = np.array([(1, 2, 3.5)], dat_dtype)
S = np.vstack((S[:,None], X, X, X))
Restacking each iteration is generally slow, and you may be better off making a list of the 1-row arrays and vstack-ing them at the end, or creating the array with known size and assigning to the elements.

I'm not a fan of growing arrays incrementally, but here's a way to do it:
import numpy as np
def grow_table():
dt=np.dtype({'names':['A','B','C'],'formats':['i','i','d']})
S = np.zeros(0, dtype=dt)
for i in range(5):
X = np.array((i, 2*i, i+.5), dtype=dt)
S = np.hstack((S, X))
return S
if __name__ == '__main__':
S = grow_table()
print S
print S['A']
producing:
[(0, 0, 0.5) (1, 2, 1.5) (2, 4, 2.5) (3, 6, 3.5) (4, 8, 4.5)]
[0 1 2 3 4]
S starts with shape (0,). X has shape (); it is 0d. In the end S has shape (5,). We have to use hstack because we are creating a 1d array; an array of tuples. That's what you get with a dtype like this. Also when assigning values to arrays like this, the values need to be in a tuple, not a list.
A better incremental build is:
def make_table(N=5):
dt=np.dtype({'names':['A','B','C'],'formats':['i','i','d']})
S = np.zeros(N, dtype=dt)
for i in range(N):
S[i] = (i, 2*i, i+.5)
return S
or even using a list of tuples:
def better(N=5):
dt=np.dtype({'names':['A','B','C'],'formats':['i','i','d']})
L = [(i, 2*i, i+.5) for i in range(N)]
return np.array(L, dtype=dt)
for csv output:
S = better()
np.savetxt('S.txt', S, fmt='%d, %d, %f')
produces:
0, 0, 0.500000
1, 2, 1.500000
...
Trying to savetxt a (N,1) array produces one or more errors.
savetxt attempts to write
for row in S:
write(fmt%row)
With the (N,) array, a row is (0, 0, 0.5), but for (N,1) it is [(0, 0, 0.5)].
np.savetxt('S.txt', S, fmt='%s')
works, producing
(0, 0, 0.5)
(1, 2, 1.5)
...
But you don't need this dtype if you just want to save 2 columns of ints and one float. Just let the fmt do all the work:
def simple(N=5):
return np.array([(i, 2*i, i+.5) for i in range(N)])
S = simple()
np.savetxt('S.txt',S, fmt='%d, %d, %f')

Related

Return 4 row of np array where the values are the biggest in column 1

I have the following array MyArray :
[['AZ' 0.144]
['RZ' 14.021]
['BH' 1003.487]
['NE' 1191.514]
['FG' 550.991]
['MA' nan]]
Where Array dim is :
MyArray.shape
(6,2)
How would I return the 4 Row where values are the biggest ?
So the output would be :
[['RZ' 14.021]
['BH' 1003.487]
['NE' 1191.514]
['FG' 550.991]]
I tried :
MyArray[np.argpartition(MyArray, -2)][:-4]
But this does return an error :
TypeError: '<' not supported between instances of 'float' and 'str'
What am I doing wrong ?

You just sort by second column and get last 4 rows:
import numpy as np
a = np.array(
[['AZ', 0.144],
['RZ', 14.021],
['BH', 1003.487],
['NE', 1191.514],
['FG', 550.991],
['MA', np.nan]],
)
a = a[~np.isnan(a[:, 1].astype(float))]
srt = a[a[:, 1].astype(float).argsort()]
print(srt[-4:, :])

Lets start with a remark on how to create MyArray:
You have to pass dtype=object, otherwise the array is of <U8 type.
Start the computation with setting the number of rows to retrieve:
n = 4
Then get the result running:
result = MyArray[np.argpartition(MyArray[:, 1], n)[:n]]
The result is:
array([['AZ', 0.144],
['RZ', 14.021],
['FG', 550.991],
['BH', 1003.487]], dtype=object)
How this code works:
np.argpartition(MyArray[:, 1], n) retrieves array([0, 1, 4, 2, 3, 5], dtype=int64).
First 4 elements are indices of rows with 4 lowest values in column 1.
…[:n] - leaves only the indices of the lowest rows.
MyArray[…] - retrieves the indicated rows.
Other possible solution, maybe easier to comprehend:
result = np.take(MyArray, np.argpartition(MyArray[:, 1], n)[:n], axis=0)

Divide each dimension with different number from a list in numpy

I have ndarray with the shape (3,3,3) and list with 3 numbers.
I want to divide the first dimension with the first number in the list, the second dimension with the second number and third dimension with the third number.
Example:
np.random.rand(3,3,3)
>>>array([[[0.90428811, 0.60637664, 0.45090308],
[0.17400851, 0.49163535, 0.62370288],
[0.58701608, 0.91207839, 0.69364496]],
[[0.85290321, 0.85170489, 0.48792597],
[0.02602198, 0.91088298, 0.14882673],
[0.63354821, 0.21764451, 0.30760075]],
[[0.64833375, 0.13583598, 0.50561519],
[0.42832468, 0.91146014, 0.41627495],
[0.71238947, 0.37868578, 0.05874898]]])
and the list:
lst=[0.215, 0.561,0.724]
I want the output result to be the results of this:
[0.90428811/0.215, 0.60637664/0.215, 0.45090308/0.215],
[0.17400851/0.215, 0.49163535/0.215, 0.62370288/0.215],
[0.58701608/0.215, 0.91207839/0.215, 0.69364496/0.215]],
[[0.85290321/0.561, 0.85170489/0.561, 0.48792597/0.561],
[0.02602198/0.561, 0.91088298/0.561, 0.14882673/0.561],
[0.63354821/0.561, 0.21764451/0.561, 0.30760075/0.561]],
[[0.64833375/0.724, 0.13583598/0.724, 0.50561519/0.724],
[0.42832468/0.724, 0.91146014/0.724, 0.41627495/0.724],
[0.71238947/0.724, 0.37868578/0.724, 0.05874898/0.724]]])
I have tried to do something like this (arr is the ndarray):
nums=np.arange(3)
for n in nums:
arr[i]=arr[i]/lst[i]
but got error:
IndexError: only integers, slices (:), ellipsis (...),
numpy.newaxis (None) and integer or boolean arrays are valid indices

Simply do this. It broadcasts the lst array to have shape (3, 1, 1) which easily goes with the shape of a.
Note that None is simply an alias for np.newaxis.
import numpy as np
a = np.random.randn(3,3,3)
lst = np.array([0.215, 0.561,0.724])
a / lst[:, None, None]

Use index broadcasting:
a = np.random.rand(3,3,3)
res = a / lst[:, np.newaxis, np.newaxis]
The reason this works is that lst[:, np.newaxis, np.newaxis] generates an array with shape (3, 1, 1) and numpy expands any dimension of size one to the required size (by repeating the element) for many common operations. This process is called broadcasting.
So in our example here, for the division the result of lst[:, np.newaxis, np.newaxis] would be expanded to:
[[[0.215, 0.215, 0.215],
[0.215, 0.215, 0.215],
[0.215, 0.215, 0.215]],
[[0.561, 0.561, 0.561],
[0.561, 0.561, 0.561],
[0.561, 0.561, 0.561]],
[[0.724, 0.724, 0.724],
[0.724, 0.724, 0.724],
[0.724, 0.724, 0.724]]]
Note that this expansion is happening only conceptually and numpy will not allocate more memory just to fill it with the same value over and over again.

This is not a so good looking solution (because of the lst declaration), but it works:
import numpy as np
np.random.rand(3,3,3)
arr = np.array([[[0.90428811, 0.60637664, 0.45090308],
[0.17400851, 0.49163535, 0.62370288],
[0.58701608, 0.91207839, 0.69364496]],
[[0.85290321, 0.85170489, 0.48792597],
[0.02602198, 0.91088298, 0.14882673],
[0.63354821, 0.21764451, 0.30760075]],
[[0.64833375, 0.13583598, 0.50561519],
[0.42832468, 0.91146014, 0.41627495],
[0.71238947, 0.37868578, 0.05874898]]])
lst = [[[0.215]], [[0.561]],[[0.724]]]
div = np.divide(arr, lst)
print(div)
The ouput will be:
[[[4.20599121 2.82035647 2.09722363]
[0.80934191 2.28667605 2.90094363]
[2.73030735 4.24222507 3.22625563]]
[[1.52032658 1.51819053 0.86974326]
[0.04638499 1.62367733 0.26528829]
[1.12931945 0.38795813 0.54830793]]
[[0.8954886 0.18761876 0.69836352]
[0.59160867 1.25892285 0.5749654 ]
[0.98396336 0.52304666 0.081145 ]]]

How to use np.where() to create a new array of specific rows?

I have an array (msaarr) of 1700 values, ranging from approximately 0 to 150. I know that 894 of these values should be less than 2, and I wish to create a new array containing only these values.
So far, I have attempted this code:
Combined = np.zeros(shape=(894,8))
for i in range(len(Spitzer)): #len(Spitzer) = 1700
index = np.where(msaarr <= 2)
Combined[:,0] = msaarr[index]
The reason there are eight columns is because I have more data associated with each value in msaarr that I also want to display. msaarr was created using several lines of code, which is why I haven't mentioned them here, but it is an array with shape (1700,1) with type float64.
The problem I'm having is that if I print msaarr[index], then I get an array of shape (893,), but when I attempt to assign this as my zeroth column, I get the error
ValueError: could not broadcast input array from shape (1699) into shape (894)
I also attempted
Combined[:,0] = np.extract(msaarr <= 2, msaarr)
Which gave the same error.
I thought at first this might just be some confusion with Python's zero-indexing, so I tried changing the shape to 893, and also tried to assign to a different column Combined[:,1], but I have the same error every time.
Alternatively, when I try:
Combined[:,1][i] = msaarr[index][i]
I get the error:
IndexError: index 894 is out of bounds for axis 0 with size 894
What am I doing wrong?
EDIT:
A friend pointed out that I might not be calling index correctly because it is a tuple, and so his suggestion was this:
index = np.where(msaarr < 2)
Combined[:,0] = msaarr[index[0][:]]
But I am still getting this error:
ValueError: could not broadcast input array from shape (893,1) into shape (893)
How can my shape be (893) and not (893, 1)?
Also, I did check, and len(index[0][:]) = 893, and len(msaarr[index[0][:]]) = 893.
The full code as of last attempts is:
import numpy as np
from astropy.io import ascii
from astropy.io import fits
targets = fits.getdata('/Users/vcolt/Dropbox/ATLAS source matches/OzDES.fits')
Spitzer = ascii.read(r'/Users/vcolt/Desktop/Catalogue/cdfs_spitzer.csv', header_start=0, data_start=1)
## Find minimum separations, indexed.
RADiffArr = np.zeros(shape=(len(Spitzer),1))
DecDiffArr = np.zeros(shape=(len(Spitzer),1))
msaarr = np.zeros(shape=(len(Spitzer),1))
Combined= np.zeros(shape=(893,8))
for i in range(len(Spitzer)):
x = Spitzer["RA_IR"][i]
y = Spitzer["DEC_IR"][i]
sep = abs(np.sqrt(((x - targets["RA"])*np.cos(np.array(y)))**2 + (y - targets["DEC"])**2))
minsep = np.nanmin(sep)
minseparc = minsep*3600
msaarr[i] = minseparc
min_positions = [j for j, p in enumerate(sep) if p == minsep]
x2 = targets["RA"][min_positions][0]
RADiff = x*3600 - x2*3600
RADiffArr[i] = RADiff
y2 = targets["DEC"][min_positions][0]
DecDiff = y*3600 - y2*3600
DecDiffArr[i] = DecDiff
index = np.where(msaarr < 2)
print msaarr[index].shape
Combined[:,0] = msaarr[index[0][:]]
I get the same error whether index = np.where(msaarr < 2) is in or out of the loop.

Take a look at using numpy.take in combination with numpy.where.
inds = np.where(msaarr <= 2)
new_msaarr = np.take(msaarr, inds)
If it is a multi-dimensional array, you can also add the axis keyword to take slices along that axis.

I think loop is not at the right place. np.where() will return an array of index of elements which matches the condition you have specified.
This should suffice
Index = np.where(msaarr <= 2)
Since index is an array. You need to loop over this index and fill the values in combined[:0]
Also I want to point out one thing. You have said that there will be 894 values less than 2 but in the code you are using less than and equal to 2.

np.where(condition) will return a tuple of arrays containing the indexes of elements that verify your condition.
To get an array of the elements verifying your condition use:
new_array = msaarr[msaarr <= 2]
>>> x = np.random.randint(0, 10, (4, 4))
>>> x
array([[1, 6, 8, 4],
[0, 6, 6, 5],
[9, 6, 4, 4],
[9, 6, 8, 6]])
>>> x[x>2]
array([6, 8, 4, 6, 6, 5, 9, 6, 4, 4, 9, 6, 8, 6])

numpy / python - indexing by arrays with duplicates

I'm trying to make a 3D histogram. Initially h = zeros((6,6,8)).
I'll explain my problem with an example. Suppose I have 3 lists of coordinates for h, each list for one dimension:
x = array([2,1,0,1,2,2])
y = array([1,3,0,3,2,1])
z = array([6,2,0,2,5,6]) (the coordinates (x[0],y[0],z[0]) and (x[6],y[6],z[6]) are duplicates, and (x[1],y[1],z[1]) and (x[3],y[3],z[3]) also are)
and also a list of corresponding quantities to accumulate into h:
q = array([1,2,5,9,8,7])
I tried and h[x,y,z] += q does not work because only q[5] = 7 is added to h[2,1,6] and q[0] = 1 is not.
How can I work around this? Thank you.

IIUC, you want np.add.at. To quote the docs: "For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once."
For example:
>>> np.add.at(h, [x,y,z], q)
>>> for i, val in np.ndenumerate(h):
... if val: print(i, val)
...
((0, 0, 0), 5.0)
((1, 3, 2), 11.0)
((2, 1, 6), 8.0)
((2, 2, 5), 8.0)

merge two numpy.array without a loop

I have a two numpy.arrays, I want to get following result efficiently
1.add the element's of b to a's sub-array
a=numpy.array([(1,2,3),(1,2,3)])
b=numpy.array([0,0])
->
c=[(0,1,2,3),(0,1,2,3)]
code in a loop
a=numpy.array([(1,2,3),(1,2,3)])
b=numpy.array([(0,0)])
c=numpy.zeros(2 , 4)
idx=0
for x in a:
c[idx]=(a[idx][0],a[idx][1],a[idx][2], b[idx])
idx = idx+1
and
2. Get an 2-D array with dimension(a.dim*b.dim, 2) from two 1-D arrays
a=numpy.array([(1,2)])
b=numpy.array([(3,4)])
->
c=[(1,3),(1,4),(2,3),(2,4)]
code in a loop
a=numpy.array([(1,2)])
b=numpy.array([(3,4)])
c=numpy.zeros(a.size*b.size , 2)
idx=0
for x in a:
for y in b:
c[idx]=(x,y)
idx = idx+1

For the first problem, you can define b differently and use numpy.hstack:
a = numpy.array([(1,2,3),(1,2,3)])
b = numpy.array([[0],[0]])
numpy.hstack((b,a))
Regarding the second problem, I would probably use sza's answer and create the numpy array from that result, if necessary. That technique was suggested in an old Stack Overflow question.

For the first one, you can do
>>> a=numpy.array([(1,2,3),(1,2,3)])
>>> b=numpy.array([0,0])
>>> [tuple(numpy.insert(x, 0, y)) for (x,y) in zip(a,b)]
[(0, 1, 2, 3), (0, 1, 2, 3)]
For the 2nd one, you can get the 2-D array like this
>>> a=numpy.array([(1,2)])
>>> b=numpy.array([(3,4)])
>>> import itertools
>>> c = list(itertools.product(a.tolist()[0], b.tolist()[0]))
[(1, 3), (1, 4), (2, 3), (2, 4)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a numpy matrix with dtypes - python

Related

Return 4 row of np array where the values are the biggest in column 1

Divide each dimension with different number from a list in numpy

How to use np.where() to create a new array of specific rows?

numpy / python - indexing by arrays with duplicates

merge two numpy.array without a loop

Categories

Resources