I have what I'm quite sure is a simple question, but I'm not having much luck finding an explanation online.
I have an array of flux values and a corresponding array of time values. Obviously those two arrays are one-to-one (one flux value for each time value). However, some of my flux values are NaNs.
My question is this: How do I remove the corresponding values from the time array when I remove the NaNs from the flux array?
These arrays are large enough (several thousand entries) that it would be exceedingly cumbersome to do it by hand.
You could try boolean indexing:
In [13]: time
Out[13]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
In [15]: flux
Out[15]: array([ 1., 1., 1., 1., 1., nan, nan, nan, 1., 1., 1.])
In [16]: time2 = time[~np.isnan(flux)]
In [17]: flux2 = flux[~np.isnan(flux)]
In [18]: time2
Out[18]: array([ 0., 1., 2., 3., 4., 8., 9., 10.])
In [19]: flux2
Out[19]: array([ 1., 1., 1., 1., 1., 1., 1., 1.])
Just write time = time[~np.isnan(flux)] etc. if you don't need the original arrays any more.
A more complicated way is to use masked arrays:
In [20]: m = np.ma.masked_invalid(flux)
In [21]: time2 = time[~m.mask]
In [22]: time2
Out[22]: array([ 0., 1., 2., 3., 4., 8., 9., 10.])
In [23]: flux2
Out[23]: array([ 1., 1., 1., 1., 1., 1., 1., 1.])
In [22]: flux2 = flux[~m.mask]
Related
I have a numpy arrary:
import numpy as np
pval=np.array([[0., 0.,0., 0., 0.,0., 0., 0.],
[0., 0., 0., 0., 0.,0., 0., 0.]])
And a vectorized function:
def getnpx(age):
return pval[0]+age
vgetnpx = np.frompyfunc(getnpx, 1, 1)
vgetnpx(1)
The output:
array([1., 1., 1., 1., 1., 1., 1., 1.])
However if I want to set a variable for pval:
def getnpx(mt,age):
return mt[0]+age
vgetnpx = np.frompyfunc(getnpx, 2, 1)
vgetnpx(pval,1)
I received an error:
TypeError: 'float' object is not subscriptable
What is the correct way to set a variable for pval ?Any friend can help?
I don't see why you are trying to use frompyfunc. That's for passing array arguments to a function that only takes scalar inputs.
In [97]: pval=np.array([[0., 0.,0., 0., 0.,0., 0., 0.],
...: [0., 0., 0., 0., 0.,0., 0., 0.]])
In the first case you use global pval, and use just 1 age value. No need to frompyfunc:
In [98]: pval[0]+1
Out[98]: array([1., 1., 1., 1., 1., 1., 1., 1.])
And if you want to pass pval as argument, just do:
In [99]: def foo(mt,age):
...: return mt[0]+age
...:
In [100]: foo(pval,1)
Out[100]: array([1., 1., 1., 1., 1., 1., 1., 1.])
You gave a link to an earlier question that I answered. The sticky point in that case was that your function returned an array that could vary in size. I showed how to use it with a list comprehension. I also showed how to tweak vectorize so it would happy returning an object dtype result. Alternatively use frompyfunc to return that object. In all those cases the function argument was a scalar, a single number.
If your goal is to add a different age to each row of pval, just do:
In [102]: pval + np.array([[1],[2]])
Out[102]:
array([[1., 1., 1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2., 2., 2.]])
I have a text file containing an upper 'triangular' matrix, the lower values being omitted (here's an example below):
3 5 3 5 1 8 1 6 5 8
5 8 1 1 6 2 9 6 4
2 0 5 2 1 0 0 3
2 2 5 1 0 1 0
1 3 6 3 6 1
4 2 4 3 7
4 0 0 1
0 1 8
2 1
1
Since the file in question is ~10000 lines in size, I was wondering if there was a 'smart' way to generate a numpy matrix from it e.g. using the genfromtxt function. However using it directly throws an error on the lines of
Line #12431 (got 6 columns instead of 12437) and using filling_values won't work as there's no way to designate the no missing value placeholders.
Right now I have to resort to manually open and close the file:
import numpy as np
def load_updiag(filename, size):
output = np.zeros((size,size))
line_count = 0
for line in f:
data = line.split()
output[line_count,line_count:size]= data
line_count += 1
return output
Which I feel is probably not very scalable for large file sizes.
Is there a way to properly use genfromtxt (or any other optimized function from numpy's library) on such matrices?
You can read the raw data from the file into a string, and then use np.fromstring to get a 1-d array of the upper triangular part of the matrix:
with open('data.txt') as data_file:
data = data_file.read()
arr = np.fromstring(data, sep=' ')
Alternatively, you can define a generator to read one line of your file at a time, then use np.fromiter to read a 1-d array from this generator:
def iter_data(path):
with open(path) as data_file:
for line in data_file:
yield from line.split()
arr = np.fromiter(iter_data('data.txt'), int)
If you know the size of the matrix (which you can determine from the first line of the file), you can specify the count keyword argument of np.fromiter so that the function will pre-allocate exactly the right amount of memory, which will be faster. That's what these functions do:
def iter_data(fileobj):
for line in fileobj:
yield from line.split()
def read_triangular_array(path):
with open(path) as fileobj:
n = len(fileobj.readline().split())
count = int(n*(n+1)/2)
with open(path) as fileobj:
return np.fromiter(iter_data(fileobj), int, count=count)
This "wastes" a little work, since it opens the file twice to read the first line and get the count of entries. An "improvement" would be to save the first line and chain it with the iterator over the rest of the file, as in this code:
from itertools import chain
def iter_data(fileobj):
for line in fileobj:
yield from line.split()
def read_triangular_array(path):
with open(path) as fileobj:
first = fileobj.readline().split()
n = len(first)
count = int(n*(n+1)/2)
data = chain(first, iter_data(fileobj))
return np.fromiter(data, int, count=count)
All of these approaches yield
>>> arr
array([ 3., 5., 3., 5., 1., 8., 1., 6., 5., 8., 5., 8., 1.,
1., 6., 2., 9., 6., 4., 2., 0., 5., 2., 1., 0., 0.,
3., 2., 2., 5., 1., 0., 1., 0., 1., 3., 6., 3., 6.,
1., 4., 2., 4., 3., 7., 4., 0., 0., 1., 0., 1., 8.,
2., 1., 1.])
This compact representation might be all you need, but if you want the full square matrix you can allocate a zeros matrix of the right size and copy arr into it using np.triu_indices_from, or you can use scipy.spatial.distance.squareform:
>>> from scipy.spatial.distance import squareform
>>> squareform(arr)
array([[ 0., 3., 5., 3., 5., 1., 8., 1., 6., 5., 8.],
[ 3., 0., 5., 8., 1., 1., 6., 2., 9., 6., 4.],
[ 5., 5., 0., 2., 0., 5., 2., 1., 0., 0., 3.],
[ 3., 8., 2., 0., 2., 2., 5., 1., 0., 1., 0.],
[ 5., 1., 0., 2., 0., 1., 3., 6., 3., 6., 1.],
[ 1., 1., 5., 2., 1., 0., 4., 2., 4., 3., 7.],
[ 8., 6., 2., 5., 3., 4., 0., 4., 0., 0., 1.],
[ 1., 2., 1., 1., 6., 2., 4., 0., 0., 1., 8.],
[ 6., 9., 0., 0., 3., 4., 0., 0., 0., 2., 1.],
[ 5., 6., 0., 1., 6., 3., 0., 1., 2., 0., 1.],
[ 8., 4., 3., 0., 1., 7., 1., 8., 1., 1., 0.]])
Consider the following numpy.arrays:
a = np.array([1., 2., 3.])
b = np.array([4., 5.])
c = np.array([6., 7.])
I need to combine these so I end up with the following:
[(1., 4., 6.), (1., 5., 7.), (2., 4., 6.), (2., 5., 7.), (3., 4., 6.), (3., 5., 7.)]
Note that in this case, the array a happens to be the largest array. This is not guaranteed however. Nor is the length guaranteed. In other words, any array could be the longest and each array is of arbitrary length.
I tried using itertools.izip_longest but I can only use fillvalue for the tuple with 3. which will not work. I tried itertools.product also but my result is not a true cartesian product.
You can transpose b and c and then create a product of the a with the transposed array using itertools.product:
>>> from itertools import product
>>> [np.insert(j,0,i) for i,j in product(a,np.array((b,c)).T)]
[array([ 1., 4., 6.]), array([ 1., 5., 7.]), array([ 2., 4., 6.]), array([ 2., 5., 7.]), array([ 3., 4., 6.]), array([ 3., 5., 7.])]
>>>
Let's say you have:
a = np.array([4., 5.])
b = np.array([1., 2., 3.])
c = np.array([6., 7.])
d = np.array([5., 1])
e = np.array([3., 2.])
Now, if you know before-hand which one is the longest array, which is b in this case, you can use an approach based upon np.meshgrid -
# Concatenate elements from identical positions from the equal arrays
others = np.vstack((a,c,d,e)).T # If you have more arrays, edit this line
# Get grided version of the longest array and
# grided-indices for indexing into others array
X,Y = np.meshgrid(np.arange(others.shape[0]),b)
# Concatenate grided longest array and grided indexed others for final output
out = np.hstack((Y.ravel()[:,None],others[X.ravel()]))
Sample run -
In [47]: b
Out[47]: array([ 1., 2., 3.])
In [48]: a
Out[48]: array([ 4., 5.])
In [49]: c
Out[49]: array([ 6., 7.])
In [50]: d
Out[50]: array([ 5., 1.])
In [51]: e
Out[51]: array([ 3., 2.])
In [52]: out
Out[52]:
array([[ 1., 4., 6., 5., 3.],
[ 1., 5., 7., 1., 2.],
[ 2., 4., 6., 5., 3.],
[ 2., 5., 7., 1., 2.],
[ 3., 4., 6., 5., 3.],
[ 3., 5., 7., 1., 2.]])
If the length differences are not extreme (check inputs first) I'd be tempted to pad out the shorter lists to the length of the longest with None and generate all the permutations (27 of them for 3 lists of 3 elements). Then
results = []
for candidate in possibles:
if not (None in candidate): results.append(candidate)
Reasons not to do this: if the cube of the length of the longest list is significant in terms of memory usage (space to store N cubed possibles) or CPU usage.
I have a set of data which is in columns, where the first column is the x values. How do i read this in?
If you want to store both, x and y values you can do
ydat = np.zeros((data.shape[1]-1,data.shape[0],2))
# write the x data
ydat[:,:,0] = data[:,0]
# write the y data
ydat[:,:,1] = data[:,1:].T
Edit:
If you want to store only the y-data in the sub arrays you can simply do
ydat = data[:,1:].T
Working example:
t = np.array([[ 0., 0., 1., 2.],
[ 1., 0., 1., 2.],
[ 2., 0., 1., 2.],
[ 3., 0., 1., 2.],
[ 4., 0., 1., 2.]])
a = t[:,1:].T
a
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.]])
Why does the following code not produce the expected assignment?
A = np.array([[ 9., 2., 7.], [ 3., 3., 1.], [ 4., 1., 6.]])
L = np.zeros([3,3])
i = range(1,3)
L[i][:,[0]] = A[i][:,[0]] / A[0,0]
L continues to contain all zeros. How do I produce what I expect to see (i.e. [[ 0., 0., 0.], [ .333, 0., 0.], [ .444, 0., 0.]])?
You should do direct indexing L[i,0]=A[i,0]/A[0,0], otherwise you are working on a view rather than then a slice of the original array.