I have a 100x2 array D and a 100x1 array c (with entries +/- 1) I'm trying to make a scatter plot of the columns in D corresponding to c = 1.
I tried something like this: plt.scatter(D[0][c==1],D[1][c==1]) but it throws up IndexError: too many indices for array
I'm aware that I've use list comprehension or something of that sort. I'm fairly new to Python and hence struggling with the format.
Thanks a lot.
Concept
You can use np.where to select only rows from D that are 1 in your array C:
D = np.array([[0.25, 0.25], [0.75, 0.75]])
C = np.array([1, 0])
Using np.where, we can select only rows that are 1 in C:
>>> D[np.where(C==1)]
array([[0.25, 0.25]])
Example On your actual data:
D = np.random.randn(100, 2)
C = np.random.randint(0, 2, (100, 1))
valid = D[np.where(C.ravel()==1)]
import matplotlib.pyplot as plt
plt.scatter(valid[:, 0], valid[:, 1])
Output:
You can use numpy for this (assuming you have two numpy arrays, otherwise you can convert them into numpy arrays):
import numpy as np
c_ones = np.where(c == 1) # Finds all indices where c == 1
d_0 = D[0][c_ones]
d_1 = D[1][c_ones]
Then you can plot d_0, d_1 as normal.
For converting your lists if needed,
C_np = np.asarray(c)
D_np = np.asarray(D)
And then perform np.where on C_np as shown above.
Would this solve your issue?
Related
When I organize data frame with 1 variable, it works well.
import numpy as np
a = np.random.normal(45, 9, 10000)
source = {"Genotype": ["CV1"]*10000, "AGW": a}
df=pd.DataFrame(source)
df
However, when I add more variables, it does not work.
import numpy as np
a = np.random.normal(45, 9, 10000)
b = np.random.normal(35, 10, 10000)
source = {"Genotype": ["CV1"]*10000 + ["CV2"]*10000,
"AGW": a + b}
df=pd.DataFrame(source)
df
and it says "ValueError: All arrays must be of the same length"
I think the AGW column calculates actual a + b which results in 10,000 rows, not array numbers vertically. I want to make data frame with two columns with 20,000 rows.
Could you let me know how to do it?
Thanks!!
Use numpy.hstack for join 2 numpy arrays:
source = {"Genotype": ["CV1"]*10000 + ["CV2"]*10000,
"AGW": np.hstack((a, b))}
df=pd.DataFrame(source)
Or join lists:
source = {"Genotype": ["CV1"]*10000 + ["CV2"]*10000,
"AGW": list(a) + list(b)}
df=pd.DataFrame(source)
I want to slice the same numpy array (data_arra) multiple times to find each time the values in a different range
data_ar shpe: (203,)
range_ar shape: (1000,)
I implemented it with a for loop, but it takes way to long since I have a lot of data_arrays:
#create results array
results_ar = np.zeros(shape=(1000),dtype=object)
i=0
for range in range_ar:
results_ar[i] = data_ar[( (data_ar>=(range-delta)) & (data_ar<(range+delta)) )].values
i+=1
so for example:
data_ar = [1,3,4,6,10,12]
range_ar = [7,4,2]
delta= 3
expected output:
(note results_ar shpae=(3,) dtype=object, each element is an array)
results_ar[[6,10];
[1,3,4,6];
[1,3,4]]
some idea on how to tackle this?
You can use numba to speed up the computations.
import numpy as np
import numba
from numba.typed import List
import timeit
data_ar = np.array([1,3,4,6,10,12])
range_ar = np.array([7,4,2])
delta = 3
def foo(data_ar, range_ar):
results_ar = list()
for i in range_ar:
results_ar.append(data_ar[( (data_ar>=(i-delta)) & (data_ar<(i+delta)) )])
print(timeit.timeit(lambda :foo(data_ar, range_ar)))
#numba.njit(parallel=True, fastmath=True)
def foo(data_ar, range_ar):
results_ar = List()
for i in range_ar:
results_ar.append(data_ar[( (data_ar>=(i-delta)) & (data_ar<(i+delta)) )])
print(timeit.timeit(lambda :foo(data_ar, range_ar)))
15.53519330600102
1.6557575029946747
An almost 9.8 times speedup.
You could use np.searchsorted like this:
data_ar = np.array([1, 3, 4, 6, 10, 12])
range_ar = np.array([7, 4, 2])
delta = 3
bounds = range_ar[:, None] + delta * np.array([-1, 1])
result = [data_ar[slice(*row)] for row in np.searchsorted(data_ar, bounds)]
I have two loops that runs for a different x and y coordinates and for each different (x,y) coordinates, a linear equation is being solved for force 1 and force 2 using matrices method i.e. finding the inverse of A if Ax = C. For each loop it gives an answer as a matrix where first element is force 1 and 2nd element is force 2 at those specific coordinates. Here's my code:
import numpy as np
from scipy import linalg
def Force():
Force1 = np.zeros((160,90))
Force2 = np.zeros((160,90))
for x in np.arange(0,16.1,0.1):
for y in np.arange(1,9.1,0.1):
l1 = np.hypot(x,y)
l2 = np.hypot(15-x,y)
A = np.array([[(x/l1),((x-15)/l2)],[(y/l1),(y/l2)]])
c = np.array([[0],[70*9.81]])
F = linalg.solve(A,c)
Force1[x,y] = F[0]
Force2[x,y] = F[1]
print("Force 1 = {} \nForce 2 = {}\n".format(F[0], F[1]))
so at each point (x,y) a matrix [[Force 1],[Force 2]] is solved. Now I would like to append all the Force1(s) into a list of Force1[x,y] and similarly for Forces2(s) so that I can do
plt.imshow[Force1]
plt.imshow[Force2]
to plot a 2 heatmaps. How would I go about doing that?
This solves your issue - you were trying to assign to indices in Force1 and Force2 of type float. I've changed the for loops to use enumerate instead, and tweaked the assignment so it assigns F[0][0] and F[1][0].
import numpy as np
from scipy import linalg
def Force():
Force1 = np.zeros((160,90))
Force2 = np.zeros((160,90))
for i, x in enumerate(np.arange(0,16,0.1)):
for j, y in enumerate(np.arange(1,9,0.1)):
l1 = np.hypot(x,y)
l2 = np.hypot(15-x,y)
A = np.array([[(x/l1),((x-15)/l2)],[(y/l1),(y/l2)]])
c = np.array([[0],[70*9.81]])
F = linalg.solve(A,c)
Force1[i, j] = F[0][0]
Force2[i, j] = F[1][0]
# print("Force 1 = {} \nForce 2 = {}\n".format(F[0], F[1]))
plt.imshow(Force1)
plt.show()
plt.imshow(Force2)
plt.show()
Force()
The generated plots are:
and
I have two numpy arrays, a in size (20*3*3) and b in size (3*3). Let a=(a1, a2, ..., a20). I want to calculate the matrix product element wise like this:
c=(c1, c2, ..., c20), ci=b.Taib, i=1~20.
How can I do it efficiently using numpy?
A slow version using for loop is like this:
a = np.random.sample((20, 3, 3))
b = np.random.sample((3, 3))
c = np.zeros_like(a)
for i0, ai in enumerate(a):
c[i0] = np.dot(b.T, np.dot(ai, b))
You can try np.matmul(b.T, np.dot(a,b)):
import numpy as np
import pandas as pd
a = np.random.sample((4, 3, 3))
b = np.random.sample((3, 3))
c = np.zeros_like(a)
# using for loop
for i0, ai in enumerate(a):
c[i0] = np.dot(b.T, np.dot(ai, b))
# alternative method
e = np.zeros_like(a)
e = np.matmul(b.T, np.dot(a,b))
# checking for equal
print(np.array_equal(c, e))
You can just put your operation in a vectorized form because your inputs are NumPy arrays. No need of explicit for loop and indexing.
P.S: Thanks to #yatu who found that the answer was not the same shape. Now I added the swapaxes to get the consistent answer as OP's approach
np.random.seed(1)
a = np.random.sample((4, 3, 3))
b = np.random.sample((3, 3))
c = np.dot(b.T, np.dot(a, b)).swapaxes(0,1)
print (c)
[[[0.96496962 1.30807122 0.55382266]
[1.42300972 1.98975139 0.81871374]
[0.32358338 0.45493059 0.1346777 ]]
[[1.46772447 2.15650254 0.87555186]
[2.26335921 3.33689922 1.28679305]
[0.71561413 0.96507585 0.54309736]]
[[1.50660527 2.36946435 0.59771395]
[2.49705244 3.76328176 1.06274954]
[0.96090846 1.43636151 0.31807679]]
[[1.03706878 1.94107476 0.61884642]
[1.74739926 3.07419808 1.03537019]
[0.59565039 1.09721382 0.37283626]]]
I've read the masked array documentation several times now, searched everywhere and feel thoroughly stupid. I can't figure out for the life in me how to apply a mask from one array to another.
Example:
import numpy as np
y = np.array([2,1,5,2]) # y axis
x = np.array([1,2,3,4]) # x axis
m = np.ma.masked_where(y>2, y) # filter out values larger than 5
print m
[2 1 -- 2]
print np.ma.compressed(m)
[2 1 2]
So this works fine.... but to plot this y axis, I need a matching x axis. How do I apply the mask from the y array to the x array? Something like this would make sense, but produces rubbish:
new_x = x[m.mask].copy()
new_x
array([5])
So, how on earth is that done (note the new x array needs to be a new array).
Edit:
Well, it seems one way to do this works like this:
>>> import numpy as np
>>> x = np.array([1,2,3,4])
>>> y = np.array([2,1,5,2])
>>> m = np.ma.masked_where(y>2, y)
>>> new_x = np.ma.masked_array(x, m.mask)
>>> print np.ma.compressed(new_x)
[1 2 4]
But that's incredibly messy! I'm trying to find a solution as elegant as IDL...
I had a similar issue, but involving loads more masking commands and more arrays to apply them. My solution is that I do all the masking on one array and then use the finally masked array as the condition in the mask_where command.
For example:
y = np.array([2,1,5,2]) # y axis
x = np.array([1,2,3,4]) # x axis
m = np.ma.masked_where(y>5, y) # filter out values larger than 5
new_x = np.ma.masked_where(np.ma.getmask(m), x) # applies the mask of m on x
The nice thing is you can now apply this mask to many more arrays without going through the masking process for each of them.
Why not simply
import numpy as np
y = np.array([2,1,5,2]) # y axis
x = np.array([1,2,3,4]) # x axis
m = np.ma.masked_where(y>2, y) # filter out values larger than 5
print list(m)
print np.ma.compressed(m)
# mask x the same way
m_ = np.ma.masked_where(y>2, x) # filter out values larger than 5
# print here the list
print list(m_)
print np.ma.compressed(m_)
code is for Python 2.x
Also, as proposed by joris, this do the work new_x = x[~m.mask].copy() giving an array
>>> new_x
array([1, 2, 4])
This may not bee 100% what OP wanted to know,
but it's a cute little piece of code I use all the time -
if you want to mask several arrays the same way, you can use this generalized function to mask a dynamic number of numpy arrays at once:
def apply_mask_to_all(mask, *arrays):
assert all([arr.shape == mask.shape for arr in arrays]), "All Arrays need to have the same shape as the mask"
return tuple([arr[mask] for arr in arrays])
See this example usage:
# init 4 equally shaped arrays
x1 = np.random.rand(3,4)
x2 = np.random.rand(3,4)
x3 = np.random.rand(3,4)
x4 = np.random.rand(3,4)
# create a mask
mask = x1 > 0.8
# apply the mask to all arrays at once
x1, x2, x3, x4 = apply_mask_to_all(m, x1, x2, x3, x4)