matplotlib list not showing 0 or Nan's - python

Hi I have a list with X and a list with Y. I want to plot them using matplotlib.plt.plot(x,y)
I have some values of y that are 0 or 'empty' how can I make it that matplot doesn't connect the 0 or empty dots? and shows the rest? Do I have to split it into different lists?
Thanks in advance!

If you use numpy.nan instead of 0 or empty the line gets disconnected.
See:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,10,20)
y = np.sin(x)
y[3] = np.nan
y[7] = np.nan
plt.plot(x,y)

Use np.where to set the data not to be plotted to np.nan.
from numpy import *
a=linspace(1, 50, 1000)
b=sin(a)
c=where(b>-0.7, b, nan) #In this example, we plot only the values larger than -0.7
#if you want to skip the 0, c=where(b!=0, b, nan)
plt.plot(c)

Related

Replace outlier values with NaN in numpy? (preserve length of array)

I have an array of magnetometer data with artifacts every two hours due to power cycling.
I'd like to replace those indices with NaN so that the length of the array is preserved.
Here's a code example, adapted from https://www.kdnuggets.com/2017/02/removing-outliers-standard-deviation-python.html.
import numpy as np
import plotly.express as px
# For pulling data from CDAweb:
from ai import cdas
import datetime
# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ 'PG2',
start,
end,
['thg_mag_'+ 'pg2']
)
x =data['UT']
y =data['VERTICAL_DOWN_-_Z']
def reject_outliers(y): # y is the data in a 1D numpy array
n = 5 # 5 std deviations
mean = np.mean(y)
sd = np.std(y)
final_list = [x for x in y if (x > mean - 2 * sd)]
final_list = [x for x in final_list if (x < mean + 2 * sd)]
return final_list
px.scatter(reject_outliers(y))
print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
px.line(y=y, x=x)
# px.scatter(y) # It looks like the outliers are successfully dropped.
# px.line(y=reject_outliers(y), x=x) # This is the line I'd like to see work.
When I run 'px.scatter(reject_outliers(y))', it looks like the outliers are successfully getting dropped:
...but that's looking at the culled y vector relative to the index, rather than the datetime vector x as in the above plot. As the debugging text indicates, the vector is shortened because the outlier values are dropped rather than replaced.
How can I edit my 'reject_outliers()` function to assign those values to NaN, or to adjacent values, in order to keep the length of the array the same so that I can plot my data?
Use else in the list comprehension along the lines of:
[x if x_condition else other_value for x in y]
Got a less compact version to work. Full code:
import numpy as np
import plotly.express as px
# For pulling data from CDAweb:
from ai import cdas
import datetime
# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ 'PG2',
start,
end,
['thg_mag_'+ 'pg2']
)
x =data['UT']
y =data['VERTICAL_DOWN_-_Z']
def reject_outliers(y): # y is the data in a 1D numpy array
mean = np.mean(y)
sd = np.std(y)
final_list = np.copy(y)
for n in range(len(y)):
final_list[n] = y[n] if y[n] > mean - 5 * sd else np.nan
final_list[n] = final_list[n] if final_list[n] < mean + 5 * sd else np.nan
return final_list
px.scatter(reject_outliers(y))
print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
# px.line(y=y, x=x)
px.line(y=reject_outliers(y), x=x) # This is the line I wanted to get working - check!
More compact answer, sent via email by a friend:
In numpy you can select/index based on a Boolean array, and then make assignment with it:
def reject_outliers(y): # y is the data in a 1D numpy array
n = 5 # 5 std deviations
mean = np.mean(y)
sd = np.std(y)
final_list = y.copy()
final_list[np.abs(y - mean) > n * sd] = np.nan
return final_list
I also noticed that you didn’t use the value of n in your example code.
Alternatively, you can use the where method (https://numpy.org/doc/stable/reference/generated/numpy.where.html)
np.where(np.abs(y - mean) > n * sd, np.nan, y)
You don’t need the .copy() if you don’t mind modifying the input array.
Replace np.mean and np.std with np.nanmean and np.nanstd if you want the function to work on arrays that already contain nans, i.e. if you want to use this function recursively.
The answer about using if else in a list comprehension would work, but avoiding the list comprehension makes the function much faster if the arrays are large.

boxplot structure disappears when pandas contains nan [duplicate]

I am using matplotlib to plot a box figure but there are some missing values (NaN). Then I found it doesn't display the box figure within the columns having NaN values.
Do you know how to solve this problem?
Here are the codes.
import numpy as np
import matplotlib.pyplot as plt
#==============================================================================
# open data
#==============================================================================
filename='C:\\Users\\liren\\OneDrive\\Data\\DATA in the first field-final\\ks.csv'
AllData=np.genfromtxt(filename,delimiter=";",skip_header=0,dtype='str')
TreatmentCode = AllData[1:,0]
RepCode = AllData[1:,1]
KsData= AllData[1:,2:].astype('float')
DepthHeader = AllData[0,2:].astype('float')
TreatmentUnique = np.unique(TreatmentCode)[[3,1,4,2,8,6,9,7,0,5,10],]
nT = TreatmentUnique.size#nT=number of treatments
#nD=number of deepth;nR=numbers of replications;nT=number of treatments;iT=iterms of treatments
nD = 5
nR = 6
KsData_3D = np.zeros((nT,nD,nR))
for iT in range(nT):
Treatment = TreatmentUnique[iT]
TreatmentFilter = TreatmentCode == Treatment
KsData_Filtered = KsData[TreatmentFilter,:]
KsData_3D[iT,:,:] = KsData_Filtered.transpose()iD = 4
fig=plt.figure()
ax = fig.add_subplot(111)
plt.boxplot(KsData_3D[:,iD,:].transpose())
ax.set_xticks(range(1,nT+1))
ax.set_xticklabels(TreatmentUnique)
ax.set_title(DepthHeader[iD])
Here is the final figure and some of the treatments are missing in the box.
You can remove the NaNs from the data first, then plot the filtered data.
To do that, you can first find the NaNs using np.isnan(data), then perform the bitwise inversion of that Boolean array using the ~: bitwise inversion operator. Use that to index the data array, and you filter out the NaNs.
filtered_data = data[~np.isnan(data)]
In a complete example (adapted from here)
Tested in python 3.10, matplotlib 3.5.1, seaborn 0.11.2, numpy 1.21.5, pandas 1.4.2
For 1D data:
import matplotlib.pyplot as plt
import numpy as np
# fake up some data
np.random.seed(2022) # so the same data is created each time
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low), 0)
# Add a NaN
data[40] = np.NaN
# Filter data using np.isnan
filtered_data = data[~np.isnan(data)]
# basic plot
plt.boxplot(filtered_data)
plt.show()
For 2D data:
For 2D data, you can't simply use the mask above, since then each column of the data array would have a different length. Instead, we can create a list, with each item in the list being the filtered data for each column of the data array.
A list comprehension can do this in one line: [d[m] for d, m in zip(data.T, mask.T)]
import matplotlib.pyplot as plt
import numpy as np
# fake up some data
np.random.seed(2022) # so the same data is created each time
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low), 0)
data = np.column_stack((data, data * 2., data + 20.))
# Add a NaN
data[30, 0] = np.NaN
data[20, 1] = np.NaN
# Filter data using np.isnan
mask = ~np.isnan(data)
filtered_data = [d[m] for d, m in zip(data.T, mask.T)]
# basic plot
plt.boxplot(filtered_data)
plt.show()
I'll leave it as an exercise to the reader to extend this to 3 or more dimensions, but you get the idea.
Use seaborn, which is a high-level API for matplotlib
seaborn.boxplot filters NaN under the hood
import seaborn as sns
sns.boxplot(data=data)
1D
2D
NaN is also ignored if plotting from df.plot(kind='box') for pandas, which uses matplotlib as the default plotting backend.
import pandas as pd
df = pd.DataFrame(data)
df.plot(kind='box')
1D
2D

How to convert A[x,y] = z to [ [ x0...xN ], [ y0...yN], [ z0...zN] ]

I have a 2D Numpy array that represents an image, and I want to create a surface plot of image intensity using matplotlib.surface_plot. For this function, I need to convert the 2D array A[x,y] => z into three arrays: [x0,...,xN], [y0,...,yN] and [z0,...,zN]. I can see how to do this conversion element-by-element:
X = []
Y = []
Z = []
for x in range( A.shape[ 0 ] ):
for y in range( A.shape[ 1 ] ):
X.append( x )
Y.append( y )
Z.append( A[x,y] )
but I'm wondering whether there is a more Pythonic way to do this?
a very simple way to do this could be to basically use the code shown in the matplotlib example. assuming x and y representing the sizes of the two dims in your image array A, you could do
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
# generate some input data that looks nice on a color map:
A = np.mgrid[0:10:0.1,0:10:0.1][0]
X = np.arange(0, A.shape[0], 1)
Y = np.arange(0, A.shape[1], 1)
X, Y = np.meshgrid(X, Y)
fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, A, cmap='viridis',
linewidth=0, antialiased=False)
gives
You possibly don't need to construct the actual grid, because some pyplot functions accept 1d arrays for x and y, implying that a grid is to be constructed. It seems that Axes3D.plot_surface (which I presume you meant) does need 2d arrays as input, though.
So to get your grid the easiest way is using np.indices to get the indices corresponding to your array:
>>> import numpy as np
...
... # dummy data
... A = np.random.random((3,4)) # randoms of shape (3,4)
...
... # get indices
... x,y = np.indices(A.shape) # both arrays have shape (3,4)
...
... # prove that the indices correspond to the values of A
... print(all(A[i,j] == A[x[i,j], y[i,j]] for i in x.ravel() for j in y.ravel()))
True
The resulting arrays all have the same shape as A, which should be correct for most use cases. If for any reason you really need a flattened 1d array, you should use x.ravel() etc. to get a flattened view of the same 2d array.
I should note though that the standard way to visualize images (due to the short-wavelength variation of the data) is pyplot.imshow or pyplot.pcolormesh which can give you pixel-perfect visualization, albeit in two dimensions.
We agree X, Y and Z have different sizes (N for X and Y and N^2 for Z) ? If yes:
X looks not correct (you add several times the same values)
something like:
X = list(range(A.shape[0])
Y = list(range(A.shape[1])
Z = [A[x,y] for x in X for y in Y]

Use a more accurate array of x values to generate line of best fit in matplotlib?

I am currently stuck on a problem on which I am required to generate a curve of best fit which I am required to use a more precise x array from 250 to 100 in steps of 10. Here is my code below so far..
import numpy as np
from numpy import polyfit, polyval
import matplotlib.pyplot as plt
x = [250,300,350,400,450,500,550,600,700,750,800,900,1000]
x = np.array(x)
y = [0.791, 0.846, 0.895, 0.939, 0.978, 1.014, 1.046, 1.075, 1.102, 1.148, 1.169, 1.204, 1.234]
y= np.array(y)
r = polyfit(x,y,3)
fit = polyval(r, x)
plt.plot(x, fit, 'b')
plt.plot(x,y, color = 'r', marker = 'x')
plt.show()
If I understand correctly, you are trying to create an array of numbers from a to b by steps of c.
With pure python you can use:
list(range(a, b, c)) #in your case list(range(250, 1000, 10))
Or, since you are using numpy you can directly make the numpy array:
np.arange(a, b, c)
To create an array in steps you can use numpy.arange([start,] stop[, step]):
import numpy as np
x = np.arange(250,1000,10)
To generate values from 250-1000, use range(start, stop, step):
x = range(250,1001,10)
x = np.array(x)

Scatter plot with logical indexing

I have a 100x2 array D and a 100x1 array c (with entries +/- 1) I'm trying to make a scatter plot of the columns in D corresponding to c = 1.
I tried something like this: plt.scatter(D[0][c==1],D[1][c==1]) but it throws up IndexError: too many indices for array
I'm aware that I've use list comprehension or something of that sort. I'm fairly new to Python and hence struggling with the format.
Thanks a lot.
Concept
You can use np.where to select only rows from D that are 1 in your array C:
D = np.array([[0.25, 0.25], [0.75, 0.75]])
C = np.array([1, 0])
Using np.where, we can select only rows that are 1 in C:
>>> D[np.where(C==1)]
array([[0.25, 0.25]])
Example On your actual data:
D = np.random.randn(100, 2)
C = np.random.randint(0, 2, (100, 1))
valid = D[np.where(C.ravel()==1)]
import matplotlib.pyplot as plt
plt.scatter(valid[:, 0], valid[:, 1])
Output:
You can use numpy for this (assuming you have two numpy arrays, otherwise you can convert them into numpy arrays):
import numpy as np
c_ones = np.where(c == 1) # Finds all indices where c == 1
d_0 = D[0][c_ones]
d_1 = D[1][c_ones]
Then you can plot d_0, d_1 as normal.
For converting your lists if needed,
C_np = np.asarray(c)
D_np = np.asarray(D)
And then perform np.where on C_np as shown above.
Would this solve your issue?

Categories