The issue
I have a contourf plot I made with a pandas dataframe that plots some 2-dimensional value with time on the x-axis and vertical pressure level on the y-axis. The field, time, and pressure data I'm pulling is all from a netCDF file. I can plot it fine, but I'd like to scale the y-axis to better represent the real atmosphere. (The default scaling is linear, but the pressure levels in the file imply a different king of scaling.) Basically, it should look something like the plot below on the y-axis. It's like a log scale, but compressing the bottom part of the axis instead of the top. (I don't know the term for this... like a log scale but inverted?) It doesn't need to be exact.
Working example (written in Jupyter notebook)
#modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker, colors
#data
time = np.arange(0,10)
lev = np.array([900,800,650,400,100])
df = pd.DataFrame(np.arange(50).reshape(5,10),index=lev,columns=time)
df.index.name = 'Level'
print(df)
0 1 2 3 4 5 6 7 8 9
Level
900 0 1 2 3 4 5 6 7 8 9
800 10 11 12 13 14 15 16 17 18 19
650 20 21 22 23 24 25 26 27 28 29
400 30 31 32 33 34 35 36 37 38 39
100 40 41 42 43 44 45 46 47 48 49
#lists for plotting
levtick = np.arange(len(lev))
clevels = np.arange(0,55,5)
#Main plot
fig, ax = plt.subplots(figsize=(10, 5))
im = ax.contourf(df,levels=clevels,cmap='RdBu_r')
#x-axis customization
plt.xticks(time)
ax.set_xticklabels(time)
ax.set_xlabel('Time')
#y-axis customization
plt.yticks(levtick)
ax.set_yticklabels(lev)
ax.set_ylabel('Pressure')
#title and colorbar
ax.set_title('Some mean time series')
cbar = plt.colorbar(im,values=clevels,pad=0.01)
tick_locator = ticker.MaxNLocator(nbins=11)
cbar.locator = tick_locator
cbar.update_ticks()
The Question
How can I scale the y-axis such that values near the bottom (900, 800) are compressed while values near the top (200) are expanded and given more plot space, like in the sample above my code? I tried using ax.set_yscale('function', functions=(forward, inverse)) but didn't understand how it works. I also tried simply ax.set_yscale('log'), but log isn't what I need.
You can use a custom scale transformation with ax.set_yscale('function', functions=(forward, inverse)) as you suggested. From the documentation:
forward and inverse are callables that return the scale transform
and its inverse.
In this case, define in forward() the function you want, such as the inverse of the log function, or a more custom one for your need. Call this function before your y-axis customization.
def forward(x):
return 2**x
def inverse(x):
return np.log2(x)
ax.set_yscale('function', functions=(forward,inverse))
Related
I want to create time series plot using max, min and avg temperatures from each month of the year.
I would recommend looking into matplotlib to visualize different types of data which can be installed with a quick pip3 install matplotlib.
Here is some starter code you can play around with to get familiar with the library:
# Import the library
import matplotlib.pyplot as plt
# Some sample data to play around with
temps = [30,40,45,50,55,60]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
# Create a figure and plot the data
plt.figure()
plt.plot(temps)
# Add labels to the data points (optional)
for i, point in enumerate(months):
plt.annotate(point, (i, temps[i]))
# Apply some labels
plt.ylabel("Temperature (F)")
plt.title("Temperature Plot")
# Hide the x axis labels
plt.gca().axes.get_xaxis().set_visible(False)
# Show the comlpeted plot
plt.show()
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# sample data
df = pd.DataFrame({'Date':pd.date_range('2010-01-01', '2010-12-31'),
'Temp':np.random.randint(20, 100, 365)})
df.head()
Date Temp
0 2010-01-01 95
1 2010-01-02 20
2 2010-01-03 22
3 2010-01-04 26
4 2010-01-05 93
# group by month and get min, max, mean values for temperature
temp_agg = df.groupby(df.Date.dt.month)['Temp'].agg([min, max, np.mean])
temp_agg.index.name='month'
temp_agg
min max mean
month
1 20 99 50.258065
2 25 98 56.642857
3 22 89 51.225806
4 22 98 60.333333
5 27 99 57.645161
6 21 99 62.000000
7 20 98 67.419355
8 36 98 63.806452
9 22 99 62.166667
10 24 99 63.322581
11 22 97 64.200000
12 20 99 60.870968
# shorthand method of plotting entire dataframe
temp_agg.plot()
I am trying to draw a frequency bar plot and a cumulative "ogive" in the same plot. If I draw them separately both are shown OK, but when shown in the same figure, the cumulative graphic is shown shifted. Below the code used.
df = pd.DataFrame({'Correctas': [4,6,5,4,7,2,8,3,5,6,9,6,6,7,5,5,8,10,4,8,3,6,9,5,11,5,12,7,7,5,4,6]});
df['Correctas'].value_counts(sort = False).plot.bar();
df['Correctas'].value_counts(sort = False).cumsum().plot();
plt.show()
The frequency data is
2 1
3 3
4 7
5 14
6 20
7 24
8 27
9 29
10 30
11 31
12 32
So the cumulative shall start from 2 and it starts from 4 on x axis.
image showing the error
This has to do with bar chart plotting categorical x-axis. Here is a quick fix:
df = pd.DataFrame({'Correctas': [4,6,5,4,7,2,8,3,5,6,9,6,6,7,5,5,8,10,4,8,3,6,9,5,11,5,12,7,7,5,4,6]});
df_counts = df['Correctas'].value_counts(sort = False)
df_counts.index = df_counts.index.astype('str')
df_counts.plot.bar(alpha=.8);
df_counts.cumsum().plot(color='k', kind='line');
plt.show();
Output:
I am using pandas groupby for plotting wind speed Vs direction using a bar and whisker plot. However the xaxis is not readable due to so many wind direction value close to each other.
I have tried the oc_params ax.set_xticks but instead I am having empty x-axis or modified xaxis with different values
The head of my dataframe
Kvit_TIU dir_cat
0 0.064740 14
1 0.057442 15
2 0.056750 15
3 0.069002 17
4 0.068464 17
5 0.067057 17
6 0.071901 12
7 0.050464 5
8 0.066165 1
9 0.073993 27
10 0.090784 34
11 0.121366 33
12 0.087172 34
13 0.066197 30
14 0.073020 17
15 0.071784 16
16 0.081699 17
17 0.088014 14
18 0.076758 14
19 0.078574 14
I used groupby = dir_cat to create a box plot
fig = plt.figure() # create the canvas for plotting
ax1 = plt.subplot(1,1,1)
ax1 = df_KvTr10hz.boxplot(column='Kvit_TIU', by='dir_cat', showfliers=False, showmeans=True)
ax1.set_xticks([30,90, 180,270, 330])
I would like to have the x-axis plotted with a reduced frequency. So that the plot can be readable
ax1 = df_KvTr10hz.dropna().boxplot(column='Kvit_TIU', by='dir_cat', showfliers=False, showmeans=True)
EDIT: Using OP sample dataframe
However, if we substitute with NaNs the Kvit_TIU values for 'dir_cat'>=30
How can I make a distplot with seaborn to only have whole numbers?
My data is an array of numbers between 0 and ~18. I would like to plot the distribution of the numbers.
Impressions
0 210
1 1084
2 2559
3 4378
4 5500
5 5436
6 4525
7 3329
8 2078
9 1166
10 586
11 244
12 105
13 51
14 18
15 5
16 3
dtype: int64
Code I'm using:
sns.distplot(Impressions,
# bins=np.arange(Impressions.min(), Impressions.max() + 1),
# kde=False,
axlabel=False,
hist_kws={'edgecolor':'black', 'rwidth': 1})
plt.xticks = range(current.Impressions.min(), current.Impressions.max() + 1, 1)
Plot looks like this:
What I'm expecting:
The xlabels should be whole numbers
Bars should touch each other
The kde line should simply connect the top of the bars. By the looks of it, the current one assumes to have 0s between (x, x + 1), hence why the downward spike (This isn't required, I can turn off kde)
Am I using the correct tool for the job or distplot shouldn't be used for whole numbers?
For your problem can be solved bellow code,
import seaborn as sns # for data visualization
import numpy as np # for numeric computing
import matplotlib.pyplot as plt # for data visualization
arr = np.array([1,2,3,4,5,6,7,8,9])
sns.distplot(arr, bins = arr, kde = False)
plt.xticks(arr)
plt.show()
enter image description here
In this way, you can plot histogram using seaborn sns.distplot() function.
Note: Whatever data you will pass to bins and plt.xticks(). It should be an ascending order.
I want to plot a curve on an image. I would to see the curve only in a certain range. So:
plt.figure()
plt.imshow(img)
plt.plot(x, my_curve)
plt.axis([0, X, Y, 0])
But in this way also the image is showed in that range, but I don't want this. I would like to see the whole image with a portion of the curve. How can apply the axes only on the second plot?
Note that I can't use a slice of the arrays. I am in this situation:
x = [0 0 0 10 10 10 30 30 30 40 40 40]
my_curve = [0 0 0 10 10 10 30 30 30 40 40 40]
Well I need to see the straight line on the image, but only between pixels 25 and 35. If I delete each element out of such range, I obtain only the point (30,30) and I can not represent the straight line.
If your data is sparse, you can interpolate it :
x2=np.linspace(x[0],x[-1],1000)[0:X]
my_curve2=np.interp(x2,x,my_curve)
plt.plot(x2, my_curve2)