Extract top 5 values from a dataframe to plot - python

I have the following data:
Date
01/23/2020 55
03/01/2020 44
02/24/2020 39
03/12/2020 39
01/24/2020 39
02/05/2020 38
03/17/2020 37
03/16/2020 37
03/19/2020 37
03/14/2020 35
from df.groupby(['Date']).count()['NN'].sort_values(ascending=False).head(10)
I would like to extract the top 5 Dates in order to automatically include in this code:
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
fig, ax = plt.subplots(figsize=(15,7))
df.groupby(['Date']).count()['NN'].plot(ax=ax)
xcoords = ["01/23/2020", "03/01/2020", "02/24/2020","03/12/2020","01/24/2020"]
for xc in xcoords:
plt.axvline(x=xc, color='r')
At the moment the vertical lines are added based on the values that I change manually in xcoords. I would like to substitute there the top 5 Date values from df.groupby(['Date']).count()['NN'].sort_values(ascending=False).head(10) . How can I do?

Related

How to plot a graph using this data with python?

I want to create time series plot using max, min and avg temperatures from each month of the year.
I would recommend looking into matplotlib to visualize different types of data which can be installed with a quick pip3 install matplotlib.
Here is some starter code you can play around with to get familiar with the library:
# Import the library
import matplotlib.pyplot as plt
# Some sample data to play around with
temps = [30,40,45,50,55,60]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
# Create a figure and plot the data
plt.figure()
plt.plot(temps)
# Add labels to the data points (optional)
for i, point in enumerate(months):
plt.annotate(point, (i, temps[i]))
# Apply some labels
plt.ylabel("Temperature (F)")
plt.title("Temperature Plot")
# Hide the x axis labels
plt.gca().axes.get_xaxis().set_visible(False)
# Show the comlpeted plot
plt.show()
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# sample data
df = pd.DataFrame({'Date':pd.date_range('2010-01-01', '2010-12-31'),
'Temp':np.random.randint(20, 100, 365)})
df.head()
Date Temp
0 2010-01-01 95
1 2010-01-02 20
2 2010-01-03 22
3 2010-01-04 26
4 2010-01-05 93
# group by month and get min, max, mean values for temperature
temp_agg = df.groupby(df.Date.dt.month)['Temp'].agg([min, max, np.mean])
temp_agg.index.name='month'
temp_agg
min max mean
month
1 20 99 50.258065
2 25 98 56.642857
3 22 89 51.225806
4 22 98 60.333333
5 27 99 57.645161
6 21 99 62.000000
7 20 98 67.419355
8 36 98 63.806452
9 22 99 62.166667
10 24 99 63.322581
11 22 97 64.200000
12 20 99 60.870968
# shorthand method of plotting entire dataframe
temp_agg.plot()

Python- compress lower end of y-axis in contourf plot

The issue
I have a contourf plot I made with a pandas dataframe that plots some 2-dimensional value with time on the x-axis and vertical pressure level on the y-axis. The field, time, and pressure data I'm pulling is all from a netCDF file. I can plot it fine, but I'd like to scale the y-axis to better represent the real atmosphere. (The default scaling is linear, but the pressure levels in the file imply a different king of scaling.) Basically, it should look something like the plot below on the y-axis. It's like a log scale, but compressing the bottom part of the axis instead of the top. (I don't know the term for this... like a log scale but inverted?) It doesn't need to be exact.
Working example (written in Jupyter notebook)
#modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker, colors
#data
time = np.arange(0,10)
lev = np.array([900,800,650,400,100])
df = pd.DataFrame(np.arange(50).reshape(5,10),index=lev,columns=time)
df.index.name = 'Level'
print(df)
0 1 2 3 4 5 6 7 8 9
Level
900 0 1 2 3 4 5 6 7 8 9
800 10 11 12 13 14 15 16 17 18 19
650 20 21 22 23 24 25 26 27 28 29
400 30 31 32 33 34 35 36 37 38 39
100 40 41 42 43 44 45 46 47 48 49
#lists for plotting
levtick = np.arange(len(lev))
clevels = np.arange(0,55,5)
#Main plot
fig, ax = plt.subplots(figsize=(10, 5))
im = ax.contourf(df,levels=clevels,cmap='RdBu_r')
#x-axis customization
plt.xticks(time)
ax.set_xticklabels(time)
ax.set_xlabel('Time')
#y-axis customization
plt.yticks(levtick)
ax.set_yticklabels(lev)
ax.set_ylabel('Pressure')
#title and colorbar
ax.set_title('Some mean time series')
cbar = plt.colorbar(im,values=clevels,pad=0.01)
tick_locator = ticker.MaxNLocator(nbins=11)
cbar.locator = tick_locator
cbar.update_ticks()
The Question
How can I scale the y-axis such that values near the bottom (900, 800) are compressed while values near the top (200) are expanded and given more plot space, like in the sample above my code? I tried using ax.set_yscale('function', functions=(forward, inverse)) but didn't understand how it works. I also tried simply ax.set_yscale('log'), but log isn't what I need.
You can use a custom scale transformation with ax.set_yscale('function', functions=(forward, inverse)) as you suggested. From the documentation:
forward and inverse are callables that return the scale transform
and its inverse.
In this case, define in forward() the function you want, such as the inverse of the log function, or a more custom one for your need. Call this function before your y-axis customization.
def forward(x):
return 2**x
def inverse(x):
return np.log2(x)
ax.set_yscale('function', functions=(forward,inverse))

Plotting multiple x axis and multiple y axis in same graph for csv dile using python

I am having a csv file with 10 columns. Every 2 columns alternately have the same number of rows.
All the odd columns represents time, and even columns represents energy.
I want to plot column1, column2 together, column3, column4 together, column5, column6 together, column7, column8 together, column9, column10 together on the same plot.
How can I do this?
For example.
sample.csv
1 99 2 98 1 98 3 99 ...
2 98 3 97 2 97 4 98 ...
3 97 4 96 3 96 5 97 ...
5 95 4 95 6 96 ...
7 95 ...
8 94 ...
Not sure if you want all plots in one image or separately for each pair of columns.
Here is a solution where you can display easily each pair of column using a function.
Modules
import io
import pandas as pd
import matplotlib.pyplot as plt
Data
df = pd.read_csv(io.StringIO("""
1 99 2 98 1 98 3 99 ...
2 98 3 97 2 97 4 98 ...
3 97 4 96 3 96 5 97 ...
5 95 4 95 6 96 ...
7 95 ...
8 94 ...
"""), delim_whitespace=True, header=None, columns=[], engine="python")
Function where you need to put in x, the first column, it adds then as Y-axis the column that is next.
def plotfunction(x):
plt.plot(df.iloc[:,x], df.iloc[:,x+1])
plt.show()
plotfunction(0)
Use for multiple plots the following.
for i in range(4):
plotfunction(i)
Or in nicer subplot.
fig = plt.figure(figsize=(10, 6))
for i,x in zip([1,2,3,4], [0,2,4,6]):
ax = fig.add_subplot(2,2,i)
ax.plot(df.iloc[:,x], df.iloc[:,x+1])
With an artificial input test.csv, like
a,b,c,d
1,50,2,20
2,60,3,40
,,4,60
this code worked for me and produced one image with two graphs that represent columns 1+2 and 3+4.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("test.csv")
for i in range(0,len(df.columns),2):
plt.plot(df.iloc[:,i].values, df.iloc[:,i+1].values)
plt.show()
Edit: Initially, worked only for 4 columns, should work for even more now
2nd edit for additions:
Colors and labels can be specified in the plot command with the label and color or c keyword arguments:
color_list =["blue", "red", "green"]
for i in range(0,len(df.columns),2):
plt.plot(df.iloc[:,i].values, df.iloc[:,i+1].values, label = df.columns[i+1], color = color_list[i//2])
plt.legend()
plt.show()
this works if the labels are given as the top line in the csv file and are included in the dataframe. Alternatively, you can specify a custom list just like I did for the colors. There are more complex but also convenient ways for setting the color, e.g. color maps and sets, but I guess this is the easiest solution. More information and alternative implementations can be found here for labels and here for colors. In general, the matplotlib documentation is very extensive and easy to read.

How can I create a grouped bar chart with Matplotlib or Seaborn from a multi-indexed data frame?

I have a problem regarding how I can plot multi-indexed data in a single bar chart. I started with a DataFrame with three columns (artist, genre and miscl_count) and 195 rows. I then grouped the data by two of the columns, which resulted in the table below. My question is, how can I create a bar plot from this, so that the each group in "miscl_count" are shown as three separate bar plots across all five genres (i.e. a total amount of 3x5 bars)? I would also like the genre to identify what color a bar is assigned.
I know that there is unstacking, but I don't understand how I can get this to work with Matplotlib or Seaborn.
The head of the DataFrame, that I perform the groupby method on looks like this:
print(miscl_df.head())
artist miscl_count genre
0 band1 5 a
1 band2 6 b
2 band3 5 b
3 band4 4 b
4 band5 5 b
5 band6 5 c
miscl_df_group = miscl_df.groupby(['genre', 'miscl_count']).count()
print(miscl_df_group)
After group by, the output looks like this:
artist
miscl_count 4 5 6
genre
a 11 9 9
b 19 13 16
c 13 14 16
d 10 9 12
e 21 14 10
Just to make sure I made myself clear, the output should be shown as a single chart (and not as subplots)!
Working solution to be used on the grouped data:
miscl_df_group.unstack(level='genre').plot(kind='bar')
Alternatively, it can also be used this way:
miscl_df_group.unstack(level='miscl_count').plot(kind='bar')
with seaborn, no need to group the data, this is done under the hood:
import seaborn as sns
sns.barplot(x="artist", y="miscl_count", hue="genre", data=miscl_df)
(change the column names at will, depending on what you want)
# full working example
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame()
df["artist"] = list(map(lambda i: f"band{i}", np.random.randint(1,4,size=(100,))))
df["genre"] = list(map(lambda i: f"genre{i}", np.random.randint(1,6,size=(100,))))
df["count"] = np.random.randint(50,100,size=(100,))
# df
# count genre artist
# 0 97 genre9 band1
# 1 95 genre7 band1
# 2 65 genre3 band2
# 3 81 genre1 band1
# 4 58 genre10 band1
# .. ... ... ...
# 95 61 genre1 band2
# 96 53 genre9 band2
# 97 55 genre9 band1
# 98 94 genre1 band2
# 99 85 genre8 band1
# [100 rows x 3 columns]
sns.barplot(x="artist", y="count", hue="genre", data=df)

Xtick frequency in pandas boxplot

I am using pandas groupby for plotting wind speed Vs direction using a bar and whisker plot. However the xaxis is not readable due to so many wind direction value close to each other.
I have tried the oc_params ax.set_xticks but instead I am having empty x-axis or modified xaxis with different values
The head of my dataframe
Kvit_TIU dir_cat
0 0.064740 14
1 0.057442 15
2 0.056750 15
3 0.069002 17
4 0.068464 17
5 0.067057 17
6 0.071901 12
7 0.050464 5
8 0.066165 1
9 0.073993 27
10 0.090784 34
11 0.121366 33
12 0.087172 34
13 0.066197 30
14 0.073020 17
15 0.071784 16
16 0.081699 17
17 0.088014 14
18 0.076758 14
19 0.078574 14
I used groupby = dir_cat to create a box plot
fig = plt.figure() # create the canvas for plotting
ax1 = plt.subplot(1,1,1)
ax1 = df_KvTr10hz.boxplot(column='Kvit_TIU', by='dir_cat', showfliers=False, showmeans=True)
ax1.set_xticks([30,90, 180,270, 330])
I would like to have the x-axis plotted with a reduced frequency. So that the plot can be readable
ax1 = df_KvTr10hz.dropna().boxplot(column='Kvit_TIU', by='dir_cat', showfliers=False, showmeans=True)
EDIT: Using OP sample dataframe
However, if we substitute with NaNs the Kvit_TIU values for 'dir_cat'>=30

Categories