How to Sort Values on the Graph

How to Sort Values on the Graph - python

I'm using python to analyze 911 Call for Service dataset. I'm showing data monthwise. Data is not sorted Date Wise.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('911_calls_for_service.csv')
r, c = df.shape
df['callDateTime'] = pd.to_datetime(df['callDateTime'])
df['MonthYear'] = df['callDateTime'].apply(lambda time: str(time.year) + '-' + str(time.month))
df['MonthYear'].value_counts().plot()
print(df['MonthYear'].value_counts())
plt.tight_layout()
plt.show()

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('911_calls_for_service.csv')
df['callDateTime'] = pd.to_datetime(df['callDateTime'])
ax = df['callDateTime'].groupby([df["callDateTime"].dt.year, df["callDateTime"].dt.month]).count().plot()
ax.set_xlabel("Date")
ax.set_ylabel("Frequency")
plt.tight_layout()
plt.show()

Related

How to fill color by groups in histogram using Matplotlib?

I know how to do this in R and have provided a code for it below. I want to know how can I do something similar to the below mentioned in Python Matplotlib or using any other library
library(ggplot2)
ggplot(dia[1:768,], aes(x = Glucose, fill = Outcome)) +
geom_bar() +
ggtitle("Glucose") +
xlab("Glucose") +
ylab("Total Count") +
labs(fill = "Outcome")

Using pandas you can pivot the dataframe and directly plot it.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# dataframe with two columns in "long form"
g = np.array([np.random.normal(5, 10, 500),
np.random.rayleigh(10, size=500)]).flatten()
df = pd.DataFrame({'Glucose': g, 'Outcome': np.repeat([0,1],500)})
# pivot and plot
df.pivot(columns="Outcome", values="Glucose").plot.hist(bins=100)
plt.show()

Please consider the following example, which uses seaborn 0.11.1.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# generate random data
data = {'Glucose': np.random.normal(5, 10, 100),
'Outcome': np.random.randint(2, size=100)}
df = pd.DataFrame(data)
# plot
fig, ax = plt.subplots(figsize=(10, 10))
sns.histplot(data=df, x='Glucose', hue='Outcome', stat='count', edgecolor=None)
ax.set_title('Glucose')

How to annotate regression lines in seaborn lmplot?

I have plotted two variables against each other in Seaborn and used the hue keyword to separate the variables into two categories.
I want to annotate each regression line with the coefficient of determination. This question only describes how to show the labels for a line with using the legend.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_excel(open('intubation data.xlsx', 'rb'), sheet_name='Data
(pretest)', header=1, na_values='x')
vars_of_interest = ['PGY','Time (sec)','Aspirate (cc)']
df['Resident'] = df['PGY'] < 4
lm = sns.lmplot(x=vars_of_interest[1], y=vars_of_interest[2],
data=df, hue='Resident', robust=True, truncate=True,
line_kws={'label':"bob"})

Using your code as it is:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_excel(open('intubation data.xlsx', 'rb'), sheet_name='Data
(pretest)', header=1, na_values='x')
vars_of_interest = ['PGY','Time (sec)','Aspirate (cc)']
df['Resident'] = df['PGY'] < 4
p = sns.lmplot(x=vars_of_interest[1], y=vars_of_interest[2],
data=df, hue='Resident', robust=True, truncate=True,
line_kws={'label':"bob"}, legend=True)
# assuming you have 2 groups
ax = p.axes[0, 0]
ax.legend()
leg = ax.get_legend()
L_labels = leg.get_texts()
# assuming you computed r_squared which is the coefficient of determination somewhere else
label_line_1 = r'$R^2:{0:.2f}$'.format(0.3)
label_line_2 = r'$R^2:{0:.2f}$'.format(0.21)
L_labels[0].set_text(label_line_1)
L_labels[1].set_text(label_line_2)
Voila:
Graph created with my own random data since OP hasn't provided any.

Display datetime as day for xtick

I have the following sample codes:
import pandas as pd
import matplotlib.pyplot as plt
dates = ['01/02/2007 00:02:00','01/02/2007 00:04:00','02/02/2007
00:02:00','02/02/2007 00:04:00']
x = pd.to_datetime(dates, format='%d/%m/%Y %H:%M:%S')
y = [0.32,0.33,0.32,0.34]
plt.plot(x,y)
I would like to have the xtick to be just 'Thu' for 01/02/2007 and 'Fri' for 02/02/2007. What is the best possible way to do that?

One possible solution is to change the X-axis format:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
dates = ['01/02/2007 00:02:00','01/02/2007 00:04:00','02/02/2007 00:02:00','02/02/2007 00:04:00']
x = pd.to_datetime(dates, format='%d/%m/%Y %H:%M:%S')
y = [0.32,0.33,0.32,0.34]
fig, ax = plt.subplots()
ax.plot(x,y)
yearsFmt = mdates.DateFormatter('%a')
ax.xaxis.set_major_formatter(yearsFmt)
plt.show()

The key idea is to get the dayofweek from the DateTime object, like: x.dayofweek. This returns the numeric dayofweek. We can easily get the corresponding name np.array(['Mon','Tue','Wed','Thu','Fri','Sat', 'Sun'])[x.dayofweek]
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
dates = ['01/02/2007 00:02:00','01/02/2007 00:04:00','02/02/2007 00:02:00','02/02/2007 00:04:00']
x = pd.to_datetime(dates, format='%d/%m/%Y %H:%M:%S')
x_d = np.array(['Mon','Tue','Wed','Thu','Fri','Sat', 'Sun'])[x.dayofweek]
y = [0.32,0.33,0.32,0.34]
ser = pd.Series(y, index=x_d)
ser.plot()

Modify major and minor xticks for dates

I am plotting two pandas series. The index is a date (1-1 to 12-31)
s1.plot()
s2.plot()
pd.plot() interprets the dates and assigns them to axis values as such:
I would like to modify the major ticks to be the 1st of every month and minor ticks to be the days in between
This works:
%matplotlib notebook
import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%m-%d')
s2014max = df2014.groupby(['Date'], sort=True)['Data_Value'].max()/10
s2014min = df2014.groupby(['Date'], sort=True)['Data_Value'].min()/10
#remove the leap day and convert to datetime for plotting
s2014min = s2014min[s2014min.index != '02-29']
s2014max = s2014max[s2014max.index != '02-29']
dateslist = s2014min.index.tolist()
dates = [pd.datetime.strptime(date, '%m-%d').date() for date in dateslist]
plt.figure()
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
monthFmt = mdates.DateFormatter('%b')
dayFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(monthFmt)
ax.xaxis.set_minor_formatter(dayFmt)
ax.tick_params(direction='out', pad=15)
s2014min.plot()
s2014max.plot()
This results in no ticks:

A possible way is to use matplotlib for plotting the dates instead of pandas.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
dates = pd.date_range("2016-01-01", "2016-12-31" )
y = np.cumsum(np.random.normal(size=len(dates)))
df = pd.DataFrame({"Dates" : dates, "y": y})
fig, ax = plt.subplots()
ax.plot_date(df["Dates"], df.y, '-')
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
monthFmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_formatter(monthFmt)
plt.show()

You were so close! All you needed to do was add the formatters similar to how the other answer did it. Here is a working sample similar to your code (note I did mine in ipython notebook hence the %matplotlib inline).
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime, timedelta
from random import random
y = [random() for i in range(25)]
x = [(datetime.now() - timedelta(days=i)) for i in range(25)]
x.reverse()
s = pd.Series(y, index=x) # NOTE: S, not df, since you said you were using series
# format the ticks
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
monthFmt = mdates.DateFormatter('%b')
dayFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(monthFmt) # This is what you needed
ax.xaxis.set_minor_formatter(dayFmt) # This is what you needed
ax.tick_params(direction='out', pad=15)
# format the coords message box
s.plot(figsize=(10,3))
which will look like this:

How can i read excel file and plot daily rainfall time series

How can I convert 08:45 time to 0845 so that I can plot time series rain fall
import numpy as np
import csv as csv
import pandas as pd
import datetime
import time
import matplotlib.pylab as plt
from mpl_toolkits.mplot3d import Axes3D
filename ='/home/yogesh/RTDAS 20 St.Data/Ambeghar_Rainfall.xls'
viewdata = pd.read_excel(filename, delimiter = ',',skiprows = 6,usecols=([3,4,5,6]))
index_col = 'Date'
fig1 = plt.figure(figsize=(25, 15))
ax1 = fig1.add_subplot(111)
plt.plot(viewdata["Today's Rain\n(mm)"])
plt.title("Rain Rate")
plt.show()
output:

import numpy as np
import pandas as pd
import matplotlib.pylab as plt
filename ='/home/yogesh/RTDAS 20 St.Data/Ambeghar_Rainfall.xls'
# The standard variable name for a DataFrame is df.
df = pd.read_excel(filename, delimiter = ',', skiprows=6,usecols=([3,4,5,6]))
#I'm not sure if this is used later, or if you're trying to set index_col as your column name.
index_col = 'Date'
df = df.set_index(index_col)
# If you're only looking to plot a single column this is often easier:
df["Today's Rain\n(mm)"].plot(figsize=(25, 15))
plt.title("Rain Rate")
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Sort Values on the Graph - python

Related

How to fill color by groups in histogram using Matplotlib?

How to annotate regression lines in seaborn lmplot?

Display datetime as day for xtick

Modify major and minor xticks for dates

How can i read excel file and plot daily rainfall time series

Categories

Resources