Use dataframe column names as labels in pylab.plot - python

I would like to plot the data in a dataframe and have the column headers be the labels. I tried this:
dfm.columns = ['a','b']
plot(dfm.cumsum(), label= dfm.columns.values)
legend(loc='upper left')
But got this:
Instead of both lines being labeled ['a','b'], I'd like the blue line to be a and the green to be b using pylab

I think it's the way you have your data set up in part of the code you're not showing.
Here's an example, I used df.plot() in this case.
import pandas as pd
import random
import matplotlib.pyplot as plt
x = [random.randint(10,20) for r in range(100)]
y = [random.randint(0,10) for r in range(100)]
df = pd.DataFrame([x,y]).T #T for transpose
df.columns=['a','b']
df.plot(kind='line')
plt.legend(loc='upper left')
plt.show()
Edit
pylab version
import pandas as pd
import random
import matplotlib.pylab as plt
x = [random.randint(10,20) for r in range(100)]
y = [random.randint(0,10) for r in range(100)]
df = pd.DataFrame([x,y]).T
plt.plot(df)
plt.legend(['a','b'],loc='upper left')
plt.show()

Related

How to plot Multiline Graphs Via Seaborn library in Python?

I have written a code that looks like this:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
exp1= sns.lineplot(data=df1)
plt.savefig('exp1.png')
exp1_smooth= sns.lmplot(x='Size', y='Time', data=df, ci=None, order=4, truncate=False)
plt.savefig('exp1_smooth.png')
That gives me Graph_1:
The Size = x- axis is a constant line but as you can see in my code it varies from (10,100,1000).
How does this produces a constant line? I want to produce a multiline graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2).
Also I wanted to plot a smooth graph of the same graph I am getting right now but it gives me error. What needs to be done to achieve a smooth multi-line graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2)?
I think it not the issue, the line represents for size looks like constant but it NOT.
Can see that values of size in range 10-1000 while the minimum division of y-axis is 20,000 (20 times bigger), make it look like a horizontal line on your graph.
You can try with a bigger values to see the slope clearly.
If you want 'size` as x-axis, you can try below example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
fig = plt.figure()
fig = sns.lineplot(data=df1, x='Size',y='Encrypt_Time' )
fig = sns.lineplot(data=df1, x='Size',y='Decrypt_Time' )

Consistent color argument between matplotlib scatter to matplotlib plot?

I'm hoping to use matplotlib to plot inter-annual variation of monthly data (below). By passing c=ds['time.year'] in plt.scatter(), I achieve the desired outcome. However, I would like to be able to connect the points with an analogous plt.plot() call. Is this possible?
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
# create y data
y = []
for yr in range(10):
for mo in range(12):
y.append(yr+mo+(yr*mo)**2)
# create datetime vector
t = pd.date_range(start='1/1/2010', periods=120, freq='M')
# combine in DataArray
ds = xr.DataArray(y, coords={'time':t}, dims=['time'])
# scatter plot with color
im = plt.scatter(ds['time.month'], ds.values, c=ds['time.year'])
plt.colorbar(im)
Output:
I have tried the following, but it does not work:
plt.plot(ds['time.month'], ds.values, c=ds['time.year'])
You can create a norm mapping the range of years to the range of colors. The norm together with the used colormap, can server as input for a ScalarMapple to create an accompanying colorbar. With the default 'viridis' colormap the code could look like:
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
import pandas as pd
import xarray as xr
y = []
for yr in range(10):
for mo in range(12):
y.append(yr + mo + (yr * mo) ** 2)
t = pd.date_range(start='1/1/2010', periods=120, freq='M')
ds = xr.DataArray(y, coords={'time': t}, dims=['time'])
norm = plt.Normalize(ds['time.year'].min(), ds['time.year'].max())
cmap = plt.cm.get_cmap('viridis')
for year in range(int(ds['time.year'].min()), int(ds['time.year'].max()) + 1):
plt.plot(ds['time.month'][ds['time.year'] == year],
ds.values[ds['time.year'] == year],
ls='-', marker='o', color=cmap(norm(year)))
plt.colorbar(ScalarMappable(cmap=cmap, norm=norm))
plt.xticks(range(1, 13))
plt.show()

How to fill color by groups in histogram using Matplotlib?

I know how to do this in R and have provided a code for it below. I want to know how can I do something similar to the below mentioned in Python Matplotlib or using any other library
library(ggplot2)
ggplot(dia[1:768,], aes(x = Glucose, fill = Outcome)) +
geom_bar() +
ggtitle("Glucose") +
xlab("Glucose") +
ylab("Total Count") +
labs(fill = "Outcome")
Using pandas you can pivot the dataframe and directly plot it.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# dataframe with two columns in "long form"
g = np.array([np.random.normal(5, 10, 500),
np.random.rayleigh(10, size=500)]).flatten()
df = pd.DataFrame({'Glucose': g, 'Outcome': np.repeat([0,1],500)})
# pivot and plot
df.pivot(columns="Outcome", values="Glucose").plot.hist(bins=100)
plt.show()
Please consider the following example, which uses seaborn 0.11.1.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# generate random data
data = {'Glucose': np.random.normal(5, 10, 100),
'Outcome': np.random.randint(2, size=100)}
df = pd.DataFrame(data)
# plot
fig, ax = plt.subplots(figsize=(10, 10))
sns.histplot(data=df, x='Glucose', hue='Outcome', stat='count', edgecolor=None)
ax.set_title('Glucose')

How to annotate regression lines in seaborn lmplot?

I have plotted two variables against each other in Seaborn and used the hue keyword to separate the variables into two categories.
I want to annotate each regression line with the coefficient of determination. This question only describes how to show the labels for a line with using the legend.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_excel(open('intubation data.xlsx', 'rb'), sheet_name='Data
(pretest)', header=1, na_values='x')
vars_of_interest = ['PGY','Time (sec)','Aspirate (cc)']
df['Resident'] = df['PGY'] < 4
lm = sns.lmplot(x=vars_of_interest[1], y=vars_of_interest[2],
data=df, hue='Resident', robust=True, truncate=True,
line_kws={'label':"bob"})
Using your code as it is:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_excel(open('intubation data.xlsx', 'rb'), sheet_name='Data
(pretest)', header=1, na_values='x')
vars_of_interest = ['PGY','Time (sec)','Aspirate (cc)']
df['Resident'] = df['PGY'] < 4
p = sns.lmplot(x=vars_of_interest[1], y=vars_of_interest[2],
data=df, hue='Resident', robust=True, truncate=True,
line_kws={'label':"bob"}, legend=True)
# assuming you have 2 groups
ax = p.axes[0, 0]
ax.legend()
leg = ax.get_legend()
L_labels = leg.get_texts()
# assuming you computed r_squared which is the coefficient of determination somewhere else
label_line_1 = r'$R^2:{0:.2f}$'.format(0.3)
label_line_2 = r'$R^2:{0:.2f}$'.format(0.21)
L_labels[0].set_text(label_line_1)
L_labels[1].set_text(label_line_2)
Voila:
Graph created with my own random data since OP hasn't provided any.

Problem when using datetime data to draw graphic

I want to draw a graphic with using datas in datetime format as xaxis, but the process lasts very, very, extremly long, over 30 mins there is still no graphic. But once I apply datas in another column, the graphic will occur very soon. All the datas' formats are 'list'.
I'm confused about that, since they are all in the same format, why I can't draw the graphic out using the datetime formate as xaxis??
here is my code, I cherish all your time and help!
from matplotlib import pyplot as plt
import csv
names = []
x = []
y = []
names=[]
with open('all.csv','r') as csvfile: #this csv file contains over 16000 datas
plots= csv.reader(csvfile,delimiter=',')
for row in plots:
x.append(row[1]) #row1 is the datetime format data
y.append(row[2])
print(x,y)
plt.plot(x,y)
plt.show()
Lines of my csv file look something like:
2016/05/02 10:47:45,14.1,20.1,N.C.,170.7,518.3,-1259,-12.61,375.8,44.92,13.76,92.74,132.6,38.86,165.3,170.9,311.5,252.3,501.2,447.2,378.4,35.48,7.868,181.2,
I want the first column as xaxis and the following colums as yaxis...
and the y axis doesn't change, no matter how I change the y axis limit.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv('all.csv')
x = df.iloc[:,1]
y = df.iloc[:,3]
x = pd.to_datetime(x)
plt.figure(num=3, figsize=(15, 5))
plt.plot(x,y)
my_y_ticks = np.arange(0, 40, 10)
plt.xticks(rotation = 90)
plt.show()
I havent understood exactly what you mean with all the datas' format are list, but I think you could use something like this:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('all.csv')
x = df.iloc[:,0]
y = df.iloc[:,1]
x = pd.to_datetime(x)
plt.plot(x,y)
plt.show()
Maybe showing some rows can be useful
EDIT:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_csv('all.csv')
x = df.iloc[:,0]
y = df.iloc[:,1]
x = pd.to_datetime(x, format="%Y/%m/%d %H/%M/%S") #if the format is different, change here
fig, ax = plt.subplots()
ax.plot(x, y)
xfmt = mdates.DateFormatter("%Y/%m/%d %H:%M:%S")
ax.xaxis.set_major_formatter(xfmt)
plt.xticks(rotation=70)
plt.show()

Categories