I am trying to use ggplot in Python for the first time and the semantics are completely unobvious to me.
I have a pandas dataframe with two columns: date and entries_sum. What I would like to do is plot a bar plot with the date column as each entry on the x-axis and entries_sum as the respective heights.
I cannot figure out how to do this with the ggplot API. Am I formatting my data wrong for this?
How about:
ggplot(aes(x='date', y='entries_sum'), data=data) + geom_bar(stat='identity')
Related
I would like to customize axis titles in a Seaborn pairplot, e.g., I would like to use the formatted versions (r'$I_{bs}$') and (r'$k_{HL}$') instead of "Ibs" and "kHL" which are the column titles in the dataframe I use to generate the plot. How can I achieve this?
Problem solved. I named columns using formatted strings in defining the dataframe I used to create the pairplot:
df_red=pd.DataFrame(DataMatrix_red,columns=['shape',r'$d_n$',r'$\nu_s$',r'$D_s$[%]',r'$I_{{bs}}$',r'$k_{{VssL}}$',r'$k_{{HL}}$'])
I created a graph using DOGE crypto data:
import pandas as pd
import csv
df2 = pd.read_csv("https://raw.githubusercontent.com/peoplecure/pandoras-box/master/doge.csv")
plt.plot(df2['begins_at'], df2['open_price'])
plt.show()
Above graph looks fine. But, when I try to create a graph using another method with the exact same data, the graph looks totally off
from pandas import DataFrame
df = DataFrame (DOGE_data)
plt.plot(df['begins_at'], df['open_price'])
plt.show()
Regrettably, I don't have a way to share the data in the second method. However, data used in the first graph was created by df. I was hoping if anyone has any idea what may be going on here.
The messed up y-axis could be the hint: Usually, with numerical data, there would be 4-12 y-axis label ticks and markers. Then, usually, with non-numerical data, there is one tick for each "category".
Check the data type of y-data in the second dataset: df['open_price'].dtype
Here is a sample of the data I'm working with WellAnalyticalData I'd like to loop through each well name and create a time series chart for each parameter with sample date on the x-axis and the value on the y-axis. I don't think I want subplots, I'm just looking for individual plots of each analyte for each well. I've used pandas to try grouping by well name and then attempting to plot, but that doesn't seem to be the way to go. I'm fairly new to python and I think I'm also having trouble figuring out how to construct the loop statement. I'm running python 3.x and am using the matplotlib library to generate the plots.
so if I understand your question correctly you want one plot for each combination of Well and Parameter. No subplots, just a new plot for each combination. Each plot should have SampleDate on the x-axis and Value on the y-axis. I've written a loop here that does just that, although you'll see that since in your data has just one date per well per parameter, the plots are just a single dot.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({'WellName':['A','A','A','A','B','B','C','C','C'],
'SampleDate':['2018-02-15','2018-03-31','2018-06-07','2018-11-14','2018-02-15','2018-11-14','2018-02-15','2018-03-31','2018-11-14'],
'Parameter':['Arsenic','Lead','Iron','Magnesium','Arsenic','Iron','Arsenic','Lead','Magnesium'],
'Value':[0.2,1.6,0.05,3,0.3,0.79,0.3,2.7,2.8]
})
for well in df.WellName.unique():
temp1 = df[df.WellName==well]
for param in temp1.Parameter.unique():
fig = plt.figure()
temp2 = temp1[temp1.Parameter==param]
plt.scatter(temp2.SampleDate,temp2.Value)
plt.title('Well {} and Parameter {}'.format(well,param))
I am doing a barplot out of a dataframe with a 15min datetimeindex over a couple of years.
Using this code:
df_Vol.resample(
'A',how='sum'
).plot.bar(
title='Sums per year',
style='ggplot',
alpha=0.8
)
Unfortunately the ticks on the X-axis are now shown with the full timestamp like this: 2009-12-31 00:00:00.
I would prefer to Keep the code for plotting short, but I couldn't find an easy way to format the timestamp simply to the year (2009...2016) for the plot.
Can someone help on this?
As it does not seem to be possible to Format the date within the Pandas df.plot(), I have decided to create a new dataframe and plot from it.
The solution below worked for me:
df_Vol_new = df_Vol.resample('A',how='sum')
df_Vol_new.index = df_Vol_new.index.format(formatter=lambda x: x.strftime('%Y'))
ax2 =df_Vol_new.plot.bar(title='Sums per year',stacked=True, style='ggplot', alpha=0.8)
I figured an alternative (better, at least to me) way is to add the following to df_Vol_new.plot() command:
plt.legend(df_Vol_new.index.to_period('A'))
This way you would reserve df_Vol_new.index datetime format while getting better plots at the same time.
I am trying to plot two time series of data what are not in sequential date. So when I plot them they look weird. I don't know how I can fix it here is my code and figure. Thank you.
date=read_myfile['Date']
x=[dt.datetime.strptime(d,'%m/%d/%Y').date() for d in date]
y=read_myfile['Observed']
y1=read_myfile['Simulated']
plt.plot(x,y,color='blue');
plt.plot(x,y1,color='red');
plt.gcf().autofmt_xdate()
Two time series of data.