I am new to Python and Pandas so any help is much appreciated.
I am trying to make the graph below interactive, it would also be good to be able to choose which attributes show rather than them all.
Here is what I have so far
df.set_index('Current Year').plot(rot=45)
plt.xlabel("Year",size=16)
plt.ylabel("",size=16)
plt.title("Current year time series plot", size=18)
I know that i need to import the following import plotly.graph_objects as go but no idea how to implement this with the above time series graph. Thanks
EDIT
I am getting this error when trying to enter my plotted data.
All you need is:
df.plot()
As long as you import the correct libraries and set plotly as the plotting backend for pandas like this:
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame({'year':['2020','2021','2022'], 'value':[1,3,2]}).set_index('year')
fig = df.plot(title = "Current year time series plot")
fig.show()
Plot:
Complete code:
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame({'year':['2020','2021','2022'], 'value':[1,3,2]}).set_index('year')
fig = df.plot(title = "Current year time series plot")
fig.show()
Related
I am trying to build slice by slice a heatmap. How can I update the data present in the graph without generating a full figure? The following code each time produces a new plot. I tried to use fig.update_traces() but it didn’t work.
What am I missing?
Thanks
import plotly.express as px
import pandas as pd
import time
df = pd.DataFrame(np.random.rand(1,100))
for i in range(0,10):
df = df.append(pd.DataFrame(np.random.rand(1,100)), ignore_index = True)
time.sleep(1)
fig = px.imshow(df)
fig.show()
so I've been experimenting with plotly and trying to get plotting multiple traces. I wrote the following code which plots two traces on the same graph :
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
data = pd.read_csv("data.csv")
headers = pd.read_csv("data.csv", index_col=0, nrows=0).columns.tolist()
fig = go.Figure()
fig = px.line(data, x="DateTime", y=[headers[0], headers[1]])
fig.show()
In this example the first and second headers are plotted as traces on the graph. I was wondering if there was a way other than y=[headers[n],headers[n+1]]... to get all the lines drawn on? I tried just using the headers array without an index, but it gives a ValueError
Plotly Express cannot process wide-form data with columns of different type.
So, is there a plotly-specific way to make this more efficient & readable than just writing every index in the plot header definition, or can it be done with standard python?
EDIT: the actual data sample is a csv providing int values with a header and date :
DateTime X Y Z
01-JAN-2018,5,6,7...
02-JAN-2018,7,8,9
if your sample data is what is in your CSV, it's a simple case of defining y as the numeric columns
import io
import pandas as pd
import plotly.express as px
headers = pd.read_csv(io.StringIO("""DateTime,X,Y,Z
01-JAN-2018,5,6,7
02-JAN-2018,7,8,9
"""))
px.line(headers, x="DateTime", y=headers.select_dtypes("number").columns)
I've searched for days and didn't find an answer. How can I plot a time series data in Dash Plotly as a linegraph with selectable lines?
My data (pandas dataframe) describes GDP of different countrys. Index is country, column is years.
I don't find a solution to pass the data to Dash Plotly linegraph. What are my x and y values?
fig = px.line(df, x=?, y=?)
By the looks of it, the solution in your example should be:
fig = px.line(df, x=df.index, y = df.columns)
Plot 1 - plot by columns as they appear in your dataset
From here, if you'd like to display countries in the legend and have time on the x-axis, you can just add df = df.T into the mix and get:
Plot 2 - transposed dataframe to show time on the x-axis
Details
There's a multitude of possibilites when it comes to plotting time series with plotly. Your example displays a dataset of a wide format. With the latest versions, plotly handles both datasets of long and wide format elegantly straight out of the box. If you need specifics of long and wide data in plotly you can also take a closer look here.
The code snippet below uses the approach described above, but in order for this to work for you exactly the same way, your countries will have to be set as the dataframe row index. But you've stated that they are, so give it a try and let me know how it works out for you. And one more thing: you can freely select which traces to display by clicking the years in the plot legend. The figure produced by the snippet below can also be directly implemented in Dash by following the steps under the section What About Dash? here.
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.io as pio
# sample dataframe of a wide format
np.random.seed(5); cols = ['Canada', 'France', 'Germany']
X = np.random.randn(6,len(cols))
df=pd.DataFrame(X, columns=cols)
df.iloc[0]=0;df=df.cumsum()
df['Year'] = pd.date_range('2020', freq='Y', periods=len(df)).year.astype(str)
df = df.T
df.columns = df.iloc[-1]
df = df.head(-1)
df.index.name = 'Country'
# Want time on the x-axis? ###
# just include:
# df = df.T
##############################
# plotly
fig = px.line(df, x=df.index, y = df.columns)
fig.update_layout(template="plotly_dark")
fig.show()
I am new to analytics,python and machine learning and I am working on Time forecasting. Using the following code I am getting the value for train and test data but the graph is plotted blank.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.tsa.api as ExponentialSmoothing
#Importing data
df = pd.read_csv('international-airline-passengers - Copy.csv')
#Printing head
print(df.head())
#Printing tail
print(df.tail())
df = pd.read_csv('international-airline-passengers - Copy.csv', nrows = 11856)
#Creating train and test set
#Index 10392 marks the end of October 2013
train=df[0:20]
test=df[20:]
#Aggregating the dataset at daily level
df.Timestamp = pd.to_datetime(df.Month,format='%m/%d/%Y %H:%M')
df.index = df.Timestamp
df = df.resample('D').mean()
train.Timestamp = pd.to_datetime(train.Month,format='%m/%d/%Y %H:%M')
print('1')
print(train.Timestamp)
train.index = train.Timestamp
train = train.resample('D').mean()
test.Timestamp = pd.to_datetime(test.Month,format='%m/%d/%Y %H:%M')
test.index = test.Timestamp
test = test.resample('D').mean()
train.Count.plot(figsize=(15,8), title= 'Result', fontsize=14)
test.Count.plot(figsize=(15,8), title= 'Result', fontsize=14)
plt.show()
Not able to understand the reason for getting the graph blank even when train and test data is having value.
Thanks in advance.
I think I found the issue here. The thing is you are using train.Count.plot here, while the value of "plt" is still empty.If you go through the documentation of matplotlib(link down below), you will find that you need to store some value in plt first and here since plt is empty, it is giving back empty plot.
Basically you are not plotting anything and just showing up the blank plot.
Eg: plt.subplots(values) or plt.scatter(values), or any of its function depending on requirements.Hope this helps.
https://matplotlib.org/
import holoviews as hv
import pandas as pd
import numpy as np
data=pd.read_csv("C:/Users/Nisarg.Bhatt/Documents/data.csv", engine="python")
train=data.groupby(["versionCreated"])["Polarity Score"].mean()
table=hv.Table(train)
print(table)
bar=hv.Bars(table).opts(plot=dict(width=1500))
renderer = hv.renderer('bokeh')
app = renderer.app(bar)
print(app)
from bokeh.server.server import Server
server = Server({'/': app}, port=0)
server.start()
server.show("/")
This is done by using Holoviews, it is used for visualisation purpose.If you are using for a professional application, you should definitely try this. Here the versionCreated is date and Polarity is similar to count. Try this
OR, if you want to stick to matplotlib try this:
fig, ax = plt.subplots(figsize=(16,9))
ax.plot(msft.index, msft, label='MSFT')
ax.plot(short_rolling_msft.index, short_rolling_msft, label='20 days rolling')
ax.plot(long_rolling_msft.index, long_rolling_msft, label='100 days rolling')
ax.set_xlabel('Date')
ax.set_ylabel('Adjusted closing price ($)')
ax.legend()
Also this can be used, if you want to stick with matplotlib
I have dataframe:
payout_df.head(10)
What would be the easiest, smartest and fastest way to replicate the following excel plot?
I've tried different approaches, but couldn't get everything into place.
Thanks
If you just want a stacked bar chart, then one way is to use a loop to plot each column in the dataframe and just keep track of the cumulative sum, which you then pass as the bottom argument of pyplot.bar
import pandas as pd
import matplotlib.pyplot as plt
# If it's not already a datetime
payout_df['payout'] = pd.to_datetime(payout_df.payout)
cumval=0
fig = plt.figure(figsize=(12,8))
for col in payout_df.columns[~payout_df.columns.isin(['payout'])]:
plt.bar(payout_df.payout, payout_df[col], bottom=cumval, label=col)
cumval = cumval+payout_df[col]
_ = plt.xticks(rotation=30)
_ = plt.legend(fontsize=18)
Besides the lack of data, I think the following code will produce the desired graph
import pandas as pd
import matplotlib.pyplot as plt
df.payout = pd.to_datetime(df.payout)
grouped = df.groupby(pd.Grouper(key='payout', freq='M')).sum()
grouped.plot(x=grouped.index.year, kind='bar', stacked=True)
plt.show()
I don't know how to reproduce this fancy x-axis style. Also, your payout column must be a datetime, otherwise pd.Grouper won't work (available frequencies).