Fastest way to parse multiple header names to Plotly (Python - python

so I've been experimenting with plotly and trying to get plotting multiple traces. I wrote the following code which plots two traces on the same graph :
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
data = pd.read_csv("data.csv")
headers = pd.read_csv("data.csv", index_col=0, nrows=0).columns.tolist()
fig = go.Figure()
fig = px.line(data, x="DateTime", y=[headers[0], headers[1]])
fig.show()
In this example the first and second headers are plotted as traces on the graph. I was wondering if there was a way other than y=[headers[n],headers[n+1]]... to get all the lines drawn on? I tried just using the headers array without an index, but it gives a ValueError
Plotly Express cannot process wide-form data with columns of different type.
So, is there a plotly-specific way to make this more efficient & readable than just writing every index in the plot header definition, or can it be done with standard python?
EDIT: the actual data sample is a csv providing int values with a header and date :
DateTime X Y Z
01-JAN-2018,5,6,7...
02-JAN-2018,7,8,9

if your sample data is what is in your CSV, it's a simple case of defining y as the numeric columns
import io
import pandas as pd
import plotly.express as px
headers = pd.read_csv(io.StringIO("""DateTime,X,Y,Z
01-JAN-2018,5,6,7
02-JAN-2018,7,8,9
"""))
px.line(headers, x="DateTime", y=headers.select_dtypes("number").columns)

Related

How can i make this time series graph interactive?

I am new to Python and Pandas so any help is much appreciated.
I am trying to make the graph below interactive, it would also be good to be able to choose which attributes show rather than them all.
Here is what I have so far
df.set_index('Current Year').plot(rot=45)
plt.xlabel("Year",size=16)
plt.ylabel("",size=16)
plt.title("Current year time series plot", size=18)
I know that i need to import the following import plotly.graph_objects as go but no idea how to implement this with the above time series graph. Thanks
EDIT
I am getting this error when trying to enter my plotted data.
All you need is:
df.plot()
As long as you import the correct libraries and set plotly as the plotting backend for pandas like this:
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame({'year':['2020','2021','2022'], 'value':[1,3,2]}).set_index('year')
fig = df.plot(title = "Current year time series plot")
fig.show()
Plot:
Complete code:
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame({'year':['2020','2021','2022'], 'value':[1,3,2]}).set_index('year')
fig = df.plot(title = "Current year time series plot")
fig.show()

plotly express for large data sets

import plotly.express as px
import pandas as pd
dfa = pd.DataFrame()
dfa["travel_time(min)"] = range(100000)
fig = px.ecdf(dfa["travel_time(min)"], x="travel_time(min)")
#fig.write_html("debug.html")
fig.show()
The 100k points are producing a graphic, which is lagging (with 10k points it is working fine).
How can I fix this? Is it possible to somehow precalculate the graphic?

Plotly: How to plot time series in Dash Plotly

I've searched for days and didn't find an answer. How can I plot a time series data in Dash Plotly as a linegraph with selectable lines?
My data (pandas dataframe) describes GDP of different countrys. Index is country, column is years.
I don't find a solution to pass the data to Dash Plotly linegraph. What are my x and y values?
fig = px.line(df, x=?, y=?)
By the looks of it, the solution in your example should be:
fig = px.line(df, x=df.index, y = df.columns)
Plot 1 - plot by columns as they appear in your dataset
From here, if you'd like to display countries in the legend and have time on the x-axis, you can just add df = df.T into the mix and get:
Plot 2 - transposed dataframe to show time on the x-axis
Details
There's a multitude of possibilites when it comes to plotting time series with plotly. Your example displays a dataset of a wide format. With the latest versions, plotly handles both datasets of long and wide format elegantly straight out of the box. If you need specifics of long and wide data in plotly you can also take a closer look here.
The code snippet below uses the approach described above, but in order for this to work for you exactly the same way, your countries will have to be set as the dataframe row index. But you've stated that they are, so give it a try and let me know how it works out for you. And one more thing: you can freely select which traces to display by clicking the years in the plot legend. The figure produced by the snippet below can also be directly implemented in Dash by following the steps under the section What About Dash? here.
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.io as pio
# sample dataframe of a wide format
np.random.seed(5); cols = ['Canada', 'France', 'Germany']
X = np.random.randn(6,len(cols))
df=pd.DataFrame(X, columns=cols)
df.iloc[0]=0;df=df.cumsum()
df['Year'] = pd.date_range('2020', freq='Y', periods=len(df)).year.astype(str)
df = df.T
df.columns = df.iloc[-1]
df = df.head(-1)
df.index.name = 'Country'
# Want time on the x-axis? ###
# just include:
# df = df.T
##############################
# plotly
fig = px.line(df, x=df.index, y = df.columns)
fig.update_layout(template="plotly_dark")
fig.show()

Problem plotting single and double column data with a boxplot

I am trying to plot columns of data form a .csv file in a boxplot/violin plot using matplotlib.pyplot.
When setting the dataframe [df] to one column of data, the plotting works fine. However once I try to plot two columns I do not get a plot generated, and the code seems like it's just running and running, so I think there is something to how I am passing along the data. Each columns is 54,500 row long.
import os
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from pandas import read_csv
os.chdir(r"some_directory//")
df = read_csv(r"csv_file.csv")
# the csv file is 7 columns x 54500 rows, only concerned with two columns
df = df[['surge', 'sway']]
# re-size the dataframe to only use two columns
data = df[['surge', 'sway']]
#print data to just to confirm
print(data)
plt.violinplot(data, vert=True, showmeans=True, showmedians=True)
plt.show()
If I change the data line to data = df['surge'] I get a perfect plot with the 54501 surge values.
When I introduce the second variable as data = df[['surge', 'sway']] is when the program gets hung up. I should note the same problem exists if I let data = df[['surge']] so I think it's something to do with the double braces and going from a list to an array, perhaps?

Clustermapping in Python using Seaborn

I am trying to create a heatmap with dendrograms on Python using Seaborn and I have a csv file with about 900 rows. I'm importing the file as a pandas dataframe and attempting to plot that but a large number of the rows are not being represented in the heatmap. What am I doing wrong?
This is the code I have right now. But the heatmap only represents about 49 rows.
Here is an image of the clustermap I've obtained but it is not displaying all of my data.
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
# Data set
df = pd.read_csv('diff_exp_gene.csv', index_col = 0)
# Default plot
sns.clustermap(df, cmap = 'RdBu', row_cluster=True, col_cluster=True)
plt.show()
Thank you.
An alternative approach would be to use imshow in matpltlib. I'm not exactly sure what your question is but I demonstrate a way to graph points on a plane from csv file
import numpy as np
import matplotlib.pyplot as plt
import csv
infile = open('diff_exp_gene.csv')
df = csv.DictReader(in_file)
temp = np.zeros((128,128), dtype = int)
for row in data:
if row['TYPE'] == types:
temp[int(row['Y'])][int(row['X'])] = temp[int(row['Y'])][int(row['X'])] + 1
plt.imshow(temp, cmap = 'hot', origin = 'lower')
plt.show()
As far as I know, keywords that apply to seaborn heatmaps also apply to clustermap, as the sns.clustermap passes to the sns.heatmap. In that case, all you need to do in your example is to set yticklabels=True as a keyword argument in sns.clustermap(). That will make all of the 900 rows appear.
By default, it is set as "auto" to avoid overlap. The same applies to the xticklabels. See more here: https://seaborn.pydata.org/generated/seaborn.heatmap.html

Categories