Python streamlit dynamic filter - python

I try to do a dashboard with streamlit, but I do not succeed to impact the graph with the filter. You can see what i did below :
year = st.sidebar.multiselect(
"select the year : ",
options=df2["Year"].unique(),
default=df2["Year"].unique(),
)
df_selection = df2.query("Year == #year")
fig = px.bar(df2, x="dpe", y="Percentage", color="signature",title="<b> DPE repartition in function of the year <b>")
fig.update_layout(plot_bgcolor="rgba(0,0,0,0)")
If you have any solution for this problem let me know thanks in advance.

You need to use st.plotly_chart() to display your bar graph.
try adding the following after
fig.update_layout(plot_bgcolor="rgba(0,0,0,0)")
st.plotly_chart(fig)
this should resolve the issue.

Related

Python (Datapane) : How to pass dynamic variables into a datapane report function

I am working on a charting module where I can pass on dataframe and the module will create reports based on plots generated by calling few functions as mentioned below.
I am using Altair for plotting and "Datapane" for creating the report, the documentation of the same can be found here : https://datapane.github.io/datapane/
My DataFrame looks like this
d = {'Date': ['2021-01-01', '2021-01-01','2021-01-01','2021-01-01','2021-01-02','2021-01-03'],
'country': ['IND','IND','IND','IND','IND','IND' ],
'channel': ['Organic','CRM','Facebook','referral','CRM','CRM' ],
'sessions': [10000,8000,4000,2000,7000,6000 ],
'conversion': [0.1,0.2,0.1,0.05,0.12,0.11 ],
}
country_channel = pd.DataFrame(d)
Plotting functions :
def plot_chart(source,Y_axis_1,Y_axis_2,chart_caption):
base = alt.Chart(source).encode(
alt.X('Date:T', axis=alt.Axis(title="Date"))
)
line_1 = base.mark_line(opacity=1, color='#5276A7').encode(
alt.Y(Y_axis_1,
axis=alt.Axis( titleColor='#5276A7'))
)
line_2 = base.mark_line(opacity=0.3,color='#57A44C', interpolate='monotone').encode(
alt.Y(Y_axis_2,
axis=alt.Axis( titleColor='#57A44C'))
)
chart_ae=alt.layer(line_1, line_2).resolve_scale(
y = 'independent'
).interactive()
charted_plot = dp.Plot(chart_ae , caption=chart_caption)
return charted_plot
def channel_plot_split(filter_1,filter_2,country,channel):
channel_split_data = country_channel[(country_channel[filter_1]==country.upper())]
channel_split_data =channel_split_data[(channel_split_data[filter_2].str.upper()==channel.upper())]
channel_split_data=channel_split_data.sort_values(by='Date',ascending = True)
channel_split_data=channel_split_data.reset_index(drop=True)
channel_split_data.head()
plot_channel_split = plot_chart(source=channel_split_data,Y_axis_1='sessions:Q',Y_axis_2='conversion:Q',chart_caption="Sessions-Conversion Plot for Country "+country.upper()+" and channel :"+ channel)
channel_plot=dp.Group(dp.HTML("<div class='center'> <h3> Country : "+country.upper()+" & Channel : "+channel.upper()+"</h3></div>"),plot_channel_split,rows=2)
return channel_plot
def grpplot(plot_1,plot_2):
gp_plot = dp.Group(plot_1,plot_2,columns=2)
return gp_plot
The above functions when called, will filter the dataframe, create plot for each filters and group 2 plots in a row.
row_1 = grpplot(channel_plot_split('country','channel','IND','Organic'),channel_plot_split('country','channel','IND','CRM'))
row_2 = grpplot(channel_plot_split('country','channel','IND','Facebook'),channel_plot_split('country','channel','IND','referral'))
I can now generate a report by calling datapane.Report() function as follows
r= dp.Report(row_1,row_2)
Problem: This works fine when I know how many channels are present, but my channel list is dynamic.I am thing of using "for" loop to generate rows, but not sure how can I pass on these rows as kwargs in dp.Report() function. For example, if I have 10 channels, I need to pass 10 rows dynamically.
I had a similar problem and solved it as follows
Create a list to store the pages or elements of the report, such as
report_pages=[]
report_pages.append(dp.Page)
report_pages.append(dp.Table)
report_pages.append(dp.Plot)
At the end just generate the report with a pointer to the list
dp.Report(*pages)
In your case, I think you can do the following
create a list
rows=[]
add the rows to the list
rows.append(row_1)
rows.append(row_2)
and then create the report with
r= dp.Report(*rows)
I found this solution on datapane's GitHub and in this notebook in the last line of code.
So here is how I solved this problem.
channel_graph_list=[]
for i in range(0,len(unique_channels),1):
channel_1_name = unique_channels[i]
filtered_data = filter_the_data(source=channel_data,filter_1='channel',fv_1=channel_1_name)
get_chart = plot_chart(filtered_data,Y_axis_1='sessions:Q',Y_axis_2='conversion:Q',chart_title='Session & Conv. Chart for '+channel_1_name)
#This is where the trick starts - The below code creates a dynamic variable
vars() ["channel_row_"+str(i)] = get_chart
channel_graph_list.append("dp.Plot(channel_row_"+str(i)+",label='"+channel_1_name+"')")
#convert the list to a string
channel_graph_row = ','.join(channel_graph_list)
# assign the code you want to run
code="""channel_graph = dp.Select(blocks=["""+channel_graph_row+ """],type=dp.SelectType.TABS)"""
#execute the code
exec(code)
Hope the above solution helps others looking to pass dynamically generated parameters into any function.

Is there a way I can cut out any values less than a score of 6 (Python)

I am doing a graph with movie genres and their average scores and it is a bit hard to understand due to the grouping of it all. I was wondering if there is a way to make it more presentable. I was thinking it might help to cut out any genre (either completely or into an 'other' sub-set) but all my attempts to do so have failed. Here is the code I used to get the graph:
df5 = pd.DataFrame(data={"Genre":dataYearScore['Genre'], "Score": dataYearScore['Score']})
df5 = df5.assign(Genre=df5['Genre'].str.split(',')).explode('Genre').reset_index(drop=True)
genre_list5 = []
avg_scores5 = []
for genre in df5["Genre"].unique():
genre_list5.append(genre)
avg_scores5.append(df5.loc[df5["Genre"]==genre, "Score"].mean())
plt.bar(genre_list5, avg_scores5, width = 0.8)
plt.xlabel('Genre')
plt.ylabel('Average Score')
plt.xticks(rotation=65)
plt.title('Average Score for Each Genre')
plt.show()
This is what my image looks like currently.
Any help is greatly appreciated :)
Try this:
df5 = df5[df5['Score']<6]

How to display two different legends in hconcat chart using altair

I need to display two separate charts side by side including their legends in Jupyterlab and the only way I managed to do that was using hconcat.
I've gotten this far:
However even with .resolve_legend(color='independent') I get the entries from both charts displayed in both legends at the top - which is mighty confusing.
The result should look like this:
How can I remove the unwanted legend entries?
Or if anyone knows a good alternative how to have to charts side-by-side in a single jupyterlab cell I would be happy to take a different route.
My code looks like this:
import altair as alt
import pandas as pd
from altair.expr import datum
df_test=pd.read_csv("test_df.csv")
chart_m1=alt.Chart(df_test).mark_bar().encode(
x=alt.X('counts:Q', stack="normalize",axis=None),
y=alt.Y('category:N',sort=['A','B','C'],title=None),
color=alt.Color('grade:N',
sort = alt.EncodingSortField( 'sort:Q', order = 'ascending' ),
scale = alt.Scale(domain=['good <10', 'average 10-20', 'bad >20'], range=['#0cce6b', '#ffa400', '#ff4e42']),
legend = alt.Legend(title="Metric1",orient='top')),
order='sort:Q',
tooltip=['category:N','grade:N','counts:Q']
).transform_filter(datum.metric=='metric1'
).properties(height=50,width=150)
chart_m2=alt.Chart(df_test).mark_bar().encode(
x=alt.X('counts:Q', stack="normalize",axis=None),
y=alt.Y('category:N',sort=['A','B','C'],title=None),
color=alt.Color('grade:N',
sort = alt.EncodingSortField( 'sort:Q', order = 'ascending' ),
scale = alt.Scale(domain=['good <100', 'average 100-350', 'bad >350'], range=['#0cce6b', '#ffa400', '#ff4e42']),
legend = alt.Legend(title="Metric2",orient='top')),
order='sort:Q',
tooltip=['category:N','grade:N','counts:Q']
).transform_filter(datum.metric=='metric2'
).properties(height=50,width=150)
alt.hconcat(chart_m1,chart_m2).resolve_legend(color='independent').configure_view(stroke=None)
The test_df.csv I used is this:
category,metric,sort,grade,counts
A,metric1,1,good <10,345
B,metric1,1,good <10,123
C,metric1,1,good <10,567
A,metric1,2,average 10-20,567
B,metric1,2,average 10-20,678
C,metric1,2,average 10-20,789
A,metric1,3,bad >20,900
B,metric1,3,bad >20,1011
C,metric1,3,bad >20,1122
A,metric2,1,good <100,1122
B,metric2,1,good <100,1011
C,metric2,1,good <100,900
A,metric2,2,average 100-350,789
B,metric2,2,average 100-350,678
C,metric2,2,average 100-350,567
A,metric2,3,bad >350,567
B,metric2,3,bad >350,345
C,metric2,3,bad >350,123
Use resolve_scale(color='independent')
alt.hconcat(
chart_m1, chart_m2
).resolve_scale(
color='independent'
).configure_view(
stroke=None
)
More information at https://altair-viz.github.io/user_guide/scale_resolve.html

Creation Pivot table for difficult dataSet python

Please help me to understand how can I create Pivot_table or group by for the difficult dataSet.
I tried to create pivot_table:
grouped_table = pd.pivot_table(renamedDf,index=["date","date_1","date_2","date_3", values = col_list] ,aggfunc=np.sum)
I received:
File "<ipython-input-107-c87c2a9a3325>", line 1
grouped_table = pd.pivot_table(renamedDf,index=["date","date_1","date_2","date_3", values = col_list] ,aggfunc=np.sum)
^SyntaxError: invalid syntax
Dataset has the following structure:
DataSet_screenshot
Expected structure:
enter image description here
Thank you in advance for any suggestions!
you have a syntax error because you didn't close your bracket after index list move it like this.
grouped_table = pd.pivot_table(renamedDf,index=["date","date_1","date_2","date_3"], values = col_list ,aggfunc=np.sum)

How does Python handle time conversions?

Recently I have been working on a time series data set and have written a script to automate some plotting. Using the pd.to_datetime function (provided with a specific format), I assumed would automatically convert every time entry to the appropriate format.
The raw data follows this format:
%d/%m/%YYYY HH:MM (HH:MM is irrelevant in this case so don't worry about it as we are only interested in the daily average)
However, it seems Python intermittently changes the 'raw timestamps' and changes the format to:
%d-%m-%YYYY
Why is this the case and how can I make sure Python doesn't do this?
I have received the below error and struggle to work out why this is the case.
I have looked at the following SO but I don't have the same issue.
time data does not match format
The data itself is provided in the following CSV and is all in the %d/%m/%Y format.
My code for my function is attached in case there are any errors with how I've converted the timestamps.
def plotFunction(dataframe):
for i in wellNames:
my_list = dataframe["Date"].values
DatesRev = []
for j in my_list:
a=j[0:10]
DatesRev.append(a)
#We now need to re-add the dates to our data frame
df2 = pd.DataFrame(data= DatesRev)
df2.columns = ["DatesRev"]
dataframe["DatesRev"] = df2["DatesRev"]
# print (dataframe)
# #df2= pd.DataFrame(DatesRev)
# #df2.columns = ['DatesRev']
# #dataframe['DatesRev'] = df2['DatesRev']
wellID = dataframe[dataframe['Well']==i]
wellID['DatesRev'] = pd.to_datetime(wellID['DatesRev'], format='%d/%m/%Y')
print (i)
# ax = wellID.set_index('DatesRev').plot()
# xfmt = mdates.DateFormatter('%d-%m-%Y')
# ax.xaxis.set_major_formatter(xfmt)
# plt.xticks(rotation=90)
# ax.legend(bbox_to_anchor=(1.04,1), loc="upper left")
# plt.title(i)
# plt.show()
# plt.savefig(i + ".jpg", bbox_inches='tight')
The problem is that python does not recognize / very well. I came across this problem myself. Dashes are what it recognizes the best. If I may suggest, try to keep the - formatting. It is just Python being Python. :P
In wellID['DatesRev'], change format='%d/%m/%Y' to format='%d-%m-%Y' and that should possibly fix your problem.

Categories