How to display two different legends in hconcat chart using altair - python

I need to display two separate charts side by side including their legends in Jupyterlab and the only way I managed to do that was using hconcat.
I've gotten this far:
However even with .resolve_legend(color='independent') I get the entries from both charts displayed in both legends at the top - which is mighty confusing.
The result should look like this:
How can I remove the unwanted legend entries?
Or if anyone knows a good alternative how to have to charts side-by-side in a single jupyterlab cell I would be happy to take a different route.
My code looks like this:
import altair as alt
import pandas as pd
from altair.expr import datum
df_test=pd.read_csv("test_df.csv")
chart_m1=alt.Chart(df_test).mark_bar().encode(
x=alt.X('counts:Q', stack="normalize",axis=None),
y=alt.Y('category:N',sort=['A','B','C'],title=None),
color=alt.Color('grade:N',
sort = alt.EncodingSortField( 'sort:Q', order = 'ascending' ),
scale = alt.Scale(domain=['good <10', 'average 10-20', 'bad >20'], range=['#0cce6b', '#ffa400', '#ff4e42']),
legend = alt.Legend(title="Metric1",orient='top')),
order='sort:Q',
tooltip=['category:N','grade:N','counts:Q']
).transform_filter(datum.metric=='metric1'
).properties(height=50,width=150)
chart_m2=alt.Chart(df_test).mark_bar().encode(
x=alt.X('counts:Q', stack="normalize",axis=None),
y=alt.Y('category:N',sort=['A','B','C'],title=None),
color=alt.Color('grade:N',
sort = alt.EncodingSortField( 'sort:Q', order = 'ascending' ),
scale = alt.Scale(domain=['good <100', 'average 100-350', 'bad >350'], range=['#0cce6b', '#ffa400', '#ff4e42']),
legend = alt.Legend(title="Metric2",orient='top')),
order='sort:Q',
tooltip=['category:N','grade:N','counts:Q']
).transform_filter(datum.metric=='metric2'
).properties(height=50,width=150)
alt.hconcat(chart_m1,chart_m2).resolve_legend(color='independent').configure_view(stroke=None)
The test_df.csv I used is this:
category,metric,sort,grade,counts
A,metric1,1,good <10,345
B,metric1,1,good <10,123
C,metric1,1,good <10,567
A,metric1,2,average 10-20,567
B,metric1,2,average 10-20,678
C,metric1,2,average 10-20,789
A,metric1,3,bad >20,900
B,metric1,3,bad >20,1011
C,metric1,3,bad >20,1122
A,metric2,1,good <100,1122
B,metric2,1,good <100,1011
C,metric2,1,good <100,900
A,metric2,2,average 100-350,789
B,metric2,2,average 100-350,678
C,metric2,2,average 100-350,567
A,metric2,3,bad >350,567
B,metric2,3,bad >350,345
C,metric2,3,bad >350,123

Use resolve_scale(color='independent')
alt.hconcat(
chart_m1, chart_m2
).resolve_scale(
color='independent'
).configure_view(
stroke=None
)
More information at https://altair-viz.github.io/user_guide/scale_resolve.html

Related

How to assign a code to xlabels in bar plot

I am working on global terrorism database and trying to plot a bar plot of various target types of terrorists.
I want to assign a code to every xlabel and print a legend on the graph showing corresponding codes with the target types
Till now I have not found any solution to this problem.
Link to the dataset in consideration.
df['targtype1_txt'].value_counts().plot(kind = 'bar')
plt.title("Favoirite Target types of Terrorist")
plt.xlabel("Target Type")
plt.ylabel("Value_Counts")
Link to output of the above code.
I would like to map the values of xlabel to some numerical code and put it as list on the side of the plot
You can do something like this:
c = df.targtype1_txt.astype('category')
d = dict(enumerate(c.cat.categories))
df['targtype1_txt_code'] = c.cat.codes
df['targtype1_txt_code'].value_counts().plot(kind = 'bar')
plt.title("Favorite Target types of Terrorist")
plt.xlabel("Target Type")
plt.xticks(rotation=0)
plt.ylabel("Value_Counts")
s ='\n'.join(['%s: %s' % (key, value) for (key, value) in d.items()])
plt.text(2.6, 1, s, fontsize=14)
Create a new column with a code for each value and plot that column. Then add a text box on the side using the dictionary that has been used to code the values.
Output (with few values):

Filter columns in a csv file and output plot

I am trying to filter a column in a CSV just like you would in excel. Then based on that filter I would like to have it call a different column and output the data from that column into a plot.
I have tried to print the code on its own and its prints correctly. I am just not sure about the syntax. When I print the code it shows that I can correctly search through a column
data.head()
print('banana',
data[('Sub-Dept')] == 'Stow Each') #and data[('Sub-Dept')] == 'Stow Each Nike', 'Each Stow to Prime', 'Each Stow to Prime E', 'Each Stow to Prime W', 'Stow to Prime LeadPA')
But I do not know how to get it to filter that first, then plot underneath there. I am fairly new to this.
I have a column has many different filterable names inside of it. I want to call those names above.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
x = []
y = []
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6)
new_data = data.loc[(data['Sub-Dept'] == 'Stow Each')]
sns.set(style="whitegrid") #this is strictly cosmetic, you can change it any time
ax = sns.countplot(x='U.S. OSHA Recordable?', data=new_data)
plt.bar(x, y, label='Loaded from file!')
plt.ylabel('Quantity of Injuries')
plt.title('Injuries (past 4 weeks)')
plt.show()
Right now, I am expecting it to out put 1 graph that has 2 bars. Problem: it shows a quantity of 80 on 1 bar and 20 on another bar. Solution: After the data is filtered from another column it should show 21 in 1 bar and 7 in another bar inside the same graph.
The graphing portion works great and so does pulling the data from the excel. The only part I do not know how to do is filtering that column and then graphing based on that filter. I am not sure what the code should look like and where it should go. please help
CSV FILE HERE: https://drive.google.com/open?id=1yJ6iQL-bOvGSLAKPcPXqgk9aKLqUEmPK
Try pandas.query()
Pandas query might be useful.
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6)
new_data = data.query("Sub-Dept == Stow Each or
Sub-Dept == RF_Pick")
I am so happy to have figured this out. I had trouble finding the answer to this on the internet. So I hope this helps someone else in the future. Thanks to Datanovice for the initial idea into .loc. That helped me get to the next steps.The rest of my answer came from here: https://www.geeksforgeeks.org/python-pandas-extracting-rows-using-loc/
Sorry I left my comments in the code
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
import seaborn as sns #https://seaborn.pydata.org/generated/seaborn.boxplot.html
# This website saved my life https://www.pythonforengineers.com/introduction-to-pandas/
# use this to check the available styles: plt.style.available
x = []
y = []
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6, index_col="Sub-Dept") #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
new_data = data.loc[["Each Stow to Prime", "Each Stow to Prime E", "Each Stow to Prime W", "Stow Each", "Stow Each Nike", "Stow to Prime LeadPA",]]
sns.set(style="whitegrid") #this is strictly cosmetic, you can change it any time
ax = sns.countplot(x='U.S. OSHA Recordable?', data=new_data) #magic, so seaborn is designed to pull the dats from a URL. But when using pandas and seaborn there is a work around
# the key is that "countplot" literally does the work for you. its awesome
plt.bar(x, y, label='Loaded from file!')
plt.ylabel('Quantity of Injuries')
plt.title('Stow Injuries (past 4 weeks)')
plt.show() # shows the plot to the user

How does Python handle time conversions?

Recently I have been working on a time series data set and have written a script to automate some plotting. Using the pd.to_datetime function (provided with a specific format), I assumed would automatically convert every time entry to the appropriate format.
The raw data follows this format:
%d/%m/%YYYY HH:MM (HH:MM is irrelevant in this case so don't worry about it as we are only interested in the daily average)
However, it seems Python intermittently changes the 'raw timestamps' and changes the format to:
%d-%m-%YYYY
Why is this the case and how can I make sure Python doesn't do this?
I have received the below error and struggle to work out why this is the case.
I have looked at the following SO but I don't have the same issue.
time data does not match format
The data itself is provided in the following CSV and is all in the %d/%m/%Y format.
My code for my function is attached in case there are any errors with how I've converted the timestamps.
def plotFunction(dataframe):
for i in wellNames:
my_list = dataframe["Date"].values
DatesRev = []
for j in my_list:
a=j[0:10]
DatesRev.append(a)
#We now need to re-add the dates to our data frame
df2 = pd.DataFrame(data= DatesRev)
df2.columns = ["DatesRev"]
dataframe["DatesRev"] = df2["DatesRev"]
# print (dataframe)
# #df2= pd.DataFrame(DatesRev)
# #df2.columns = ['DatesRev']
# #dataframe['DatesRev'] = df2['DatesRev']
wellID = dataframe[dataframe['Well']==i]
wellID['DatesRev'] = pd.to_datetime(wellID['DatesRev'], format='%d/%m/%Y')
print (i)
# ax = wellID.set_index('DatesRev').plot()
# xfmt = mdates.DateFormatter('%d-%m-%Y')
# ax.xaxis.set_major_formatter(xfmt)
# plt.xticks(rotation=90)
# ax.legend(bbox_to_anchor=(1.04,1), loc="upper left")
# plt.title(i)
# plt.show()
# plt.savefig(i + ".jpg", bbox_inches='tight')
The problem is that python does not recognize / very well. I came across this problem myself. Dashes are what it recognizes the best. If I may suggest, try to keep the - formatting. It is just Python being Python. :P
In wellID['DatesRev'], change format='%d/%m/%Y' to format='%d-%m-%Y' and that should possibly fix your problem.

How to format a bokeh DataTable cell based on another column value?

Salutations.
I am developing an application using bokeh server (version 0.12.13) and I have a DataTable widget with several columns. One of them are a measure of an issue open days and another is an estimate of days to close such issue.
In some situations, the amount of days that an issue is open surpasses the estimated amount and I would like to colour the estimated days column red if that happens.
I have tried using "widget.HTMLTemplateFormatter", but I haven't figured out how to access another column value to make the comparison and decide whether paint the cell red or not.
Does anyone know how to get around this?
You can either define a javascript function within the underscore js code to conditionally color each cell. Each of the fields within the data source linked to the table can be accessed.
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import DataTable, TableColumn, HTMLTemplateFormatter
from bokeh.io import show
dict1 = {'estd':[1]*6,
'actd':[1, 1, 1, 2, 2, 2],
'z' :[3, 3, 3, 3, 3, 3]}
source = ColumnDataSource(data=dict1)
template="""
<b><div style="background:<%=
(function colorfromint(){
if(actd > estd){
return("Red")
}
else{
return("White")
}
}()) %>;">
<%= (value).toFixed(1) %></div></b>
"""
formater = HTMLTemplateFormatter(template=template)
columns = [
TableColumn(field="estd", title="Estimated Days"),
TableColumn(field="actd", title="Actual days",formatter=formater),
TableColumn(field="z", title="z")
]
data_table = DataTable(source=source, columns=columns, width=800)
show(data_table)
If the data does not change you can define the colors using python code, see the second example here: How do I adjust number format in bokeh data table using HTMLTemplateFormatter?.

Print a map with markers and text details with Python

I am currently working on a project, the idea is to extract tweets (with geo enabled) from a Hashtag and to print a map (with Folium). Inside the map, I should have markers according to the user locations and when I click at the marker I should have the text of the tweet. But currently I only have a map and the markers.
This is my code :
import pandas as pd
import folium, ast
locations = pd.read_csv('tweets.csv', usecols=[3]).dropna()
l_locations = []
for loc in locations.values:
l_locations.append(ast.literal_eval(loc[0])['coordinates'])
print_tweet_map = folium.Map(location=[48.864716, 2.349014], zoom_start=8, tiles='Mapbox Bright')
for geo in l_locations:
folium.Marker(location=geo).add_to(print_tweet_map)
print_tweet_map.save('index.html')
Can you guys help me to print the markers and the text details of the tweet ?
Thanks in advance.
PS : I have currently :
Some lines of the csv file :
created_at,user,text,geo
2017-09-30 15:28:56,yanceustie,"RT #ChelseaFC: .#AlvaroMorata and #marcosalonso03 have been checking out the pitch ahead of kick-off..., null
2017-09-30 15:48:18,trendinaliaVN,#CHEMCI just started trending with 17632 tweets. More trends at ... #trndnl,"{'type': 'Point', 'coordinates': [21.0285, 105.8048]}"
Examine read_csv closely and then use it to retrieve the tweet text as well. Reading folium documentation, the popup seems the most relevant place to put each tweet's text.
Also you iterate over the same things 2 times. You can reduce those into 1 iteration loop that places a tweet on the map. (Imagine the map as your empty list you were appending to). No need to be overly sequential.
import pandas as pd
import folium, ast
frame = pd.read_csv('tweets.csv', usecols=[2, 3], names=['text', 'location'], header=None).dropna()
print_tweet_map = folium.Map(location=[48.864716, 2.349014], zoom_start=8, tiles='Mapbox Bright')
for index, item in frame.iterrows():
loc = item.get('location')
text = item.get('text')
geo = ast.literal_eval(loc[0])['coordinates']
folium.Marker(location=geo, popup=text) \
.add_to(print_tweet_map)
print_tweet_map.save('index.html')
This should work, or very close to work, but I don't have a proper computer handy for testing.

Categories