Pandas plot several df with different variables on the same barplot

Pandas plot several df with different variables on the same barplot - python

I have 4 Dataframes with different location: Indonesia, Singapore, Malaysia and Total each of them containing the percentage of the 5 top revenue-generating products. I have plotted them separately.
I want to combine them together on one plot where X-axis shows different locations and top-revenue-generating products for each location.
I have printed data frames and as you can see they have different products in them.
print(Ind_top_cat, Sin_top_cat, Mal_top_cat, Tot_top_cat)
Category Amt
M020P 0.144131
MH 0.099439
ML 0.055052
PB 0.050057
PPDR 0.048315
Category Amt
ML 0.480781
M015 0.073034
PPDR 0.035412
M025 0.033418
M020 0.031836
Category Amt
TN 0.343650
PPDR 0.190773
NMCN 0.118425
M015 0.047539
NN 0.038140
Category Amt
M020P 0.158575
MH 0.092012
ML 0.064179
PPDR 0.050803
PB 0.044301
Thanks to joelostblom I was able to construct a plot, however, there are still some issues.
enter image description here
all_countries = pd.concat([Ind_top_cat, Sin_top_cat, Mal_top_cat, Tot_top_cat])
all_countries['Category'] = all_countries.index
sns.barplot(x='Country', y='Amt',hue = 'Category',data=all_countries)
Is there any way I can put legend values on the x-axis (no need to colour categories on I want to instead colour countries), and put data values on top of bars. Also, bars are not centred and have no idea how to solve it.

You could create a new column in each dataframe with the country name, e.g.
Ind_top_cat['Country'] = 'Indonesia'
Sin_top_cat['Country'] = 'Singapore'
The you can create one big dataframe by concatenating the country dataframes together:
all_countries = pd.concat([Ind_top_cat, Sin_top_cat])
And finally, you can use a high level plotting library such as seaborn to assign one column to the x-axis location and one to the color of the bars:
import seaborn as sns
sns.barplot(x='Country', y='Amt', color='Category', data=all_countries)
You can scroll down to the second example on this page to get an idea what such a plot would look like (also pasted below):

Related

Plotting a map using Geoview and using size/ colour option

I'm trying to visualize a dataset which I've filtered down to just longitude/latitude, country name, year and a count of deaths. I'm trying to plot that using geoviews as I wish to add lot more to my dataset and interactive map would be a great add on
My code is as follows: (for_plot is the dataframe)
# Plotting the graph
Best = gv.Dataset(for_plot)
points = Best.to(gv.Points, ['longitude', 'latitude'], ['deaths', 'country'])
(gts.Wikipedia * points).opts(
opts.Points(width=600, height=350, tools=['hover'],
size='deaths', cmap='viridis'))
This creates a perfect graph put the 'size' function doesn't work. If I change size to color, graph is not generated. I'm okay with either but just need atleast one marker.
Thanks for any help
Tried to switch values for color instead of size, works with year but not deaths

Encoding a list column to the legend of a plot

Apologies in advance, I am not sure how to word this question best:
I am working with a large dataset, and I would like to plot Latitude and Longitude where the colour of the points (actually the opacity) is encoded to a 'FeatureType' column binded to the legend. This way I can use the legend to highlight on my map various features I am looking for.
Here is a picture of my map and legend so far
The problem is that in my dataset, the FeatureType column is a list of features that can be found there (i.e arch, bridge, etc..).
How can I make it so that the point shows up for both arch, and bridge. At the moment it creates its own category of (arch,bridge etc.), leading to over 300 combinations of about 20 different FeatureTypes.
The dataset can be found at http://atlantides.org/downloads/pleiades/dumps/pleiades-locations-latest.csv.gz
N.B: I am using altair/pandas
import altair as alt
import pandas as pd
from vega_datasets import data
df = pd.read_csv ('C://path/pleiades-locations.csv')
alt.data_transformers.enable('json')
countries = alt.topo_feature(data.world_110m.url, 'countries')
selection = alt.selection_multi(fields=['featureType'], bind='legend')
brush = alt.selection(type='interval', encodings=['x'])
map = alt.Chart(countries).mark_geoshape(
fill='lightgray',
stroke='white'
).project('equirectangular').properties(
width=500,
height=300
)
points = alt.Chart(df).mark_circle().encode(
alt.Latitude('reprLat:Q'),
alt.Longitude('reprLong:Q'),
alt.Color('featureType:N'),
tooltip=['featureType','timePeriodsKeys:N'],
opacity=alt.condition(selection, alt.value(1), alt.value(0.0))
).add_selection(
selection)
(map + points)

It is not possible for Altair to generate the labels you want from your current column format. You will need to turn your comma-separated string labels into lists and then explode the column so that you get one row per item in the list:
import altair as alt
import pandas as pd
from vega_datasets import data
alt.data_transformers.enable('data_server')
df = pd.read_csv('http://atlantides.org/downloads/pleiades/dumps/pleiades-locations-latest.csv.gz')[['reprLong', 'reprLat', 'featureType']]
df['featureType'] = df['featureType'].str.split(',')
df = df.explode('featureType')
countries = alt.topo_feature(data.world_110m.url, 'countries')
world_map = alt.Chart(countries).mark_geoshape(
fill='lightgray',
stroke='white')
points = alt.Chart(df).mark_circle(size=10).encode(
alt.Latitude('reprLat:Q'),
alt.Longitude('reprLong:Q'),
alt.Color('featureType:N', legend=alt.Legend(columns=2)))
world_map + points
Note that having this many entries in the legend is not meaningful since the colors are repeated. The interactivity would help with that somewhat, but I would consider splitting this up into multiple charts. I am not sure if it is even possible to expand the legend to show those hidden 81 entries. And double check that the long lat location corresponds correctly with the world map projection you are using, they seemed to move around when I changed the projection.

Plotly Express Choropleth for Country Regions

I have a dataframe created on a csv file about Italian Covid-19 spread all over regions. I was trying to create a px.choropleth plot in which showing Total Positive values for every regions in Italy.
This the code tried:
italy_regions=[i for i in region['Region'].unique()]
fig = px.choropleth(italy_last, locations="Country",
locationmode=italy_regions,
color=np.log(italy_last["TotalPositive"]),
hover_name="Region", hover_data=['TotalPositive'],
color_continuous_scale="Sunsetdark",
title='Regions with Positive Cases')
fig.update(layout_coloraxis_showscale=False)
fig.show()
Now I report some info: 'Country' is the name given to my dataframe and is filled only with the same values: 'Italy'. If I only input 'location="Country"' the graph is fine and I can see Italy colored into the world map.
The problems start when I try to make pyplot color my regions. As I'm a newbye in pyplot express, I read some examples and I thought I had to create a list of italian regions names and then put into 'choropleth' as input for 'barmode'.
Clearly I'm wrong.
So, what is the procedure to follow to make it run (if any)?
In case of need, I can provide both the csv file that the jupyter file I'm working on.

You need to provide a geojson with Italian region borders as geojson parameter to plotly.express.choropleth, for instance this one
https://gist.githubusercontent.com/datajournalism-it/48e29e7c87dca7eb1d29/raw/2636aeef92ba0770a073424853f37690064eb0ea/regioni.geojson
If you use this one, you need to explicitly pass featureidkey='properties.NOME_REG' as a parameter of plotly.express.choropleth.
Working example:
import pandas as pd
import requests
import plotly.express as px
regions = ['Piemonte', 'Trentino-Alto Adige', 'Lombardia', 'Puglia', 'Basilicata',
'Friuli Venezia Giulia', 'Liguria', "Valle d'Aosta", 'Emilia-Romagna',
'Molise', 'Lazio', 'Veneto', 'Sardegna', 'Sicilia', 'Abruzzo',
'Calabria', 'Toscana', 'Umbria', 'Campania', 'Marche']
# Create a dataframe with the region names
df = pd.DataFrame(regions, columns=['NOME_REG'])
# For demonstration, create a column with the length of the region's name
df['name_length'] = df['NOME_REG'].str.len()
# Read the geojson data with Italy's regional borders from github
repo_url = 'https://gist.githubusercontent.com/datajournalism-it/48e29e7c87dca7eb1d29/raw/2636aeef92ba0770a073424853f37690064eb0ea/regioni.geojson'
italy_regions_geo = requests.get(repo_url).json()
# Choropleth representing the length of region names
fig = px.choropleth(data_frame=df,
geojson=italy_regions_geo,
locations='NOME_REG', # name of dataframe column
featureidkey='properties.NOME_REG', # path to field in GeoJSON feature object with which to match the values passed in to locations
color='name_length',
color_continuous_scale="Magma",
scope="europe",
)
fig.update_geos(showcountries=False, showcoastlines=False, showland=False, fitbounds="locations")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
Output image

A line graph for non-numeric data

I have a dataset with mostly non numeric forms. I would love to create a visualization for them but I am having an error message.
My data set looks like this
|plant_name|Customer_name|Job site|Delivery.Date|DeliveryQuantity|
|SN13|John|Sweden|01.01.2019|6|
|SN14|Ruth|France|01.04.2018|4|
|SN15|Jane|Serbia|01.01.2019|2|
|SN11|Rome|Denmark|01.04.2018|10|
|SN14|John|Sweden|03.04.2018|5|
|SN15|John|Sweden|04.09.2019|7|
|
I need to create a lineplot to show how many times John made a purchase using Delivery Date as my timeline (x-axis)
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
pd.set_option("display.max_rows", 5)
hr_data = pd.read_excel("D:\data\Days_Calculation.xlsx", parse_dates = True)
x = hr_data['DeliveryDate']
y = hr_data ['Customer_name']
sns.lineplot(x,y)
Error: No numeric types to aggregate
My expected result show be a line graph like this
John's marker will present on the timeline (Delivery Date) on "01.01.2019", "03.04.2018" and "04.09.2019"
Another instance
To plot string vs float for example Total number of quantity (DeliveryQuantity) vs Customer Name .How can one approach this
how do one format the axes distance of a plot (not label)

Why not make Delivery Date a timestamp object instead of a string?
hr_data["Delivery.Date"] = pd.to_datetime(hr_data["Delivery.Date"])
Now you got plot options.
Working with John.
john_data = hr_data[hr_data["Customer_name"]=="John"]
sns.countplot(john_data["Delivery.Date"])

Generally speaking you have to aggregate something when working with categorical data. Whether you will be counting names in a column or adding number of orders, or ranking some categories this is still numeric data.
plot_data = hr_data.pivot_table(index='DeliveryDate', columns='Customer_name', values='DeliveryQuantity', aggfunc='sum')
plt.xticks(LISTOFVALUESFORXRANGE)
plot_data.plot(legend=False)

Creating a UK Heatmap

I have a data frame for UK data that looks something like this:
longitude latitude region priority
51.307733 -0.75708898 South East High
51.527477 -0.20646542 London Medium
51.725135 0.4747223 East of England Low
This dataframe is several thousand rows long. I want a heatmap of the UK broken down by the regions and colour intensity to be dependent on the priority in each region.
I would like to know the best way to turn this into a heatmap of the UK. I have tried geoPandas and Plotly but I have no functioning knowledge of these. Are these the best way to do it or is there a tool out there that you can simply upload your data to and it will plot it for you? Thanks!

For this kind of job i use to go with folium, which is great to work with maps,
But for the heatMap you have to have your "priority" column as float!
import folium
from folium import plugins
from folium.plugins import HeatMap
my_map = folium.Map(location=[51.5074, 0.1278],
zoom_start = 13) # for UK
your_dataframe['latitude'] = your_dataframe['latitude'].astype(float)
your_dataframe['longitude'] = your_dataframe['longitude'].astype(float)
your_dataframe['priority'] = your_dataframe['priority'].astype(float)
heat_df = your_dataframe[['latitude', 'longitude','priority']]
heat_df = heat_df.dropna(axis=0, subset=['latitude','longitude','priority'])
# List comprehension to make out list of lists
heat_data = [[row['latitude'],row['longitude'],row['priority']] for index, row in heat_df.iterrows()]
my_map.add_children(plugins.HeatMap(heat_data))
my_map.save('map.html')
and then you have to open map.html with yout browser

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas plot several df with different variables on the same barplot - python

Related

Plotting a map using Geoview and using size/ colour option

Encoding a list column to the legend of a plot

Plotly Express Choropleth for Country Regions

A line graph for non-numeric data

Creating a UK Heatmap

Categories

Resources