Seaborn: Include multiple 'HUE' in one graph? - python

I have simple python script for reading CSV file and plotting graphs using seaborn, and it works perfectly!
firstly: here is a screenshot of my CSV file showing the frames and the statistics for stations:
here is my simple code:
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
read_CSV_stats_per_day_for_KS = pd.read_csv('results_per_day/ALL_stations_together.csv', sep=";", encoding ="ISO-8859-1")
read_day_column = read_CSV_stats_per_day_for_KS[read_CSV_stats_per_day_for_KS['day'] == 0]
def plot_results_about_south_east_stations():
south_east = read_day_column[read_day_column['Region'] == 'south_east']
top4_visited_stations = south_east.nlargest(4, 'total_visited_cars')
dataframes_for_south_east_stations = read_CSV_stats_per_day_for_KS[read_CSV_stats_per_day_for_KS['name'].isin(top4_visited_stations['name'])]
sns.relplot(x='day', y='avg_queue_length', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='avg_total_EV_in_station', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='total_rejected_cars', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='total_exhausted_cars', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
plt.show()
When i plot, I get those beautiful graphs:
https://imgur.com/a/YlGTCgH
If you look on the graphs below(exactly the same), I am trying to replace the name of stations with total_amount_of_chargers that particular station has
https://imgur.com/a/5l5cP4x
qustion 1: firstly, the numbers are wrong, i want to show the actual number of chargers, not those 20, 40,60,80. How can i do that?
question 2: I still want the name of stations to stay on the Y label instead of numbers, how can i do that?
qustion 3: the final qustion, Is it possible to combine the name of stations and total amount of chargers to show on the right? like this:
https://imgur.com/a/3RidCDv
It this were possible, it would be awesome and i would be very appreciated!
Thank you.

Related

How can i make this time series graph interactive?

I am new to Python and Pandas so any help is much appreciated.
I am trying to make the graph below interactive, it would also be good to be able to choose which attributes show rather than them all.
Here is what I have so far
df.set_index('Current Year').plot(rot=45)
plt.xlabel("Year",size=16)
plt.ylabel("",size=16)
plt.title("Current year time series plot", size=18)
I know that i need to import the following import plotly.graph_objects as go but no idea how to implement this with the above time series graph. Thanks
EDIT
I am getting this error when trying to enter my plotted data.
All you need is:
df.plot()
As long as you import the correct libraries and set plotly as the plotting backend for pandas like this:
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame({'year':['2020','2021','2022'], 'value':[1,3,2]}).set_index('year')
fig = df.plot(title = "Current year time series plot")
fig.show()
Plot:
Complete code:
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame({'year':['2020','2021','2022'], 'value':[1,3,2]}).set_index('year')
fig = df.plot(title = "Current year time series plot")
fig.show()

Plotting top 10 Values in Big Data

I need help plotting some categorical and numerical Values in python. the code is given below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv('train_feature_store.csv')
df.info
df.head
df.columns
plt.figure(figsize=(20,6))
sns.countplot(x='Store', data=df)
plt.show()
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)
However, the data size is so huge (Big data) that I'm not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given below:-
In an attempt to plot the thing, I'm trying to put the below code into a dataframe and plot it, but not able to do so. Can anyone help me out in this:-
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)
Below, is a link to the sample dataset. However, the dataset is a representation, in the original one where I'm trying to do the EDA, which has around 3 thousand unique stores and 60 thousand rows of data. PLEASE HELP! Thanks!
https://drive.google.com/drive/folders/1PdXaKXKiQXX0wrHYT3ZABjfT3QLIYzQ0?usp=sharing
You were pretty close.
import pandas as pd
import seaborn as sns
df = pd.read_csv('train_feature_store.csv')
sns.set(rc={'figure.figsize':(16,9)})
g = df.groupby('Store', as_index=False)['Size'].sum().sort_values(by='Size', ascending=False).head(10)
sns.barplot(data=g, x='Store', y='Size', hue='Store', dodge=False).set(xticklabels=[]);
First of all.. looking at the data ..looks like it holds data from scotland to Kolkata ..
categorize the data by geography first & then visualize.
Regards
Maitryee

Graphing a dataframe line plot with a legend in Matplotlib

I'm working with a dataset that has grades and states and need to create line graphs by state showing what percent of each state's students fall into which bins.
My methodology (so far) is as follows:
First I import the dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
records = [{'Name':'A', 'Grade':'.15','State':'NJ'},{'Name':'B', 'Grade':'.15','State':'NJ'},{'Name':'C', 'Grade':'.43','State':'CA'},{'Name':'D', 'Grade':'.75','State':'CA'},{'Name':'E', 'Grade':'.17','State':'NJ'},{'Name':'F', 'Grade':'.85','State':'HI'},{'Name':'G', 'Grade':'.89','State':'HI'},{'Name':'H', 'Grade':'.38','State':'CA'},{'Name':'I', 'Grade':'.98','State':'NJ'},{'Name':'J', 'Grade':'.49','State':'NJ'},{'Name':'K', 'Grade':'.17','State':'CA'},{'Name':'K', 'Grade':'.94','State':'HI'},{'Name':'M', 'Grade':'.33','State':'HI'},{'Name':'N', 'Grade':'.22','State':'NJ'},{'Name':'O', 'Grade':'.7','State':'NJ'}]
df = pd.DataFrame(records)
df.Grade = df.Grade.astype(float)
Next I cut each grade into a bin
df['bin'] = pd.cut(df['Grade'],[-np.inf,.05,.1,.15,.2,.25,.3,.35,.4,.45,.5,.55,.6,.65,.7,.75,.8,.85,.9,.95,1],labels=False)/10
Then I create a pivot table giving me the count of people by bin in each state
df2 = pd.pivot_table(df,index=['bin'],columns='State',values=['Name'],aggfunc=pd.Series.nunique,margins=True)
df2 = df2.fillna(0)
Then I convert those n-counts into percentages and remove the margin rows
df3 = df2.div(df2.iloc[-1])
df3 = df3.iloc[:-1,:-1]
Now I want to create a line graph with multiple lines (one for each state) with the bin on the x axis and the percentage on the Y axis. df3.plot() will give me the chart I want but I would like to accomplish the same using matplotlib, because it offers me greater customization of the graph. However, running
plt.plot(df3)
gives me the lines I need but I can't get the legend the work properly. Any thoughts on how to accomplish this?
It may not be the best way, but I use the pandas plot function to draw df3, then get the legend and get the new label names. Please note that the processing of the legend string is limited to this data.
line = df3.plot(kind='line')
handles, labels = line.get_legend_handles_labels()
label = []
for l in labels:
label.append(l[7:-1])
plt.legend(handles, label, loc='best')
You can do this:
plt.plot(df3,label="label")
plt.legend()
plt.show()
For more information visit here
And if it helps you to solve your issues then don't forget to mark this as accepted answer.

Make Matplotlib Ignore CSV Headings

Trying to create a bar chart using a CSV file and Matplotlib. However, there are two headings (COUNTRY & COST) which means that the code isn't able to run properly and produce the bar chart. How do I edit the code so that it will ignore the headings? The first image is what the CSV file actually looks like and the second image is what the code is able to understand and run.
EDIT: the python assisstant tells me that the error seems to be occurring in Line 14 of the code: price.append(float(row[1]))
import matplotlib.pyplot as plt
import csv
price = []
countries = []
with open ("Europe.csv","r") as csvfile:
plot = csv.reader(csvfile)
for idx, row in enumerate(plot):
if idx == 0:
continue
price.append(float(row[1]))
countries.append(str(row[0]))
plt.style.use('grayscale')
plt.bar( countries, price, label='Europe', color='red')
plt.ylabel('Price in US$')
plt.title('Cost of spotify premium per country')
plt.xticks(rotation=90)
plt.legend(loc='best')
plt.show()
I would use pandas for this. With that you can then more easily create the bar plot using this function.
Example using your variables countries and price:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"country": countries, "price": price})
df.plot.bar(x="country", y="price")
plt.show()
Just using pandas.read_csv then using skiprows=[0],header=None like this:
import pandas as pd
df = pd.read_csv('data.csv',sep=';',skiprows=[0],header=None)
Iam using separator ';' to data because I assume your csv file create in ms.excel
But I think just read the csv file without skiprows, like this:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv',sep=';')
price = df['cost']
countries = df['country']
plt.style.use('grayscale')
plt.bar( countries, price, label='Europe', color='red')
plt.ylabel('Price in US$')
plt.title('Cost of spotify premium per country')
plt.xticks(rotation=90)
plt.legend(loc='best')
plt.show()
for data like this :
and the result like this :

Plot stacked bar chart from pandas data frame

I have dataframe:
payout_df.head(10)
What would be the easiest, smartest and fastest way to replicate the following excel plot?
I've tried different approaches, but couldn't get everything into place.
Thanks
If you just want a stacked bar chart, then one way is to use a loop to plot each column in the dataframe and just keep track of the cumulative sum, which you then pass as the bottom argument of pyplot.bar
import pandas as pd
import matplotlib.pyplot as plt
# If it's not already a datetime
payout_df['payout'] = pd.to_datetime(payout_df.payout)
cumval=0
fig = plt.figure(figsize=(12,8))
for col in payout_df.columns[~payout_df.columns.isin(['payout'])]:
plt.bar(payout_df.payout, payout_df[col], bottom=cumval, label=col)
cumval = cumval+payout_df[col]
_ = plt.xticks(rotation=30)
_ = plt.legend(fontsize=18)
Besides the lack of data, I think the following code will produce the desired graph
import pandas as pd
import matplotlib.pyplot as plt
df.payout = pd.to_datetime(df.payout)
grouped = df.groupby(pd.Grouper(key='payout', freq='M')).sum()
grouped.plot(x=grouped.index.year, kind='bar', stacked=True)
plt.show()
I don't know how to reproduce this fancy x-axis style. Also, your payout column must be a datetime, otherwise pd.Grouper won't work (available frequencies).

Categories