Plotting the UK election results using Python, Pandas and Geopandas - python

as an exercise I am trying to plot out the UK's general election results from 2017. I have used Pandas to manipulate my dataframe and geopandas to visualise the results where every region is coloured by the winning party, conservative: blue, labour: red etc...
I have managed to plot it out but no matter what I do - the colours are not coming out correctly! Below I have attached my code, my output and what the output should look like, and any help would be much appreciated.
My Code:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely.geometry import multipolygon, polygon, Polygon, MultiPolygon
%matplotlib inline
uk_map = gpd.read_file('Westminster_Parliamentary_Constituencies__December_2017__UK_BGC_V2.shp')
#shape file of the uk
df = pd.read_csv('uk_election_data_v2.csv')
#election results
uk_map.rename(columns={'PCON17NM':'Constituency'}, inplace=True)
uk_map.sort_values('Constituency', inplace=True)
df.sort_values('Constituency', inplace=True)
party_colours={'Conservative':'#0087dc',
'Liberal Democrat':'#FDBB30',
'Labour':'#d50000',
'SNP':'#FFF95D',
'Green':'#00FF00',
'Independent':'#800080',
'Sinn Fein':'#228B22',
'Democratic Unionist':'#808080',
'Plaid Cymru':'#FF5733'
}
#dictionary to assign the colours to each winning party
df['winner_fill']=df['2017_winner'].apply(lambda s: party_colours.get(s,"#aaaaaa"))
#new column that applies the colour for the winning party
election_results = uk_map.merge(df, on='Constituency')
#merge the shape and df together
election_results.plot('winner_fill', figsize=(12,12))
[Expected output]
[My Output]

I think the problem is apply in
df['winner_fill']=df['2017_winner'].apply(lambda s: party_colours.get(s,"#aaaaaa"))
Try map instead
df['winner_fill'] = df['2017_winner'].map(lambda s: party_colours.get(s,"#aaaaaa"))

Related

Plotting top 10 Values in Big Data

I need help plotting some categorical and numerical Values in python. the code is given below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv('train_feature_store.csv')
df.info
df.head
df.columns
plt.figure(figsize=(20,6))
sns.countplot(x='Store', data=df)
plt.show()
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)
However, the data size is so huge (Big data) that I'm not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given below:-
In an attempt to plot the thing, I'm trying to put the below code into a dataframe and plot it, but not able to do so. Can anyone help me out in this:-
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)
Below, is a link to the sample dataset. However, the dataset is a representation, in the original one where I'm trying to do the EDA, which has around 3 thousand unique stores and 60 thousand rows of data. PLEASE HELP! Thanks!
https://drive.google.com/drive/folders/1PdXaKXKiQXX0wrHYT3ZABjfT3QLIYzQ0?usp=sharing
You were pretty close.
import pandas as pd
import seaborn as sns
df = pd.read_csv('train_feature_store.csv')
sns.set(rc={'figure.figsize':(16,9)})
g = df.groupby('Store', as_index=False)['Size'].sum().sort_values(by='Size', ascending=False).head(10)
sns.barplot(data=g, x='Store', y='Size', hue='Store', dodge=False).set(xticklabels=[]);
First of all.. looking at the data ..looks like it holds data from scotland to Kolkata ..
categorize the data by geography first & then visualize.
Regards
Maitryee

I can't get satellite imagery as a background in geopandas

I have multiple shapefiles that I am trying to map in geopandas, but I can't get aerial/satellite imagery as a background image. These are very zoomed in shapefiles, and probably cover less than half a mile square. These are in Iowa in the United States.
Here is my code.
import geopandas as gpd
import fiona, os
import matplotlib.pyplot as plt
from geopandas import GeoDataFrame
from shapely.geometry import Polygon
import pandas as pd
import contextily as ctx
boundary = gpd.read_file(boundaryFile)
sample_data = gpd.read_file(sampleFile)
yieldData = gpd.read_file(yieldFile)
filesList = [boundary, sample_data, yieldData]
for i in filesList:
i.set_crs(epsg=3857, inplace=True)
fig = plt.figure()
ax = yieldData.plot()
ctx.add_basemap(ax, source=ctx.providers.Esri.WorldImagery)
I am getting a ValueError: The inferred zoom level of 34 is not valid for the current tile provider. This can indicate that the extent of your figure is wrong (e.g. too small extent, or in the wrong coordinate reference system)
Thanks for your help

Box and whisker plot on multiple columns

I am trying to make a Box and Whisker plot on my dataset that looks something like this -
& the chart I'm trying to make
My current lines of code are below -
import seaborn as sns
import matplotlib.pyplot as plt
d = df3.boxplot(column = ['Northern California','New York','Kansas','Texas'], by = 'Banner')
d
Thank you
I've recreated a dummy version of your dataset:
import numpy as np
import pandas as pd
dictionary = {'Banner':['Type1']*10+['Type2']*10,
'Northen_californina':np.random.rand(20),
'Texas':np.random.rand(20)}
df = pd.DataFrame(dictionary)
What you need is to melt your dataframe (unpivot) in orther to have the information of geographical zone stored in a column and not as column name. You can use pandas.melt method and specify all the columns you want to put in your boxplot in the value_vars argument.
With my dummy dataset you can do this:
df = pd.melt(df,id_vars=['Banner'],value_vars=['Northen_californina','Texas'],
var_name='zone', value_name='amount')
Now you can apply a boxplot using the hue argument:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(9,9)) #for a bigger image
sns.boxplot(x="Banner", y="amount", hue="zone", data=df, palette="Set1")

Simple Graph Does Not Represent Data

This is a very straightforward question. I have and x axis of years and a y axis of numbers increasing linearly by 100. When plotting this with pandas and matplotlib I am given a graph that does not represent the data whatsoever. I need some help to figure this out because it is such a small amount of code:
The CSV is as follows:
A,B
2012,100
2013,200
2014,300
2015,400
2016,500
2017,600
2018,700
2012,800
2013,900
2014,1000
2015,1100
2016,1200
2017,1300
2018,1400
The Code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("CSV/DSNY.csv")
data.set_index("A", inplace=True)
data.plot()
plt.show()
The graph this yields is:
It is clearly very inconsistent with the data - any suggestions?
The default behaviour of matplotlib/pandas is to draw a line between successive data points, and not to mark each data point with a symbol.
Fix: change data.plot() to data.plot(style='o'), or df.plot(marker='o', linewidth=0).
Result:
All you need is sort A before plotting.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("CSV/DSNY.csv").reset_index()
data = data.sort_values('A')
data.set_index("A", inplace=True)
data.plot()
plt.show()

Highlight specific points in matplotlib scatterplot

I have a CSV with 12 columns of data. I'm focusing on these 4 columns
Right now I've plotted "Pass def" and "Rush def". I want to be able to highlight specific points on the scatter plot. For example, I want to highlight 1995 DAL point on the plot and change that point to a color of yellow.
I've started with a for loop but I'm not sure where to go. Any help would be great.
Here is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import csv
import random
df = pd.read_csv('teamdef.csv')
x = df["Pass Def."]
y = df["Rush Def."]
z = df["Season"]
points = []
for point in df["Season"]:
if point == 2015.0:
print(point)
plt.figure(figsize=(19,10))
plt.scatter(x,y,facecolors='black',alpha=.55, s=100)
plt.xlim(-.6,.55)
plt.ylim(-.4,.25)
plt.xlabel("Pass DVOA")
plt.ylabel("Rush DVOA")
plt.title("Pass v. Rush DVOA")
plot.show
You can layer multiple scatters, so the easiest way is probably
plt.scatter(x,y,facecolors='black',alpha=.55, s=100)
plt.scatter(x, 2015.0, color="yellow")

Categories