I'm attempting to map data, but the map output does not have clear boundaries and the data displayed is not continuous as it should be.
The first map below uses similar data with the exact code as the second map, so I don't know what is going wrong. I was wondering if there was a way to format the code so the plot is similar in style to the first one.
import matplotlib.pyplot as plt
f, ax = plt.subplots(1, figsize=(12,6))
ax = states_00_14.plot(column='num_fires', cmap='OrRd',
legend=True, ax=ax)
lims = plt.axis('equal')
f.suptitle('US Wildfire count per state in 2000-2014')
ax.set_axis_off()
I'm very new to python and matplotlib so I basically have no clue what I'm doing wrong. I'm working in a Jupyter Notebook if that is relevant. Thanks in advance!
you didn't define your data sources. Have used: https://www.naturalearthdata.com/downloads/110m-cultural-vectors/ for state boundaries. Have used https://data-nifc.opendata.arcgis.com/datasets/wfigs-wildland-fire-locations-full-history/explore?showTable=true for source of wild fires
this means now have to geo data frames, which are simple to complete a spatial join. This then allows a state to be associated with a point of a fire.
# spatial join of fire locations to states
state_fire = gdf_fire.loc[fire_mask, fire_cols].sjoin(
gdf2.loc[boundary_mask, boundary_cols]
)
once state has been associated, can aggregate data to get number of fires per year per state
have visualised these first with plotly as this allows me to animate frames for each year, plus simpler debugging with hover info
then visualised with matplotlib. recreated same format GeoDataFrame you used, aggregating years to 2000-2014, then joined on state polygon for plotting
added edgecolor="black" so that edges are clearly marked.
import geopandas as gpd
import pandas as pd
import plotly.express as px
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import requests
# get wild fire data..
# https://data-nifc.opendata.arcgis.com/datasets/wfigs-wildland-fire-locations-full-history/explore?showTable=true
gdf_fire = gpd.GeoDataFrame.from_features(
requests.get(
"https://opendata.arcgis.com/datasets/d8fdb82b3931403db3ef47744c825fbf_0.geojson"
).json()
)
# fmt: off
# download boundaries
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_1_states_provinces.zip"
f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])
# fmt: on
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(f)
zfile.extractall(f.stem)
# load downloaded boundaries
gdf2 = gpd.read_file(str(f.parent.joinpath(f.stem).joinpath(f"{f.stem}.shp")))
# a bit of cleanup, data types and CRS
gdf_fire["FireDiscoveryDateTime"] = pd.to_datetime(gdf_fire["FireDiscoveryDateTime"])
gdf_fire = gdf_fire.set_crs("EPSG:4326")
# filters, US states and fires with a date...
boundary_cols = ["adm1_code", "iso_3166_2", "iso_a2", "name", "geometry"]
boundary_mask = gdf2["iso_a2"].eq("US")
fire_cols = ["OBJECTID", "FireDiscoveryDateTime", "geometry"]
# fire_mask = gdf_fire["FireDiscoveryDateTime"].dt.year.between(2010,2012)
fire_mask = ~gdf_fire["FireDiscoveryDateTime"].isna()
# spatial join of fire locations to states
state_fire = gdf_fire.loc[fire_mask, fire_cols].sjoin(
gdf2.loc[boundary_mask, boundary_cols]
)
# summarize data by year and state
df_fires_by_year = (
state_fire.groupby(
boundary_cols[0:-1]
+ ["index_right", state_fire["FireDiscoveryDateTime"].dt.year],
as_index=False,
)
.size()
.sort_values(["FireDiscoveryDateTime", "index_right"])
)
# and finally visualize...
px.choropleth_mapbox(
df_fires_by_year,
geojson=gdf2.loc[boundary_mask, "geometry"].__geo_interface__,
locations="index_right",
color="size",
animation_frame="FireDiscoveryDateTime",
hover_name="name",
hover_data={"index_right": False},
color_continuous_scale="OrRd",
).update_layout(
mapbox={
"style": "carto-positron",
"zoom": 2,
"center": {"lat": 39.50, "lon": -98.35},
},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
matplotlib
import matplotlib.pyplot as plt
# recreate dataframe in question. Exclude Alaska and Hawaii as they mess up boundaries...
# further aggregate the defined years....
states_00_14 = gpd.GeoDataFrame(
df_fires_by_year.loc[df_fires_by_year["FireDiscoveryDateTime"].between(2000, 2014)]
.groupby("index_right", as_index=False)
.agg({"size": "sum"})
.merge(
gdf2.loc[boundary_mask & ~gdf2["iso_3166_2"].isin(["US-AK","US-HI"])],
left_on="index_right",
right_index=True,
how="inner",
)
)
f, ax = plt.subplots(1, figsize=(12, 6))
ax = states_00_14.plot(column="size", cmap="OrRd", legend=True, ax=ax, edgecolor="black")
lims = plt.axis("equal")
f.suptitle("US Wildfire count per state in 2000-2014")
ax.set_axis_off()
Related
I am plotting a map using plotly express and geojson file.I want to show static values on the individual district. Currently those values are visible on hover, but I want the values to be seen all the time even without hovering on it.
This is my code:
import json
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.io as pio
x = json.load(open("./odisha_disticts.geojson","r"))
user_data = []
for i in range(len(x['features'])):
d = x['features'][i]['properties']
d['Females'] = np.random.randint(0,100,1)[0]
user_data.append(d)
df = pd.DataFrame(user_data)
df.head()
ID_2 NAME_2 Females
0 16084 Angul 19
1 16085 Baleshwar 45
2 16086 Baragarh 52
3 16087 Bhadrak 81
4 16088 Bolangir 49
fig = px.choropleth(
df,
locations="ID_2",
featureidkey="properties.ID_2",
geojson=x,
color="Females"
)
fig.update_geos(fitbounds="locations", visible=False)
px.scatter_geo(
df,
geojson=x,
featureidkey="properties.NAME_2",
locations="District",
text = df["District"]
)
fig.show()
The link to required files is HERE
To annotate on a map, use a graph_object to go.Choroplethmapbox with go.Scattermapbox with textmode. As a preparation before creating the graph, we need the latitude and longitude for the annotation, so we use geopandas to read the geojson file and find the center of geometry. A warning is displayed at this point because the loaded geometry uses an inappropriate geodetic system to calculate the center. If you have a latitude and longitude you wish to use for your annotations use it. There are two caveats in creating the map: first, you will need the free Mapbox API token. Get it here. second, in go.Scattemapbox(), the mode is text + marker, but if you use text only, an error will occur. The reason is unknown.
import geopandas as gpd
import pandas as pd
import plotly.graph_objects as go
# read your data
data = pd.read_csv('./data.csv', index_col=0)
# read geojson
x = json.load(open("./odisha_disticts.geojson","r"))
gdf = gpd.read_file('./odisha_disticts.geojson')
gdf['centroid'] = gdf['geometry'].centroid
gdf['lon'] = gdf['centroid'].map(lambda p:p.x)
gdf['lat'] = gdf['centroid'].map(lambda p:p.y)
gdf.head()
ID_2 NAME_2 geometry centroid lon lat
0 16084 Angul POLYGON ((85.38891 21.17916, 85.31440 21.15510... POINT (84.90419 20.98316) 84.904186 20.983160
1 16085 Baleshwar POLYGON ((87.43902 21.76406, 87.47124 21.70760... POINT (86.90547 21.48738) 86.905470 21.487376
2 16086 Baragarh POLYGON ((83.79293 21.56323, 83.84026 21.52344... POINT (83.34884 21.22068) 83.348838 21.220683
3 16087 Bhadrak POLYGON ((86.82882 21.20137, 86.82379 21.13752... POINT (86.61598 20.97818) 86.615981 20.978183
4 16088 Bolangir POLYGON ((83.45259 21.05145, 83.44352 21.01535... POINT (83.16839 20.58812) 83.168393 20.588121
import plotly.express as px
import plotly.graph_objects as go
mapbox_token = open("mapbox_api_key.txt").read()
fig = go.Figure()
fig.add_trace(go.Scattermapbox(lat=gdf['lat'],
lon=gdf['lon'],
mode='text+markers',
textposition='top center',
text = [str(x) for x in data["District"]],
textfont=dict(color='blue')
))
fig.add_trace(go.Choroplethmapbox(geojson=x,
locations=data['id'],
z=data['Females'],
featureidkey="properties.ID_2",
colorscale='Reds',
zmin=0,
zmax=data['Females'].max(),
marker_opacity=0.8,
marker_line_width=0
)
)
fig.update_layout(height=600,
mapbox=dict(
center={"lat": gdf['lat'].mean(), "lon": gdf['lon'].mean()},
accesstoken=mapbox_token,
zoom=5.5,
style="light"
))
fig.show()
Im trying to make a kdeplot using geopandas.
this is my code:
Downloading shape file
URL = "https://data.sfgov.org/api/geospatial/wkhw-cjsf?method=export&format=Shapefile"
response = requests.get(URL)
open('pd_data.zip', 'wb').write(response.content)
with zipfile.ZipFile('./pd_data.zip', 'r') as zip_ref:
zip_ref.extractall('./ShapeFiles')
Making the geopandas data frame
data = train.groupby(['PdDistrict']).count().iloc[:,0]
data = pd.DataFrame({ "district": data.index,
"incidences": data.values})
california_map = str(list(pathlib.Path('./ShapeFiles').glob('*.shp'))[0])
gdf = gdp.read_file(california_map)
gdf = pd.merge(gdf, data, on = 'district')
Note: I didn't include the link to the train set because it's not important for this question(use any data you want)
This is the part that I don't get,
what arguments should I pass to the kdeplot function, like where I pass the shape file and where I pass the data?
ax = gplt.kdeplot(
data, clip=gdf.geometry,
shade=True, cmap='Reds',
projection=gplt.crs.AlbersEqualArea())
gplt.polyplot(boroughs, ax=ax, zorder=1)
had a few challenges setting up an environment where I did not get kernel crashes. Used none wheel versions of shapely and pygeos
a few things covered in documentation kdeplot A basic kdeplot takes pointwise data as input. You did not provide sample for data I'm not sure that it is point wise data. Have simulated point wise data, 100 points within each of the districts in referenced geometry
I have found I cannot use clip and projection parameters together. One or the other not both
shape file is passed to clip
import geopandas as gpd
import pandas as pd
import numpy as np
import geoplot as gplt
import geoplot.crs as gcrs
# setup starting point to match question
url = "https://data.sfgov.org/api/geospatial/wkhw-cjsf?method=export&format=Shapefile"
gdf = gpd.read_file(url)
# generate 100 points in each of the districts
r = np.random.RandomState(42)
N = 5000
data = pd.concat(
[
gpd.GeoSeries(
gpd.points_from_xy(*[r.uniform(*g.bounds[s::2], N) for s in (0, 1)]),
crs=gdf.crs,
).loc[lambda s: s.intersects(g.buffer(-0.003))]
for _, g in gdf["geometry"].iteritems()
]
)
data = (
gpd.GeoDataFrame(geometry=data)
.sjoin(gdf)
.groupby("district")
.sample(100, random_state=42)
.reset_index(drop=True)
)
ax = gplt.kdeplot(
data,
clip=gdf,
fill=True,
cmap="Reds",
# projection=gplt.crs.AlbersEqualArea(),
)
gplt.polyplot(gdf, ax=ax, zorder=1)
I'm having an odd issue with Plotly, the image below will give some context:
This is the map made with Bokeh
This is the map made with Plotly
The same transformation steps are applied to both versions, however for some reason Plotly will exclude some of the shapes.
These are the transformation steps I am using:
import pandas as pd
import plotly.io as pio
import plotly.graph_objs as go
import json
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely import wkt
from bokeh.plotting import save, figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.io import show, output_file
from bokeh.palettes import brewer
df_test = pd.read_csv(f'{filepath}')
df_blocks = pd.read_csv(f'{filepath}')
group_2 = df_test[['geo_name', 'edited_characteristics', 'total', 'male', 'female']]
group_2 = group_2.pivot(index='geo_name', columns='edited_characteristics', values=['total', 'male', 'female'])
cat = 'Total - Low-income status in 2015 for the population in private households to whom low-income concepts are applicable - 100% data'
group_2['LIM 0-17 percent'] = (
group_2[( 'total', f'{cat}//0 to 17 years')] /
group_2[( 'total', cat)]
)
group_2.reset_index(inplace=True)
g2 = group_2[['geo_name', 'LIM 0-17 percent']]
g2.rename(columns={'geo_name': 'DAUID'}, inplace=True)
df_g2 = pd.merge(g2, df_blocks, on='DAUID')
df_g2['geometry'] = df_g2['geometry'].apply(wkt.loads)
geo_df_g2 = gpd.GeoDataFrame(df_g2, geometry='geometry')
geo_df_g2.crs = {'init': 'epsg:3347'}
geo_df_g2 = geo_df_g2.to_crs({'init': 'epsg:4326'})
geo_df_g2 = geo_df_g2[geo_df_g2[('LIM 0-17 percent', '')] < 1]
mean = geo_df_g2[('LIM 0-17 percent', '')].mean()
std = geo_df_g2[('LIM 0-17 percent', '')].std()
geo_df_g2 = geo_df_g2[(geo_df_g2[('LIM 0-17 percent', '')] < (mean - 1
* std)) | (geo_df_g2[('LIM 0-17 percent', '')] > (mean + 1 * std))]
geo_df_g2.columns = [x[0] if type(x) is tuple else x for x in
geo_df_g2.columns]
geo_df_g2 = geo_df_g2.loc[:, ~geo_df_g2.columns.duplicated()]
geo_df_g2_j = geo_df_g2.copy()
geo_df_g2_j['DAUID'] = geo_df_g2_j['DAUID'].astype(str)
geo_df_g2_j.set_index('DAUID', inplace=True)
geo_df_g2_json = json.loads(geo_df_g2_j.to_json())
USING PLOTLY
geo_df_g2 = geo_df_g2[['DAUID', 'LIM 0-17 percent']]
geo_df_g2['DAUID'] = geo_df_g2['DAUID'].astype(str)
fig = go.Figure(go.Choroplethmapbox(geojson=geo_df_g2_json,
locations=geo_df_g2['DAUID'],
z=geo_df_g2['LIM 0-17 percent'],
colorscale='Viridis',
zauto=True,
marker_opacity=0.5,
marker_line_width=0.5)
)
fig.update_layout(mapbox_style='white-bg',
#mapbox_accesstoken=mapbox_token,
mapbox_zoom=12,
mapbox_center={'lat': 45.41117, 'lon': -75.69812})
fig.update_layout(margin={'r':0, 't':0, 'l':0, 'b':0})
pio.renderers.default = 'browser'
fig.show()
USING BOKEH
json_data = json.dumps(geo_df_g2_json)
geosource = GeoJSONDataSource(geojson=json_data)
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%',
'20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width
= 500, height = 20,
border_line_color=None,location = (0,0), orientation =
'horizontal', major_label_overrides = tick_labels)
p = figure(title='LIM', plot_height=600, plot_width=950,
toolbar_location=None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs', 'ys', source=geosource, fill_color={'field': 'LIM 0-17 percent', 'transform': color_mapper}, line_color='black', line_width=0.25, fill_alpha=1)
output_file('test_bokeh.html')
show(p)
As you could see, they both use the same projections, same dataframe transformation, and the same categories. Is there a way to fix this?
TIA
EDIT: The shapes are in the correct position, there are just a lot of them missing from the plot.
UPDATE: In hopes of seeing if other Plotly modules could solve the problem, I kind of narrowed down the issue. Using the tutorial on Plotly for creating a Scattermapbox, the way they called the mapbox features worked better at revealing the inherit problems than the tutorial did on the Choroplethmapbox. Apparently what is happening is that Plotly (or Mapbox) is not recognizing several groups of nearby points as coordinates for a polygon, and hence excluding them until you specify that you want them present. This is done by setting the mapbox dictionary values for 'type' to either 'fill', 'line', or 'circle'. This of course leads to another issue, whereby those new shapes are not colored or labelled the same way as the original polygons since they were not there by default.
Here is the code sample that helps show the problem with the polygon points not forming a complete shape:
fig = go.Figure(go.Choroplethmapbox(geojson=geo_df_g2_json,
locations=geo_df_g2['DAUID'],
z=geo_df_g2['LIM 0-17 percent'],
below='traces',
colorscale='Viridis',
zauto=True,
marker_opacity=0.5,
marker_line_width=0.5)
)
fig.update_layout(
mapbox = {
'style': 'carto-positron',
'center': {'lat': 45.41117, 'lon': -75.69812},
'zoom': 12, 'layers': [{
'source': {
'type': "FeatureCollection",
'features': geo_df_g2_json['features']
},
'type': 'fill', 'below': 'traces', 'color': 'lightblue'}]},
margin = {'l':0, 'r':0, 'b':0, 't':0})
fig.show()
To clarify my intent, there are two questions I'm trying to answer:
Why does Plotly transform some polygon coordinates to a shape and others to just the individual points?
Is there a workaround to fill the shapes after using the above function, based on the 'z' value?
I found out what was causing the polygons to disappear. Since Plotly uses geojson files vs. interacting with geopandas dataframes (I believe that's the reason), it has more stringent requirements on data formatting. Other libraries like Bokeh, contextily, or geopandas aggregate multiple rows of polygons that share a common parent before plotting them, whereas Plotly looks at them individually. In my case, since each 'id' had mutliple sub-ids, each with their own polygon coordinates, Plotly would just pick one when plotting them. It would store the rest as points, and it would only display them if I used the 'fill' option. Here is a rough example of what my dataframe looked like:
DAUID DBUID Total geometry
001 00101 5 Polygon(x1, y1)
001 00102 5 Polygon(x2, y2)
001 00103 5 Polygon(x3, y3)
So while the primary id and the total values stayed constant, the geometries did not. I found this out by accident when trying to write a color mapper and noticed I had duplicate entries for the DAUID. At the end, it was my fault for not using the correct database.
It looks like Plotly will be introducing geopandas support soon, so I would be curious to see if it resolves edge cases like this.
I had a similar issue. That is a slice of my geopandas dataframe looked like -
province_id geometry
0 1 POLYGON (x1, y1)
1 1 POLYGON (x2, y2)
2 1 POLYGON (x3, y3)
I used province_id_data.dissolve(by='province_id', aggfunc='first') to combine them into a multipolygon and then plot using plotly.
I have a dataframe that I am trying to visualize into a heatmap, I used matplotlib to make a heatmap but it is showing data that is not apart of my dataframe.
I've tried to create a heatmap using matplotlib from an example I found online and changed the code to work for my data. But on the left side of the graph and top of it there are random values that are not apart of my data and I'm not sure how to remove them.
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from io import StringIO
url = 'http://mcubed.net/ncaab/seeds.shtml'
#Getting the website text
data = requests.get(url).text
#Parsing the website
soup = BeautifulSoup(data, "html5lib")
#Create an empty list
dflist = []
#If we look at the html, we don't want the tag b, but whats next to it
#StringIO(b.next.next), takes the correct text and makes it readable to
pandas
for b in soup.findAll({"b"})[2:-1]:
dflist.append(pd.read_csv(StringIO(b.next.next), sep = r'\s+', header
= None))
dflist[0]
#Created a new list, due to the melt we are going to do not been able to
replace
#the dataframes in DFList
meltedDF = []
#The second item in the loop is the team number starting from 1
for df, teamnumber in zip(dflist, (np.arange(len(dflist))+1)):
#Creating the team name
name = "Team " + str(teamnumber)
#Making the team name a column, with the values in df[0] and df[1] in
our dataframes
df[name] = df[0] + df[1]
#Melting the dataframe to make the team name its own column
meltedDF.append(df.melt(id_vars = [0, 1, 2, 3]))
# Concat all the melted DataFrames
allTeamStats = pd.concat(meltedDF)
# Final cleaning of our new single DataFrame
allTeamStats = allTeamStats.rename(columns = {0:name, 2:'Record', 3:'Win
Percent', 'variable':'Team' , 'value': 'VS'})\
.reindex(['Team', 'VS', 'Record', 'Win
Percent'], axis = 1)
allTeamStats
#Graph visualization Making a HeatMap
%matplotlib inline
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
y=["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16"]
x=["16","15","14","13","12","11","10","9","8","7","6","5","4","3","2","1"]
winp = []
for i in x:
lst = []
for j in y:
percent = allTeamStats.loc[(allTeamStats["Team"]== 'Team '+i) &\
(allTeamStats["VS"]== "vs.#"+j)]['Win
Percent'].iloc[0]
percent = float(percent[:-1])
lst.append(percent)
winp.append(lst)
winpercentage= np.array([[]])
fig,ax=plt.subplots(figsize=(18,18))
im= ax.imshow(winp, cmap='hot')
# We want to show all ticks...
ax.set_xticks(np.arange(len(y)))
ax.set_yticks(np.arange(len(x)))
# ... and label them with the respective list entries
ax.set_xticklabels(y)
ax.set_yticklabels(x)
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")
# Loop over data dimensions and create text annotations.
for i in range(len(x)):
for j in range(len(y)):
text = ax.text(j, i, winp[i][j],
ha="center", va="center", color="red")
ax.set_title("Win Percentage of Each Matchup", fontsize= 40)
heatmap = plt.pcolor(winp)
plt.colorbar(heatmap)
ax.set_ylabel('Seeds', fontsize=40)
ax.set_xlabel('Seeds', fontsize=40)
plt.show()
The results I get are what I want except for the two lines that are on the left side and top of the heatmap. I'm unsure what these values are coming from and to easier see them I used cmap= 'hot' to show the values that are not supposed to be there. If you could help me fix my code to plot it correctly or plot an entire new heatmap using seaborn (my TA told me to try using seaborn but I've never used it yet) with my data. Anything helps Thanks!
I think the culprit is this line: im= ax.imshow(winp, cmap='hot') in your code. Delete it and try again. Basically, anything that you plotted after that line was laid over what that line created. The left and top "margins" were the only parts of the image on the bottom that you could see.
I am translating a set of R visualizations to Python. I have the following target R multiple plot histograms:
Using Matplotlib and Seaborn combination and with the help of a kind StackOverflow member (see the link: Python Seaborn Distplot Y value corresponding to a given X value), I was able to create the following Python plot:
I am satisfied with its appearance, except, I don't know how to put the Header information in the plots. Here is my Python code that creates the Python Charts
""" Program to draw the sampling histogram distributions """
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import seaborn as sns
def main():
""" Main routine for the sampling histogram program """
sns.set_style('whitegrid')
markers_list = ["s", "o", "*", "^", "+"]
# create the data dataframe as df_orig
df_orig = pd.read_csv('lab_samples.csv')
df_orig = df_orig.loc[df_orig.hra != -9999]
hra_list_unique = df_orig.hra.unique().tolist()
# create and subset df_hra_colors to match the actual hra colors in df_orig
df_hra_colors = pd.read_csv('hra_lookup.csv')
df_hra_colors['hex'] = np.vectorize(rgb_to_hex)(df_hra_colors['red'], df_hra_colors['green'], df_hra_colors['blue'])
df_hra_colors.drop(labels=['red', 'green', 'blue'], axis=1, inplace=True)
df_hra_colors = df_hra_colors.loc[df_hra_colors['hra'].isin(hra_list_unique)]
# hard coding the current_component to pc1 here, we will extend it by looping
# through the list of components
current_component = 'pc1'
num_tests = 5
df_columns = df_orig.columns.tolist()
start_index = 5
for test in range(num_tests):
current_tests_list = df_columns[start_index:(start_index + num_tests)]
# now create the sns distplots for each HRA color and overlay the tests
i = 1
for _, row in df_hra_colors.iterrows():
plt.subplot(3, 3, i)
select_columns = ['hra', current_component] + current_tests_list
df_current_color = df_orig.loc[df_orig['hra'] == row['hra'], select_columns]
y_data = df_current_color.loc[df_current_color[current_component] != -9999, current_component]
axs = sns.distplot(y_data, color=row['hex'],
hist_kws={"ec":"k"},
kde_kws={"color": "k", "lw": 0.5})
data_x, data_y = axs.lines[0].get_data()
axs.text(0.0, 1.0, row['hra'], horizontalalignment="left", fontsize='x-small',
verticalalignment="top", transform=axs.transAxes)
for current_test_index, current_test in enumerate(current_tests_list):
# this_x defines the series of current_component(pc1,pc2,rhob) for this test
# indicated by 1, corresponding R program calls this test_vector
x_series = df_current_color.loc[df_current_color[current_test] == 1, current_component].tolist()
for this_x in x_series:
this_y = np.interp(this_x, data_x, data_y)
axs.plot([this_x], [this_y - current_test_index * 0.05],
markers_list[current_test_index], markersize = 3, color='black')
axs.xaxis.label.set_visible(False)
axs.xaxis.set_tick_params(labelsize=4)
axs.yaxis.set_tick_params(labelsize=4)
i = i + 1
start_index = start_index + num_tests
# plt.show()
pp = PdfPages('plots.pdf')
pp.savefig()
pp.close()
def rgb_to_hex(red, green, blue):
"""Return color as #rrggbb for the given color values."""
return '#%02x%02x%02x' % (red, green, blue)
if __name__ == "__main__":
main()
The Pandas code works fine and it is doing what it is supposed to. It is my lack of knowledge and experience of using 'PdfPages' in Matplotlib that is the bottleneck. How can I show the header information in Python/Matplotlib/Seaborn that I can show in the corresponding R visalization. By the Header information, I mean What The R visualization has at the top before the histograms, i.e., 'pc1', MRP, XRD,....
I can get their values easily from my program, e.g., current_component is 'pc1', etc. But I don't know how to format the plots with the Header. Can someone provide some guidance?
You may be looking for a figure title or super title, fig.suptitle:
fig.suptitle('this is the figure title', fontsize=12)
In your case you can easily get the figure with plt.gcf(), so try
plt.gcf().suptitle("pc1")
The rest of the information in the header would be called a legend.
For the following let's suppose all subplots have the same markers. It would then suffice to create a legend for one of the subplots.
To create legend labels, you can put the labelargument to the plot, i.e.
axs.plot( ... , label="MRP")
When later calling axs.legend() a legend will automatically be generated with the respective labels. Ways to position the legend are detailed e.g. in this answer.
Here, you may want to place the legend in terms of figure coordinates, i.e.
ax.legend(loc="lower center",bbox_to_anchor=(0.5,0.8),bbox_transform=plt.gcf().transFigure)