How to plot specific rows of qualitative data using matplotlib on python? - python

I have a large spreadsheet of data that for privacy reasons I cannot show, but there is a column called 'origin' where there are hundreds of rows for particular company names. For example: 500 rows of information has been input for 500 people working at "Sony". I want to be able to make graphs for the information gathered for each institution, but I am having trouble only plotting for specific rows. The goal is to make a dashboard for each institution.
A way of putting this would be:
fig = px.scatter(df, x='gender'['female], y='race',
color='origin'['Sony'])
fig.update_traces(mode='markers+lines')
fig.show()
I want to focus on particular categories when plotting.
Any help is appreciated!

Related

Tibco SPOTFIRE - How to create lists of unique values from data table for creating a new data table

i'm pretty new about using spotfire and I've to realize some bar and line charts like those graphs I realized with python and matplotlib :
bar chart from python
line chart from python
In order to realize those graphs, I created a set of unique values for the x axis which contains differents sprints of stories (refer to jira and agile method for more informations) and i created 3 lists (begin, planned and ends) for gathering all of the business values for each sprint occurence. Then I created a pandas dataframe gathering my 4 new lists and I used the columns with matplolib to realize the graphs (the second graph shows the cumulative sum of begin and end business values per sprints).
My question is : is it possible to create a list of unique values for the x axis in spotfire and how to create a data table from another data table, just like I did for the python graphs ? All I can get for the moment by using spotfire is this :
bar and line charts from spotfire
I already tried to merge each graphs of the same category together in order to get the same result however the two x axis (begin and ends) do not have the same number of values and i get some errors if i try. If anyone had a solution that can solve that problem, it would be great.
PS : I can't give any data files cause i'm working for a society and some of those data could be confidential and sorry for potential clerical errors, cause i'm french.

GeoPandas: Plot two Geo DataFrames over each other on a map

I am new to using Geopandas and plotting maps from Geo Dataframe. I have two Geo DataFrames which belong to the same city. But they are sourced from different sources. One contains the Geometry data for houses and another for Census tracts. I want to plot the houses' boundary on top of the tract boundry.
Below is the first row from each data set. I am also not sure why the Geometry Polygon values are on such a different scale in each of these datasets.
Houses Data Set
House Data
Tract Data Set
Tract Data
I tried the following code in the Jupyer Notebook but nothing is showing up.
f, ax = plt.subplots()
tract_data.plot(ax=ax)
house_data.plot(ax=ax)
But an empty plot shows up.
This is my first post. Please let me know what else I can provide.
You probably need to set the correct coordinate reference system (crs). More info here
An easy fix might be
f, ax = plt.subplots()
tract_data.to_crs(house_data.crs).plot(ax=ax)
house_data.plot(ax=ax)

Large DF to plot

I have a large df to plot (couple of millions of rows, 8 columns, obtained by concatenation of several files).
I want to plot several graphs using facet, in order to have complete view on data:
'''
rp = sns.relplot(data=df,
x='zscore',
y='%',
col='Nr',
row ="Support",
style="Metal",
kind='line')
'''
I tried both in Seaborn and Plotly Express but time to build this graphs is just too important, more than one hour on my laptop.
What can I improve, optimize, i order to speed graph creation?
Thank you!
PS. I do am a newbie in Python and programming ;)

Group Boxplots with multiple dataframes

I have tried to create a scatter with grouped boxplots as the ones on the following links:
matplotlib: Group boxplots
https://cmdlinetips.com/2019/03/how-to-make-grouped-boxplots-in-python-with-seaborn/
how to make a grouped boxplot graph in matplotlib
However, the data I want to use comes in a format as:
5y_spreads
7y_spreads
10y_spreads
(each of the images above comes from a different worksheet in the same workbook)
I need to work the data in Python to make it ready for seaborn and that is what is difficult for me.
It is not structured as in the examples from the links. I understand this requires mastering dataframes (something I am still learning).
I also need to show the latest value to see where the bonds are trading now, compared to the range.

Matplotlib legend plotting name data title multiple times

I am plotting some data from a CSV. I recently edited the date range of the CSV file but the values are the same. Before, the data was being plotted simulatiously with 2 other data sets, and the legend had only one entry for this data set. After editing the CSV file, the legend now displays the label 3 times but overall graphs the data correctly. I have tried removing the other two data sets from the plot, using numpoints=1, and ensuring nothing is in a for loop (which none of this code uses one). Additionally, I made sure there wasn't 3 versions of the data saved in the same directory. Any suggestions on why this is happening and how to fix it? I'm including my plotting code in case something is in it that is wrong.
plt.plot(date_range,ice_extent1,color='red',label='MASIE')
plt.xlabel("Date (yyyy/mm)")
plt.ylabel("Sea Ice Extent (10^6 km^2)")
plt.title("Sea Ice Extent")`
plt.legend()

Categories