How to create a bar chart like below chart for below question using Python?
Chart
Question: plot job role based on number of individuals who have have master degree AND earn <=50K in the United States
My table is below which I have imported from CSV file.
Degree job rol Country earning
Bachelor Admin US. <=50k
Master HR. England >=50k
I tried many ways. But could not do it
You can import the matplotlib.pyplot as plt and import numpy as np to make a bar graph. Create an array called y = np.array([...]) for example, and create a function called plt.bar(). From there, you can fill in the height and your y variable to create a graph as you desire.
Finally, just type plt.show() to display the graph.
Related
I have plotted a figure using a shapefile through a function using geopandas, it represents Germany with each federal state represented as a cluster. Now I would like to color the region with highest electricity production and the one with the lowest, both with different colors.
I am new to geopandas and couldn't find an answer, I guess I have to create a subplot but I'm not sure? How can I select and color each cluster? (I already determined the name and electricity production of the max and min cluster).
Thanks a lot in advance.
The type of plot you are trying to create is called Choropleth Maps. Create a data frame with the name of the federal state, electricity production of the max and min cluster, and the geometry of the state extracted from the shapefile. Doing this will create only 2 states with different colors, but if you want the map of Germany also, then you need to include the geometry of all the states and set their electricity to NaN.
Then you can call the plot() on the data frame with column='electricity_production'
df.head()
>> name, electricity_production, geometry
'cluster1', max/min, MULTIPOLYGON(((...
'cluster2', max/min. POLYGON((..
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
df.plot(column='electricity_production', ax=ax, legend=True)
Python newbie here. I'm looking at some daily weather data for a couple of cities over the course of a year. Each city has its own csv file. I'm interested in comparing the count of daily average temperatures between two cities in a bar graph, so I can see (for example) how often the average temperature in Seattle was 75 degrees (or 30 or 100) compared to Phoenix.
I'd like a bar graph with side-by-side bars with temperature on the x-axis and count on the y-axis. I've been able to get a bar graph of each city separately with this data, but don't know how to get both cities on the same bar chart with with a different color for each city. Seems like it should be pretty simple, but my hours of search haven't gotten me a good answer yet.
Suggestions please, oh wise stackoverflow mentors?
Here's what I've got so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("KSEA.csv")
df2 = pd.read_csv("KPHX.csv")
df["actual_mean_temp"].value_counts(sort=False).plot(kind ="bar")
df2["actual_mean_temp"].value_counts(sort = False).plot(kind = 'bar')
You can concat DataFrames, assigning city as a column, and then use histplot in seaborn:
import seaborn as sns
z = pd.concat([
df[['actual_mean_temp']].assign(city='KSEA'),
df2[['actual_mean_temp']].assign(city='KPHX'),
])
ax = sns.histplot(data=z, x='actual_mean_temp', hue='city',
multiple='dodge', binwidth=1)
Output:
I want to create a plot that shows the geographical distribution of nightly prices using longitude and lattitude as coordinates, and the price encoded both by color and size of the circles. I curently have no idea on how to encode the plots by the price in both colour and size. I come to you in search of help~ I dont understand the documentation for seaborn in this scenario.
3 columns of interest:
longtitude lattitude Price
50.1235156 4.1236436 160
52.3697862 4.8935462 300
52.3640489 4.8895343 8000
52.3729765 4.8931707 1300
52.3657530 4.8796741 5000
52.2957663 4.3058365 60
52.6709324 4.6028347 100
In my scenario: each column is of equal length, but I only want to include prices that are >150
Im stuck with this filter in play, as the column with the applied filter is half the size as longitude and latitude.
My clueless attempt:
plt.scatter(df.longitude, df.latitude, s=(df.price)>150, c= (df.price)>150)
The way I understand it is that the latitude and longitude create the space/plane, and then apply the price data. But implementing it seems to work differently?
First of all, you need to filter the dataframe before plotting. If you do what you're doing (which won't work anyway), your Series of x and y coordinates will be the entire length of the dataframe but series responsible for color-coding and size will be shorter because you're trying to filter out values under 150 with this: s=(df.price)>150.
Secondly, you can't plot like this using matplotlib. With matplotlib to color-code points you need to create a dictionary, so I'd suggest using seaborn for simplicity.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df_plot = df.loc[df.price > 150]
fig = sns.scatterplot(data=df_plot, x='longitude', y='latitude', size='price', hue='price')
plt.show()
I have a pandas dataframe that I want to create a bar plot from using Seaborn. The problem is I want to use one of two categorical variables, say column A, in X-axis, but a different categorical column, say column B, to color the bars. Values in B can represent more than one value in A.
MajorCategories name review_count
Food,Restaurants Mon Ami Gabi 8348
Food,Restaurants Bacchanal Buffet 8339
Restaurants Wicked Spoon 6708
Food,Restaurants Hash House A Go Go 5763
Restaurants Gordon Ramsay BurGR 5484
Restaurants Secret Pizza 4286
Restaurants The Buffet at Bellagio 4227
Hotels & Travel McCarran International Airport 3627
Restaurants Yardbird Southern Table & Bar 3576
So, I would like my barplot to plot the bars with x = 'name' and y='review_count', at the same time color/hue?? = Major Categories. It is possible in Seaborn without many lines of code?
Below are the links to the images I get in seaborn, and the one I am trying to get.
sns.catplot(x="review_count", y="name", kind="bar", data=plot_data, aspect= 1.5)
Plot I get using seaborn using the code above
Plot I am trying to achieve, this one is using ggplot2 in R
Try passing hue and set dodge=False:
sns.catplot(x="review_count", y="name", hue='MajorCategories',
kind="bar", data=plot_data,
dodge=False, aspect= 1.5)
Output:
I have a dataframe which has a column named genres. Each genres has multiple values as movie name. The format is given below:
Movie_val Genre
2 Fantasy
11 Adventure
12 Comedy
2 Fantasy
2 Adventure
11 Adventure
13 Thriller
12 Fantasy
10 Thriller
11 Drama
1 Fantasy
I need to group_by each of the genres based on movie_val and plot each group in a scatter plot like a cluster (Eg: Action genre movies in one cluster or color, Adventure in another, etc.,). I checked the matplot lib library and it expects two values X and Y for a cluster graph. My group_by command will have lot of movie values (eg,. Adventure genres have many values and I am not sure how to plot the values as a group).
Also each of these group_by values should be represented in different color.
I tried the below code for bar plot. But I am looking for scatter one, as below format doesnt allow for scatter.
result = df.groupby(['genres'])['Movie_val'].quantile(0.5)
result.sort_values().plot(kind='barh')
I am trying this in python using pandas library. Any help would be greatly appreciated.
The seaborn library can probably give you what you're after. Of course you still need to pick which columns of your data frame will provide the coordinates for the scatter plot.
import seaborn as sns
g = sns.FacetGrid(df, hue="Genre", size=5)
g.map(plt.scatter, "column name for x dimension", "column name for y dimension", s=50, alpha=.7)
g.add_legend();
See also the examples with more complex faceting here:
https://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html