How to plot histogram from two columns - python

I have a csv file containing two columns. What I'd like to do is to plot a histogram based on these two columns.
My code is as follows:
data = pd.read_csv('data.csv')
My csv data is made like this:
Age Blood Pressure
51 120
.. ...
I tried with plt.hist(data['Age'], bins=10) which only gives me an histogram based on the first column and its frequency, the same goes for the second column. Is there a way to plot an histogram which shows me "Ages" in the x-Axis and "Blood Pressure" in the y-Axis?

Maybe you could use a Bar chart
This code will do the job probably:
plt.bar(data['Age'], data['Blood Pressure'], align='center')
plt.xlabel('Age')
plt.ylabel('Blood Pressure')
plt.title('Bar Chart')
plt.show()
More about Bar charts: https://pythonspot.com/matplotlib-bar-chart/

Related

Single Stacked Bar Chart Matplotlib

I am struggling to get a single stacked bar chart using matplotlib.
I want to create something like this:
Horizontal Stacked Bar Chart
However, even if I use df.plot.barh(stacked=True, ax=axes_var, legend=False) I get two separate bars. My data frame currently looks like this:
Percentage
Female 42.9
Male 57.1
Any advice would be appreciated.
First transpose one column DataFrame:
df.T.plot.barh(stacked=True, legend=False)
If 2 or more columns:
df[['Percentage']].T.plot.barh(stacked=True, legend=False)

How to plot a barchart showing frequency count of various buckets made out of a dataframe column?

I have the below attached dataframe and i need to Plot a bar chart showing runs scored on the x-axis and frequency/count on the y-axis.
I have tried this command but it's not displaying correct results -
bins = [0,10,20,30,40]
plt.hist(df.Runs, bins, histtype='bar')
plt.xlabel('x')
plt.ylabel('y')
I am getting the below graph : -
The expected plot which i want -
You need to do the data cleaning and you need drop 'DND' & 'TDNB' rows from runs
then you need import seaborn
Then you need to plot the grap
Answer:
bins=[0,10,20,30,40] sns.displot(df.Runs,bins) plt.show()

Plot histogram from two columns of csv using pandas

I have a csv file containing two columns. What I'd like to do is to plot a histogram based on these two columns.
My code is as follows:
data = pd.read_csv('data.csv')
My csv data is made like this:
Age Blood Pressure
51 120
.. ...
I tried with plt.hist(data['Age'], bins=10) which only gives me an histogram based on the first column and its frequency, the same goes for the second column.
Is there a way to plot an histogram which shows me "Ages" in the x-Axis and "Blood Pressure" in the y-Axis?
If it actually makes sense for you, you can change the orientation of the second plot:
plt.hist(data['Age'], bins=10, alpha=.5)
plt.hist(data['Blood Pressure'], bins=10, alpha=.5, orientation='horizontal')
plt.show()
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(data['Age'], data['Blood Pressure'])

I want to create a pie chart using a dataframe column in python

I want to create a Pie chart using single column of my dataframe, say my column name is 'Score'. I have stored scores in this column as below :
Score
.92
.81
.21
.46
.72
.11
.89
Now I want to create a pie chart with the range in percentage.
Say 0-0.4 is 30% , 0.4-0.7 is 35 % , 0.7+ is 35% .
I am using the below code using
df1['bins'] = pd.cut(df1['Score'],bins=[0,0.5,1], labels=["0-50%","50-100%"])
df1 = df.groupby(['Score', 'bins']).size().unstack(fill_value=0)
df1.plot.pie(subplots=True,figsize=(8, 3))
With the above code I am getting the Pie chart, but i don’t know how i can do this using percentage.
my pie chart look like this for now
Cutting the dataframe up into bins is the right first step. After which, you can use value_counts with normalize=True in order to get relative frequencies of values in the bins column. This will let you see percentage of data across ranges that are defined in the bins.
In terms of plotting the pie chart, I'm not sure if I understood correctly, but it seemed like you would like to display the correct legend values and the percentage values in each slice of the pie.
pandas.DataFrame.plot is a good place to see all parameters that can be passed into the plot method. You can specify what are your x and y columns to use, and by default, the dataframe index is used as the legend in the pie plot.
To show the percentage values per slice, you can use the autopct parameter as well. As mentioned in this answer, you can use all the normal matplotlib plt.pie() flags in the plot method as well.
Bringing everything together, this is the resultant code and the resultant chart:
df = pd.DataFrame({'Score': [0.92,0.81,0.21,0.46,0.72,0.11,0.89]})
df['bins'] = pd.cut(df['Score'], bins=[0,0.4,0.7,1], labels=['0-0.4','0.4-0.7','0.7-1'], right=True)
bin_percent = pd.DataFrame(df['bins'].value_counts(normalize=True) * 100)
plot = bin_percent.plot.pie(y='bins', figsize=(5, 5), autopct='%1.1f%%')
Plot of Pie Chart

In pandas, how to properly label a Bar Chart with massive number of records?

I have a pandas series with about 200 rows, containing a integer count in each.
I am trying to plot the series on a bar graph, using the following line of code:
plt.figure(figsize=(40, 40)
country_wise_counts.plot(kind='bar', y='Number of Users', x='Country Name', subplots=False, legend = False, fontsize=12)
and I get a plot as follows:
which clearly is not helpful.
So my first question is:
Is it a sane attempt, trying to plot my data this way, when I have 200 separate values that I want to plot on a bar graph?
If yes, how do I do what I want to do?

Categories