Splitting value_counts() into axis [duplicate]

Splitting value_counts() into axis [duplicate] - python

Consider my series as below: First column is article_id and the second column is frequency count.
article_id
1 39
2 49
3 187
4 159
5 158
...
16947 14
16948 7
16976 2
16977 1
16978 1
16980 1
Name: article_id, dtype: int64
I got this series from a dataframe with the following command:
logs.loc[logs['article_id'] <= 17029].groupby('article_id')['article_id'].count()
logs is the dataframe here and article_id is one of the columns in it.
How do I plot a bar chart(using Matlplotlib) such that the article_id is on the X-axis and the frequency count on the Y-axis ?
My natural instinct was to convert it into a list using .tolist() but that doesn't preserve the article_id.

IIUC you need Series.plot.bar:
#pandas 0.17.0 and above
s.plot.bar()
#pandas below 0.17.0
s.plot('bar')
Sample:
import pandas as pd
import matplotlib.pyplot as plt
s = pd.Series({16976: 2, 1: 39, 2: 49, 3: 187, 4: 159,
5: 158, 16947: 14, 16977: 1, 16948: 7, 16978: 1, 16980: 1},
name='article_id')
print (s)
1 39
2 49
3 187
4 159
5 158
16947 14
16948 7
16976 2
16977 1
16978 1
16980 1
Name: article_id, dtype: int64
s.plot.bar()
plt.show()

The new pandas API suggests the following way:
import pandas as pd
s = pd.Series({16976: 2, 1: 39, 2: 49, 3: 187, 4: 159,
5: 158, 16947: 14, 16977: 1, 16948: 7, 16978: 1, 16980: 1},
name='article_id')
s.plot(kind="bar", figsize=(20,10))
If you are working on Jupyter, you don't need the matplotlib library.

Just use 'bar' in kind parameter of plot
Example
series = read_csv('BwsCount.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
series.plot(kind='bar')
Default value of kind is 'line' (ie. series.plot() --> will automatically plot line graph)
For your reference:
kind : str
‘line’ : line plot (default)
‘bar’ : vertical bar plot
‘barh’ : horizontal bar plot
‘hist’ : histogram
‘box’ : boxplot
‘kde’ : Kernel Density Estimation plot
‘density’ : same as ‘kde’
‘area’ : area plot
‘pie’ : pie plot

Related

set x axis as column names on barplot

I have a dataframe such as this:
data = {'name': ['Bob', 'Chuck', 'Daren', 'Elisa'],
'100m': [19, 14, 12, 11],
'200m': [36, 25, 24, 24],
'400m': [67, 64, 58, 57],
'800m': [117, 120, 123, 121]}
df = pd.DataFrame(data)
name 100m 200m 400m 800m
1 Bob 19 36 67 117
2 Chuck 14 25 64 120
3 Daren 12 24 58 123
4 Elisa 11 24 57 121
My task is simple: Plot the times (along the y-axis), with the name of the event (100m, 200m, etc. along the x-axis). The hue of each bar should be determined by the 'name' column, and look something like this.
Furthermore, I would like to overlay the results (not stack). However, there is no functionality in seaborn nor matplotlib to do this.

Instead of using seaborn, which is an API for matplotlib, plot df directly with pandas.DataFrame.plot. matplotlib is the default plotting backend for pandas.
Tested in python 3.11, pandas 1.5.1, matplotlib 3.6.2, seaborn 0.12.1
ax = df.set_index('name').T.plot.bar(alpha=.7, rot=0, stacked=True)
seaborn.barplot does not have an option for stacked bars, however, this can be implemented with seaborn.histplot, as shown in Stacked Bar Chart with Centered Labels.
df must be converted from a wide format to a long format with df.melt
# melt the dataframe
dfm = df.melt(id_vars='name')
# plot
ax = sns.histplot(data=dfm, x='variable', weights='value', hue='name', discrete=True, multiple='stack')

Changing order of seaborn lineplot

I have a small pd.DataFrame that looks like this:
Col1
NumCol
0
10000000
1
7500000
2
12500000
3
37500000
4
110000000
5
65000000
NumCol is actually dollar values.
I want to create a seaborn lineplot, but instead of using the numerical values which create a funky looking axis, I'd like to show dollar values.
sns.lineplot(data=plot_df, x='Col1', y='NumCol') properly creates:
However, I'd like the axes to show $10,000,000, $7,500,000, etc.
I know I can create a string-representation of the column using
plot_df['NumCol_Str'] = plot_df.NumCol.apply(lambda x : "${:,}".format(x))
Which creates:
Col1
NumCol
NumCol_Str
0
10000000
$10,000,000
1
7500000
$7,500,000
2
12500000
$12,500,000
3
37500000
$37,500,000
4
110000000
$110,000,000
5
65000000
$65,000,000
However, when plotting, it changes the order of the columns
sns.lineplot(data=plot_df, x='Col1', y='NumCol_Str')
How can I properly plot the linegraph while keeping the new string notation on the axis?
MRE below:
plot_df = pd.DataFrame.from_dict({'Col1': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5},
'NumCol': {0: 10000000,
1: 7500000,
2: 12500000,
3: 37500000,
4: 110000000,
5: 65000000}})
plot_df['NumCol_Str'] = plot_df.NumCol.apply(lambda x : "${:,}".format(x))
sns.lineplot(data=plot_df, x='Col1', y='NumCol_Str')
sns.lineplot(data=plot_df, x='Col1', y='NumCol')

Just plot using the numeric values and then change the axis formatter with matplotlib tick formatter:
import matplotlib.ticker as mtick
ax.yaxis.set_major_formatter(mtick.StrMethodFormatter('${x:,.0f}'))
EDIT:
Or even simpler as pointed out by #BigBen:
ax.yaxis.set_major_formatter('${x:,.0f}')

Seaborn distplot only whole numbers

How can I make a distplot with seaborn to only have whole numbers?
My data is an array of numbers between 0 and ~18. I would like to plot the distribution of the numbers.
Impressions
0 210
1 1084
2 2559
3 4378
4 5500
5 5436
6 4525
7 3329
8 2078
9 1166
10 586
11 244
12 105
13 51
14 18
15 5
16 3
dtype: int64
Code I'm using:
sns.distplot(Impressions,
# bins=np.arange(Impressions.min(), Impressions.max() + 1),
# kde=False,
axlabel=False,
hist_kws={'edgecolor':'black', 'rwidth': 1})
plt.xticks = range(current.Impressions.min(), current.Impressions.max() + 1, 1)
Plot looks like this:
What I'm expecting:
The xlabels should be whole numbers
Bars should touch each other
The kde line should simply connect the top of the bars. By the looks of it, the current one assumes to have 0s between (x, x + 1), hence why the downward spike (This isn't required, I can turn off kde)
Am I using the correct tool for the job or distplot shouldn't be used for whole numbers?

For your problem can be solved bellow code,
import seaborn as sns # for data visualization
import numpy as np # for numeric computing
import matplotlib.pyplot as plt # for data visualization
arr = np.array([1,2,3,4,5,6,7,8,9])
sns.distplot(arr, bins = arr, kde = False)
plt.xticks(arr)
plt.show()
enter image description here
In this way, you can plot histogram using seaborn sns.distplot() function.
Note: Whatever data you will pass to bins and plt.xticks(). It should be an ascending order.

Pandas groupby count returns only a column?

I'm quite new to pandas programming. I have a file excel that I put into a dataframe and I was trying to do a group by with a count() for an attribute like in the code below and afterwards to show in a plotbar the frequency of these items I've grouped (y axis the frequency, x axis the item) :
red_whine=pd.read_csv('winequality-red.csv',header=1,sep=';',names=['fixed_acidity','volatile_acidity',...])
frequency=red_whine.groupby('quality')['quality'].count()
pdf=pd.DataFrame(frequency)
print(pdf[pdf.columns[0]])
but if I do this, this code will print me the result below like if it was a unique column:
quality
3 10
4 53
5 680
6 638
7 199
8 18
How can I keep the two columns separated?

import urllib2 # By recollection, Python 3 uses import urllib
target_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
wine = pd.read_csv(urllib2.urlopen(target_url), sep=';')
vc = wine.quality.value_counts()
>>> vc
5 681
6 638
7 199
4 53
8 18
3 10
Name: quality, dtype: int64
>>> vc.index
Int64Index([5, 6, 7, 4, 8, 3], dtype='int64')
>>> vc.values
array([681, 638, 199, 53, 18, 10])
For plotting, please refer to this:
Plotting categorical data with pandas and matplotlib

Lines not showing up on Matplotlib graph

I am trying to plot three lines on the same plot in Matplotlib. They are InvoicesThisYear, DisputesThisYear, and PercentThisYear (Which is Disputes/Invoices)
The original input is two columns of dates -- one for the date of a logged dispute and one for the date of a logged invoice.
I use the dates to count up the number of disputes and invoices per month during a certain year.
Then I try to graph it, but it comes up empty. I started with just trying to print PercentThisYear and InvoicesThisYear.
PercentThisYear = (DisputesFYThisYear/InvoicesFYThisYear).fillna(0.0)
#Percent_ThisYear.plot(kind = 'line')
#InvoicesFYThisYear.plot(kind = 'line')
plt.plot(PercentThisYear)
plt.xlabel('Date')
plt.ylabel('Percent')
plt.title('Customer Disputes')
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax2 = ax.twinx()
ax2.plot(InvoicesFYThisYear)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
#ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
datenow = datetime.datetime.now()
dstart = datetime.datetime(2015,4,1)
print datenow
#plt.ylim(0, .14)
plt.xlim(dstart, datenow)
firsts=[]
for i in range(dstart.month, datenow.month+1):
firsts.append(datetime.datetime(2015,i,1))
plt.xticks(firsts)
plt.show()
This is the output... The date is all messed up and nothing prints. But the scaled on the axes look right. What am I doing wrong?
Here is the set up leading up to the graph if that is helpful
The Input looks like this:
InvoicesThisYear
Out[82]:
7 7529
5 5511
6 4934
8 3552
dtype: int64
DisputesThisYear
Out[83]:
2 211
1 98
7 54
4 43
3 32
6 29
5 21
8 8
dtype: int64
PercentThisYear
Out[84]:
1 0.000000
2 0.000000
3 0.000000
4 0.000000
5 0.003810
6 0.005877
7 0.007172
8 0.002252
dtype: float64

Matplotlib has no way of knowing which dates are associated with which data points. When you call plot with only one argument y, Matplotlib automatically assumes that the x-values are range(len(y)). You need to supply the dates as the first argument to plot. Assuming that InvoicesThisYear is a count of the number of invoices each month, starting at 1 and ending at 8, you could do something like
import datetime
import matplotlib.pyplot as plt
import pandas as pd
InvoicesFYThisYear = pd.DataFrame([0, 0, 0, 0, 5511, 4934, 7529, 3552])
Disputes = pd.DataFrame([98, 211, 32, 43, 21, 29, 54, 8])
PercentThisYear = (Disputes / InvoicesFYThisYear)
datenow = datetime.date.today()
ax = plt.subplot(111)
dates = [datetime.date(2015,i,1) for i in xrange(1, 9, 1)]
plt.plot(dates, PercentThisYear)
ax2 = ax.twinx()
ax2.plot(dates, InvoicesFYThisYear)
dstart = datetime.datetime(2015,4,1)
plt.xlim(dstart, datenow)
plt.xticks(dates, dates)
plt.show()
If your data is in a Pandas series and the index is an integer representing the month, all you have to do is change the index to datetime objects instead. The plot method for pandas.Series will handle things automatically from there. Here's how you might do that:
Invoices = pd.Series((211, 98, 54, 43, 32, 29, 21, 8), index = (2, 1, 7, 4, 3, 6, 5, 8))
dates = [datetime.date(2015, month, 1) for month in Invoices.index]
Invoices.index = dates
Invoices.plot()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting value_counts() into axis [duplicate] - python

Related

set x axis as column names on barplot

Changing order of seaborn lineplot

Seaborn distplot only whole numbers

Pandas groupby count returns only a column?

Lines not showing up on Matplotlib graph

Categories

Resources