Time series plot showing unique occurrences per day - python

I have a dataframe, where I would like to make a time series plot with three different lines that each show the daily occurrences (the number of rows per day) for each of the values in another column.
To give an example, for the following dataframe, I would like to see the development for how many a's, b's and c's there have been each day.
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-14','2019-10-09','2019-10-10','2019-10-08','2019-10-14','2019-10-10','2019-10-08','2019-10-08','2019-10-13','2019-10-08','2019-10-12','2019-10-11','2019-10-09','2019-10-08']),
'letter':['a','b','c','a','b','b','b','b','c','b','b','a','b','a','c']})
When I try the command below (my best guess so far), however, it does not filter for the different dates (I would like three lines representing each of the letters.
Any ideas on how to solve this?
df.groupby(['date']).count().plot()['letter']
I have also tried a solution in Matplotlib, though this one gives an error..
fig, ax = plt.subplots()
ax.plot(df['date'], df['letter'].count())

Based on your question, I believe you are looking for a line plot which has dates in X-axis and the counts of letters in the Y-axis. To achieve this, these are the steps you will need to do...
Group the dataframe by date and then letter - get the number of entries/rows for each which you can do using size()
Flatten the grouped dataframe using reset_index(), rename the new column to Counts and sort by letter column (so that the legend shows the data in the alphabetical format)... these are more to do with keeping the new dataframe and graph clean and presentable. I would suggest you do each step separately and print, so that you know what is happening in each step
Plot each line plot separately using filtering the dataframe by each specific letter
Show legend and rotate date so that it comes out with better visibility
The code is shown below....
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-14','2019-10-09','2019-10-10','2019-10-08','2019-10-14','2019-10-10','2019-10-08','2019-10-08','2019-10-13','2019-10-08','2019-10-12','2019-10-11','2019-10-09','2019-10-08']),
'letter':['a','b','c','a','b','b','b','b','c','b','b','a','b','a','c']})
df_grouped = df.groupby(by=['date', 'letter']).size().reset_index() ## New DF for grouped data
df_grouped.rename(columns = {0 : 'Counts'}, inplace = True)
df_grouped.sort_values(['letter'], inplace=True)
colors = ['r', 'g', 'b'] ## New list for each color, change as per your preference
for i, ltr in enumerate(df_grouped.letter.unique()):
plt.plot(df_grouped[df_grouped.letter == ltr].date, df_grouped[df_grouped.letter == ltr].Counts, '-o', label=ltr, c=colors[i])
plt.gcf().autofmt_xdate() ## Rotate X-axis so you can see dates clearly without overlap
plt.legend() ## Show legend
Output graph

Related

Reformatting y axis values in a multi-line plot in Python

Updated with more info
I've seen this answered on here for single line plots, but I need help with a plot showing two variables, if that matters at all... I am fairly new to python in general. My line graph shows two different departments' funding over the years. I just want to reformat the y axis to display as a number in the hundreds of millions.
Using a csv for the general public funding report of Minneapolis.
msp_df = pd.read_csv('Minneapolis_Data_Snapshot_v2.csv',error_bad_lines=False)
msp_df.info()
Saved just the two depts I was interested in, to a dataframe.
CPED_df = (msp_df['Unnamed: 0'] == 'CPED')
msp_df.iloc[CPED_df.values]
police_df = (msp_df['Unnamed: 0'] == 'Police')
msp_df.iloc[police_df.values]
("test" is the new name of my data frame containing all the info as seen below.)
test = pd.DataFrame({'Year': range(2014,2021),
'CPED': msp_df.iloc[CPED_df.values].T.reset_index(drop=True).drop(0,0)[5].tolist(),
'Police': msp_df.iloc[police_df.values].T.reset_index(drop=True).drop(0,0)[4].tolist()})
The numbers from the original dataset were being read as strings because of the commas so had to fix that first.)
test['Police2'] = test['Police'].str.replace(',','').astype(int)
test['CPED2'] = test['CPED'].str.replace(',','').astype(int)
And here is my code for the plot. It executes, I'm just wanting to reformat the y axis number scale. Right now it just shows up as a decimal. (I've already imported pandas and seaborn and matploblib)
plt.plot(test.Year, test.Police2, test.Year, test.CPED2)
plt.ylabel('Budget in Hundreds of Millions')
plt.xlabel('Year')
Current plot
Any help super appreciated! Thanks :)
the easiest way to reformat the y axis, to force it to take certain values ​​is to use
plt.yticks(ticks, labels)
for example if you want to have only display values ​​from 0 to 1 you can do :
plt.yticks([0,0.2,0.5,0.7,1], ['a', 'b', 'c', 'd', 'e'])

pandas.groupby().plot() stacking amount of rows per group by, not actual values

I'm trying to create a stacked bar-graph which shows two transaction types for a customer. The graph is sorted into columns by week.
Sample code within my code structure is below:
%matplotlib inline
import pandas as pd
values = [('1','2019-07-28','retail',11),
('1','2019-07-28','wholesale',18),
('1','2019-08-04','retail',7),
('1','2019-08-04','wholesale',12),
('1','2019-08-11','retail',6),
('1','2019-08-11','wholesale',16)]
columns = ['customer_id','week',
'transaction_type',
'sale_count']
df = pd.DataFrame(values, columns=columns)
df.groupby(['week','transaction_type']).size()\
.unstack()\
.plot(sort_columns='week',
kind='bar', stacked=True);
The result I'm getting is a row count for each transaction_type as either 1 or 2
current:
What I need is a stacked bar graph that gives the sum of sale_count for each date listed in week like the one below
expected:
Can anyone tell me what I'm doing wrong here?
Similar to commented:
(df.groupby(['week','transaction_type'])['sale_count']
.sum().unstack('transaction_type')
.plot.bar(stacked=True)
)
Output:
#Quang Hoang's answer is correct and should be accepted and upvoted. This is just a note about formatting code. I guess it will be better to get rid of extra round brackets and move legend outside as in the following code
df.groupby(['week','transaction_type'])['sale_count']\
.sum().unstack('transaction_type')\
.plot.bar(stacked=True, rot=0)\
.legend(bbox_to_anchor=(1.3, 1.0));

Python: Pie chart percentages greater than number

I need to plot a pie chart of frequencies from a column of a dataframe, but a lot of lower frequencies appear and visualization is poor.
the code I wrote is :
df[column].value_counts(normalize=True).plot(kind="pie")
I know that df[column].value_counts(normalize=True) will give me percentages of every unique value, but I want to apply the filter percentage>0.05
What I tried?:
new_df = df[column].value_counts(normalize=True)
but this gives me column as index, so I reset the index
new_df = new_df.reset_index()
and then tried
new_df.plot(kind = "pie")
but nothing appears.
I want some 1 line code that can make something like:
df[column].value_counts(normalize=True).plot(kind="pie" if value_counts > 0.05)
Try this:
df['column'].value_counts()[df['column'].value_counts(normalize=True)>0.05].plot(kind='pie')

Plot a graph for each country in pandas

I have a dataframe which has three columns. The first one represents the country the second one is number of days and the third one is a count column. A sample would look like this:
import pandas as pd
df = pd.DataFrame({'Country':['USA','USA','IND','UK','UK','UK'],
'Days':[4,5,6,8,9,4],
'Count': [10,13,7,8,2,10]})
I want to plot the Days on the X-axis and the Count on the Y-axis for each country (a line plot) but i want the graphs to be in one frame much like the pair plot. Is there a way to achieve this ? Also I am not sure how to filter the dataframe and plot the filtered object as i want one graph per country?
I want something along this line where for America it would look like this
Days = [4,5]
Count = [10,13]
plt.plot(Days, Count, color='green')
plt.xlabel('Days')
plt.ylabel('Count')
plt.title('Days vs count for USA')
plt.show()
But i want it for every country in a seperate plot but in one frame like a pair-plot.
Any help would be useful.Thanks!
There are probably better built in methods for this, but I would use:
for country in df['Country'].unique():
df[df['Country']==country].sort_values('Days').plot.line(x='Days',
y='Count',
title=country)

Legend on pandas plot of time series shows only "None"

data is a pandas dataframe with a date-time-index on entries with multiple attributes. One of these attributes is called STATUS. I tried to create a plot of the number of entries per day, broken down by the STATUS attribute.
My first attempt using pandas.plot:
for status in data["STATUS"].unique():
entries = data[data["STATUS"] == status]
entries.groupby(pandas.TimeGrouper("D")).size().plot(figsize=(16,4), legend=True)
The result:
How should I modify the code above so that the legend shows which status the curve belongs to?
Also, feel free to suggest a different approach to realizing such a visualization (group time series by time interval, count entries, and break down by attributes of the entries).
I believe that with below change to your code you will get what you want:
fig, ax = plt.subplots()
for status in data["STATUS"].unique():
entries = data[data["STATUS"] == status]
dfPlot = pandas.DataFrame(entries.groupby(pandas.TimeGrouper("D")).size())
dfPlot.columns=[status]
dfPlot.plot(ax=ax, figsize=(16,4), legend=True)
What happened is that the output for size function gives you a Series type with no name in its column. So creating a Dataframe from the Series and changing the column name does the trick.

Categories