data is a pandas dataframe with a date-time-index on entries with multiple attributes. One of these attributes is called STATUS. I tried to create a plot of the number of entries per day, broken down by the STATUS attribute.
My first attempt using pandas.plot:
for status in data["STATUS"].unique():
entries = data[data["STATUS"] == status]
entries.groupby(pandas.TimeGrouper("D")).size().plot(figsize=(16,4), legend=True)
The result:
How should I modify the code above so that the legend shows which status the curve belongs to?
Also, feel free to suggest a different approach to realizing such a visualization (group time series by time interval, count entries, and break down by attributes of the entries).
I believe that with below change to your code you will get what you want:
fig, ax = plt.subplots()
for status in data["STATUS"].unique():
entries = data[data["STATUS"] == status]
dfPlot = pandas.DataFrame(entries.groupby(pandas.TimeGrouper("D")).size())
dfPlot.columns=[status]
dfPlot.plot(ax=ax, figsize=(16,4), legend=True)
What happened is that the output for size function gives you a Series type with no name in its column. So creating a Dataframe from the Series and changing the column name does the trick.
Related
I have a dataframe, where I would like to make a time series plot with three different lines that each show the daily occurrences (the number of rows per day) for each of the values in another column.
To give an example, for the following dataframe, I would like to see the development for how many a's, b's and c's there have been each day.
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-14','2019-10-09','2019-10-10','2019-10-08','2019-10-14','2019-10-10','2019-10-08','2019-10-08','2019-10-13','2019-10-08','2019-10-12','2019-10-11','2019-10-09','2019-10-08']),
'letter':['a','b','c','a','b','b','b','b','c','b','b','a','b','a','c']})
When I try the command below (my best guess so far), however, it does not filter for the different dates (I would like three lines representing each of the letters.
Any ideas on how to solve this?
df.groupby(['date']).count().plot()['letter']
I have also tried a solution in Matplotlib, though this one gives an error..
fig, ax = plt.subplots()
ax.plot(df['date'], df['letter'].count())
Based on your question, I believe you are looking for a line plot which has dates in X-axis and the counts of letters in the Y-axis. To achieve this, these are the steps you will need to do...
Group the dataframe by date and then letter - get the number of entries/rows for each which you can do using size()
Flatten the grouped dataframe using reset_index(), rename the new column to Counts and sort by letter column (so that the legend shows the data in the alphabetical format)... these are more to do with keeping the new dataframe and graph clean and presentable. I would suggest you do each step separately and print, so that you know what is happening in each step
Plot each line plot separately using filtering the dataframe by each specific letter
Show legend and rotate date so that it comes out with better visibility
The code is shown below....
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-14','2019-10-09','2019-10-10','2019-10-08','2019-10-14','2019-10-10','2019-10-08','2019-10-08','2019-10-13','2019-10-08','2019-10-12','2019-10-11','2019-10-09','2019-10-08']),
'letter':['a','b','c','a','b','b','b','b','c','b','b','a','b','a','c']})
df_grouped = df.groupby(by=['date', 'letter']).size().reset_index() ## New DF for grouped data
df_grouped.rename(columns = {0 : 'Counts'}, inplace = True)
df_grouped.sort_values(['letter'], inplace=True)
colors = ['r', 'g', 'b'] ## New list for each color, change as per your preference
for i, ltr in enumerate(df_grouped.letter.unique()):
plt.plot(df_grouped[df_grouped.letter == ltr].date, df_grouped[df_grouped.letter == ltr].Counts, '-o', label=ltr, c=colors[i])
plt.gcf().autofmt_xdate() ## Rotate X-axis so you can see dates clearly without overlap
plt.legend() ## Show legend
Output graph
I have a program that forecasts individual stock data. It's very simple and straightforward. The user needs to select one stock and the range of data.
I'm ready to take it up to the next level by allowing this application to create individual forecasts for multiple stocks in one sitting by passing a list of stock symbols to my model. For example, instead of running this program 20 times for 20 different stocks, It would only have to run once for 20 individual stocks. Before, I could only use this application for one at a time.
Let's look at where I currently am. I have already made a dummy list of stocks in tickers and started a loop which turned into poorly designed data frames and dictionaries.
import yfinance as yf
#stock symbol
tickers = ["LRCX", "FB", "COF"]
# mm-dd-yy formate
start_date = "01-01-2014"
end_date = "11-23-2019"
stocks = {}
for symbol in tickers:
stock_info = pdr.get_data_yahoo(tickers, start=start_date,end=end_date)
stock_info['date'] = stock_info.index
key_name = 'df_' + symbol
stock_info.drop(['Open', 'High', 'Low','Volume'], axis=1)
stock_info.rename(columns={'Close': 'y', 'date': 'ds'}, inplace=True)
stocks[key_name] = stock_info
This is the current data frame that the code above produced:
https://i.stack.imgur.com/Yh1F9.png I call it with stocks[key_name]. However, this is not the dataframe I had in mind. I want to have a loop that creates individual dataframes for each stock above in my list of tickers. Then process the individual dataframes by dropping and renaming each necessary column. In this case, finalizing dataframes with only y and ds columns for each stock.
An example of a dataframe I wanted to create for stocks in my list, one df per stock
Once that is settled, I would I would like to create loops that pass these dataframes into my model and plots out the data.
The method below did not work for me because I'm using a dictionary and it got overly complicated, I also found out that I need to pass dataframes for .fit() when using Prophet() (its a forecasting model developed by Facebook). I would need to loop through each dataframe created and fit them indivudally as such below.
for k, v in stocks.items():
m = Prophet()
m.fit(stocks)
Below is what I have in mind for plotting each dataframe and their respective columns of data. It might help you understand this workflow better. I'm assuming that its very easy to loop over a list for plotting, but I'm also struggling with that. Would i need to automate the size of the subplots as well? Incase I want to try out 30 stocks? Just some of the questions I keep running into.
for stock in list_of_df
# First Subplot
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(14,5))
ax1.plot(stock_info["date"], stock_info["Close"])
ax1.set_xlabel("Date", fontsize=12)
ax1.set_ylabel("Stock Price")
ax1.set_title(f"{ticker} Close Price History")
# Second Subplot
ax1.plot(stock_info["date"], stock_info["High"], color="green")
ax1.set_xlabel("Date", fontsize=12)
ax1.set_ylabel("Stock Price")
ax1.set_title(f"{ticker} High Price History")
# Third Subplot
ax1.plot(stock_info["date"], stock_info["Low"], color="red")
ax1.set_xlabel("Date", fontsize=12)
ax1.set_ylabel("Stock Price")
ax1.set_title(f"{ticker} Low Price History")
# Fourth Subplot
ax2.plot(stock_info["date"], stock_info["Volume"], color="orange")
ax2.set_xlabel("Date", fontsize=12)
ax2.set_ylabel("Stock Price")
ax2.set_title(f"{ticker} Volume History")
plt.show()
I would greatly appreciate some guidance here from any dataframe and looping expert. Streamlining this workflow has turned out a lot more difficult than I thought, but essentially I'm trying to make a loop or a function that works for creating any amount of dataframes at once given the proper data.
cols = [i for i in stock_info.columns]
cols = [ i for i in cols if "date" not in i]
for col in cols:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(14,5))
ax1.plot(stock_info[col], stock_info["Close"])
ax1.set_xlabel("Date", fontsize=12)
ax1.set_ylabel(col)
ax1.set_title(f"{ticker} Close Price History")
plt.show()
I'm trying to create a stacked bar-graph which shows two transaction types for a customer. The graph is sorted into columns by week.
Sample code within my code structure is below:
%matplotlib inline
import pandas as pd
values = [('1','2019-07-28','retail',11),
('1','2019-07-28','wholesale',18),
('1','2019-08-04','retail',7),
('1','2019-08-04','wholesale',12),
('1','2019-08-11','retail',6),
('1','2019-08-11','wholesale',16)]
columns = ['customer_id','week',
'transaction_type',
'sale_count']
df = pd.DataFrame(values, columns=columns)
df.groupby(['week','transaction_type']).size()\
.unstack()\
.plot(sort_columns='week',
kind='bar', stacked=True);
The result I'm getting is a row count for each transaction_type as either 1 or 2
current:
What I need is a stacked bar graph that gives the sum of sale_count for each date listed in week like the one below
expected:
Can anyone tell me what I'm doing wrong here?
Similar to commented:
(df.groupby(['week','transaction_type'])['sale_count']
.sum().unstack('transaction_type')
.plot.bar(stacked=True)
)
Output:
#Quang Hoang's answer is correct and should be accepted and upvoted. This is just a note about formatting code. I guess it will be better to get rid of extra round brackets and move legend outside as in the following code
df.groupby(['week','transaction_type'])['sale_count']\
.sum().unstack('transaction_type')\
.plot.bar(stacked=True, rot=0)\
.legend(bbox_to_anchor=(1.3, 1.0));
Trying to plot with Bokeh using a data-frame but plot is displaying empty. Beginner here; missing something fundamental.
My plot works if I hard code some basic X and Y variables so I know the issue has to do with the data-frame I'm trying to use as a source.
...
df = pd.DataFrame(j)
df.columns = ['Team','Type','Date','SLA_MET']
df['SLA_MET']= df['SLA_MET'].round(2)
pd.set_option('display.max_columns', 10)
print(df)
source = ColumnDataSource(df)
p = figure(background_fill_color='gray',
background_fill_alpha=0.5,
border_fill_color='blue',
border_fill_alpha=0.25,
plot_height=600,
plot_width=1000,
x_axis_label='Month',
x_axis_location='below',
y_axis_label='% SLA Met',
y_axis_location='left',
title='Percentage of SLA Met',
title_location='above',
toolbar_location='below',
tools='save')
p.line(source=source,x='Date',y='SLA_MET')
show(p)
Decided to pass clean lists to plot
for index, row in df.iterrows():
if row[2] =='Service Request':
sr_list.append(row[3])
else:
inc_list.append(row[3])
date_list.append(row[1]) # Only need 1 list of dates
Problem is dates in scientific notation and dates are not in order.
Bokeh does not know what to do with the strings in your Date column. You have two options:
convert this column to real python/numpy/pandas (numeric) datetime values, and also set x_axis_type="datetime" in your figure call, or
use the string values as categorical factors
It's not clear what your intention is, so I can't recommend one vs the other.
I have a dataframe which has three columns. The first one represents the country the second one is number of days and the third one is a count column. A sample would look like this:
import pandas as pd
df = pd.DataFrame({'Country':['USA','USA','IND','UK','UK','UK'],
'Days':[4,5,6,8,9,4],
'Count': [10,13,7,8,2,10]})
I want to plot the Days on the X-axis and the Count on the Y-axis for each country (a line plot) but i want the graphs to be in one frame much like the pair plot. Is there a way to achieve this ? Also I am not sure how to filter the dataframe and plot the filtered object as i want one graph per country?
I want something along this line where for America it would look like this
Days = [4,5]
Count = [10,13]
plt.plot(Days, Count, color='green')
plt.xlabel('Days')
plt.ylabel('Count')
plt.title('Days vs count for USA')
plt.show()
But i want it for every country in a seperate plot but in one frame like a pair-plot.
Any help would be useful.Thanks!
There are probably better built in methods for this, but I would use:
for country in df['Country'].unique():
df[df['Country']==country].sort_values('Days').plot.line(x='Days',
y='Count',
title=country)