Problem plotting dataframe with matplotlib - python

I'm trying to plot a bar chart of some de-identified transactional banking data using the pandas and matplotlib libraries.
The data looks like this:
The column named "day" stores the numbers of the days on which the transaction was made, the column named "tr_type" stores the numbers of transactions made on each day, and the column named "average_income" stores the average amount of incomes for each of the different types of transactions.
The task is to display the data of all three columns, which have the largest average amount of incomes, on one graph.
For definiteness, I took the top 5 rows of sorted data.
`
slised_two = sliced_df_new.sort_values('average_income', ascending=False).head(5)
slised_two = slised_two.set_index('day')
`
For convenience in further plotting, I set a column called "day" as an index. I get this:
Based on this data, I tried to build one graph, but, unfortunately, I did not achieve the result I wanted, because I had to build 2 graphs for normal data display.
`
axes = slised_two.plot.bar(rot=0, subplots=True)
axes[1].legend(loc=2)
`
The question arises, is it possible to build a histogram in such a way that days are displayed on the x-axis, the average amount of incomes is displayed on the y-axis, and at the same time, the transaction number is signed on top of each column?

Related

How can I make my data numeral, so I can visualize them via matplotlib?

So i have this df, the columns that im intrested in visualizing
later with matplotlib are the 'incident_date', 'fatalities'. I want to create two diagrams. The one will display the number of the incidents with injuries (the column named 'fatalities' says whether it was a fatal accident, or just one with injuries or neither), the other will display the dates with the most deaths. So, in order to do those, I need somehow to turn the data in the 'fatalities' column into numeral ones.
This is my df's head, so you get an idea
I created dummy data based on picture you provided
data = {'incident_date':['1-Mar-20','1-Mar-20','3-Mar-20','3-Mar-20','3-Mar-20','5-Mar-20','6-Mar-20','7-Mar-20','7-Mar-20'] \
,'fatalities':['Fatal','Fatal','Injuries','Injuries','Neither','Fatal','Fatal','Fatal','Fatal'] \
, 'conclusion_number':[1,1,3,23,23,34,23,24,123]}
df = pd.DataFrame(data)
All you need is to do a group by incident_data and fatalities and you will get the numerical values for that particular date and that particular incident.
df_grp = df.groupby(['incident_date','fatalities'],as_index=False)['conclusion_number'].count()
df_grp.rename({'conclusion_number':'counts'},inplace=True, axis=1)
The Output of above looks like this.
output dataframe
Once you get counts column you can perform your matplot diagrams.
Let me know if you need help with diagrams as well

How do I stop Pandas from continuing to put the same data on the same plot?

My first post, so I hope I do this correctly!
This is admittedly an OOP modification of something on DataCamp. I have two objects which contain Pandas dataframes. The first (StockData) has stock data for every trading day of 2016 for both Amazon and Facebook. The second (BenchmarkData) has the S&P 500 closing values for every trading day of 2016. For both, I want to calculate the percent change (StockReturns and BenchmarkReturns, respectively) and then plot them. I want both of the StockReturns on the same plot, but the BenchmarkReturns (which is a Series and not a dataframe for reasons irrelevant to this part of the code) on a separate plot. For my function, I've added a flag as input to tell the program whether the object contains a stock dataframe or a benchmark dataframe and I call the function twice during runtime, once for stock and the other for benchmark. However, no matter what I do, Pandas plots all 3 on the same plot. How do I separate the benchmark data and get it on its own plot?
def _CalculatePercentChange(self, IsStockData):
if(IsStockData):
self.__StockReturns = self.__StockData.GetData().pct_change()
self.__StockReturns.plot(title = 'Daily Percent Change')
else:
self.__BenchmarkReturnsDataFrame = self.__BenchmarkData.GetData().pct_change()
self.__BenchmarkReturns = self.__BenchmarkReturnsDataFrame['S&P 500'].squeeze()
self.__BenchmarkReturns.plot(title = 'Daily Percent Change')
Thanks guys.

Plot each year of a time series on the same x-axis

I have a time series with daily data that I want to plot to see how it evolves over a year. I want to compare how it evolves over the year compared to previous years. I have written the following code in Python:
xindex = data['biljett'].index.month*30 + data['biljett'].index.day
plt.plot(xindex, data['biljett'])
plt.show()
The graph looks as follows:
A graph how the data evolves over a year compared to previous years. The line is continuous and and does not end with the end of the year which makes it fuzzy. What am I doing wrong ?
From technical perspectives, it happens because your data points are not sorted w.r.t. date, thus it goes back and forth to connect data points in the data frame order. you sort the data based on xindex and you're good to go. to do that: (first you need to put xindex in data dataframe as a new column)
data.sort_values(by='xindex').reset_index(drop=True)
From the visualization perspective, I think you might have several values per each day count, thus plot is not a good option to begin with. So IMHO you'd want to try plt.scatter() to visualize your data in a better way.
I have rewritten as follows:
xindex = data['biljett'].index.month*30 + data['biljett'].index.day
data['biljett'].sort_values('xindex').reset_index(drop=True)
plt.plot(xindex, data['biljett'])
plt.show()
but gets the following error message:
ValueError: No axis named xindex for object type

Cumulative Frequency with Defined Bins in Python

I have an array of data on how quickly people take action measured in hours. I want to generate a table that tells me what % of users have taken by the first hour, first day, first week, first month, etc.
I have used the pandas.cut to categorize and give them group_names
bins_hours = [0...]
group_names = [...]
hourlylook = pd.cut(av.date_diff, bins_hours, labels=group_names,right=False)
I then plotted hourlylook and got an awesome bar chart.
But I want to express this information cumulatively, too, in a table format. What's the best way to tackle this problem?
Have a look at: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumsum.html
This should allow you to create a new column with the cumulative sum.

Graph Making from Pandas dataframe

i have a data frame called abc, that has rows of times (y direction) and columns of dates (x direction), the frame is made up of values.
what i need to do is create a graph by selecting a certain time, this graph needs to be dates vs the values in that row in the the dataframe. attached is a photo of the data
example of dataframe
so i need to be able to create a graph by typing in say 9am and a time series graph will appear of a graph of jun,july,aug,sept etc... on the x axis vs 1,2,6,8,9 etc...on the y
and then it would pull a different graph if i changed it to say 11am
This is very basic pandas functionality:
You can access the dataframe along the respective axis with the .ix operator
df=pd.DataFrame(np.random.randint(0,10,(5, 4)),index=['9am','10am','11am','12pm','1pm'],columns=['jun','jul','aug','sept'])
# select '9am' and all [:] of the months
df.ix['9am',:].plot()
# select month and 9am to 11 am
df.ix['9am':'11am','jun'].plot()
plot obviously draws the desired graph.

Categories