Plot each year of a time series on the same x-axis - python

I have a time series with daily data that I want to plot to see how it evolves over a year. I want to compare how it evolves over the year compared to previous years. I have written the following code in Python:
xindex = data['biljett'].index.month*30 + data['biljett'].index.day
plt.plot(xindex, data['biljett'])
plt.show()
The graph looks as follows:
A graph how the data evolves over a year compared to previous years. The line is continuous and and does not end with the end of the year which makes it fuzzy. What am I doing wrong ?

From technical perspectives, it happens because your data points are not sorted w.r.t. date, thus it goes back and forth to connect data points in the data frame order. you sort the data based on xindex and you're good to go. to do that: (first you need to put xindex in data dataframe as a new column)
data.sort_values(by='xindex').reset_index(drop=True)
From the visualization perspective, I think you might have several values per each day count, thus plot is not a good option to begin with. So IMHO you'd want to try plt.scatter() to visualize your data in a better way.

I have rewritten as follows:
xindex = data['biljett'].index.month*30 + data['biljett'].index.day
data['biljett'].sort_values('xindex').reset_index(drop=True)
plt.plot(xindex, data['biljett'])
plt.show()
but gets the following error message:
ValueError: No axis named xindex for object type

Related

How can I make my data numeral, so I can visualize them via matplotlib?

So i have this df, the columns that im intrested in visualizing
later with matplotlib are the 'incident_date', 'fatalities'. I want to create two diagrams. The one will display the number of the incidents with injuries (the column named 'fatalities' says whether it was a fatal accident, or just one with injuries or neither), the other will display the dates with the most deaths. So, in order to do those, I need somehow to turn the data in the 'fatalities' column into numeral ones.
This is my df's head, so you get an idea
I created dummy data based on picture you provided
data = {'incident_date':['1-Mar-20','1-Mar-20','3-Mar-20','3-Mar-20','3-Mar-20','5-Mar-20','6-Mar-20','7-Mar-20','7-Mar-20'] \
,'fatalities':['Fatal','Fatal','Injuries','Injuries','Neither','Fatal','Fatal','Fatal','Fatal'] \
, 'conclusion_number':[1,1,3,23,23,34,23,24,123]}
df = pd.DataFrame(data)
All you need is to do a group by incident_data and fatalities and you will get the numerical values for that particular date and that particular incident.
df_grp = df.groupby(['incident_date','fatalities'],as_index=False)['conclusion_number'].count()
df_grp.rename({'conclusion_number':'counts'},inplace=True, axis=1)
The Output of above looks like this.
output dataframe
Once you get counts column you can perform your matplot diagrams.
Let me know if you need help with diagrams as well

Problem plotting dataframe with matplotlib

I'm trying to plot a bar chart of some de-identified transactional banking data using the pandas and matplotlib libraries.
The data looks like this:
The column named "day" stores the numbers of the days on which the transaction was made, the column named "tr_type" stores the numbers of transactions made on each day, and the column named "average_income" stores the average amount of incomes for each of the different types of transactions.
The task is to display the data of all three columns, which have the largest average amount of incomes, on one graph.
For definiteness, I took the top 5 rows of sorted data.
`
slised_two = sliced_df_new.sort_values('average_income', ascending=False).head(5)
slised_two = slised_two.set_index('day')
`
For convenience in further plotting, I set a column called "day" as an index. I get this:
Based on this data, I tried to build one graph, but, unfortunately, I did not achieve the result I wanted, because I had to build 2 graphs for normal data display.
`
axes = slised_two.plot.bar(rot=0, subplots=True)
axes[1].legend(loc=2)
`
The question arises, is it possible to build a histogram in such a way that days are displayed on the x-axis, the average amount of incomes is displayed on the y-axis, and at the same time, the transaction number is signed on top of each column?

How do I stop Pandas from continuing to put the same data on the same plot?

My first post, so I hope I do this correctly!
This is admittedly an OOP modification of something on DataCamp. I have two objects which contain Pandas dataframes. The first (StockData) has stock data for every trading day of 2016 for both Amazon and Facebook. The second (BenchmarkData) has the S&P 500 closing values for every trading day of 2016. For both, I want to calculate the percent change (StockReturns and BenchmarkReturns, respectively) and then plot them. I want both of the StockReturns on the same plot, but the BenchmarkReturns (which is a Series and not a dataframe for reasons irrelevant to this part of the code) on a separate plot. For my function, I've added a flag as input to tell the program whether the object contains a stock dataframe or a benchmark dataframe and I call the function twice during runtime, once for stock and the other for benchmark. However, no matter what I do, Pandas plots all 3 on the same plot. How do I separate the benchmark data and get it on its own plot?
def _CalculatePercentChange(self, IsStockData):
if(IsStockData):
self.__StockReturns = self.__StockData.GetData().pct_change()
self.__StockReturns.plot(title = 'Daily Percent Change')
else:
self.__BenchmarkReturnsDataFrame = self.__BenchmarkData.GetData().pct_change()
self.__BenchmarkReturns = self.__BenchmarkReturnsDataFrame['S&P 500'].squeeze()
self.__BenchmarkReturns.plot(title = 'Daily Percent Change')
Thanks guys.

Time Series Chart: Groupby seasons (or specfic months) for multiple years in xarray

Thank you for taking interest in my question.
I am hoping to do plot a temperature time series chart specifically between the months January to August from 1981-1999.
Below are my codes and attempts:
temperature = xr.open_dataarray('temperature.nc')
temp = temperature.sel(latitude=slice(34.5,30), longitude=slice(73,78.5))
templatlonmean = temp.mean(dim=['latitude','longitude'])-273.15
tempgraph1 = templatlonmean.sel(time=slice('1981','1999'))
The above commands read in fine without any errors.
Below are my attempts to divide the months into seasons:
1st Attempt
tempseason1 = tempgraph1.groupby("time.season").mean("time")
#Plotting Graph Command
myfig, myax = plt.subplots(figsize=(14,8))
timeyears = np.unique(tempgraph1["time.season"])
tempseason1.plot.line('b-', color='red', linestyle='--',linewidth=4, label='1981-1999 Mean')
I got this error:
"Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead."
I tried this as my second attempt (retrieved from this post Select xarray/pandas index based on specific months)
However, I wasn't sure how can I plot a graph with this, so I tried the following:
def is_amj(month):
return (month >= 4) & (month <= 6)
temp_seasonal = tempgraph1.sel(time=is_amj(tempgraph1['time.month']))
#Plotting Graph Command
timeyears = np.unique(tempgraph1["time.season"])
temp_seasonal.plot.line('b-', color='red', linestyle='--',linewidth=4, label='1981-1999 Mean')
And it caused no error but the graph was not ideal
So I moved on to my 3rd attempt (from here http://xarray.pydata.org/en/stable/examples/monthly-means.html):
month_length = tempmean.time.dt.days_in_month
weights = month_length.groupby('time.season') / month_length.groupby('time.season').sum()
np.testing.assert_allclose(weights.groupby('time.season').sum().values, np.ones(4))
ds_weighted = (tempmean * weights).groupby('time.season').sum(dim='time')
ds_unweighted = tempmean.groupby('time.season').mean('time')
#Plot Commands
timeyears = np.unique(tempgraph1["time.season"])
ds_unweighted.plot.line('b-', color='red', linestyle='--',linewidth=4, label='1981-1999 Mean')
Still I got the same error as the 1st attempt:
"Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead."
As I this command was used to plot weather maps rather than time series chart, however I believed the groupby process would be similar or the same even, thats's why I used it.
However, as I am relatively new in coding, please excuse any syntax errors and that I am not able to spot any obvious ways to go about this.
Therefore, I am wondering if you could suggest any other ways to plot specific monthly datas for xarray or if there's any adjustment I need to make for the commands I have attempted.
I greatly appreciate your generous help.
Please let me know if you need any more further information, I will respond as soon as possible.
Thank you!
About your issues 1. and 3., the object is the seasons of grouping.
You can visualize that by doing:
tempseason1 = tempgraph1.groupby("time.season").mean("time")
print(tempseason1.coords)
You should see something like:
Coordinates:
* lon (lon) float32 ...
* lat (lat) float32 ...
* season (season) object 'DJF' 'JJA' 'MAM' 'SON'
Notice the type object of season dimension.
I think you should use resample instead of groupby here.
Resample is basically a groupby to upsample or downsample time series.
It would look like:
tempseason1 = tempgraph1.resample(time="Q").mean("time")
The argument "Q" is a pandas offset for quarterly frequency, see there for details.
I don't know much about plotting though.

Cumulative Frequency with Defined Bins in Python

I have an array of data on how quickly people take action measured in hours. I want to generate a table that tells me what % of users have taken by the first hour, first day, first week, first month, etc.
I have used the pandas.cut to categorize and give them group_names
bins_hours = [0...]
group_names = [...]
hourlylook = pd.cut(av.date_diff, bins_hours, labels=group_names,right=False)
I then plotted hourlylook and got an awesome bar chart.
But I want to express this information cumulatively, too, in a table format. What's the best way to tackle this problem?
Have a look at: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumsum.html
This should allow you to create a new column with the cumulative sum.

Categories