I'm learning python and stuck trying to plot data from my table.
Here is piece of my code:
df1 = populationReport.loc[['Time','VIC','NSW','QLD']]
df1 = df1.set_index('Time')
print(df1)
plt.plot(df1)
plt.legend(df1.columns)
plt.ylabel ('Population')
plt.xlabel ('Timeline')
plt.show()
I need the X axis to display information from 'Time' column.
But so far it just displays line numbers in my table.
Attached image displays desired plot but x axis should display not the number of entries but data from 'Time'column
my draft plot
Here is how the table looks like:
VIC NSW QLD
Time
1/12/05 5023203.0 6718023.0 3964175.0
1/3/06 5048207.0 6735528.0 3987653.0
1/6/06 5061266.0 6742690.0 4007992.0
1/9/06 5083593.0 6766133.0 4031580.0
1/12/06 5103965.0 6786160.0 4055845.0
I think you can use to_datetime, if necessary define format or dayfirst parameter:
df1 = populationReport[['Time','VIC','NSW','QLD']]
df1['Time'] = pd.to_datetime(df1['Time'], format='%d/%m/%y')
#alternative
#df1['Time'] = pd.to_datetime(df1['Time'], dayfirst=True)
df1 = df1.set_index('Time')
print (df1)
VIC NSW QLD
Time
2005-12-01 5023203.0 6718023.0 3964175.0
2006-03-01 5048207.0 6735528.0 3987653.0
2006-06-01 5061266.0 6742690.0 4007992.0
2006-09-01 5083593.0 6766133.0 4031580.0
2006-12-01 5103965.0 6786160.0 4055845.0
and then is possible use DataFrame.plot:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ax = df1.plot()
ticklabels = df1.index.strftime('%Y-%m-%d')
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
plt.show()
Related
I'm trying to plot a graph of a time series which has dates from 1959 to 2019 including months, and I when I try plotting this time series I'm getting a clustered x-axis where the dates are not showing properly. How is it possible to remove the months and get only the years on the x-axis so it wont be as clustered and it would show the years properly?
fig,ax = plt.subplots(2,1)
ax[0].hist(pca_function(sd_Data))
ax[0].set_ylabel ('Frequency')
ax[1].plot(pca_function(sd_Data))
ax[1].set_xlabel ('Years')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
# fig.savefig('factor1959.pdf')
pca_function(sd_Data)
comp_0
sasdate
1959-01 -0.418150
1959-02 1.341654
1959-03 1.684372
1959-04 1.981473
1959-05 1.242232
...
2019-08 -0.075270
2019-09 -0.402110
2019-10 -0.609002
2019-11 0.320586
2019-12 -0.303515
[732 rows x 1 columns]
From what I see, you do have years on your second subplot, they are just overlapped because there are to many of them placed horizontally. Try to increase figsize, and rotate ticks:
# Builds an example dataframe.
df = pd.DataFrame(columns=['Years', 'Frequency'])
df['Years'] = pd.date_range(start='1/1/1959', end='1/1/2023', freq='M')
df['Frequency'] = np.random.normal(0, 1, size=(df.shape[0]))
fig, ax = plt.subplots(2,1, figsize=(20, 5))
ax[0].hist(df.Frequency)
ax[0].set_ylabel ('Frequency')
ax[1].plot(df.Years, df.Frequency)
ax[1].set_xlabel('Years')
for tick in ax[0].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
for tick in ax[1].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
p.s. if the x-labels still overlap, try to increase your step size.
First off, you need to store the result of the call to pca_function into a variable. E.g. called result_pca_func. That way, the calculations (and possibly side effects or different randomization) are only done once.
Second, the dates should be converted to a datetime format. For example using pd.to_datetime(). That way, matplotlib can automatically put year ticks as appropriate.
Here is an example, starting from a dummy test dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': [f'{y}-{m:02d}' for y in range(1959, 2019) for m in range(1, 13)]})
df['Values'] = np.random.randn(len(df)).cumsum()
df = df.set_index('Date')
result_pca_func = df
result_pca_func.index = pd.to_datetime(result_pca_func.index)
fig, ax2 = plt.subplots(figsize=(10, 3))
ax2.plot(result_pca_func)
plt.tight_layout()
plt.show()
I cannot find a way to plot the grouped data from the follwoing data frame:
Processed Card Transaction ID Transaction amount Error_Occured
Date
2019-01-01 Carte Rouge 217142203412 147924.21 0
2019-01-01 ChinaPay 149207925233 65301.63 1
2019-01-01 Masterkard 766507067450 487356.91 5
2019-01-01 VIZA 145484139636 97774.52 1
2019-01-02 Carte Rouge 510466748547 320951.10 3
I want to create a plot where: x-axis: Date, y-axis: Errors_Occured, points/lines colored by Processed Card. I tried grouping the data frame first and ploting it using pandas plot:
df = df.groupby(['Date','Processed Card']).sum('Error_Occured')
df = df.reset_index()
df.set_index("Date",inplace=True)
df.plot(legend=True)
plt.show()
But I get the plot where Transaction ID is displayed and not the cards:
SEE THE PLOT
You can do something like this:
fig, ax = plt.subplots()
df = pd.DataFrame()
df['Date'] = ['2019-01-01','2019-01-01','2019-01-01','2019-01-01','2019-01-02']
df['Card'] = ['Carte Rouge', 'ChinaPay', 'Masterkard', 'VIZA', 'Carte Rouge']
df['Error_Occured'] = [0,1,5,1,3]
series = dict(list(df.groupby(['Card'])))
for name, s in series.items():
ax.plot(s['Date'], s['Error_Occured'], marker='o', label=name)
plt.legend()
plt.show()
This produces the following with the data provided:
Note that you only want to group by card, not date.
Thank you in advance for the assistance!
I am trying to create a heat map from time-series data and the data begins mid year, which is causing the top of my heat map to be shifted to the left and not match up with the rest of the plot (Shown Below). How would I go about shifting the just the top line over so that the visualization of the data syncs up with the rest of the plot?
(Code Provided Below)
import pandas as pd
import matplotlib.pyplot as plt
# links to datadata
url1 = 'https://raw.githubusercontent.com/the-datadudes/deepSoilTemperature/master/minotDailyAirTemp.csv'
# load the data into a DataFrame, not a Series
# parse the dates, and set them as the index
df1 = pd.read_csv(url1, parse_dates=['Date'], index_col=['Date'])
# groupby year and aggregate Temp into a list
dfg1 = df1.groupby(df1.index.year).agg({'Temp': list})
# create a wide format dataframe with all the temp data expanded
df1_wide = pd.DataFrame(dfg1.Temp.tolist(), index=dfg1.index)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
ax1.matshow(df1_wide, interpolation=None, aspect='auto');
Now, what its the problem, the dates on the dataset, if you see the Dataset this start on
`1990-4-24,15.533`
To solve this is neccesary to add the data between 1990/01/01 -/04/23 and delete the 29Feb.
rng = pd.date_range(start='1990-01-01', end='1990-04-23', freq='D')
df = pd.DataFrame(index= rng)
df.index = pd.to_datetime(df.index)
df['Temp'] = np.NaN
frames = [df, df1]
result = pd.concat(frames)
result = result[~((result.index.month == 2) & (result.index.day == 29))]
With this data
dfg1 = result.groupby(result.index.year).agg({'Temp': list})
df1_wide = pd.DataFrame(dfg1['Temp'].tolist(), index=dfg1.index)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
ax1.matshow(df1_wide, interpolation=None, aspect='auto');
The problem with the unfilled portions are a consequence of the NaN values on your dataset, in this case you take the option, replace the NaN values with the column-mean or replace by the row-mean.
Another ways are available to replace the NaN values
df1_wide = df1_wide.apply(lambda x: x.fillna(x.mean()),axis=0)
I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)
You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
I would like to plot 12 graphs (one graph per month) including columns 'A' and 'B' on the left y axis and column 'C' on the right.
Code below plots everything on the left side.
import pandas as pd
index=pd.date_range('2011-1-1 00:00:00', '2011-12-31 23:50:00', freq='1h')
df=pd.DataFrame(np.random.rand(len(index),3),columns=['A','B','C'],index=index)
df2 = df.groupby(lambda x: x.month)
for key, group in df2:
group.plot()
How to separate columns and use something like this:group.plot({'A','B':style='g'},{'C':secondary_y=True}) ?
You can capture the axes which the Pandas plot() command returns and use it again to plot C specifically on the right axis.
index=pd.date_range('2011-1-1 00:00:00', '2011-12-31 23:50:00', freq='1h')
df=pd.DataFrame(np.random.randn(len(index),3).cumsum(axis=0),columns=['A','B','C'],index=index)
df2 = df.groupby(lambda x: x.month)
for key, group in df2:
ax = group[['A', 'B']].plot()
group[['C']].plot(secondary_y=True, ax=ax)
To get all lines in a single legend see:
Legend only shows one label when plotting with pandas