I am new to python and need your help. I have several dataframes. Each dataframe is for one day. So I am using for loop to plot for all dataframe. For each plot I want to add the date in my title. Can anyone help me. I have created a variable 'date_created and assigned the dates which I want. I want my title to look like below :
'Voltage vs time
28-01-2022'
for df in (df1,df2,df3,df4,df5,df6,df7,df8):
y = df[' Voltage']
x = df['time']
date_created = [ '28-01-2022, 29-01-2022, 30-01-2022, 31-08-2022, 01-02-2022, 02-02-2022, 03-02-2022, 04-02-2022' ]
fig, ax = plt.subplots(figsize=(18,7))
plt.plot(x,y, 'b')
plt.xlabel("time")
plt.ylabel(" Voltage [V]")
plt.title("Voltage vs time")
To make code work more effective it would be better to create a dictionary of dataframes and dates (if you haven't got date column in your dataframe).
dict = {df1: '28-01-2022', df2: '29-01-2022', df3: '30-01-2022'}
Than we will use for loop for elements of this dictionary
for key, value in dict.items():
y = key['Voltage']
x = key['time']
fig, ax = plt.subplots(figsize=(18,7))
plt.plot(x,y, 'b')
plt.xlabel("time")
plt.ylabel(" Voltage [V]")
plt.title(f"Voltage vs time {value}")
Hope this will work for you!
Related
Im new on PANDAS and MAtplotlib, still learning each day. Appreciate your help. I keep receiving for some plots the Y values at the wrong X position. Not sure if there is somethign related to the dataframe im producing, everything looks fine for me, but it keeps plotting at an offset of X+1. As im using DATES for X values, it keeps plotting the values one month ahead everytime.
The dataFrame dfExec1 comes from the main df:
dfRevenue = pd.read_csv('Revenue Report_DataHistory.csv')
dfExec1 = dfRevenue[dfRevenue['PLAN/EXEC'] == 'EXEC']
dfExec1.loc[:,'Year'] = pd.to_datetime(dfExec1['Year'], format='%m/%d/%Y', errors='coerce')
dfExec1 = dfExec1.groupby(pd.Grouper(key='Year', freq='M')).sum()
This is a picture of dfExec1 :
dfExec1 frame. All data are floats
Now i tried to choose to work only with the columns i wanted and zeros as NaN. I also created a new column for the DATES to try to see if the plot came out correct.
dfServicos = dfExec1.iloc[:, [0,1,2,3,4,6,7,8,9,10,11]]
dfServicos[dfServicos==0] = np.nan
dfServicos['DATAS'] = dfServicos.index
#dfServicos
fig6, ax = plt.subplots(figsize=(25,7))
#for coluna in dfServicos.columns:
#ax.scatter(x=dfServicos['DATAS'], y=dfServicos.loc[:, coluna], s=100, label=[coluna])
ax.scatter(x=dfServicos.iloc[0,11], y=dfServicos.iloc[0, 0], s=100, label=['Fishing'])
ax.legend()
plt.show()
This is Exec1 after treatment:
DataFrame - needed to cover blue data but Floats and NaN
I only plotted one column as example, but all the plots are showing like this :
X position offset by 01 month
Thank you very much for your support !
just got the problem solved by plotting xticks first. After that needed to configure date format.
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
import matplotlib.ticker as ticker
dfServicos = dfExec1.iloc[:, [0,1,2,3,4,6,7,8,9,10,11]]
dfServicos[dfServicos==0] = np.nan
#dfServicos
fig6, ax = plt.subplots(figsize=(35,7))
ax.set_xticks(dfServicos.index)
for coluna in dfServicos.columns:
ax.scatter(x=dfServicos.index, y=dfServicos.loc[:, coluna], s=100, label=[coluna])
#ax.bar(x=dfServicos.index, height=dfServicos.loc[:, 'Fishing'])
dateForm = DateFormatter('%m-%Y')
ax.xaxis.set_major_formatter(dateForm)
formatter = ticker.FormatStrFormatter('$%1.2f')
ax.yaxis.set_major_formatter(formatter)
ax.legend()
plt.show()
I have 2 dataframes:
'stock' is a dataframe with columns Date and Price.
'events' is a dataframe with columns Date and Text.
My goal is to produce a graph of the stock prices and on the line place dots where the events occur. However, I do not know how to do 'y' value for the events dataframe as I want it to be where it is on the stock dataframe.
I am able to plot the first dataframe fine with:
plt.plot('Date', 'Price', data=stock)
And I try to plot the event dots with:
plt.scatter('created_at', ???, data=events)
However, it is the ??? that I don't know how to set
Assuming Date and created_at are datetime:
stock = pd.DataFrame({'Date':['2021-01-01','2021-02-01','2021-03-01','2021-04-01','2021-05-01'],'Price':[1,5,3,4,10]})
events = pd.DataFrame({'created_at':['2021-02-01','2021-03-01'],'description':['a','b']})
stock.Date = pd.to_datetime(stock.Date)
events.created_at = pd.to_datetime(events.created_at)
Filter stock by events.created_at (or merge) and plot them onto the same ax :
stock_events = stock[stock.Date.isin(events.created_at)]
# or merge on the date columns
# stock_events = stock.merge(events, left_on='Date', right_on='created_at')
ax = stock.plot(x='Date', y='Price')
stock_events.plot.scatter(ax=ax, x='Date', y='Price', label='Event', c='r', s=50)
I have a few monthly datasets of usage stats stored in different CSVs, with a couple hundred fields. I am cutting off the top 30 of each one, but the bottom will change (and the top as changes as stuff is banned, albeit less commonly). Currently I have the lines representing months, but I want the points to be (y=usage %) and (x=month) with the legend being different users.
column[0] is their number in the file (1-30)
column[1] is their name
column[2] is the usage percent
AprilStats = pd.read_csv(r'filepath', nrows=30)
MayStats = pd.read_csv(r'filepath', nrows=30)
JuneStats = pd.read_csv(r'filepath', nrows=30)
## Assign labels and sources
labels = [[AprilStats.columns[1]], [MayStats.columns[1]], [JuneStats.columns[1]]]
AprilUsage=np.array(AprilStats[AprilStats.columns[2]].tolist())
MayUsage=np.array(MayStats[MayStats.columns[2]].tolist())
JuneUsage=np.array(JuneStats[JuneStats.columns[2]].tolist())
x = np.array(AprilStats[AprilStats.columns[0]].tolist())
y = np.array(AprilStats[AprilStats.columns[2]].tolist())
my_xticks = AprilStats[AprilStats.columns[1]].tolist()
plt.xticks(x, my_xticks, rotation='55')
x1 = np.array(MayStats[MayStats.columns[0]].tolist())
y1 = np.array(MayStats[MayStats.columns[2]].tolist())
my_xticks1 = MayStats[MayStats.columns[1]].tolist()
plt.xticks(x, my_xticks1, rotation='55')
x2 = np.array(JuneStats[JuneStats.columns[0]].tolist())
y2 = np.array(JuneStats[JuneStats.columns[2]].tolist())
my_xticks2 = JuneStats[JuneStats.columns[1]].tolist()
plt.xticks(x, my_xticks2, rotation='55',)
### Plot the data
plt.rc('xtick', labelsize='xx-small')
plt.title('Little Cup Usage')
plt.ylabel('Usage (Percent)')
plt.plot(x,y,label='April', color='green', alpha=.4)
plt.plot(x1,y1,label='May', color='blue', alpha=.4)
plt.plot(x2,y2,label='June', color='red', alpha=.4)
plt.subplots_adjust(bottom=.2)
plt.legend()
plt.savefig('90daytest.png', dpi=500)
plt.show()
I think I am mislabeling them, but the month of usage isn't stored in the file. I reckon I could add it, but I'd like to not have to go in and edit these files every month. Also, sorry if this is horribly inneficient coding, I have just started learning python less than two weeks ago and this is a little project for me to learn with.
I'd divide this into two steps:
Gather all the data into a single dataframe in which the rows correspond to the different months, the columns to the different names and the values are the usage %.
Plot each column as a different series in a scatter plot.
Step 1:
# Create a dictionary associating a file to each month
files = {dt.date(2019, 4, 1): 'april.csv',
dt.date(2019, 5, 1): 'may.csv'}
# An empty data frame
df = pd.DataFrame()
''' For each file, generate a one entry data frame as follows, and append it to df.
Month name1 name2 ...
2019-1-1 0.5 0.2
'''
for month, file in files.items():
data = pd.read_csv(file, usecols=['name', 'usage'], index_col='name')
data = data.transpose()
data['month'] = month
data = data.set_index('month')
df = df.append(data)
Step 2:
# New figure
fig = plt.figure()
# Plot one series for each column in df
for name in df.columns:
plt.scatter(x=df.index, y=df[name], label=name)
# Additional plot formatting code here
plt.show()
I hope that helps.
I'm new to data viz. and was wondering what the simplest way is to plot my data:
I have a pd.dataframe that looks like:
df.head()
price event
1 123
2 456 A
3 789
...
I would like to have a time series just as if I did
df.plot(x='price')
But with events visible on the plot at for each entry in my DataFrame where my 'event' column is equal to something.
What are my best options?
Thanks.
I took the liberty and added one more row with event z.
fig = plt.figure()
ax = fig.add_subplot(111)
x = df.reset_index()['index']
y = df['price']
ax.scatter(x, y)
ax.plot(y)
for i, txt in enumerate(df['event']):
ax.annotate(txt, (x[i]+0.1,y[i]))
Output:
I am plotting Density Graphs using Pandas Plot. But I am not able to add appropriate legends for each of the graphs. My code and result is as as below:-
for i in tickers:
df = pd.DataFrame(dic_2[i])
mean=np.average(dic_2[i])
std=np.std(dic_2[i])
maximum=np.max(dic_2[i])
minimum=np.min(dic_2[i])
df1=pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(dic_2[i])))
ax=df.plot(kind='density', title='Returns Density Plot for '+ str(i),colormap='Reds_r')
df1.plot(ax=ax,kind='density',colormap='Blues_r')
You can see in the pic, top right side box, the legends are coming as 0. How do I add something meaningful over there?
print(df.head())
0
0 -0.019043
1 -0.0212065
2 0.0060413
3 0.0229895
4 -0.0189266
I think you may want to restructure the way you've created the graph. An easy way to do this is to create the ax before plotting:
# sample data
df = pd.DataFrame()
df['returns_a'] = [x for x in np.random.randn(100)]
df['returns_b'] = [x for x in np.random.randn(100)]
print(df.head())
returns_a returns_b
0 1.110042 -0.111122
1 -0.045298 -0.140299
2 -0.394844 1.011648
3 0.296254 -0.027588
4 0.603935 1.382290
fig, ax = plt.subplots()
I then created the dataframe using the parameters specified in your variables:
mean=np.average(df.returns_a)
std=np.std(df.returns_a)
maximum=np.max(df.returns_a)
minimum=np.min(df.returns_a)
pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(df.returns_a))).rename(columns={0: 'std_normal'}).plot(kind='density',colormap='Blues_r', ax=ax)
df.plot('returns_a', kind='density', ax=ax)
This second dataframe you're working with is created by default with column 0. You'll need to rename this.
I figured out a simpler way to do this. Just add column names to the dataframes.
for i in tickers:
df = pd.DataFrame(dic_2[i],columns=['Empirical PDF'])
print(df.head())
mean=np.average(dic_2[i])
std=np.std(dic_2[i])
maximum=np.max(dic_2[i])
minimum=np.min(dic_2[i])
df1=pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(dic_2[i])),columns=['Normal PDF'])
ax=df.plot(kind='density', title='Returns Density Plot for '+ str(i),colormap='Reds_r')
df1.plot(ax=ax,kind='density',colormap='Blues_r')