I have an Excel file that looks like this:
I would like to plot all 3 individuals' weight on Jan 1, 2020 on a bar chart to compare visually in Python using Matplotlib. How can I do this?
It's probably easiest done with pandas:
import pandas as pd
import datetime as dt
df = pd.read_excel('your_file_location', sheet_name='sheet_name', parse_dates=['Date'])
df = df.loc[df['Date'] == dt.date(year=2020, month=1, day=1)]
ax = df.plot.bar(df['Name'], df['Weight'])
Here we first load data from a specific sheet of your excel file (you can omit sheet_name argument if your excel file has only a single sheet), then we filter data to show only records from a specific date and then plot with names on x-axis and weight on y-axis.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('C:\\Desktop\\file.csv', index_col = 'Date',
parse_dates = True) #importing data to python and making date column as
index
df['year'] = df.index.year #extracting year from index
data_20 = df[df['year'] == 2020] # Filtering out 2020 date
ax = data_20.plot(kind='bar',x='Name',y='Weight') #Plotting by name for 2020
To plot only for 2 people:
ax = data_20[data_20['Name'] != 'John Smith']
.plot(kind='bar',x='Name',y='Weight') #Plotting by name for 2020
ax.set_ylabel('Weights in lbs') #Labeling y-axis
ax.set_xlabel('Names') #Labeling x-axis
ax.set_title('Weights for 2020') # Adding the title
To make it pretty just add labels:
ax.set_ylabel('Weights in lbs') #Labeling y-axis
ax.set_xlabel('Names') #Labeling x-axis
ax.set_title('Weights for 2020'); # Adding the title
Related
I have read in a monthly temperature anomalies csv file using Pandas read.csv() function. Years are from 1881 to 2022. I excluded the last 3 months of 202 to avoid -999 values). Date format is yyyy-mm-dd. How can I just plot the year and only one value instead of 12 on the x-axis (i.e., I don't need 12 1851s, 1852s, etc.)?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.dates import YearLocator, MonthLocator, DateFormatter
import matplotlib.dates as mdates
ds = pd.read_csv('path_to_file.csv', header='infer', engine='python', skipfooter=3)
dates = ds['Date']
tAnoms = ds[' Berkeley Earth 2m Air Temperature (degree C) 0N-90N;0E-360E']
fig = plt.figure(figsize=(10,10))
ax = plt.subplot(111)
ax.plot(dates,tAnoms)
ax.plot(dates,tAnoms.rolling(60, center=True).mean())
ax.xaxis.set_major_locator(mdates.YearLocator(month=1) # EDIT
years_fmt = mdates.DateFormatter('%Y') # EDIT 2
ax.xaxis.set_major_formatter(years_fmt) # EDIT 2
plt.show()
EDIT: adding the following gives me the 2nd plot
EDIT 2: Gives me yearly values, but only from 1970-1975. 3rd plot
You could:
Create a new column year from your Date column.
Compute the average temperature for each year (using mean or median): df.groupby(['year']).mean()
So, I found a good, but maybe not perfect solution. First thing I needed to do was use parse_dates & infer_datetime_format when reading in the csv file. Then, convert dates to pydatetime(). mdates.AutoDateLocator() was what I needed along with set_major_formatter. Not sure how I could manually change the interval, however (e.g., change to every 10 years or 25 years instead of using the default. This does work well enough though.
ds = pd.read_csv('path_to_file.csv', parse_dates=['Date'], infer_datetime_format=True,
header='infer', engine='python', skipfooter=3)
dates = ds['Date'].dt.to_pydatetime() # Convert to pydatetime()
tAnoms = ds[' Berkeley Earth 2m Air Temperature (degree C) 0N-90N;0E-360E']
fig = plt.figure(figsize=(10,10))
ax = plt.subplot(111)
# Produce plot
ax.plot(dates,tAnoms.rolling(60, center=True).mean())
# Use AutoDateLocator() from matplotlib.dates (mdates)
# Set date format to years
ax.xaxis.set_major_locator(mdates.AutoDateLocator())
years_fmt = mdates.DateFormatter('%Y')
ax.xaxis.set_major_formatter(years_fmt)
plt.show()
I have the following dataframe coming from an excel file:
df = pd.read_excel('base.xlsx')
My excel file contains the following columns:
data - datetime64[ns]
stock- float64
demand - float64
origem - object
I need to plot a bar chart where the x-axis will be the date and the bars the stock and demand. Blue would be the demand and orange the stock:
This can be done with the pandas bar plot function. Note that if there are dates that are not recorded in your dataset (e.g. weekends or national holidays) they will not be automatically displayed with a gap in the bar plot. This is because bar plots in pandas (and other packages) are made primarily for categorical data, as mentioned here and here.
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create a random time series with the date as index
# In your case where you are importing your dataset from excel you
# would assign your date column to the df index like this:
rng = np.random.default_rng(123)
days = 7
df = pd.DataFrame(dict(demand = rng.uniform(100, size=days),
stock = rng.uniform(100, size=days),
origin = np.random.choice(list('ABCD'), days)),
index = pd.date_range(start='2020-12-14', freq='D', periods=days))
# Create pandas bar plot
fig, ax = plt.subplots(figsize=(10,5))
df.plot.bar(ax=ax, color=['tab:blue', 'tab:orange'])
# Assign ticks with custom tick labels
# Date format codes for xticklabels:
# https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
plt.xticks(ax.get_xticks(), [ts.strftime('%A') for ts in df.index], rotation=0)
plt.legend(frameon=False)
plt.show()
I am trying to plot temperature with respect to time data from a csv file.
My goal is to have a graph which shows the temperature data per day.
My problem is the x-axis: I would like to show the time for uniformly and only be in hours and minutes with 15 minute intervals, for example: 00:00, 00:15, 00:30.
The csv is loaded into a pandas dataframe, where I filter the data to be shown based on what day it is, in the code I want only temperature data for 18th day of the month.
Here is the csv data that I am loading in:
date,temp,humid
2020-10-17 23:50:02,20.57,87.5
2020-10-17 23:55:02,20.57,87.5
2020-10-18 00:00:02,20.55,87.31
2020-10-18 00:05:02,20.54,87.17
2020-10-18 00:10:02,20.54,87.16
2020-10-18 00:15:02,20.52,87.22
2020-10-18 00:20:02,20.5,87.24
2020-10-18 00:25:02,20.5,87.24
here is the python code to make the graph:
import pandas as pd
import datetime
import matplotlib.pyplot as plt
df = pd.read_csv("saveData2020.csv")
#make new columns in dataframe so data can be filtered
df["New_Date"] = pd.to_datetime(df["date"]).dt.date
df["New_Time"] = pd.to_datetime(df["date"]).dt.time
df["New_hrs"] = pd.to_datetime(df["date"]).dt.hour
df["New_mins"] = pd.to_datetime(df["date"]).dt.minute
df["day"] = pd.DatetimeIndex(df['New_Date']).day
#filter the data to be only day 18
ndf = df[df["day"]==18]
#display dataframe in console
pd.set_option('display.max_rows', ndf.shape[0]+1)
print(ndf.head(10))
#plot a graph
ndf.plot(kind='line',x='New_Time',y='temp',color='red')
#edit graph to be sexy
plt.setp(plt.gca().xaxis.get_majorticklabels(),'rotation', 30)
plt.xlabel("time")
plt.ylabel("temp in C")
#show graph with the sexiness edits
plt.show()
here is the graph I get:
Answer
First of all, you have to convert "New Time" (your x axis) from str to datetime type with:
ndf["New_Time"] = pd.to_datetime(ndf["New_Time"], format = "%H:%M:%S")
Then you can simply add this line of code before showing the plot (and import the proper matplotlib library, matplotlib.dates as md) to tell matplotlib you want only hours and minutes:
plt.gca().xaxis.set_major_formatter(md.DateFormatter('%H:%M'))
And this line of code to fix the 15 minutes span for the ticks:
plt.gca().xaxis.set_major_locator(md.MinuteLocator(byminute = [0, 15, 30, 45]))
For more info on x axis time formatting you can check this answer.
Code
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv("saveData2020.csv")
#make new columns in dataframe so data can be filtered
df["New_Date"] = pd.to_datetime(df["date"]).dt.date
df["New_Time"] = pd.to_datetime(df["date"]).dt.time
df["New_hrs"] = pd.to_datetime(df["date"]).dt.hour
df["New_mins"] = pd.to_datetime(df["date"]).dt.minute
df["day"] = pd.DatetimeIndex(df['New_Date']).day
#filter the data to be only day 18
ndf = df[df["day"]==18]
ndf["New_Time"] = pd.to_datetime(ndf["New_Time"], format = "%H:%M:%S")
#display dataframe in console
pd.set_option('display.max_rows', ndf.shape[0]+1)
print(ndf.head(10))
#plot a graph
ndf.plot(kind='line',x='New_Time',y='temp',color='red')
#edit graph to be sexy
plt.setp(plt.gca().xaxis.get_majorticklabels(),'rotation', 30)
plt.xlabel("time")
plt.ylabel("temp in C")
plt.gca().xaxis.set_major_locator(md.MinuteLocator(byminute = [0, 15, 30, 45]))
plt.gca().xaxis.set_major_formatter(md.DateFormatter('%H:%M'))
#show graph with the sexiness edits
plt.show()
Plot
Notes
If you do not need "New_Date", "New_Time", "New hrs", "New_mins" and "day" columns for other purposes than plotting, you can use a shorter version of the above code, getting rid of those columns and appling the day filter directly on "date" column as here:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv("saveData2020.csv")
# convert date from string to datetime
df["date"] = pd.to_datetime(df["date"], format = "%Y-%m-%d %H:%M:%S")
#filter the data to be only day 18
ndf = df[df["date"].dt.day == 18]
#display dataframe in console
pd.set_option('display.max_rows', ndf.shape[0]+1)
print(ndf.head(10))
#plot a graph
ndf.plot(kind='line',x='date',y='temp',color='red')
#edit graph to be sexy
plt.setp(plt.gca().xaxis.get_majorticklabels(),'rotation', 30)
plt.xlabel("time")
plt.ylabel("temp in C")
plt.gca().xaxis.set_major_locator(md.MinuteLocator(byminute = [0, 15, 30, 45]))
plt.gca().xaxis.set_major_formatter(md.DateFormatter('%H:%M'))
#show graph with the sexiness edits
plt.show()
This code will reproduce exactly the same plot as before.
I have a simple dataframe with two columns, 'date' and 'amount'. I want to plot the amount using date as the x-axis. The first lines of the data are:
22/05/2018,52068.67
21/05/2018,52159.19
15/05/2018,52744.03
08/05/2018,54666.21
08/05/2018,54677.51
01/05/2018,53890.59
30/04/2018,54812.25
27/04/2018,52258.23
26/04/2018,52351.47
23/04/2018,49777.04
23/04/2018,49952.44
23/04/2018,49992.44
05/04/2018,53238.59
03/04/2018,53631.09
03/04/2018,53839.64
28/03/2018,50836.78
26/03/2018,51206.67
26/03/2018,51372.02
14/03/2018,51110.17
12/03/2018,51411.31
06/03/2018,51169.91
05/03/2018,51374.57
27/02/2018,48728.85
27/02/2018,48730.5
16/02/2018,44988.25
14/02/2018,41948.03
12/02/2018,43776.31
12/02/2018,43800.31
12/02/2018,43840.11
05/02/2018,29358.96
26/01/2018,39491.0
24/01/2018,36470.03
23/01/2018,36562.76
23/01/2018,36616.61
22/01/2018,36582.46
22/01/2018,36665.71
22/01/2018,36743.31
17/01/2018,36965.3
16/01/2018,37044.6
09/01/2018,42083.65
08/01/2018,42183.39
05/01/2018,42285.41
03/01/2018,41537.51
03/01/2018,41579.51
02/01/2018,41945.32
27/12/2017,43003.33
27/12/2017,43217.29
18/12/2017,38208.63
15/12/2017,38315.53
However, the plot gives me points that don't appear in the data. For example, in May 2018 there is no value near 30000.
My code is:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("test.csv", header=None, names =['date', 'amount'])
df['time'] = pd.to_datetime(df['date'])
df.set_index(['time'],inplace=True)
df['amount'].plot()
plt.show()
What am I doing wrong?
You need to covert the dates to date time using correct format and use pandas plot
df['date'] = pd.to_datetime(df['date'], format = '%d/%m/%Y')
df.plot('date', 'amount')
Newbie question, thank you in advance!
I'm trying to group the data by both date and industry and display a chart that shows the different industry revenue numbers across the time series in monthly increments.
I am working from a SQL export that has timestamps, having a bear of time getting this to work.
Posted sample csv data file here:
https://drive.google.com/open?id=0B4xdnV0LFZI1WGRMN3AyU2JERVU
Here's a small data example:
Industry Date Revenue
Fast Food 01-05-2016 12:18:02 100
Fine Dining 01-08-2016 09:17:48 110
Carnivals 01-18-2016 10:48:52 200
My failed attempt is here:
import pandas as pd
import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('2012_to_12_27_2016.csv')
df['Ship_Date'] = pd.to_datetime(df['Ship_Date'], errors = 'coerce')
df['Year'] = df.Ship_Date.dt.year
df['Ship_Date'] = pd.DatetimeIndex(df.Ship_Date).normalize()
df.index = df['Ship_Date']
df_skinny = df[['Shipment_Piece_Revenue', 'Industry']]
groups = df_skinny[['Shipment_Piece_Revenue', 'Industry']].groupby('Industry')
groups = groups.resample('M').sum()
groups.index = df['Ship_Date']
fig, ax = plt.subplots()
groups.plot(ax=ax, legend=False)
names = [item[0] for item in groups]
ax.legend(ax.lines, names, loc='best')
plt.show()
You could use DataFrame.Series.unique to get a list of all industries and then, using DataFrame.loc, define a new DataFrame object that only contains data from a single Industry.
Then if we set the Ship Date column as the index of the new DataFrame, we can use DataFrame.resample, specify the frequency as months and call sum() to get the total revenue for that month.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('Graph_Sample_Data.csv')
df['Ship Date'] = pd.to_datetime(df['Ship Date'], errors='coerce')
fig, ax = plt.subplots()
for industry in df.Industry.unique():
industry_df = df.loc[df.Industry == industry]
industry_df.index = industry_df['Ship Date']
industry_df = industry_df.resample('M').sum()
industry_df.plot(x=industry_df.index,
y='Revenue',
ax=ax,
label=industry)
plt.show()