Currently working to try out matplotlib using object oriented interface. I'm still new to this tool.
This is the end result of the graph (using excel) I want to create using matplotlib.
I have load the table into dataframe which look like this.
Below is the code I wrote.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
loaddf = pd.read_excel("C:\\SampleRevenue.xlsx")
#to get the row on number of tickets
count = loaddf.iloc[0]
#to get the total proceeds I get from sellling the ticket
vol = loaddf.iloc[1]
#The profit from tickets after deducting costs
profit = loaddf.iloc[2]
fig, ax = plt.subplots(figsize=(8, 4))
ax.barh(str(count), list(loaddf.columns.values))
Somehow this is the graph I received. How do I display the number of tickers in bar form for each month? Intention is Y axis number of tickets and x axis on months
This is the count, vol and profit series after using iloc to extract the rows.
Do i need to remove the series before I use for plotting?
What's happening is that read_excel gets really confused when the dataframe is transposed. It expects the first row to be the titles of the columns, and each subsequent row a next entry. Optionally the first column contains the row labels. In that case, you have to add index_col=0 to the parameters of read_excel. If you copy and paste-transpose everything while in Excel, it could work like:
import pandas as pd
import matplotlib.pyplot as plt
loaddf = pd.read_excel("C:\\SampleRevenue_transposed\.xlsx", index_col=0)
loaddf[["Vol '000"]].plot(kind='bar', title ="Bar plot of Vol '000")
plt.show()
If you don't transpose the Excel, the header row gets part of the data, which causes the "no numeric data to plot" message.
Related
So I have a pandas Dataframe with pateint id, date of visit, location, weight, and heartrate. I need to graph the line of the number of visits in one location in the Dataset over a period of 12 months with the month number on the horizontal axis.
Any other suggestions about how I may go about this?
I tried making the data into 3 data sets and then just graphing the number of visits counted from each data set but creating new columns and assigning the values wasn't working, it only worked for when I was graphing the values of all of the clinics but after splitting it into 3 dataframes, it stopped working.
DataFrame
Here is a working example of filtering a DataFrame and using the filtered results to plot a chart.
import pandas as pd
import matplotlib.pyplot as plt
# larger dataframe example
d = {'x values':[1,2,3,4,5,6,7,8,9],'y values':[2,4,6,8,10,12,14,16,18]}
df = pd.DataFrame(d)
# apply filter
df = df[df['x values'] < 5]
# plot chart
plt.plot(df['x values'], df['y values'])
plt.show()
result:
simply place your data into an ndarray and plot it with the matplotlib.pyplot or you can simply plot from a dataframe for example plt.plot(df['something'])
newbie programmer here:
I have this big data set (Excel file) on gas, hydro, and water bills per unit from 2017 to 2020 for each building.
Basically, the first column is the date column, and each subsequent column has the building name as the title of the column which contains the cost/unit for that particular building.
So there are 61 buildings, hence 61 columns, plus the date column bringing the total # of columns to 62. I am trying to make 62 individual plots of "cost/unit vs time", whereby I want my cost/unit to be on the y axis and the date(time) to be on the x axis.
I think I am getting the plots right, I am just not able to figure out why my dates don't come the way they should on the x axis.
Here is the code:
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stat
import numpy as np
import math as mt
import matplotlib.dates as mdates
from datetime import datetime
df1 = pd.read_csv('Gas Costs.csv')
df1['Date'] = pd.to_datetime(df1['Date'], format='%m-%y')
df1 = df1.set_index('Date')
for column in df1:
columnSeriesObj = df1[column]
plt.plot(columnSeriesObj.values)
plt.gcf().autofmt_xdate()
plt.show()
By doing this, I get 61 plots, one of which looks like this:
Cost/unit v/s time plot for one of the buildings
I also wish to give each plot a title stating the building name, but I am unable to do so. I tried it using the for loop but didn't strike much luck with it.
Any help on this will be appreciated!
I am trying to plot columns of data form a .csv file in a boxplot/violin plot using matplotlib.pyplot.
When setting the dataframe [df] to one column of data, the plotting works fine. However once I try to plot two columns I do not get a plot generated, and the code seems like it's just running and running, so I think there is something to how I am passing along the data. Each columns is 54,500 row long.
import os
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from pandas import read_csv
os.chdir(r"some_directory//")
df = read_csv(r"csv_file.csv")
# the csv file is 7 columns x 54500 rows, only concerned with two columns
df = df[['surge', 'sway']]
# re-size the dataframe to only use two columns
data = df[['surge', 'sway']]
#print data to just to confirm
print(data)
plt.violinplot(data, vert=True, showmeans=True, showmedians=True)
plt.show()
If I change the data line to data = df['surge'] I get a perfect plot with the 54501 surge values.
When I introduce the second variable as data = df[['surge', 'sway']] is when the program gets hung up. I should note the same problem exists if I let data = df[['surge']] so I think it's something to do with the double braces and going from a list to an array, perhaps?
I have two time-series datasets that I want to make a step-chart of.
The time series data is between Monday 2015-04-20 and Friday 2015-04-24.
The first dataset contains 26337 rows with values ranging from 0-1.
The second dataset contains 80 rows with values between 0-4.
First dataset represents motion sensor values in a room, with around 2-3 minutes between each measurement. 1 indicates the room is occupied, 0 indicates that it is empty. The second contains data from a survey where users could fill in how many people were in the same room, at the time they were answering the survey.
Now I want to compare this data, to find out how well the sensor performs. Obviously there is a lot of data that is "missing" in the second set. Is there a way to fill in the "blanks" in a step chart?
Each row has the following format:
Header
Timestamp (%Y-%m-%d %H:%M:%S),value
Example:
Time,Occupancy
24-04-2015 21:40:33,1
24-04-2015 21:43:11,0
.....
So far I have managed to import the first dataset and make a plot of it. Unfortunately the x-axis is not showing dates, but a lot of numbers:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
data = open('PIRDATA.csv')
ts = pd.Series.from_csv(data, sep=',')
plot(ts);
Result:
How would I go on from here on now?
Try to use Pandas to read the data, using the Date column as the index (parsing the values to dates).
data = pd.read_csv('PIRDATA.csv', index_col=0, parse_dates=0)
To achieve your step chart objective, try:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.dates import DateFormatter
from matplotlib.dates import HourLocator
small_dataset = pd.read_csv('SURVEY_RESULTS_WEEK1.csv', header=0,index_col=0, parse_dates=0)
big_dataset = pd.read_csv('PIRDATA_RAW_CONVERTED_DATETIME.csv', header=0,index_col=0, parse_dates=0)
small_dataset.rename(columns={'Occupancy': 'Survey'}, inplace=True)
big_dataset.rename(columns={'Occupancy': 'PIR'}, inplace=True)
big = big_dataset.plot()
big.xaxis.set_major_formatter(DateFormatter('%y-%m-%d H: %H'))
big.xaxis.set_major_locator(HourLocator(np.arange(0, 25, 6)))
big.set_ylabel('Occupancy')
small_dataset.plot(ax=big, drawstyle='steps')
fig = plt.gcf()
fig.suptitle('PIR and Survey Occupancy Comparsion')
plt.show()
I am still very new to Python so this is likely an easy question but I have yet to locate a satisfactory answer. I have data from five different sources which I am trying to plot in one script after loading the data from a Excel file to a single DataFrame. As it is now, I only know how to graph one source at a time or all 5 in a single figure (or somwhere between 1 and 5). Here is my code, the entire script. It may not all be necessary but I have included it all just in case.
import numpy as np
import pandas as pd
from pandas import *
import matplotlib
import matplotlib.pyplot as plot
import datetime as datetime
from datetime import *
#Import data from Excel File
data2007 = pd.ExcelFile('f:\Python\Learning 19-4-2013\Data 2007.xls')
table2007 = data2007.parse('Sheet1', skiprows=[0,1,2,3,4,5], index=None)
#Plot data for first meter
ax = plot.figure(figsize=(7,4), dpi=100).add_subplot(111)
FirstMeter = table2007_3.columns[0]
Meter1 = table2007_3[FirstMeter]
Meter1.plot(ax=ax, style='-v')
#Plot data for second meter
SecondMeter = table2007_3.columns[1]
Meter2 = table2007_3[SecondMeter]
Meter2.plot(ax=ax, style='-v')
#Plot data for third meter
ThirdMeter = table2007_3.columns[2]
Meter3 = table2007_3[ThirdMeter]
Meter3.plot(ax=ax, style='v-')
#Plot data for fourth meter
FourthMeter = table2007_3.columns[3]
Meter4 = table2007_3[FourthMeter]
Meter4.plot(ax=ax, style='v-')
#Plot data for fifth meter
FifthMeter = table2007_3.columns[4]
Meter5 = table2007_3[FifthMeter]
Meter5.plot(ax=ax, style='v-')
#Command to show plots
plot.show()
I see you are making a new Series (e.g., Meter1) out of each column of your DataFrame and then plotting them individually on the same axes. Instead, you can plot the DataFrame itself. Pandas assumes you want to plot each column as a separate line on the same plot, which is exactly what you seem to be doing here.
table_2007.plot(style='v-')
or perhaps table_2007[0:4].plot(style='v-') if there are other columns which you need to leave out.
By default, it also generates a legend, which you can suppress with the keyword argument legend=False.
If you want separate figures, as the title of your question suggests the subplots=True argument might get the job done.