I have a data set of house prices - House Price Data. When I use a subset of the data in a Numpy array, I can plot it in this nice timeseries chart:
However, when I use the same data in a Panda Series, the chart goes all lumpy like this:
How can I create a smooth time series line graph (like the first image) using a Panda Series?
Here is what I am doing to get the nice looking time series chart (using Numpy array)(after importing numpy as np, pandas as pd and matplotlib.pyplot as plt):
data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True) #pull in csv file, make index the date column and parse the dates
brixton = data[data['RegionName'] == 'Lambeth'] # pull out a subset for the region Lambeth
prices = brixton['AveragePrice'].values # create a numpy array of the average price values
plt.plot(prices) #plot
plt.show() #show
Here is what I am doing to get the lumpy one using a Panda series:
data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice']
plt.plot(prices_panda)
plt.show()
How do I make this second graph show as a nice smooth proper time series?
* This is my first StackOverflow question so please shout if I have left anything out or not been clear *
Any help greatly appreciated
When you did parse_dates=True, pandas read the dates in its default method, which is month-day-year. Your data is formatted according to the British convention, which is day-month-year. As a result, instead of having a data point for the first of every month, your plot is showing data points for the first 12 days of January, and a flat line for the rest of each year. You need to reformat the dates, such as
data.index = pd.to_datetime({'year':data.index.year,'month':data.index.day,'day':data.index.month})
The date format in the file you have is Day/Month/Year. In order for pandas to interprete this format correctly you can use the option dayfirst=True inside the read_csv call.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data/UK-HPI-full-file-2017-08.csv',
index_col='Date', parse_dates=True, dayfirst=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice']
plt.plot(prices_panda)
plt.show()
Related
So I have a pandas Dataframe with pateint id, date of visit, location, weight, and heartrate. I need to graph the line of the number of visits in one location in the Dataset over a period of 12 months with the month number on the horizontal axis.
Any other suggestions about how I may go about this?
I tried making the data into 3 data sets and then just graphing the number of visits counted from each data set but creating new columns and assigning the values wasn't working, it only worked for when I was graphing the values of all of the clinics but after splitting it into 3 dataframes, it stopped working.
DataFrame
Here is a working example of filtering a DataFrame and using the filtered results to plot a chart.
import pandas as pd
import matplotlib.pyplot as plt
# larger dataframe example
d = {'x values':[1,2,3,4,5,6,7,8,9],'y values':[2,4,6,8,10,12,14,16,18]}
df = pd.DataFrame(d)
# apply filter
df = df[df['x values'] < 5]
# plot chart
plt.plot(df['x values'], df['y values'])
plt.show()
result:
simply place your data into an ndarray and plot it with the matplotlib.pyplot or you can simply plot from a dataframe for example plt.plot(df['something'])
I have a dataframe including random data over 7 days and each data point is indexed by DatetimeIndex. I want to plot data of each day on a single plot. Currently my try is the following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n =10000
i = pd.date_range('2018-04-09', periods=n, freq='1min')
ts = pd.DataFrame({'A': [np.random.randn() for i in range(n)]}, index=i)
dates = list(ts.index.map(lambda t: t.date).unique())
for date in dates:
ts['A'].loc[date.strftime('%Y-%m-%d')].plot()
The result is the following:
As you can see when DatetimeIndex is used the corresponding day is kept that is why we have each day back to the next one.
Questions:
1- How can I fix the current code to have an x-axis which starts from midnight and ends next midnight.
2- Is there a pandas way to group days better and plot them on a single day without using for loop?
You can split the index into dates and times and unstack the ts into a dataframe:
df = ts.set_index([ts.index.date, ts.index.time]).unstack(level=0)
df.columns = df.columns.get_level_values(1)
then plot all in one chart:
df.plot()
or in separate charts:
axs = df.plot(subplots=True, title=df.columns.tolist(), legend=False, figsize=(6,8))
axs[0].figure.execute_constrained_layout()
Currently working to try out matplotlib using object oriented interface. I'm still new to this tool.
This is the end result of the graph (using excel) I want to create using matplotlib.
I have load the table into dataframe which look like this.
Below is the code I wrote.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
loaddf = pd.read_excel("C:\\SampleRevenue.xlsx")
#to get the row on number of tickets
count = loaddf.iloc[0]
#to get the total proceeds I get from sellling the ticket
vol = loaddf.iloc[1]
#The profit from tickets after deducting costs
profit = loaddf.iloc[2]
fig, ax = plt.subplots(figsize=(8, 4))
ax.barh(str(count), list(loaddf.columns.values))
Somehow this is the graph I received. How do I display the number of tickers in bar form for each month? Intention is Y axis number of tickets and x axis on months
This is the count, vol and profit series after using iloc to extract the rows.
Do i need to remove the series before I use for plotting?
What's happening is that read_excel gets really confused when the dataframe is transposed. It expects the first row to be the titles of the columns, and each subsequent row a next entry. Optionally the first column contains the row labels. In that case, you have to add index_col=0 to the parameters of read_excel. If you copy and paste-transpose everything while in Excel, it could work like:
import pandas as pd
import matplotlib.pyplot as plt
loaddf = pd.read_excel("C:\\SampleRevenue_transposed\.xlsx", index_col=0)
loaddf[["Vol '000"]].plot(kind='bar', title ="Bar plot of Vol '000")
plt.show()
If you don't transpose the Excel, the header row gets part of the data, which causes the "no numeric data to plot" message.
Can someone help me with my problem because I am newby to pandas and I have been confused.
Initially I made some subset selections and everything OK with my new dataframe(which is type pandas.core.frame.DataFrame). My new dataframe has two columns (date, count) and I want to plot a line plot having the date at the x axis and the count on y axis.
Suppose the name of the data frame is df and the names of the columns are date and count according to pandas documentation the command is:
ts = pd.Series(df['count'], index = df['date'])
ts.plot()
where is the wrong?
any help
It's best to refer Pandas website for first hand information. However, you can try the below code out-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # For show command
# Creating a dummy dataframe (You can also go ahead with Series)
df = pd.DataFrame([45, 20], columns=['count'], index=['12/11/2018', '10/1/2018'])
# Converting string to datetime format
df.index = pd.to_datetime(df.index, format='%d/%m/%Y')
df.index
# DatetimeIndex(['2018-11-12', '2018-01-10'], dtype='datetime64[ns]', freq=None)
df.plot()
plt.show()
I have two time-series datasets that I want to make a step-chart of.
The time series data is between Monday 2015-04-20 and Friday 2015-04-24.
The first dataset contains 26337 rows with values ranging from 0-1.
The second dataset contains 80 rows with values between 0-4.
First dataset represents motion sensor values in a room, with around 2-3 minutes between each measurement. 1 indicates the room is occupied, 0 indicates that it is empty. The second contains data from a survey where users could fill in how many people were in the same room, at the time they were answering the survey.
Now I want to compare this data, to find out how well the sensor performs. Obviously there is a lot of data that is "missing" in the second set. Is there a way to fill in the "blanks" in a step chart?
Each row has the following format:
Header
Timestamp (%Y-%m-%d %H:%M:%S),value
Example:
Time,Occupancy
24-04-2015 21:40:33,1
24-04-2015 21:43:11,0
.....
So far I have managed to import the first dataset and make a plot of it. Unfortunately the x-axis is not showing dates, but a lot of numbers:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
data = open('PIRDATA.csv')
ts = pd.Series.from_csv(data, sep=',')
plot(ts);
Result:
How would I go on from here on now?
Try to use Pandas to read the data, using the Date column as the index (parsing the values to dates).
data = pd.read_csv('PIRDATA.csv', index_col=0, parse_dates=0)
To achieve your step chart objective, try:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.dates import DateFormatter
from matplotlib.dates import HourLocator
small_dataset = pd.read_csv('SURVEY_RESULTS_WEEK1.csv', header=0,index_col=0, parse_dates=0)
big_dataset = pd.read_csv('PIRDATA_RAW_CONVERTED_DATETIME.csv', header=0,index_col=0, parse_dates=0)
small_dataset.rename(columns={'Occupancy': 'Survey'}, inplace=True)
big_dataset.rename(columns={'Occupancy': 'PIR'}, inplace=True)
big = big_dataset.plot()
big.xaxis.set_major_formatter(DateFormatter('%y-%m-%d H: %H'))
big.xaxis.set_major_locator(HourLocator(np.arange(0, 25, 6)))
big.set_ylabel('Occupancy')
small_dataset.plot(ax=big, drawstyle='steps')
fig = plt.gcf()
fig.suptitle('PIR and Survey Occupancy Comparsion')
plt.show()