I'm trying to plot the x-axis from the top row of my dataframe, and the y-axis from another row in my dataframe.
My dataframe looks like this:
sector_data =
Time 13:00 13:15 13:30 13:45
Utilities 1235654 1456267 1354894 1423124
Transports 506245 554862 534685 524962
Telecomms 142653 153264 162357 154698
I've tried a lot of different things, with this seeming to make the most sense. But nothing works:
sector_data.plot(kind='line',x='Time',y='Utilities')
plt.show()
I keep getting:
KeyError: 'Time'
It should end up looking like this:
Expected Chart
enter image description here
Given the little information you provide I believe this should help:
df = sector_data.T
df.plot(kind='line',x='Time',y='Utilities')
plt.show()
This is how I made a case example (I have already transposed the dataframe)
import pandas as pd
import matplotlib.pyplot as plt
a = {'Time':['13:00','13:15','13:30','13:45'],'Utilities':[1235654,1456267,1354894,1423124],'Transports':[506245,554862,534685,524962],'Telecomms':[142653,153264,162357,154698]}
df = pd.DataFrame(a)
df.plot(kind='line',x='Time',y='Utilities')
plt.show()
Output:
Let's take an example DataFrame:
import pandas as pd
df = pd.DataFrame({'ColA':['Time','Utilities','Transports','Telecomms'],'ColB':['13:00', 1235654, 506245, 142653],'ColC':['14:00', 1234654, 506145, 142650], 'ColD':['15:00', 4235654, 906245, 142053],'ColE':['16:00', 4205654, 906845, 742053]})
df = df.set_index('ColA') #set index for the column A or the values you want to plot for
Now you can easily plot with matplotlib
plt.plot(df.loc['Time'].values,df.loc['Utilities'].values)
Related
newbie programmer here:
I have this big data set (Excel file) on gas, hydro, and water bills per unit from 2017 to 2020 for each building.
Basically, the first column is the date column, and each subsequent column has the building name as the title of the column which contains the cost/unit for that particular building.
So there are 61 buildings, hence 61 columns, plus the date column bringing the total # of columns to 62. I am trying to make 62 individual plots of "cost/unit vs time", whereby I want my cost/unit to be on the y axis and the date(time) to be on the x axis.
I think I am getting the plots right, I am just not able to figure out why my dates don't come the way they should on the x axis.
Here is the code:
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stat
import numpy as np
import math as mt
import matplotlib.dates as mdates
from datetime import datetime
df1 = pd.read_csv('Gas Costs.csv')
df1['Date'] = pd.to_datetime(df1['Date'], format='%m-%y')
df1 = df1.set_index('Date')
for column in df1:
columnSeriesObj = df1[column]
plt.plot(columnSeriesObj.values)
plt.gcf().autofmt_xdate()
plt.show()
By doing this, I get 61 plots, one of which looks like this:
Cost/unit v/s time plot for one of the buildings
I also wish to give each plot a title stating the building name, but I am unable to do so. I tried it using the for loop but didn't strike much luck with it.
Any help on this will be appreciated!
my dataframe of pivot is looking like this.
df=
DATA
Type P_A P_B
Time
11:38:56 500706.0 981098.0
11:39:46 501704.0 984751.0
11:40:26 501704.0 984737.0
11:43:18 502758.0 987173.0
I want to plot this dataframe. df.plot() is works but since values are very much different on scale so ploting needs to be on different axis . How to do that?
plot has an option secondary_y:
import pandas as pd
df = pd.DataFrame({'Time':['11:38:56', '11:39:46', '11:40:26', '11:43:18'],
'P_A': [500706., 501704., 501704., 502758.],
'P_B': [981098., 984751., 984737., 987173.]})
df.plot(x='Time', y=['P_A','P_B'], secondary_y=['P_B'])
Try using secondary axis-argument to pandas plot as suggested here.
df['P_A'].plot()
df['P_B'].plot(secondary_y=True)
Can someone help me with my problem because I am newby to pandas and I have been confused.
Initially I made some subset selections and everything OK with my new dataframe(which is type pandas.core.frame.DataFrame). My new dataframe has two columns (date, count) and I want to plot a line plot having the date at the x axis and the count on y axis.
Suppose the name of the data frame is df and the names of the columns are date and count according to pandas documentation the command is:
ts = pd.Series(df['count'], index = df['date'])
ts.plot()
where is the wrong?
any help
It's best to refer Pandas website for first hand information. However, you can try the below code out-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # For show command
# Creating a dummy dataframe (You can also go ahead with Series)
df = pd.DataFrame([45, 20], columns=['count'], index=['12/11/2018', '10/1/2018'])
# Converting string to datetime format
df.index = pd.to_datetime(df.index, format='%d/%m/%Y')
df.index
# DatetimeIndex(['2018-11-12', '2018-01-10'], dtype='datetime64[ns]', freq=None)
df.plot()
plt.show()
I have a data set of house prices - House Price Data. When I use a subset of the data in a Numpy array, I can plot it in this nice timeseries chart:
However, when I use the same data in a Panda Series, the chart goes all lumpy like this:
How can I create a smooth time series line graph (like the first image) using a Panda Series?
Here is what I am doing to get the nice looking time series chart (using Numpy array)(after importing numpy as np, pandas as pd and matplotlib.pyplot as plt):
data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True) #pull in csv file, make index the date column and parse the dates
brixton = data[data['RegionName'] == 'Lambeth'] # pull out a subset for the region Lambeth
prices = brixton['AveragePrice'].values # create a numpy array of the average price values
plt.plot(prices) #plot
plt.show() #show
Here is what I am doing to get the lumpy one using a Panda series:
data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice']
plt.plot(prices_panda)
plt.show()
How do I make this second graph show as a nice smooth proper time series?
* This is my first StackOverflow question so please shout if I have left anything out or not been clear *
Any help greatly appreciated
When you did parse_dates=True, pandas read the dates in its default method, which is month-day-year. Your data is formatted according to the British convention, which is day-month-year. As a result, instead of having a data point for the first of every month, your plot is showing data points for the first 12 days of January, and a flat line for the rest of each year. You need to reformat the dates, such as
data.index = pd.to_datetime({'year':data.index.year,'month':data.index.day,'day':data.index.month})
The date format in the file you have is Day/Month/Year. In order for pandas to interprete this format correctly you can use the option dayfirst=True inside the read_csv call.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data/UK-HPI-full-file-2017-08.csv',
index_col='Date', parse_dates=True, dayfirst=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice']
plt.plot(prices_panda)
plt.show()
I am new to pandas data visulaizations and I'm having some trouble with a simple scatter plot. I have a dataframe loaded up from a csv, 6 columns, and 137 rows. But when I try to scatter the data from two columns, I only see 20 datapoints in the generated graph. I expected to see all 137. Any suggestions?
Here is a tidbit of code:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
df = pd.read_csv(file, sep=',', header=0)
df.plot.scatter(x="Parte_aerea_peso_fresco", y="APCEi", marker=".")
And here is the output.
Possibility 1)
Many points are on exactly the same spot. You can manually check in your file.csv
Possibility 2)
Some value are not valid i.e : NaN ( not a number ) or a string, ...
Your dataframe is small: You can check this possibility by printing your DataFrame.
print (df)
print (df[40:60])
df.describe()