Plotting a pandas dataframe using column names as x axis - python

I have the following Pandas Dataframe (linked above) and I'd like to plot a graph with the values 1.0 - 39.0 on the x axis and the y axis would be the dataframe values in the column of these (-0.004640 etc). The rows are other lines I'd like to plot, so at the end there will be a lot of lines.
I've tried to transpose my plot but that doesn't seem to work.
How could I go about doing this?

You could try to use matplotlib:
import matplotlib.pyplot as plt
%matplotlib inline
x=[1.0, 39.0]
plt.plot(x, df[1.0])
plt.plot(x, df[2.0})
...

Related

Python - Pandas - Plot Data frame without headers

I am trying to use the .plot() function in pandas to plot data into a line graph.
The data sorted by date with 48 rows after each date. Sample below:
1 2 ... 46 47 48
18 2018-02-19 1.317956 1.192840 ... 1.959250 1.782985 1.418093
19 2018-02-20 1.356267 1.192248 ... 2.123432 1.760629 1.569340
20 2018-02-21 1.417181 1.288694 ... 2.086715 1.823581 1.612062
21 2018-02-22 1.431536 1.279514 ... 2.201972 1.878109 1.694159
etc until row 346.
I tried the below but .plot does not seem to take positional arguments:
df.plot(x=df.iloc[0:346,0],y=[0:346,1:49]
How would I go about plotting my rows by date (the 1st column) on a line graph and can I expand this to include multiple axis?
There are multiple ways to do this, some of which are directly through the pandas dataframe. However, given your sample plotting line, I think the easiest might be to just use matplotlib directly:
import matplotlib.pyplot as plt
plt.plot(df.iloc[0:346,0],df.iloc[0:346,1:49])
For multiple axes you can add a few lines to make subplots. For example:
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = plt.subplot(1,2,1)
plt.plot(df.iloc[0:346,0],df.iloc[0:346,1:10],ax=ax1)
ax2 = plt.subplot(1,2,2)
plt.plot(df.iloc[0:346,0],df.iloc[0:346,20:30],ax=ax2)
You can also do this using the pandas plot() function that you were trying to use - it also takes an ax argument the same way as above, where you can provide the axis on which to plot. If you want to stick to pandas, I think you'd be best off setting the index to be a datetime index (see this link as an example: https://stackoverflow.com/a/27051371/12133280) and then using df.plot('column_name',ax=ax1). The x axis will be the index, which you would have set to be the date.

How do you change the spread of the Y axis of pandas box plot?

I am plotting 100 data points for 9 different groups. One group's data points are much larger than all the other groups so when I make a box graph using pandas only that group is shown, while all other groups are smashed to the bottom. Here is what it looks like now: smushed box plot
I would like the Y axis to be more spaced out so that I can see the other groups' box graphs. Here is similar data in a scatter plot that has the spacing I am looking for: well spaced scatter plot
What I have
What is need
Here is my code at the moment:
# use ``` to designate a code block in markdown
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("residues.csv")
df.plot.box()
plt.show()
It looks like you want y to be log-scaled:
df.plot.box(logy=True)
Try this:
boxplot = df.boxplot(column=df.columns)
plt.show()
Reference
See the pandas documentation on boxplot: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.html

How can I force the x axis to use column entries

I am trying to create a chart using a data frame which has TimePeriod as 201811, 201812, 201901, ..., 202006 which I want to use as the x axis values and plot against the y values (Total lives). See figure here:
However, when I plot the figure the x axis shows up as 201825, 201850, 201875, 201925,..., 202025. This clearly makes no sense and I cannot figure out how to force python to plot the desired x axis.
I am assuming it is something in xticks but I haven't has any luck. I have also tried manually entering all x axis values as labels = ('201811', '201812', '201901', ...) but this did not work either.
Is there any way to achieve the desired outcome?
Code:
import numpy as np
import pyodbc
import matplotlib.pyplot as plt
aggregated_lives_plt = aggregated_lives.plot(x= 'TimePeriodId', y='TotalLives', kind = 'line')
plt.title('Aggregated Optional Benefit Certs Since Nov-2018')
plt.xlabel('Time Period')
plt.ylabel('Total Certs (Lives)')
plt.show()
Thank you for any help!
You Timeperiod is integers, you can convert it to string:
aggregated_lives['TimePeriodId'] = aggregated_lives['TimePeriodId'].astype(str)
then use your plot command.

how to graph hours in axis 'y' python data frame

I have two columns in a data frame, D and E, where D are the values in 'HH:MM:SS' and E are int values corresponding D. And I want to plot in the axis Y the hours and de axis X the int values. I'm doing this with matplotlib but they are not sorted and each value is on the y axis.
My code is like that:
elementosx =dftunels['E']
elementosy = dftunels['D']
plt.scatter(elementosx, elementosy)
plt.xticks(elementosx)
plt.plot(elementosx,elementosy)
plt.show()
I find it easier to use seaborn like this:
import seaborn as sns
sns.scatterplot(x='E',y= 'D',data = dftunels)
But you can of course do it with matplotlib also
plt.plot(dftunels['E'],dftunels['D'], 'o')

Python matplotlib time series plot -- show years on x axis

The Python code
import matplotlib.pyplot as plt
...
xx["SPY"].plot()
plt.show()
where xx is a data frame with a date index produces a time series plot where the dates 11/30/2010, 9/15/2011, 7/2/2012 etc. are shown as labels on the x axis. I would like to labels corresponding to years, so that only "2010", "2011", "2012" etc. are shown on the x axis. How can this be done?

Categories