how to graph hours in axis 'y' python data frame - python

I have two columns in a data frame, D and E, where D are the values in 'HH:MM:SS' and E are int values corresponding D. And I want to plot in the axis Y the hours and de axis X the int values. I'm doing this with matplotlib but they are not sorted and each value is on the y axis.
My code is like that:
elementosx =dftunels['E']
elementosy = dftunels['D']
plt.scatter(elementosx, elementosy)
plt.xticks(elementosx)
plt.plot(elementosx,elementosy)
plt.show()

I find it easier to use seaborn like this:
import seaborn as sns
sns.scatterplot(x='E',y= 'D',data = dftunels)
But you can of course do it with matplotlib also
plt.plot(dftunels['E'],dftunels['D'], 'o')

Related

In a scatterplot, how do I plot a line that is an average of the all vertical coordinates of datapoints that has the same x coordinate

I want something like the plots shown in figure below, where the blue line is the average line that is generated by plotting the mean of all y-coordinate values of data-points that have the same x-coordinate values.
I tried the code below
window_size = 10
df_avg = pd.DataFrame(columns=df.columns)
for col in df.columns:
df_avg[col] = df[col].rolling(window=window_size).mean()
plt.figure(figsize=(20,20))
for idx, col in enumerate(df.columns, 1):
plt.subplot(df.shape[1]-4, 4, idx)
sns.scatterplot(data=df, x=col, y='charges')
plt.plot(df_avg[col],df['charges'])
plt.xlabel(col)
And, got plots shown below, which obviously, is not what I wanted.
If you're looking for a purely matplotlib way to do it. Here is a possible direction you can take:
import matplotlib.pyplot as plt
import numpy as np
### Create toy dataset consisting of (500,2) points
N_points=500
rand_pts=np.random.choice(50,size=(N_points,2))
#create a dictionary with keys the unique x values and values the different y values corresponding to this unique x
rand_dict={uni:rand_pts[np.where(rand_pts[:,0]==uni),1] for uni in np.unique(rand_pts[:,0])}
#plot
plt.scatter(rand_pts[:,0],rand_pts[:,1],s=50) #plot the scatter plot
plt.plot(list(rand_dict.keys()),[np.mean(val) for val in rand_dict.values()],color='tab:orange',lw=4) #plot the mean y values for each unique x

Plotting a pandas dataframe using column names as x axis

I have the following Pandas Dataframe (linked above) and I'd like to plot a graph with the values 1.0 - 39.0 on the x axis and the y axis would be the dataframe values in the column of these (-0.004640 etc). The rows are other lines I'd like to plot, so at the end there will be a lot of lines.
I've tried to transpose my plot but that doesn't seem to work.
How could I go about doing this?
You could try to use matplotlib:
import matplotlib.pyplot as plt
%matplotlib inline
x=[1.0, 39.0]
plt.plot(x, df[1.0])
plt.plot(x, df[2.0})
...

Grid of plots with lines overplotted in matplotlib

I have a dataframe that consists of a bunch of x,y data that I'd like to see in scatter form along with a line. The dataframe consists of data with its form repeated over multiple categories. The end result I'd like to see is some kind of grid of the plots, but I'm not totally sure how matplotlib handles multiple subplots of overplotted data.
Here's an example of the kind of data I'm working with:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
category = np.arange(1,10)
total_data = pd.DataFrame()
for i in category:
x = np.arange(0,100)
y = 2*x + 10
data = np.random.normal(0,1,100) * y
dataframe = pd.DataFrame({'x':x, 'y':y, 'data':data, 'category':i})
total_data = total_data.append(dataframe)
We have x data, we have y data which is a linear model of some kind of generated dataset (the data variable).
I had been able to generate individual plots based on subsetting the master dataset, but I'd like to see them all side-by-side in a 3x3 grid in this case. However, calling the plots within the loop just overplots them all onto one single image.
Is there a good way to take the following code block and make a grid out of the category subsets? Am I overcomplicating it by doing the subset within the plot call?
plt.scatter(total_data['x'][total_data['category']==1], total_data['data'][total_data['category']==1])
plt.plot(total_data['x'][total_data['category']==1], total_data['y'][total_data['category']==1], linewidth=4, color='black')
If there's a simpler way to generate the by-category scatter plus line, I'm all for it. I don't know if seaborn has a similar or more intuitive method to use than pyplot.
You can use either sns.FacetGrid or manual plt.plot. For example:
g = sns.FacetGrid(data=total_data, col='category', col_wrap=3)
g = g.map(plt.scatter, 'x','data')
g = g.map(plt.plot,'x','y', color='k');
Gives:
Or manual plt with groupby:
fig, axes = plt.subplots(3,3)
for (cat, data), ax in zip(total_data.groupby('category'), axes.ravel()):
ax.scatter(data['x'], data['data'])
ax.plot(data['x'], data['y'], color='k')
gives:

Creating a rectangular Heatmap from two columns of a Pandas dataframe

I have a Pandas dataframe like this -
The value in column 's' is the accuracy of a model for the corresponding values of 'k' and 'w'. So, it's strictly between 0 and 1.
I want to plot a heatmap such that a 7x2 grid of k (7 values) along X axis and w (2 values) along y axis will be created and the corresponding cell will be colored depending on the value of s.
I tried Seaborn's headmap function but it doesn't let me define which column to use to color the grid.
Try:
import matplotlib.pyplot as plt
import seaborn as sns
flights = df.pivot("w", "k", "s")
f, ax = plt.subplots(figsize=(9, 6))
sns.heatmap(flights, annot=True, linewidths=.5, ax=ax)
plt.show()

Matplotlib duplicated y axis

I am trying to plot two different lines from a same vector in python using Matplotlib. For this, I use an additional vector whose values on certain indices filter the general array to plot in a certain line. The code is:
import matplotlib.pyplot as plt
def visualize_method(general, changes):
'''Correctly plots the data to visualize reversals and trials'''
x = np.array([i for i in range(len(general))])
plt.plot(x[changes==0], general[changes==0],
x[changes==1], general[changes==1],
linestyle='--', marker='o')
plt.show()
When plotting the data, the result is:
As it can be observed, the y axis is "duplicated", how could I use the same y and x axis for this filtered plot?

Categories