Python error: generating a scatter plot using matplotlib - python

I am a python newbie suffering from how to import CSV file in matplotlib.pyplot
I would like to see the relationship between hour (=how many hours people spent to play a video game) and level (=game level). and then I would like to draw a scatter plot with Tax in different colors between female(1) and male(0).So, my x would be 'hour' and my y would be 'level'.
my data csv file looks like this:
hour gender level
0 8 1 20.00
1 9 1 24.95
2 12 0 10.67
3 12 0 18.00
4 12 0 17.50
5 13 0 13.07
6 10 0 14.45
...
...
499 12 1 19.47
500 16 0 13.28
Here's my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df=pd.read_csv('data.csv')
plt.plot(x,y, lavel='some relationship')
plt.title("Some relationship")
plt.xlabel('hour')
plt.ylabel('level')
plt.plot[gender(gender=1), '-b', label=female]
plt.plot[gender(gender=0), 'gD', label=male]
plt.axs()
plt.show()
I would like to draw the following graph. So, there will be two lines of male and female.
y=level| #----->male
| #
| * *----->female
|________________ x=hour
However, I am not sure how to solve this problem.
I kept getting an error NameError: name 'hour' is not defined.

Could do it in this way:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(data={"hour": [8,9,12,12,12,13,10],
"gender": [1,1,0,0,0,0,0],
"level": [20, 24.95, 10.67, 18, 17.5, 13.07, 14.45]})
df.sort_values("hour", ascending=True, inplace=True)
fig = plt.figure(dpi=80)
ax = fig.add_subplot(111, aspect='equal')
ax.plot(df.hour[df.gender==1], df.level[df.gender==1], c="red", label="male")
ax.plot(df.hour[df.gender==0], df.level[df.gender==0], c="blue", label="female")
plt.xlabel('hour')
plt.ylabel('level')

Related

python, using seaborn or plotly to show graphs for medication points over time

I'm trying to plot my heart rate and steps against when i have coffee.
The dataset is like this,
heartRate steps coffee
created
2020-04-14 06:03:00 71.0 NaN 0
2020-04-14 09:03:00 72.0 NaN 1
2020-04-14 09:55:00 61.0 NaN 1
2020-04-14 09:58:00 67.0 NaN 1
2020-04-14 10:01:00 82.0 NaN 2
Where 1 is the hour i had coffee and 2 is 4 hours after and 0 is no coffee.
Currently I'm trying to plot these points out like this,
import seaborn as sns
sns.set_theme(style="darkgrid")
# Load an example dataset with long-form data
fmri = tester
# Plot the responses for different events and regions
sns.lineplot(x="created", y="heartRate", style="medication",
data=fmri)
But this is giving me a unreadable graph like this,
I'd love to have this be more readable - what am i doing wrong with this library?
Perhaps better take matplotlib. There you can play around with the graph size and the marker and line width to optimize and adjust.
import matplotlib.pyplot as plt
from matplotlib import rc
import pandas as pd
import numpy as np
from datetime import datetime, timedelta as delta
font = {'family' : 'Arial',
'weight' : 'normal',
'size' : 60}
rc('font', **font)
ndays = 1000
start = datetime(2018, 12, 1)
dates = [start - delta(days = x) for x in range(0, ndays)]
df = pd.DataFrame({'time':dates,
'heartrate':np.random.randint(60,70,len(dates)),
'coffee':np.random.randint(0,2,len(dates))})
df['heartrate_coff'] = df['heartrate']
df['heartrate_nocoff'] = df['heartrate']
df.loc[df['coffee'].values==0,['heartrate_coff'] ]= np.nan
df.loc[df['coffee'].values==1,['heartrate_nocoff']]= np.nan
fig = plt.figure(num=None, figsize=(200, 10),
dpi=9, facecolor='w', edgecolor='k')
plt.plot(df['time'].values , df['heartrate'].values,'k-', linewidth=10)
plt.plot(df['time'].values , df['heartrate_coff'].values,'rd',markersize=30)
plt.plot(df['time'].values , df['heartrate_coff'].values,'rd-', linewidth=20)
plt.plot(df['time'].values , df['heartrate_nocoff'].values,'ko',markersize=10)
plt.xticks(rotation=-10)

Plot with Histogram an attribute from a dataframe

I have a dataframe with the weight and the number of measures of each user. The df looks like:
id_user
weight
number_of_measures
1
92.16
4
2
80.34
5
3
71.89
11
4
81.11
7
5
77.23
8
6
92.37
2
7
88.18
3
I would like to see an histogram with the attribute of the table (weight, but I want to do it for both cases) at the x-axis and the frequency in the y-axis.
Does anyone know how to do it with matplotlib?
Ok, it seems to be quite easy:
import pandas as pd
import matplotlib.pyplot as plt
hist = df.hist(bins=50)
plt.show()

Create a 3D Bar Chart in ggplot

I am new to Python and am trying to create a 3D Bar Chart in ggplot with the Date on the X-Axis, quarter on the y-axis, and value on the z-axis.
Index Date Value quarter
0 03/2001 946 1
1 06/2001 892 2
2 09/2001 866 3
3 12/2001 924 4
4 03/2002 917 1
I have tried the following code:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
import pandas as pd
import numpy as np
fig=plt.figure()
ax1=fig.add_subplot(111, projection='3d')
x3=df['Date']
y3=df['quarter']
dx=np.ones(10)
dy=np.ones(10)
dz=df['Value']
ax1.bar3d(x3,y3,z3,dx,dy,dz)
ax1.set_xlabel('Date')
ax2.set_xlabel('Quarter')
ax3.set_xlabel('Home Sales')
ax1.set_title('Home Sales')
plt.show()

Generate heat-map of cyclical continuous features - 24-hour time

Having a Pandas DF with hour of day, I've calculated the sin/cos time feature, based on this article:
counter hour sin_time cos_time
0 1 1 2.588190e-01 9.659258e-01
1 0 2 5.000000e-01 8.660254e-01
2 2 3 7.071068e-01 7.071068e-01
3 0 4 8.660254e-01 5.000000e-01
...
19 0 20 -8.660254e-01 5.000000e-01
20 0 21 -7.071068e-01 7.071068e-01
21 1 22 -5.000000e-01 8.660254e-01
22 0 23 -2.588190e-01 9.659258e-01
I'm trying to plot a heat-map based on the X,Y of the sin/cos time and the value of the counter, so if the counter is 0 no point is added. I've googeled around and written the following code:
import numpy as np
import numpy.random
import matplotlib.pyplot as plt
# Generate some test data
x = raw_df_tz['sin_time']
y = raw_df_tz['cos_time']
heatmap, xedges, yedges = np.histogram2d(x, y, bins=50)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
plt.clf()
plt.imshow(heatmap.T, extent=extent, origin='lower')
plt.show()
Output:
How can I incorporate the counter value and influence the char accordingly?
Found out that you can add weights argument to histogram2d:
np.histogram2d(x, y, weights=w, bins=50)
so w is my counter column:

Pandas dataframe plotting - issue when switching from two subplots to single plot w/ secondary axis

I have two sets of data I want to plot together on a single figure. I have a set of flow data at 15 minute intervals I want to plot as a line plot, and a set of precipitation data at hourly intervals, which I am resampling to a daily time step and plotting as a bar plot. Here is what the format of the data looks like:
2016-06-01 00:00:00 56.8
2016-06-01 00:15:00 52.1
2016-06-01 00:30:00 44.0
2016-06-01 00:45:00 43.6
2016-06-01 01:00:00 34.3
At first I set this up as two subplots, with precipitation and flow rate on different axis. This works totally fine. Here's my code:
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
filename = 'manhole_B.csv'
plotname = 'SSMH-2A B'
plt.style.use('bmh')
# Read csv with precipitation data, change index to datetime object
pdf = pd.read_csv('precip.csv', delimiter=',', header=None, index_col=0)
pdf.columns = ['Precipitation[in]']
pdf.index.name = ''
pdf.index = pd.to_datetime(pdf.index)
pdf = pdf.resample('D').sum()
print(pdf.head())
# Read csv with flow data, change index to datetime object
qdf = pd.read_csv(filename, delimiter=',', header=None, index_col=0)
qdf.columns = ['Flow rate [gpm]']
qdf.index.name = ''
qdf.index = pd.to_datetime(qdf.index)
# Plot
f, ax = plt.subplots(2)
qdf.plot(ax=ax[1], rot=30)
pdf.plot(ax=ax[0], kind='bar', color='r', rot=30, width=1)
ax[0].get_xaxis().set_ticks([])
ax[1].set_ylabel('Flow Rate [gpm]')
ax[0].set_ylabel('Precipitation [in]')
ax[0].set_title(plotname)
f.set_facecolor('white')
f.tight_layout()
plt.show()
2 Axis Plot
However, I decided I want to show everything on a single axis, so I modified my code to put precipitation on a secondary axis. Now my flow data data has disppeared from the plot, and even when I set the axis ticks to an empty set, I get these 00:15 00:30 and 00:45 tick marks along the x-axis.
Secondary-y axis plots
Any ideas why this might be occuring?
Here is my code for the single axis plot:
f, ax = plt.subplots()
qdf.plot(ax=ax, rot=30)
pdf.plot(ax=ax, kind='bar', color='r', rot=30, secondary_y=True)
ax.get_xaxis().set_ticks([])
Here is an example:
Setup
In [1]: from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame({'x' : np.arange(10),
'y1' : np.random.rand(10,),
'y2' : np.square(np.arange(10))})
df
Out[1]: x y1 y2
0 0 0.451314 0
1 1 0.321124 1
2 2 0.050852 4
3 3 0.731084 9
4 4 0.689950 16
5 5 0.581768 25
6 6 0.962147 36
7 7 0.743512 49
8 8 0.993304 64
9 9 0.666703 81
Plot
In [2]: fig, ax1 = plt.subplots()
ax1.plot(df['x'], df['y1'], 'b-')
ax1.set_xlabel('Series')
ax1.set_ylabel('Random', color='b')
for tl in ax1.get_yticklabels():
tl.set_color('b')
ax2 = ax1.twinx() # Note twinx, not twiny. I was wrong when I commented on your question.
ax2.plot(df['x'], df['y2'], 'ro')
ax2.set_ylabel('Square', color='r')
for tl in ax2.get_yticklabels():
tl.set_color('r')
Out[2]:

Categories