Second Line in Matplotlib plot is inaccurate/runs all over the grid - python

I'm trying to plot fantasy points from two players in every game since the start of the NBA season.
I've created a dataframe that has the lines of every player, every night, and I want to plot every date that each have played.
The two dataframes look as such.
kemba[['Date','FP']]
Date FP
Rk
260 10/23/2019 2.0
532 10/25/2019 28.0
754 10/26/2019 49.0
1390 10/30/2019 35.0
1628 11/1/2019 39.5
2178 11/5/2019 32.5
2463 11/7/2019 17.5
2800 11/9/2019 40.0
3103 11/11/2019 37.5
3410 11/13/2019 37.0
3699 11/15/2019 25.0
4001 11/17/2019 22.5
4186 11/18/2019 22.0
4494 11/20/2019 9.5
4750 11/22/2019 4.0
5637 11/27/2019 50.5
5904 11/29/2019 19.0
6193 12/1/2019 22.5
6677 12/4/2019 43.5
6975 12/6/2019 26.0
7454 12/9/2019 33.5
7769 12/11/2019 57.0
7861 12/12/2019 31.5
8614 12/18/2019 35.5
9071 12/20/2019 5.0
9289 12/22/2019 26.0
100 12/25/2019 23.0
ingram[['Date','FP']]
Date FP
Rk
22 10/22/2019 31.5
441 10/25/2019 37.5
646 10/26/2019 57.0
984 10/28/2019 41.5
1439 10/31/2019 30.0
1718 11/2/2019 10.5
1994 11/4/2019 59.0
2586 11/8/2019 30.0
2757 11/9/2019 31.5
4245 11/19/2019 30.5
4532 11/21/2019 38.5
4864 11/23/2019 40.5
5022 11/24/2019 32.5
5496 11/27/2019 22.0
5784 11/29/2019 43.0
6111 12/1/2019 31.0
6404 12/3/2019 40.0
6737 12/5/2019 27.0
7038 12/7/2019 18.0
7372 12/9/2019 38.5
7668 12/11/2019 29.0
7958 12/13/2019 38.0
8283 12/15/2019 32.5
8551 12/17/2019 24.0
8612 12/18/2019 48.0
8891 12/20/2019 30.5
102 12/23/2019 31.0
55 12/25/2019 46.5
The data that I've plotted is such:
# creating x & y for Ingram
ingram_fp=ingram['FP']
ingram_date=ingram['Date']
# creating x and y for Kemmba
kemba_fp=kemba['FP']
kemba_date=kemba['Date']
fig=plt.figure()
plt.plot(kemba_date,kemba_fp,color='#FF5733',linewidth=1,marker='.',label='Walker')
plt.plot(ingram_date,ingram_fp,color='#33A7FF',marker='.',label='Ingram')
fig.autofmt_xdate()
plt.show()
When I do this, the link for Ingram is all over the place. Any idea on what went wrong?
This is the plot I get

It looks like Date might not be formatted as a date.
Modify your code as follows:
import pandas as pd
# creating x & y for Ingram
ingram_fp=ingram['FP']
ingram_date=pd.to_datetime(ingram['Date'])
# creating x and y for Kemmba
kemba_fp=kemba['FP']
kemba_date=pd.to_datetime(kemba['Date'])

Related

get data of given time range in python when time stamp is not proper

time a b
2021-05-23 22:06:54 10.4 70.1
2021-05-23 22:21:41 10.7 68.3
2021-05-23 22:36:28 10.4 69.4
2021-05-23 22:51:15 9.9 71.7
2021-05-23 23:06:02 9.5 73.1
... ... ... ... ... ...
2021-11-19 08:18:31 19.8 43.0
2021-11-19 08:20:04 21.0 42.0
2021-11-19 08:21:25 35.5 20.0
2021-11-19 08:21:32 19.8 43.0
2021-11-19 08:23:05 21.0 42.0
here time is in the index, not a column.
when I did df.between_time("2021-11-17 08:15:00","2021-11-19 08:00:00")
it throws the error ValueError: Cannot convert arg ['2021-11-17 08:15:00'] to a time
data frame has not proper time stamp.
What i want to do,-: when i pass time range or date range, i want to get all the data between given time.
Thanks
Use truncate:
>>> df.truncate("2021-05-23 23:00:00", "2021-11-19 08:20:00")
a b
time
2021-05-23 23:06:02 9.5 73.1
2021-11-19 08:18:31 19.8 43.0

How can I iterate over a pandas dataframe so I can divide specific values based on a condition?

I have a dataframe like below:
0 1 2 ... 62 63 64
795 89.0 92.0 89.0 ... 74.0 64.0 4.0
575 80.0 75.0 78.0 ... 70.0 68.0 3.0
1119 2694.0 2437.0 2227.0 ... 4004.0 4010.0 6.0
777 90.0 88.0 88.0 ... 71.0 67.0 4.0
506 82.0 73.0 77.0 ... 69.0 64.0 2.0
... ... ... ... ... ... ... ...
65 84.0 77.0 78.0 ... 78.0 80.0 0.0
1368 4021.0 3999.0 4064.0 ... 1.0 4094.0 8.0
1036 80.0 80.0 79.0 ... 73.0 66.0 5.0
1391 3894.0 3915.0 3973.0 ... 4.0 4090.0 8.0
345 81.0 74.0 75.0 ... 80.0 75.0 1.0
I want to divide all elements over 1000 in this dataframe by 100. So 4021.0 becomes 40.21, et cetera.
I've tried something like below:
for cols in df:
for rows in df[cols]:
print(df[cols][rows])
I get index out of bound errors. I'm just not sure how to properly iterate the way I'm looking for.
I think loops are here slow, so better is use vectorizes solutions - select values greater like 1000 and divide:
df[df.gt(1000)] = df.div(100)
Or using DataFrame.mask:
df = df.mask(df.gt(1000), df.div(100))
print (df)
0 1 2 62 63 64
795 89.00 92.00 89.00 74.00 64.00 4.0
575 80.00 75.00 78.00 70.00 68.00 3.0
1119 26.94 24.37 22.27 40.04 40.10 6.0
777 90.00 88.00 88.00 71.00 67.00 4.0
506 82.00 73.00 77.00 69.00 64.00 2.0
65 84.00 77.00 78.00 78.00 80.00 0.0
1368 40.21 39.99 40.64 1.00 40.94 8.0
1036 80.00 80.00 79.00 73.00 66.00 5.0
1391 38.94 39.15 39.73 4.00 40.90 8.0
345 81.00 74.00 75.00 80.00 75.00 1.0
You can use the applymap function and create your custom function
def mapper_function(x):
if x >= 1000:
x=x/100
else:
x
return x
df=df.applymap(mapper_function)

Create well structured pandas dataframe using dataframe

I have a Panda DataFreme data from 2018 to 2020. I want to structure these data as follows.
Month | 2018 | 2019
Jan 115 73
Feb 112 63
....
up to December.
How can I solve this issue using panda data frame syntax?
Date
2018-01-01 115.0
2018-02-01 112.0
2018-03-01 104.5
2018-04-01 91.1
2018-05-01 85.5
2018-06-01 76.5
2018-07-01 86.5
2018-08-01 77.9
2018-09-01 65.0
2018-10-01 71.0
2018-11-01 76.0
2018-12-01 72.5
2019-01-01 73.0
2019-02-01 63.0
2019-03-01 63.0
2019-04-01 61.0
2019-05-01 58.3
2019-06-01 59.0
2019-07-01 67.0
2019-08-01 64.0
2019-09-01 59.9
2019-10-01 70.4
2019-11-01 78.9
2019-12-01 75.0
2020-01-01 73.9
Name: Close, dtype: float64
This is more like pivot but with crosstab
s = pd.crosstab(df.index.strftime('%b'),df.index.year,df.values,aggfunc='sum')
Out[87]:
col_0 2018 2019 2020
row_0
Apr 91.1 61.0 NaN
Aug 77.9 64.0 NaN
Dec 72.5 75.0 NaN
Feb 112.0 63.0 NaN
Jan 115.0 73.0 73.9
Jul 86.5 67.0 NaN
Jun 76.5 59.0 NaN
Mar 104.5 63.0 NaN
May 85.5 58.3 NaN
Nov 76.0 78.9 NaN
Oct 71.0 70.4 NaN
Sep 65.0 59.9 NaN
You can use groupby and unstack:
(s.groupby([s.index.month, s.index.year]).first().unstack()
.rename_axis(columns='Year',index='Month')
)
Output:
Year 2018 2019 2020
Month
1 115.0 73.0 73.9
2 112.0 63.0 NaN
3 104.5 63.0 NaN
4 91.1 61.0 NaN
5 85.5 58.3 NaN
6 76.5 59.0 NaN
7 86.5 67.0 NaN
8 77.9 64.0 NaN
9 65.0 59.9 NaN
10 71.0 70.4 NaN
11 76.0 78.9 NaN
12 72.5 75.0 NaN

How to print mySQL columns with sqlalchemy

We want to fetch data from our mySQL database, and we're using python (sqlalchemy) to do so. We're then saving the data on pandas dataframes. So far we're receiving data, but the column names are not included, and is automatically just indexed instead. How can we include column names, so that the true names are included and not just numbers from 0-5.
import pandas as pd
from pandas.io import sql
from sqlalchemy import create_engine
engine = create_engine("mysql://root:DTULab#123#localhost/Afgangsprojekt?host=localhost?port=3306")
conn = engine.connect()
result = conn.execute("SELECT * FROM Weather_Station").fetchall()
df = pd.DataFrame(result)
print(df)
Output prints the following:
0 1 2 3 4 5
0 0 2019-07-26 14:50:13 27.3 29.8 45.0 44.0
1 1 2019-07-26 15:00:13 26.9 28.3 44.0 48.0
2 2 2019-07-26 15:10:13 28.0 28.3 41.0 48.0
3 3 2019-07-26 15:20:13 27.8 28.3 39.0 48.0
4 4 2019-07-26 15:30:13 27.0 28.3 40.0 48.0
5 5 2019-07-26 15:40:13 26.8 28.3 42.0 48.0
6 6 2019-07-26 15:50:13 27.0 28.3 42.0 48.0
7 7 2019-07-26 16:00:14 26.8 27.2 42.0 41.0
8 8 2019-07-26 16:10:13 27.0 27.2 42.0 41.0
9 9 2019-07-26 16:20:13 26.8 27.2 43.0 41.0
10 10 2019-07-26 16:30:13 26.4 27.2 44.0 41.0
11 11 2019-07-26 16:40:13 27.1 27.2 42.0 41.0
12 12 2019-07-26 16:50:13 26.2 27.2 43.0 41.0
13 13 2019-07-26 17:00:14 25.6 26.6 44.0 43.0
14 14 2019-07-26 17:10:14 25.5 26.6 47.0 43.0
15 15 2019-07-26 17:20:14 25.3 26.6 49.0 43.0
16 16 2019-07-26 17:30:14 25.1 26.6 51.0 43.0
17 17 2019-07-26 17:40:14 25.6 26.6 52.0 43.0
18 18 2019-07-26 17:50:14 24.8 26.6 55.0 43.0
19 19 2019-07-26 18:00:14 24.4 25.2 57.0 51.0
20 20 2019-07-26 18:10:14 24.6 25.2 57.0 51.0
21 21 2019-07-26 18:20:14 24.4 25.2 58.0 51.0
22 22 2019-07-26 18:30:14 24.4 25.2 58.0 51.0
23 23 2019-07-26 18:40:14 24.8 25.2 57.0 51.0
24 24 2019-07-26 18:50:14 25.0 25.2 57.0 51.0
25 25 2019-07-26 19:00:15 24.9 24.7 57.0 57.0
26 26 2019-07-26 19:10:14 25.1 24.7 56.0 57.0
27 27 2019-07-26 19:20:14 25.4 24.7 49.0 57.0
28 28 2019-07-26 19:30:14 25.4 24.7 48.0 57.0
29 29 2019-07-26 19:40:13 25.4 24.7 48.0 57.0
.. ... ... ... ... ... ...
822 822 2019-08-01 07:30:13 13.7 14.0 94.0 94.0
823 823 2019-08-01 07:40:13 13.6 14.0 95.0 94.0
824 824 2019-08-01 07:50:13 13.6 14.0 97.0 94.0
825 825 2019-08-01 08:00:13 13.9 13.7 97.0 94.0
826 826 2019-08-01 08:10:13 13.8 13.7 94.0 94.0
827 827 2019-08-01 08:20:13 13.6 13.7 93.0 94.0
828 828 2019-08-01 08:30:14 13.6 13.7 92.0 94.0
829 829 2019-08-01 08:40:13 13.8 13.7 92.0 94.0
830 830 2019-08-01 08:50:13 14.0 13.7 91.0 94.0
831 831 2019-08-01 09:00:13 13.9 13.8 91.0 93.0
832 832 2019-08-01 09:10:13 13.9 13.8 90.0 93.0
833 833 2019-08-01 09:20:13 13.8 13.8 91.0 93.0
834 834 2019-08-01 09:30:13 13.6 13.8 93.0 93.0
835 835 2019-08-01 09:40:13 13.6 13.8 94.0 93.0
836 836 2019-08-01 09:50:13 13.6 13.8 94.0 93.0
837 837 2019-08-01 10:00:13 13.9 13.7 94.0 92.0
838 838 2019-08-01 10:10:13 13.9 13.7 95.0 92.0
839 839 2019-08-01 10:20:13 14.0 13.7 94.0 92.0
840 840 2019-08-01 10:30:13 14.3 13.7 95.0 92.0
841 841 2019-08-01 10:40:13 14.4 13.7 95.0 92.0
842 842 2019-08-01 10:50:13 14.6 13.7 94.0 92.0
843 843 2019-08-01 11:00:13 14.9 14.3 94.0 94.0
844 844 2019-08-01 11:10:14 15.0 14.3 93.0 94.0
845 845 2019-08-01 11:20:14 15.3 14.3 93.0 94.0
846 846 2019-08-01 11:30:14 15.5 14.3 92.0 94.0
847 847 2019-08-01 11:40:13 15.5 14.3 92.0 94.0
848 848 2019-08-01 11:50:13 15.4 14.3 85.0 94.0
849 849 2019-08-01 12:00:13 15.3 15.3 86.0 91.0
850 850 2019-08-01 12:10:13 15.3 15.3 86.0 91.0
851 851 2019-08-01 12:20:13 15.3 15.3 87.0 91.0
Try This
To read : read_sql
To write : to_sql
import pandas as pd
from pandas.io import sql
from sqlalchemy import create_engine
engine = create_engine("mysql://root:DTULab#123#localhost/Afgangsprojekt?host=localhost?port=3306")
connection = engine.connect()
Query = "<Query Here>"
df = pd.read_sql(Query, connection)
print(df.head(50)) # For 50 Rows to be printed
You could try calling the read_sql and pass the connection to Read SQL query or database table into a DataFrame : read_sql
import pandas as pd
from pandas.io import sql
from sqlalchemy import create_engine
engine = create_engine("mysql://root:DTULab#123#localhost/Afgangsprojekt?host=localhost?port=3306")
connection = engine.connect()
df = pd.read_sql("SELECT * FROM Weather_Station", connection)
print(df)

Pandas/Python: interpolation of multiple columns based on values specified for one reference column

df
Out[1]:
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
0 978.0 345 17.0 16.5 97 12.22 0 0 292.0 326.8 294.1
1 977.0 354 17.8 16.7 93 12.39 1 0 292.9 328.3 295.1
2 970.0 416 23.4 15.4 61 11.47 4 2 299.1 332.9 301.2
3 963.0 479 24.0 14.0 54 10.54 8 3 300.4 331.6 302.3
4 948.7 610 23.0 13.4 55 10.28 15 6 300.7 331.2 302.5
5 925.0 830 21.4 12.4 56 9.87 20 5 301.2 330.6 303.0
6 916.0 914 20.7 11.7 56 9.51 20 4 301.3 329.7 303.0
7 884.0 1219 18.2 9.2 56 8.31 60 4 301.8 326.7 303.3
8 853.1 1524 15.7 6.7 55 7.24 35 3 302.2 324.1 303.5
9 850.0 1555 15.4 6.4 55 7.14 20 2 302.3 323.9 303.6
10 822.8 1829 13.3 5.6 60 6.98 300 4 302.9 324.0 304.1
How do I interpolate the values of all the columns on specified PRES (pressure) values at say PRES=[950, 900, 875]? Is there an elegant pandas type of way to do this?
The only way I can think of doing this is to first start with making empty NaN values for the entire row for each specified PRES values in a loop, then set PRES as index and then use the pandas native interpolate option:
df.interpolate(method='index', inplace=True)
Is there a more elegant solution?
Use your solution with no loop - reindex by union original index values with PRES list, but working only if all values are unique:
PRES=[950, 900, 875]
df = df.set_index('PRES')
df = df.reindex(df.index.union(PRES)).sort_index(ascending=False).interpolate(method='index')
print (df)
HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
978.0 345.0 17.0 16.5 97.0 12.22 0.0 0.0 292.0 326.8 294.1
977.0 354.0 17.8 16.7 93.0 12.39 1.0 0.0 292.9 328.3 295.1
970.0 416.0 23.4 15.4 61.0 11.47 4.0 2.0 299.1 332.9 301.2
963.0 479.0 24.0 14.0 54.0 10.54 8.0 3.0 300.4 331.6 302.3
950.0 1829.0 13.3 5.6 60.0 6.98 300.0 4.0 302.9 324.0 304.1
948.7 610.0 23.0 13.4 55.0 10.28 15.0 6.0 300.7 331.2 302.5
925.0 830.0 21.4 12.4 56.0 9.87 20.0 5.0 301.2 330.6 303.0
916.0 914.0 20.7 11.7 56.0 9.51 20.0 4.0 301.3 329.7 303.0
900.0 1829.0 13.3 5.6 60.0 6.98 300.0 4.0 302.9 324.0 304.1
884.0 1219.0 18.2 9.2 56.0 8.31 60.0 4.0 301.8 326.7 303.3
875.0 1829.0 13.3 5.6 60.0 6.98 300.0 4.0 302.9 324.0 304.1
853.1 1524.0 15.7 6.7 55.0 7.24 35.0 3.0 302.2 324.1 303.5
850.0 1555.0 15.4 6.4 55.0 7.14 20.0 2.0 302.3 323.9 303.6
822.8 1829.0 13.3 5.6 60.0 6.98 300.0 4.0 302.9 324.0 304.1
If possible not unique values in PRES column, then use concat with sort_index:
PRES=[950, 900, 875]
df = df.set_index('PRES')
df = (pd.concat([df, pd.DataFrame(index=PRES)])
.sort_index(ascending=False)
.interpolate(method='index'))

Categories