How do I convert column B into the transition matrix in python?
Size of the matrix is 19 which is unique values in column B.
There are a total of 432 rows in the dataset.
time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
...
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088
The matrix should contain the number of transition between them.
B -----------------1088------1288----------------------------
B
.
.
1088 8 2
.
.
.
.
. Number of transitions between them.
..
.
.
I use your data to create DataFrame only with column B but it should work also with all columns.
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
I get unique values in colum to use it later to create matrix
numbers = sorted(df['B'].unique())
print(numbers)
[225, 275, 750, 816, 834, 998, 1088, 1285, 1288]
I create shifted column C so I have both values in every row
df['C'] = df.shift(-1)
print(df)
B C
0 816 816.0
1 816 998.0
2 998 750.0
3 750 998.0
I group by ['B', 'C'] so I can count pairs
groups = df.groupby(['B', 'C'])
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
# counts = {i[0]:len(i[1]) for i in groups} # count even (816,816)
print(counts)
{(225, 1288.0): 2, (275, 225.0): 2, (750, 998.0): 2, (816, 275.0): 2, (816, 816.0): 2, (816, 998.0): 2, (834, 1285.0): 2, (998, 750.0): 2, (998, 816.0): 2, (998, 834.0): 2, (998, 998.0): 12, (1088, 1088.0): 14, (1088, 1285.0): 2, (1285, 998.0): 2, (1285, 1088.0): 2, (1285, 1285.0): 6, (1285, 1288.0): 2, (1288, 1088.0): 2, (1288, 1285.0): 2}
Now I can create matrix. Using numbers and counts I create column/Series (with correct index) and I add it to matrix.
matrix = pd.DataFrame()
for x in numbers:
matrix[x] = pd.Series([counts.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
Result
225 275 750 816 834 998 1088 1285 1288
225 0 2 0 0 0 0 0 0 0
275 0 0 0 2 0 0 0 0 0
750 0 0 0 0 0 2 0 0 0
816 0 0 0 2 0 2 0 0 0
834 0 0 0 0 0 2 0 0 0
998 0 0 2 2 0 12 0 2 0
1088 0 0 0 0 0 0 14 2 2
1285 0 0 0 0 2 0 2 6 2
1288 2 0 0 0 0 0 0 2 0
Full example
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
numbers = sorted(df['B'].unique())
print(numbers)
df['C'] = df.shift(-1)
print(df)
groups = df.groupby(['B', 'C'])
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
# counts = {i[0]:len(i[1]) for i in groups} # count even (816,816)
print(counts)
matrix = pd.DataFrame()
for x in numbers:
matrix[str(x)] = pd.Series([counts.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
EDIT:
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
as normal for loop
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
counts[pair] = len(group)
else:
counts[pair] = 0
Invert value when it is bigger thant 10
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
count = len(group)
if count > 10 :
counts[pair] = -count
else
counts[pair] = count
else:
counts[pair] = 0
EDIT:
counts = {}
for pair, group in groups:
if pair[0] != pair[1]: # don't count (816,816)
#counts[(A,B)] = len((A,B)) + len((B,A))
if pair not in counts:
counts[pair] = len(group) # put first value
else:
counts[pair] += len(group) # add second value
#counts[(B,A)] = len((A,B)) + len((B,A))
if (pair[1],pair[0]) not in counts:
counts[(pair[1],pair[0])] = len(group) # put first value
else:
counts[(pair[1],pair[0])] += len(group) # add second value
else:
counts[pair] = 0 # (816,816) gives 0
#counts[(A,B)] == counts[(B,A)]
counts_2 = {}
for pair, count in counts.items():
if count > 10 :
counts_2[pair] = -count
else:
counts_2[pair] = count
matrix = pd.DataFrame()
for x in numbers:
matrix[str(x)] = pd.Series([counts_2.get((x,y), 0) for y in numbers], index=numbers)
print(matrix)
An alternative, pandas based approach. Note I've used shift(1) which means transition is the next number:
text = '''time A B
2017-10-26 09:00:00 36 816
2017-10-26 10:45:00 43 816
2017-10-26 12:30:00 50 998
2017-10-26 12:45:00 51 750
2017-10-26 13:00:00 52 998
2017-10-26 13:15:00 53 998
2017-10-26 13:30:00 54 998
2017-10-26 14:00:00 56 998
2017-10-26 14:15:00 57 834
2017-10-26 14:30:00 58 1285
2017-10-26 14:45:00 59 1288
2017-10-26 23:45:00 95 1285
2017-10-27 03:00:00 12 1285
2017-10-27 03:30:00 14 1285
2017-11-02 14:00:00 56 998
2017-11-02 14:15:00 57 998
2017-11-02 14:30:00 58 998
2017-11-02 14:45:00 59 998
2017-11-02 15:00:00 60 816
2017-11-02 15:15:00 61 275
2017-11-02 15:30:00 62 225
2017-11-02 15:45:00 63 1288
2017-11-02 16:00:00 64 1088
2017-11-02 18:15:00 73 1285
2017-11-02 20:30:00 82 1285
2017-11-02 21:00:00 84 1088
2017-11-02 21:15:00 85 1088
2017-11-02 21:30:00 86 1088
2017-11-02 22:00:00 88 1088
2017-11-02 22:30:00 90 1088
2017-11-02 23:00:00 92 1088
2017-11-02 23:30:00 94 1088
2017-11-02 23:45:00 95 1088'''
import pandas as pd
B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
# alternative approach
df['C'] = df['B'].shift(1) # shift forward so B transitions to C
df['counts'] = 1 # add an arbirtary counts column for group by
# group together the combinations then unstack to get matrix
trans_matrix = df.groupby(['B', 'C']).count().unstack()
# max the columns a bit neater
trans_matrix.columns = trans_matrix.columns.droplevel()
The result is:
Which I think is correct, i.e the one time you observe 225, it then transitions to 1288. You would just divide through by the sample size to get a probability transition matrix for each value.
Related
First data frame:
date time open high low close volume avg
0 2021-05-23 00:00:00 37458.51 38270.64 31111.01 34655.25 217136.046593 NaN
1 2021-05-24 00:00:00 34681.44 39920.00 34031.00 38796.29 161630.893971 NaN
2 2021-05-25 00:00:00 38810.99 39791.77 36419.62 38324.72 111996.228404 NaN
3 2021-05-26 00:00:00 38324.72 40841.00 37800.44 39241.91 104780.773396 NaN
4 2021-05-27 00:00:00 39241.92 40411.14 37134.27 38529.98 86547.158794 NaN
5 2021-05-28 00:00:00 38529.99 38877.83 34684.00 35663.49 135377.629720 NaN
6 2021-05-29 00:00:00 35661.79 37338.58 33632.76 34605.15 112663.092689 NaN
7 2021-05-30 00:00:00 34605.15 36488.00 33379.00 35641.27 73535.386967 NaN
8 2021-05-31 00:00:00 35641.26 37499.00 34153.84 37253.81 94160.735289 NaN
9 2021-01-06 00:00:00 37253.82 37894.81 35666.00 36693.09 81234.663770 NaN
10 2021-02-06 00:00:00 36694.85 38225.00 35920.00 37568.68 67587.372495 NaN
11 2021-03-06 00:00:00 37568.68 39476.00 37170.00 39246.79 75889.106011 NaN
12 2021-04-06 00:00:00 39246.78 39289.07 35555.15 36829.00 91317.799245 NaN
13 2021-05-06 00:00:00 36829.15 37925.00 34800.00 35513.20 70459.621490 NaN
14 2021-06-06 00:00:00 35516.07 36480.00 35222.00 35796.31 47650.206637 NaN
15 2021-07-06 00:00:00 35796.31 36900.00 33300.00 33552.79 77574.952573 NaN
16 2021-08-06 00:00:00 33556.96 34068.01 31000.00 33380.81 123251.189037 NaN
17 2021-09-06 00:00:00 33380.80 37534.79 32396.82 37388.05 136607.597517 NaN
18 2021-10-06 00:00:00 37388.05 38491.00 35782.00 36675.72 109527.284943 NaN
19 2021-11-06 00:00:00 36677.83 37680.40 35936.77 37331.98 78466.005300 NaN
20 2021-12-06 00:00:00 37331.98 37463.63 34600.36 35546.11 87717.549990 NaN
21 2021-06-13 00:00:00 35546.12 39380.00 34757.00 39020.57 86921.025555 NaN
22 2021-06-14 00:00:00 39020.56 41064.05 38730.00 40516.29 108522.391949 NaN
23 2021-06-15 00:00:00 40516.28 41330.00 39506.40 40144.04 80679.622838 NaN
24 2021-06-16 00:00:00 40143.80 40527.14 38116.01 38349.01 87771.976937 NaN
25 2021-06-17 00:00:00 38349.00 39559.88 37365.00 38092.97 79541.307119 NaN
26 2021-06-18 00:00:00 38092.97 38202.84 35129.29 35819.84 95228.042935 NaN
27 2021-06-19 00:00:00 35820.48 36457.00 34803.52 35483.72 68712.449461 NaN
28 2021-06-20 00:00:00 35483.72 36137.72 33336.00 35600.16 89878.170850 NaN
29 2021-06-21 00:00:00 35600.17 35750.00 31251.23 31608.93 168778.873159 NaN
30 2021-06-22 00:00:00 31614.12 33298.78 28805.00 32509.56 204208.179762 NaN
31 2021-06-23 00:00:00 32509.56 34881.00 31683.00 33678.07 126966.100563 NaN
32 2021-06-24 00:00:00 33675.07 35298.00 32286.57 34663.09 86625.804260 NaN
33 2021-06-25 00:00:00 34663.08 35500.00 31275.00 31584.45 116061.130356 NaN
34 2021-06-26 00:00:00 31576.09 32730.00 30151.00 32283.65 107820.375287 NaN
35 2021-06-27 00:00:00 32283.65 34749.00 31973.45 34700.34 96613.244211 NaN
36 2021-06-28 00:00:00 34702.49 35297.71 33862.72 34494.89 82222.267819 NaN
37 2021-06-29 00:00:00 34494.89 36600.00 34225.43 35911.73 90788.796220 NaN
38 2021-06-30 00:00:00 35911.72 36100.00 34017.55 35045.00 77152.197634 NaN
39 2021-01-07 00:00:00 35045.00 35057.57 32711.00 33504.69 71708.266112 15.362372
40 2021-02-07 00:00:00 33502.33 33977.04 32699.00 33786.55 56172.181378 15.386331
41 2021-03-07 00:00:00 33786.54 34945.61 33316.73 34669.13 43044.578641 15.154877
42 2021-04-07 00:00:00 34669.12 35967.85 34357.15 35286.51 43703.475789 14.677524
43 2021-05-07 00:00:00 35288.13 35293.78 33125.55 33690.14 64123.874245 14.486827
44 2021-06-07 00:00:00 33690.15 35118.88 33532.00 34220.01 58210.596349 14.305665
45 2021-07-07 00:00:00 34220.02 35059.09 33777.77 33862.12 53807.521675 14.133561
46 2021-08-07 00:00:00 33862.11 33929.64 32077.00 32875.71 70136.480320 14.336865
47 2021-09-07 00:00:00 32875.71 34100.00 32261.07 33815.81 47153.939899 14.479159
48 2021-10-07 00:00:00 33815.81 34262.00 33004.78 33502.87 34761.175468 14.564313
49 2021-11-07 00:00:00 33502.87 34666.00 33306.47 34258.99 31572.647448 14.517866
50 2021-12-07 00:00:00 34259.00 34678.43 32658.34 33086.63 48181.403762 14.627892
51 2021-07-13 00:00:00 33086.94 33340.00 32202.25 32729.77 41126.361008 14.839689
52 2021-07-14 00:00:00 32729.12 33114.03 31550.00 32820.02 46777.823484 15.192346
53 2021-07-15 00:00:00 32820.03 33185.25 31133.00 31880.00 51639.576353 15.623083
54 2021-07-16 00:00:00 31874.49 32249.18 31020.00 31383.87 48499.864154 16.058731
55 2021-07-17 00:00:00 31383.86 31955.92 31164.31 31520.07 34012.242132 16.472596
56 2021-07-18 00:00:00 31520.07 32435.00 31108.97 31778.56 35923.716186 16.669426
57 2021-07-19 00:00:00 31778.57 31899.00 30407.44 30839.65 47340.468499 17.041150
58 2021-07-20 00:00:00 30839.65 31063.07 29278.00 29790.35 61034.049017 17.671053
59 2021-07-21 00:00:00 29790.34 32858.00 29482.61 32144.51 82796.265128 17.564616
60 2021-07-22 00:00:00 32144.51 32591.35 31708.00 32287.83 46148.092433 17.463500
61 2021-07-23 00:00:00 32287.58 33650.00 31924.32 33634.09 50112.863626 16.984139
62 2021-07-24 00:00:00 33634.10 34500.00 33401.14 34258.14 47977.550138 16.242346
63 2021-07-25 00:00:00 34261.51 35398.00 33851.12 35381.02 47852.928313 15.607586
64 2021-07-26 00:00:00 35381.02 40550.00 35205.78 37237.60 152452.512724 16.219395
65 2021-07-27 00:00:00 37241.33 39542.61 36383.00 39457.87 88397.267015 16.800613
66 2021-07-28 00:00:00 39456.61 40900.00 38772.00 40019.56 101344.528441 17.599907
67 2021-07-29 00:00:00 40019.57 40640.00 39200.00 40016.48 53998.439283 18.359237
68 2021-07-30 00:00:00 40018.49 42316.71 38313.23 42206.37 73602.784805 19.368676
69 2021-07-31 00:00:00 42206.36 42448.00 41000.15 41461.83 44849.791012 20.349200
70 2021-01-08 00:00:00 41461.84 42599.00 39422.01 39845.44 53953.186326 20.714136
71 2021-02-08 00:00:00 39850.27 40480.01 38690.00 39147.82 50837.351954 20.816480
72 2021-03-08 00:00:00 39146.86 39780.00 37642.03 38207.05 57117.435853 20.578895
73 2021-04-08 00:00:00 38207.04 39969.66 37508.56 39723.18 52329.352430 20.396351
74 2021-05-08 00:00:00 39723.17 41350.00 37332.70 40862.46 84343.755621 20.526294
75 2021-06-08 00:00:00 40862.46 43392.43 39853.86 42836.87 75753.941347 21.042989
76 2021-07-08 00:00:00 42836.87 44700.00 42446.41 44572.54 73396.740808 21.756471
77 2021-08-08 00:00:00 44572.54 45310.00 43261.00 43794.37 69329.092698 22.533424
78 2021-09-08 00:00:00 43794.36 46454.15 42779.00 46253.40 74587.884845 23.450453
79 2021-10-08 00:00:00 46248.87 46700.00 44589.46 45584.99 53814.643421 24.359303
80 2021-11-08 00:00:00 45585.00 46743.47 45341.14 45511.00 52734.901977 25.229618
81 2021-12-08 00:00:00 45510.67 46218.12 43770.00 44399.00 55266.108781 25.471002
82 2021-08-13 00:00:00 44400.06 47886.00 44217.39 47800.00 48239.370431 25.995794
83 2021-08-14 00:00:00 47799.99 48144.00 45971.03 47068.51 46114.359022 26.537795
84 2021-08-15 00:00:00 47068.50 47372.27 45500.00 46973.82 42110.711334 26.878796
85 2021-08-16 00:00:00 46973.82 48053.83 45660.00 45901.29 52480.574014 27.326937
86 2021-08-17 00:00:00 45901.30 47160.00 44376.00 44695.95 57039.341629 27.285215
87 2021-08-18 00:00:00 44695.95 46000.00 44203.28 44705.29 54099.415985 27.184539
88 2021-08-19 00:00:00 44699.37 47033.00 43927.70 46760.62 53411.753920 27.302916
89 2021-08-20 00:00:00 46760.62 49382.99 46622.99 49322.47 56850.352228 27.840242
90 2021-08-21 00:00:00 49322.47 49757.04 48222.00 48821.87 46745.136584 28.412062
91 2021-08-22 00:00:00 48821.88 49500.00 48050.00 49239.22 37007.887795 28.889153
92 2021-08-23 00:00:00 49239.22 50500.00 49029.00 49488.85 52462.541954 29.512800
93 2021-08-24 00:00:00 49488.85 49860.00 47600.00 47674.01 51014.594748 29.565824
94 2021-08-25 00:00:00 47674.01 49264.30 47126.28 48973.32 44655.830342 29.446836
95 2021-08-26 00:00:00 48973.32 49352.84 46250.00 46843.87 49371.277774 29.028026
96 2021-08-27 00:00:00 46843.86 49149.93 46348.00 49069.90 42068.104965 28.630156
97 2021-08-28 00:00:00 49069.90 49299.00 48346.88 48895.35 26681.063786 28.287626
98 2021-08-29 00:00:00 48895.35 49632.27 47762.54 48767.83 32652.283473 27.744622
99 2021-08-30 00:00:00 48767.84 48888.61 46853.00 46982.91 40288.350830 26.903998
100 2021-08-31 00:00:00 46982.91 48246.11 46700.00 47100.89 48645.527370 26.051605
101 2021-01-09 00:00:00 47100.89 49156.00 46512.00 48810.52 49904.655280 25.499838
102 2021-02-09 00:00:00 48810.51 50450.13 48584.06 49246.64 54410.770538 25.311075
103 2021-03-09 00:00:00 49246.63 51000.00 48316.84 49999.14 59025.644157 25.265214
104 2021-04-09 00:00:00 49998.00 50535.69 49370.00 49915.64 34664.659590 25.221647
105 2021-05-09 00:00:00 49917.54 51900.00 49450.00 51756.88 40544.835873 25.504286
106 2021-06-09 00:00:00 51756.88 52780.00 50969.33 52663.90 49249.667081 25.962876
107 2021-07-09 00:00:00 52666.20 52920.00 42843.05 46863.73 123048.802719 25.276717
108 2021-08-09 00:00:00 46868.57 47340.99 44412.02 46048.31 65069.315200 24.624866
109 2021-09-09 00:00:00 46048.31 47399.97 45513.08 46395.14 50651.660020 23.989928
110 2021-10-09 00:00:00 46395.14 47033.00 44132.29 44850.91 49048.266180 23.670387
111 2021-11-09 00:00:00 44842.20 45987.93 44722.22 45173.69 30440.408100 23.366822
112 2021-12-09 00:00:00 45173.68 46460.00 44742.06 46025.24 32094.280520 22.938381
113 2021-09-13 00:00:00 46025.23 46880.00 43370.00 44940.73 65429.150560 22.820722
114 2021-09-14 00:00:00 44940.72 47250.00 44594.44 47111.52 44855.850990 22.594896
115 2021-09-15 00:00:00 47103.28 48500.00 46682.32 48121.41 43204.711740 22.007531
116 2021-09-16 00:00:00 48121.40 48557.00 47021.10 47737.82 40725.088950 21.432816
117 2021-09-17 00:00:00 47737.81 48150.00 46699.56 47299.98 34461.927760 20.965565
118 2021-09-18 00:00:00 47299.98 48843.20 47035.56 48292.74 30906.470380 20.306487
119 2021-09-19 00:00:00 48292.75 48372.83 46829.18 47241.75 29847.243490 19.735184
120 2021-09-20 00:00:00 47241.75 47347.25 42500.00 43015.62 78003.524443 20.139851
121 2021-09-21 00:00:00 43016.64 43639.00 39600.00 40734.38 84534.080485 20.985744
122 2021-09-22 00:00:00 40734.09 44000.55 40565.39 43543.61 58349.055420 21.676235
123 2021-09-23 00:00:00 43546.37 44978.00 43069.09 44865.26 48699.576550 22.029837
124 2021-09-24 00:00:00 44865.26 45200.00 40675.00 42810.57 84113.426292 22.735109
125 2021-09-25 00:00:00 42810.58 42966.84 41646.28 42670.64 33594.571890 23.405118
126 2021-09-26 00:00:00 42670.63 43950.00 40750.00 43160.90 49879.997650 23.734984
127 2021-09-27 00:00:00 43160.90 44350.00 42098.00 42147.35 39776.843830 23.925323
128 2021-09-28 00:00:00 42147.35 42787.38 40888.00 41026.54 43372.262400 24.312088
129 2021-09-29 00:00:00 41025.01 42590.00 40753.88 41524.28 33511.534870 24.702028
130 2021-09-30 00:00:00 41524.29 44141.37 41410.17 43824.10 46381.227810 24.581907
131 2021-01-10 00:00:00 43820.01 48495.00 43283.03 48141.61 66244.874920 23.367632
132 2021-02-10 00:00:00 48141.60 48336.59 47430.18 47634.90 30508.981310 22.214071
133 2021-03-10 00:00:00 47634.89 49228.08 47088.00 48200.01 30825.056010 21.285226
134 2021-04-10 00:00:00 48200.01 49536.12 46891.00 49224.94 46796.493720 20.470586
135 2021-05-10 00:00:00 49224.93 51886.30 49022.40 51471.99 52125.667930 20.178783
136 2021-06-10 00:00:00 51471.99 55750.00 50382.41 55315.00 79877.545181 20.539207
137 2021-07-10 00:00:00 55315.00 55332.31 53357.00 53785.22 54917.377660 20.881611
138 2021-08-10 00:00:00 53785.22 56100.00 53617.61 53951.43 46160.257850 21.322501
139 2021-09-10 00:00:00 53955.67 55489.00 53661.67 54949.72 55177.080130 21.741347
140 2021-10-10 00:00:00 54949.72 56561.31 54080.00 54659.00 89237.836128 22.304343
141 2021-11-10 00:00:00 54659.01 57839.04 54415.06 57471.35 52933.165751 23.025557
142 2021-12-10 00:00:00 57471.35 57680.00 53879.00 55996.93 53471.285500 23.546775
143 2021-10-13 00:00:00 55996.91 57777.00 54167.19 57367.00 55808.444920 24.057061
144 2021-10-14 00:00:00 57370.83 58532.54 56818.05 57347.94 43053.336781 24.660876
145 2021-10-15 00:00:00 57347.94 62933.00 56850.00 61672.42 82512.908022 25.811065
146 2021-10-16 00:00:00 61672.42 62378.42 60150.00 60875.57 35467.880960 26.903744
147 2021-10-17 00:00:00 60875.57 61718.39 58963.00 61528.33 39099.241240 27.563757
148 2021-10-18 00:00:00 61528.32 62695.78 59844.45 62009.84 51798.448440 28.318027
149 2021-10-19 00:00:00 62005.60 64486.00 61322.22 64280.59 53628.107744 29.251726
150 2021-10-20 00:00:00 64280.59 67000.00 63481.40 66001.41 51428.934856 30.405550
151 2021-10-21 00:00:00 66001.40 66639.74 62000.00 62193.15 68538.645370 31.054053
152 2021-10-22 00:00:00 62193.15 63732.39 60000.00 60688.22 52119.358860 31.117531
153 2021-10-23 00:00:00 60688.23 61747.64 59562.15 61286.75 27626.936780 31.062358
154 2021-10-24 00:00:00 61286.75 61500.00 59510.63 60852.22 31226.576760 30.995921
155 2021-10-25 00:00:00 60852.22 63710.63 60650.00 63078.78 36853.838060 31.244720
156 2021-10-26 00:00:00 63078.78 63293.48 59817.55 60328.81 40217.500830 31.249961
157 2021-10-27 00:00:00 60328.81 61496.00 58000.00 58413.44 62124.490160 30.779004
158 2021-10-28 00:00:00 58413.44 62499.00 57820.00 60575.89 61056.353010 30.489479
159 2021-10-29 00:00:00 60575.90 62980.00 60174.81 62253.71 43973.904140 30.289382
160 2021-10-30 00:00:00 62253.70 62359.25 60673.00 61859.19 31478.125660 30.099291
161 2021-10-31 00:00:00 61859.19 62405.30 59945.36 61299.80 39267.637940 29.713720
162 2021-01-11 00:00:00 61299.81 62437.74 59405.00 60911.11 44687.666720 29.196216
163 2021-02-11 00:00:00 60911.12 64270.00 60624.68 63219.99 46368.284100 29.031364
164 2021-03-11 00:00:00 63220.57 63500.00 60382.76 62896.48 43336.090490 28.804634
165 2021-04-11 00:00:00 62896.49 63086.31 60677.01 61395.01 35930.933140 28.589242
166 2021-05-11 00:00:00 61395.01 62595.72 60721.00 60937.12 31604.487490 28.384619
167 2021-06-11 00:00:00 60940.18 61560.49 60050.00 61470.61 25590.574080 27.973716
168 2021-07-11 00:00:00 61470.62 63286.35 61322.78 63273.59 25515.688300 27.926901
169 2021-08-11 00:00:00 63273.58 67789.00 63273.58 67525.83 54442.094554 28.579845
170 2021-09-11 00:00:00 67525.82 68524.25 66222.40 66947.66 44661.378068 29.294016
171 2021-10-11 00:00:00 66947.67 69000.00 62822.90 64882.43 65171.504046 29.014734
172 2021-11-11 00:00:00 64882.42 65600.07 64100.00 64774.26 37237.980580 28.749416
173 2021-12-11 00:00:00 64774.25 65450.70 62278.00 64122.23 44490.108160 28.041179
174 2021-11-13 00:00:00 64122.22 65000.00 63360.22 64380.00 22504.973830 27.368353
175 2021-11-14 00:00:00 64380.01 65550.51 63576.27 65519.10 25705.073470 26.832078
176 2021-11-15 00:00:00 65519.11 66401.82 63400.00 63606.74 37829.371240 26.479925
177 2021-11-16 00:00:00 63606.73 63617.31 58574.07 60058.87 77455.156090 25.267463
178 2021-11-17 00:00:00 60058.87 60840.23 58373.00 60344.87 46289.384910 24.154719
179 2021-11-18 00:00:00 60344.86 60976.00 56474.26 56891.62 62146.999310 23.454728
180 2021-11-19 00:00:00 56891.62 58320.00 55600.00 58052.24 50715.887260 22.944550
181 2021-11-20 00:00:00 58057.10 59845.00 57353.00 59707.51 33811.590100 22.122892
182 2021-11-21 00:00:00 59707.52 60029.76 58486.65 58622.02 31902.227850 21.302202
183 2021-11-22 00:00:00 58617.70 59444.00 55610.00 56247.18 51724.320470 21.040602
184 2021-11-23 00:00:00 56243.83 58009.99 55317.00 57541.27 49917.850170 20.840946
185 2021-11-24 00:00:00 57541.26 57735.00 55837.00 57138.29 39612.049640 20.651273
186 2021-11-25 00:00:00 57138.29 59398.90 57000.00 58960.36 42153.515220 20.071560
187 2021-11-26 00:00:00 58960.37 59150.00 53500.00 53726.53 65927.870660 20.117912
188 2021-11-27 00:00:00 53723.72 55280.00 53610.00 54721.03 29716.999570 20.161946
189 2021-11-28 00:00:00 54716.47 57445.05 53256.64 57274.88 36163.713700 19.704241
190 2021-11-29 00:00:00 57274.89 58865.97 56666.67 57776.25 40125.280090 18.969898
191 2021-11-30 00:00:00 57776.25 59176.99 55875.55 56950.56 49161.051940 18.417868
192 2021-01-12 00:00:00 56950.56 59053.55 56458.01 57184.07 44956.636560 17.893439
193 2021-02-12 00:00:00 57184.07 57375.47 55777.77 56480.34 37574.059760 17.525876
194 2021-03-12 00:00:00 56484.26 57600.00 51680.00 53601.05 58927.690270 17.858850
195 2021-04-12 00:00:00 53601.05 53859.10 42000.30 49152.47 114203.373748 19.217441
196 2021-05-12 00:00:00 49152.46 49699.05 47727.21 49396.33 45580.820120 20.508102
197 2021-06-12 00:00:00 49396.32 50891.11 47100.00 50441.92 58571.215750 21.472003
198 2021-07-12 00:00:00 50441.91 51936.33 50039.74 50588.95 38253.468770 22.161968
199 2021-08-12 00:00:00 50588.95 51200.00 48600.00 50471.19 38425.924660 22.962218
200 2021-09-12 00:00:00 50471.19 50797.76 47320.00 47545.59 37692.686650 23.846688
201 2021-10-12 00:00:00 47535.90 50125.00 46852.00 47140.54 44233.573910 24.732127
202 2021-11-12 00:00:00 47140.54 49485.71 46751.00 49389.99 28889.193580 25.583369
203 2021-12-12 00:00:00 49389.99 50777.00 48638.00 50053.90 26017.934210 26.077754
204 2021-12-13 00:00:00 50053.90 50189.97 45672.75 46702.75 50869.520930 26.859770
205 2021-12-14 00:00:00 46702.76 48700.41 46290.00 48343.28 39955.984450 27.602685
206 2021-12-15 00:00:00 48336.95 49500.00 46547.00 48864.98 51629.181000 28.109255
207 2021-12-16 00:00:00 48864.98 49436.43 47511.00 47632.38 31949.867390 28.590496
208 2021-12-17 00:00:00 47632.38 47995.96 45456.00 46131.20 43104.488700 29.278437
209 2021-12-18 00:00:00 46133.83 47392.37 45500.00 46834.48 25020.052710 29.931981
210 2021-12-19 00:00:00 46834.47 48300.01 46406.91 46681.23 29305.706650 30.303705
211 2021-12-20 00:00:00 46681.24 47537.57 45558.85 46914.16 35848.506090 30.761072
212 2021-12-21 00:00:00 46914.17 49328.96 46630.00 48889.88 37713.929240 30.715132
213 2021-12-22 00:00:00 48887.59 49576.13 48421.87 48588.16 27004.202200 30.607162
214 2021-12-23 00:00:00 48588.17 51375.00 47920.42 50838.81 35192.540460 30.051098
215 2021-12-24 00:00:00 50838.82 51810.00 50384.43 50820.00 31661.949460 29.417439
When run below code is well. But I need date in x axis
test['avg'].plot(legend=True,figsize=(12,5))
plt.grid(True)
plt.xlabel('ADX')
plt.ylabel('date')
plt.title('ADX indicator')
plt.gcf().autofmt_xdate()
plt.show()
Correct plot:
But when I chose date for x axis, I take a bad plot. Code is below:
df.set_index('date',drop=True, inplace=True)
Modified data
test['avg'].plot(legend=True,figsize=(12,5))
plt.grid(True)
plt.xlabel('ADX')
plt.ylabel('date')
plt.title('ADX indicator')
plt.gcf().autofmt_xdate()
plt.show()
Bad plot:
and also why I take NaN value for ADX in TA-lib
Can you help me with this problem?
It does appear to be the problem of the source file. The column names are not tab separated. Once this is fixed, the plotting works fine.
The NaN issue is also the source file; the average was not calculated for the first several rows.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
test = pd.read_csv(r"modified_data.dat", sep='\t')
test.set_index('date')
date = test['date']
avg = test['avg']
fig, ax = plt.subplots(figsize=(20,10))
ax.plot(date, avg)
ax.tick_params(rotation=30, width = 2)
plt.xticks(np.arange(0, len(date)+1, 5))
ax.set_xticks
Output looks like this:
df1
slot Time Location User
56 2017-10-26 22:15:00 89 1
2 2017-10-27 00:30:00 54 1
20 2017-10-28 05:00:00 64 1
24 2017-10-29 06:00:00 2 1
91 2017-11-01 22:45:00 78 1
62 2017-11-02 15:30:00 99 1
91 2017-11-02 22:45:00 34 1
47 2017-10-26 20:15:00 465 2
1 2017-10-27 00:10:00 67 2
20 2017-10-28 05:00:00 5746 2
28 2017-10-29 07:00:00 36 2
91 2017-11-01 22:45:00 786 2
58 2017-11-02 14:30:00 477 2
95 2017-11-02 23:45:00 7322 2
df2
slot
2
91
62
58
I need the output df3 as
slot Time Location User
2 2017-10-27 00:30:00 54 1
91 2017-11-01 22:45:00 78 1
91 2017-11-02 22:45:00 34 1
91 2017-11-01 22:45:00 786 2
62 2017-11-02 15:30:00 99 1
58 2017-11-02 14:30:00 477 2
if those are csv file then we can join them
join File1 file2 > file3
But how can we do the same for the outputs in Jupyter notebook
Try isin:
df1[df1.slot.isin(df2.slot)]
Output:
slot Time Location User
1 2 2017-10-27 00:30:00 54 1
4 91 2017-11-01 22:45:00 78 1
5 62 2017-11-02 15:30:00 99 1
6 91 2017-11-02 22:45:00 34 1
11 91 2017-11-01 22:45:00 786 2
12 58 2017-11-02 14:30:00 477 2
I'm trying to fill the missing slots in the CSV file which has date and time as a string.
My input from a csv file is:
A B C
56 2017-10-26 22:15:00 89
2 2017-10-27 00:30:00 54
20 2017-10-28 05:00:00 64
24 2017-10-29 06:00:00 2
91 2017-11-01 22:45:00 78
62 2017-11-02 15:30:00 99
91 2017-11-02 22:45:00 34
Output should be
A B C
0 2017-10-26 00:00:00 89
1 2017-10-26 00:15:00 89
.
.
.
.
.
56 2017-10-26 22:15:00 89
..
.
.
.
.
96 2017-10-26 23:45:00 89
0 2017-10-27 00:00:00 54
1 2017-10-27 00:15:00 54
2 2017-10-27 00:30:00 54
.
.
.
20 2017-10-28 05:00:00 64
21 2017-10-28 05:15:00 64
.
.
.
.
24 2017-10-29 06:00:00 2
.
91 2017-11-01 22:45:00 78
.
62 2017-11-02 15:30:00 99
.
91 2017-11-02 22:45:00 34
The output range is 15 min time slots for days between 2017-10-26 -> 2017-11-02 and each day have 96 slots.
And the same as above.
Using resample to get 15-min intervalsand bfill to fill missing values in B:
df = df.set_index(pd.to_datetime(df.pop('B')))
df.loc[df.index.min().normalize()] = None
df = df.resample('15min').max().bfill()
df['A'] = 4*df.index.hour + df.index.minute//15
print(df)
Output:
A C
B
2017-10-26 00:00:00 0 89.0
2017-10-26 00:15:00 1 89.0
2017-10-26 00:30:00 2 89.0
... .. ...
2017-11-02 22:15:00 89 34.0
2017-11-02 22:30:00 90 34.0
2017-11-02 22:45:00 91 34.0
You need to resample your data and to fill missing values by propagating the last known value for each date. Pandas could be helpful to do that. Assuming you loaded your csv in pandas (with pandas.read_csv), and you obtained a dataframe (let's call it df) where the date column is your index (df.set_index('B')), then:
df.resample(rule='15M').ffill()
The rule parameter defines the new frequency, and the call to .ffill() means "forward fill", i.e., replace missing data with previous ones.
Let's say i have a multiindex dataframe like below.
ROW_ID HADM_ID ICUSTAY_ID
SUBJECT_ID CHARTTIME
23 2157-10-21 12:05:00 1 124321 234044.0
2157-10-21 14:00:00 30 124321 234044.0
2157-10-21 19:00:00 77 124321 234044.0
2157-10-22 00:00:00 148 124321 234044.0
2157-10-22 04:00:00 197 124321 234044.0
2157-10-22 08:00:00 226 124321 234044.0
2157-10-22 16:00:00 320 124321 234044.0
34 2191-02-23 08:00:00 367 144319 290505.0
2191-02-23 12:00:00 450 144319 290505.0
2191-02-23 15:00:00 476 144319 290505.0
2191-02-23 20:00:00 511 144319 290505.0
2191-02-24 00:00:00 538 144319 290505.0
2191-02-24 04:00:00 567 144319 290505.0
2191-02-24 07:00:00 608 144319 290505.0
2191-02-24 12:00:00 648 144319 290505.0
36 2134-05-12 07:00:00 685 165660 241249.0
2134-05-12 12:00:00 787 165660 241249.0
2134-05-12 16:00:00 855 165660 241249.0
2134-05-12 20:00:00 924 165660 241249.0
2134-05-13 00:00:00 988 165660 241249.0
SUBJECT_ID and CHARTTIME are the multiindex. now i want to get the row of every SUBJECT_ID with every first CHARTTIME.so the expected output is:
ROW_ID HADM_ID ICUSTAY_ID
SUBJECT_ID CHARTTIME
23 2157-10-21 12:05:00 1 124321 234044.0
34 2191-02-23 08:00:00 367 144319 290505.0
36 2134-05-12 07:00:00 685 165660 241249.0
I have tried to use iloc and xs, but it not work. any help will appreciate.
If you want to groupby on index, you must pass the level param instead of the by param
df = df.reset_index('CHARTTIME')
df = df.groupby(level=['SUBJECT_ID']).first().set_index('CHARTTIME', append=True)
I want to plot 2 bars side by side each other but i keep getting an error:
ValueError: Cannot shift with no freq
This error occurred when i set my x in the axes.bar to be x-width.
Here is my code:
df.date_1 = pd.to_datetime(df.date_1)
df_percent.date_1 = pd.to_datetime(df_percent.date_1)
df = df.set_index(df['date_1']).sort_index()
df_percent = df_percent.set_index(['date_1']).sort_index()
df_percent = df_percent.reindex(df.index).fillna(0)
fig, ax = plt.subplots(figsize=(10, 8))
ax.plot( df.index, df.line1,label='line1', c='b')
ax.plot( df.index, df.line2,label='line2', c='r')
ax2=ax.twinx()
#i added the x-10 to the bar chart that i want to shift to the right
ax2.bar(df_percent.index, df_after, width=10, alpha=0.1, color='r', label='after')
ax2.bar(df_percent.index-10, df_before, width=10, alpha=0.1, color='g', label='before')
If i do a stacked bar chart it works fine.
date_1 line1 line2
date_1
2014-06-01 2014-06-01 65 66
2014-07-01 2014-07-01 68 70
2014-08-01 2014-08-01 62 65
2014-09-01 2014-09-01 62 76
2014-10-01 2014-10-01 63 66
2014-11-01 2014-11-01 79 80
2014-12-01 2014-12-01 80 50
2015-02-01 2015-02-01 70 72
2015-03-01 2015-03-01 67 67
2015-04-01 2015-04-01 69 60
2015-05-01 2015-05-01 66 83
date_1 before after
date_1
2014-06-01 2014-06-01 19.80 15.37
2014-07-01 2014-07-01 62.82 44.87
2014-08-01 2014-08-01 36.70 27.52
2014-09-01 2014-09-01 56.18 34.27
2014-10-01 2014-10-01 16.31 10.95
2014-11-01 2014-11-01 32.35 14.71
2014-12-01 2014-12-01 53.33 26.67
2015-02-01 2015-02-01 44.44 17.78
2015-03-01 2015-03-01 23.08 23.08
2015-04-01 2015-04-01 36.84 15.79
2015-05-01 2015-05-01 46.58 13.70
I think the error message (interpreted as time index cannot shift with an integer with no time frequency attached) complains about the last line of your code df_percent.index-10. It tries to tell pandas to subtract an integer 10 from pd.DatetimeIndex, which is not defined.
By -10, you means shifting the datetime index by 10 days? (use df_percent.index - pd.tseries.offsets.DateOffset(10, 'day') if this is what you want) or simply get index by excluding the first 10 days? (use df_percent.index[10:])
For pandas to plot bars side-by-side:
import pandas as pd
import matplotlib.pyplot as plt
# your data
# =========================================
print(df)
date_1 line1 line2
date_1
2014-06-01 2014-06-01 65 66
2014-07-01 2014-07-01 68 70
2014-08-01 2014-08-01 62 65
2014-09-01 2014-09-01 62 76
2014-10-01 2014-10-01 63 66
2014-11-01 2014-11-01 79 80
2014-12-01 2014-12-01 80 50
2015-02-01 2015-02-01 70 72
2015-03-01 2015-03-01 67 67
2015-04-01 2015-04-01 69 60
2015-05-01 2015-05-01 66 83
print(df_percent)
date_1 before after
date_1
2014-06-01 2014-06-01 19.80 15.37
2014-07-01 2014-07-01 62.82 44.87
2014-08-01 2014-08-01 36.70 27.52
2014-09-01 2014-09-01 56.18 34.27
2014-10-01 2014-10-01 16.31 10.95
2014-11-01 2014-11-01 32.35 14.71
2014-12-01 2014-12-01 53.33 26.67
2015-02-01 2015-02-01 44.44 17.78
2015-03-01 2015-03-01 23.08 23.08
2015-04-01 2015-04-01 36.84 15.79
2015-05-01 2015-05-01 46.58 13.70
# plot
# ========================================
fig, ax = plt.subplots(figsize=(14, 8))
df[['line1', 'line2']].plot(ax=ax, color=['b', 'r'])
ax2 = ax.twinx()
df_percent[['before', 'after']].plot(kind='bar', ax=ax2, color=['r', 'g'], alpha=0.1)