Running Hyperopt in Freqtrade and getting crazy results - python

I ran hyperopt for 5000 iterations and got the following results:
2022-01-10 19:38:31,370 - freqtrade.optimize.hyperopt - INFO - Best result:
1101 trades. Avg profit 0.23%. Total profit 25.48064438 BTC (254.5519Σ%). Avg duration 888.1 mins.
with values:
{ 'roi_p1': 0.011364434095803464,
'roi_p2': 0.04123147845715937,
'roi_p3': 0.10554480985209454,
'roi_t1': 105,
'roi_t2': 47,
'roi_t3': 30,
'rsi-enabled': True,
'rsi-value': 9,
'sell-rsi-enabled': True,
'sell-rsi-value': 94,
'sell-trigger': 'sell-bb_middle1',
'stoploss': -0.42267640639979365,
'trigger': 'bb_lower2'}
2022-01-10 19:38:31,371 - freqtrade.optimize.hyperopt - INFO - ROI table:
{ 0: 0.15814072240505736,
30: 0.05259591255296283,
77: 0.011364434095803464,
182: 0}
Result for strategy BBRSI
================================================== BACKTESTING REPORT =================================================
| pair | buy count | avg profit % | cum profit % | total profit BTC | avg duration | profit | loss |
|:----------|------------:|---------------:|---------------:|-------------------:|:----------------|---------:|-------:|
| ETH/BTC | 11 | -1.30 | -14.26 | -1.42732928 | 3 days, 4:55:00 | 0 | 1 |
| LUNA/BTC | 17 | 0.60 | 10.22 | 1.02279906 | 15:46:00 | 9 | 0 |
| SAND/BTC | 37 | 0.30 | 11.24 | 1.12513532 | 6:16:00 | 14 | 1 |
| MATIC/BTC | 24 | 0.47 | 11.35 | 1.13644340 | 12:20:00 | 10 | 0 |
| ADA/BTC | 24 | 0.24 | 5.68 | 0.56822170 | 21:05:00 | 5 | 0 |
| BNB/BTC | 11 | -1.09 | -11.96 | -1.19716109 | 3 days, 0:44:00 | 2 | 1 |
| XRP/BTC | 20 | -0.39 | -7.71 | -0.77191523 | 1 day, 5:48:00 | 1 | 1 |
| DOT/BTC | 9 | 0.50 | 4.54 | 0.45457736 | 4 days, 1:13:00 | 4 | 0 |
| SOL/BTC | 19 | -0.38 | -7.16 | -0.71688463 | 22:47:00 | 3 | 1 |
| MANA/BTC | 29 | 0.38 | 11.16 | 1.11753320 | 10:25:00 | 9 | 1 |
| AVAX/BTC | 27 | 0.30 | 8.15 | 0.81561432 | 16:36:00 | 11 | 1 |
| GALA/BTC | 26 | -0.52 | -13.45 | -1.34594702 | 15:48:00 | 9 | 1 |
| LINK/BTC | 21 | 0.27 | 5.68 | 0.56822170 | 1 day, 0:06:00 | 5 | 0 |
| TOTAL | 275 | 0.05 | 13.48 | 1.34930881 | 23:42:00 | 82 | 8 |
================================================== SELL REASON STATS ==================================================
| Sell Reason | Count |
|:--------------|--------:|
| roi | 267 |
| force_sell | 8 |
=============================================== LEFT OPEN TRADES REPORT ===============================================
| pair | buy count | avg profit % | cum profit % | total profit BTC | avg duration | profit | loss |
|:---------|------------:|---------------:|---------------:|-------------------:|:------------------|---------:|-------:|
| ETH/BTC | 1 | -14.26 | -14.26 | -1.42732928 | 32 days, 4:00:00 | 0 | 1 |
| SAND/BTC | 1 | -4.65 | -4.65 | -0.46588544 | 17:00:00 | 0 | 1 |
| BNB/BTC | 1 | -14.23 | -14.23 | -1.42444977 | 31 days, 13:00:00 | 0 | 1 |
| XRP/BTC | 1 | -8.85 | -8.85 | -0.88555957 | 18 days, 4:00:00 | 0 | 1 |
| SOL/BTC | 1 | -10.57 | -10.57 | -1.05781765 | 5 days, 14:00:00 | 0 | 1 |
| MANA/BTC | 1 | -3.17 | -3.17 | -0.31758065 | 17:00:00 | 0 | 1 |
| AVAX/BTC | 1 | -12.58 | -12.58 | -1.25910300 | 7 days, 9:00:00 | 0 | 1 |
| GALA/BTC | 1 | -23.66 | -23.66 | -2.36874608 | 7 days, 12:00:00 | 0 | 1 |
| TOTAL | 8 | -11.50 | -91.97 | -9.20647144 | 12 days, 23:15:00 | 0 | 8 |
Have accurately followed the tutorial. Don't know what I am doing wrong here.

Related

How to convert gridded csv temperature data (by lat/long) to a raster map?

I've downloaded an average temperature change dataset formatted like this but with lat/long range across the entire US:
original csv
I'm trying to convert it into a raster that I can visualize in a python or R map, and all methods I've seen require the lat, long and z fields to be tabular like this: ideal table
Is there a way to do this with the current "grid" format or do I need to transform it into a table? If the latter how can I do that in Excel or python/R?
Tried transposing data in Excel first, at a loss for other methods
Please include sample code/sample data when you ask a question here. Your data set pictured in the PNG was small enough, so I recreated it:
+----------+--------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| Lat/Long | -179.5 | -179 | -178.5 | -178 | -177.5 | -177 | -176.5 | -176 | -175.5 | -175 |
+----------+--------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| 18.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 20 | 0 | 1.524704 | 1.489677 | 1.488556 | 1.485161 | 0 | 0 | 0 | 0 | 0 |
| 20.5 | 0 | 1.484848 | 1.484863 | 1.484833 | 1.484802 | 1.516785 | 1.554611 | 1.5672 | 1.567184 | 0 |
| 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 21.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 22 | 0 | 0 | 0 | 0 | 0 | 0 | 1.586227 | 0 | 0 | 0 |
| 23 | 0 | 0 | 2.718926 | 2.743782 | 2.74353 | 0 | 1.64222 | 1.661705 | 1.720245 | 1.755074 |
| 23.5 | 0 | 0 | 0 | 3.006203 | 3.005981 | 0 | 0 | 0 | 0 | 1.808762 |
+----------+--------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
A problem like this would be better solved/dealt with in Numpy/Python, but if you want to do it in Excel, here are the steps I took to arrive at the end result posted below. I am assuming you are using Excel 365 on a Windows 10 PC. I am also assuming that you need help with the data set, not with the raster map itself.
The first problem you have is that there are blanks in your data where zeros should be. I have no idea how big this table is - probably really big - so if you want to do this in Excel, select the first cell in A1, then the last cell in column K while holding the SHIFT key. Click CTRL + H, which brings up the "Find & Replace" dialog. Replace all of the spaces with "0".
Format your data as a table in Excel by clicking within the range, and then on the Home tab "Format as Table" in the "Styles" group. The style you pick does not matter. I named the table "Original" (select a cell in the table, then click on the "Table Design" tab which appears in the top right; change the table name in the ribbon under "Properties" on the left).
Click on the "Data" tab while the table is still selected, then select "From Table/Range" in the "Get & Transform Data". This will open Power Pivot. Since you don't want to output this table again from the query, click on the arrow (NOT the button) next to "Close & Load" on the ribbon under "Close" and pick "Close & load to". This brings up a dialog box. Select "Only Create Connection" and then click "OK". If you accidentally hit the button itself, it will create a table on a new worksheet that is identical to the one you started with. You can delete the sheet later, which will convert the output of the query to a connection.
In the data tab, click on "Queries & Connections" in the "Queries & Connections" group. This brings up a sidebar on the right. Double-click the query you just created, which gets you back to Power Query:
I duplicated the original query, because we want to manipulate it further (right-click on the query in the left pane, then select "Duplicate"). Name the query something specific. I picked "Unpivoted".
Select the first column that contains the Latitude values. Then click on the arrow next to "Unpivot Columns" on the "Transform" tab and select "Unpivot Other Columns":
As a final step, I renamed the resulting columns "Latitude", "Longitude" and "Temperature", then clicked "Close & Load" to put the table onto its own worksheet.
Here is the resulting data set:
+----------+-----------+-------------+
| Latitude | Longitude | Temperature |
+----------+-----------+-------------+
| 18.5 | -179.5 | 0 |
| 18.5 | -179 | 0 |
| 18.5 | -178.5 | 0 |
| 18.5 | -178 | 0 |
| 18.5 | -177.5 | 0 |
| 18.5 | -177 | 0 |
| 18.5 | -176.5 | 0 |
| 18.5 | -176 | 0 |
| 18.5 | -175.5 | 0 |
| 18.5 | -175 | 0 |
| 19 | -179.5 | 0 |
| 19 | -179 | 0 |
| 19 | -178.5 | 0 |
| 19 | -178 | 0 |
| 19 | -177.5 | 0 |
| 19 | -177 | 0 |
| 19 | -176.5 | 0 |
| 19 | -176 | 0 |
| 19 | -175.5 | 0 |
| 19 | -175 | 0 |
| 19.5 | -179.5 | 0 |
| 19.5 | -179 | 0 |
| 19.5 | -178.5 | 0 |
| 19.5 | -178 | 0 |
| 19.5 | -177.5 | 0 |
| 19.5 | -177 | 0 |
| 19.5 | -176.5 | 0 |
| 19.5 | -176 | 0 |
| 19.5 | -175.5 | 0 |
| 19.5 | -175 | 0 |
| 20 | -179.5 | 0 |
| 20 | -179 | 1.524704 |
| 20 | -178.5 | 1.489677 |
| 20 | -178 | 1.488556 |
| 20 | -177.5 | 1.485161 |
| 20 | -177 | 0 |
| 20 | -176.5 | 0 |
| 20 | -176 | 0 |
| 20 | -175.5 | 0 |
| 20 | -175 | 0 |
| 20.5 | -179.5 | 0 |
| 20.5 | -179 | 1.484848 |
| 20.5 | -178.5 | 1.484863 |
| 20.5 | -178 | 1.484833 |
| 20.5 | -177.5 | 1.484802 |
| 20.5 | -177 | 1.516785 |
| 20.5 | -176.5 | 1.554611 |
| 20.5 | -176 | 1.5672 |
| 20.5 | -175.5 | 1.567184 |
| 20.5 | -175 | 0 |
| 21 | -179.5 | 0 |
| 21 | -179 | 0 |
| 21 | -178.5 | 0 |
| 21 | -178 | 0 |
| 21 | -177.5 | 0 |
| 21 | -177 | 0 |
| 21 | -176.5 | 0 |
| 21 | -176 | 0 |
| 21 | -175.5 | 0 |
| 21 | -175 | 0 |
| 21.5 | -179.5 | 0 |
| 21.5 | -179 | 0 |
| 21.5 | -178.5 | 0 |
| 21.5 | -178 | 0 |
| 21.5 | -177.5 | 0 |
| 21.5 | -177 | 0 |
| 21.5 | -176.5 | 0 |
| 21.5 | -176 | 0 |
| 21.5 | -175.5 | 0 |
| 21.5 | -175 | 0 |
| 22 | -179.5 | 0 |
| 22 | -179 | 0 |
| 22 | -178.5 | 0 |
| 22 | -178 | 0 |
| 22 | -177.5 | 0 |
| 22 | -177 | 0 |
| 22 | -176.5 | 1.586227 |
| 22 | -176 | 0 |
| 22 | -175.5 | 0 |
| 22 | -175 | 0 |
| 23 | -179.5 | 0 |
| 23 | -179 | 0 |
| 23 | -178.5 | 2.718926 |
| 23 | -178 | 2.743782 |
| 23 | -177.5 | 2.74353 |
| 23 | -177 | 0 |
| 23 | -176.5 | 1.64222 |
| 23 | -176 | 1.661705 |
| 23 | -175.5 | 1.720245 |
| 23 | -175 | 1.755074 |
| 23.5 | -179.5 | 0 |
| 23.5 | -179 | 0 |
| 23.5 | -178.5 | 0 |
| 23.5 | -178 | 3.006203 |
| 23.5 | -177.5 | 3.005981 |
| 23.5 | -177 | 0 |
| 23.5 | -176.5 | 0 |
| 23.5 | -176 | 0 |
| 23.5 | -175.5 | 0 |
| 23.5 | -175 | 1.808762 |
+----------+-----------+-------------+
And this is the underlying M code:
let
Source = Excel.CurrentWorkbook(){[Name="Original"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Lat/Long", type number}, {"-179.5", type any}, {"-179", type number}, {"-178.5", type number}, {"-178", type number}, {"-177.5", type number}, {"-177", type number}, {"-176.5", type number}, {"-176", type number}, {"-175.5", type number}, {"-175", type number}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Lat/Long"}, "Attribute", "Value"),
#"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Lat/Long", "Latitude"}, {"Attribute", "Longitude"}, {"Value", "Temperature"}})
in
#"Renamed Columns"
I hope this is what you are looking for. Please click the check box by this answer to accept it if this solved your problem.

Filter rows based on condition in Pandas

I have dataframe df_groups that contain sample number, group number and accuracy.
Tabel 1: Samples with their groups
+----+----------+------------+------------+
| | sample | group | Accuracy |
|----+----------+------------+------------|
| 0 | 0 | 6 | 91.6 |
| 1 | 1 | 4 | 92.9333 |
| 2 | 2 | 2 | 91 |
| 3 | 3 | 2 | 90.0667 |
| 4 | 4 | 4 | 91.8 |
| 5 | 5 | 5 | 92.5667 |
| 6 | 6 | 6 | 91.1 |
| 7 | 7 | 5 | 92.3333 |
| 8 | 8 | 2 | 92.7667 |
| 9 | 9 | 0 | 91.1333 |
| 10 | 10 | 4 | 92.5 |
| 11 | 11 | 5 | 92.4 |
| 12 | 12 | 7 | 93.1333 |
| 13 | 13 | 7 | 93.5333 |
| 14 | 14 | 2 | 92.1 |
| 15 | 15 | 6 | 93.2 |
| 16 | 16 | 8 | 92.7333 |
| 17 | 17 | 8 | 90.8 |
| 18 | 18 | 3 | 91.9 |
| 19 | 19 | 3 | 93.3 |
| 20 | 20 | 5 | 90.6333 |
| 21 | 21 | 9 | 92.9333 |
| 22 | 22 | 4 | 93.3333 |
| 23 | 23 | 9 | 91.5333 |
| 24 | 24 | 9 | 92.9333 |
| 25 | 25 | 1 | 92.3 |
| 26 | 26 | 9 | 92.2333 |
| 27 | 27 | 6 | 91.9333 |
| 28 | 28 | 5 | 92.1 |
| 29 | 29 | 8 | 84.8 |
+----+----------+------------+------------+
I want to return a dataframe with any accuracy above (e.g. 92).
so the results will be like this
Tabel 1: Samples with their groups when accuracy above 92.
+----+----------+------------+------------+
| | sample | group | Accuracy |
|----+----------+------------+------------|
| 1 | 1 | 4 | 92.9333 |
| 2 | 5 | 5 | 92.5667 |
| 3 | 7 | 5 | 92.3333 |
| 4 | 8 | 2 | 92.7667 |
| 5 | 10 | 4 | 92.5 |
| 6 | 11 | 5 | 92.4 |
| 7 | 12 | 7 | 93.1333 |
| 8 | 13 | 7 | 93.5333 |
| 9 | 14 | 2 | 92.1 |
| 10 | 15 | 6 | 93.2 |
| 11 | 16 | 8 | 92.7333 |
| 12 | 19 | 3 | 93.3 |
| 13 | 21 | 9 | 92.9333 |
| 14 | 22 | 4 | 93.3333 |
| 15 | 24 | 9 | 92.9333 |
| 16 | 25 | 1 | 92.3 |
| 17 | 26 | 9 | 92.2333 |
| 18 | 28 | 5 | 92.1 |
+----+----------+------------+------------+
so, the result will return based on the condition that is greater than or equal to the predefined accuracy (e.g. 92, 90 or 85, ect).
You can use df.loc[df['Accuracy'] >= predefined_accuracy] .

How to calculate percentatge change on this simple data frame?

I have data that looks like this:
+------+---------+------+-------+
| Year | Cluster | AREA | COUNT |
+------+---------+------+-------+
| 2016 | 0 | 10 | 2952 |
| 2016 | 1 | 10 | 2556 |
| 2016 | 2 | 10 | 8867 |
| 2016 | 3 | 10 | 9786 |
| 2017 | 0 | 10 | 2470 |
| 2017 | 1 | 10 | 3729 |
| 2017 | 2 | 10 | 8825 |
| 2017 | 3 | 10 | 9114 |
| 2018 | 0 | 10 | 1313 |
| 2018 | 1 | 10 | 3564 |
| 2018 | 2 | 10 | 7245 |
| 2018 | 3 | 10 | 6990 |
+------+---------+------+-------+
I have to get the percentage changes for each cluster compared to the previous year, e.g.
+------+---------+-----------+-------+----------------+
| Year | Cluster | AREA | COUNT | Percent Change |
+------+---------+-----------+-------+----------------+
| 2016 | 0 | 10 | 2952 | NaN |
| 2017 | 0 | 10 | 2470 | -16.33% |
| 2018 | 0 | 10 | 1313 | -46.84% |
| 2016 | 1 | 10 | 2556 | NaN |
| 2017 | 1 | 10 | 3729 | 45.89% |
| 2018 | 1 | 10 | 3564 | -4.42% |
| 2016 | 2 | 10 | 8867 | NaN |
| 2017 | 2 | 10 | 8825 | -0.47% |
| 2018 | 2 | 10 | 7245 | -17.90% |
| 2016 | 3 | 10 | 9786 | NaN |
| 2017 | 3 | 10 | 9114 | -6.87% |
| 2018 | 3 | 10 | 6990 | -23.30% |
+------+---------+-----------+-------+----------------+
Is there any easy to do this?
I've tried a few things below, this seemed to make the most sense, but it returns NaN for each pct_change.
df['pct_change'] = df.groupby(['Cluster','Year'])['COUNT '].pct_change()
+------+---------+------+------------+------------+
| Year | Cluster | AREA | Count | pct_change |
+------+---------+------+------------+------------+
| 2016 | 0 | 10 | 295200.00% | NaN |
| 2016 | 1 | 10 | 255600.00% | NaN |
| 2016 | 2 | 10 | 886700.00% | NaN |
| 2016 | 3 | 10 | 978600.00% | NaN |
| 2017 | 0 | 10 | 247000.00% | NaN |
| 2017 | 1 | 10 | 372900.00% | NaN |
| 2017 | 2 | 10 | 882500.00% | NaN |
| 2017 | 3 | 10 | 911400.00% | NaN |
| 2018 | 0 | 10 | 131300.00% | NaN |
| 2018 | 1 | 10 | 356400.00% | NaN |
| 2018 | 2 | 10 | 724500.00% | NaN |
| 2018 | 3 | 10 | 699000.00% | NaN |
+------+---------+------+------------+------------+
Basically, I simply want the function to compare the year on year change for each cluster.
df['pct_change'] = df.groupby(['Cluster'])['Count'].pct_change()
df.sort_values('Cluster', axis = 0, ascending = True)
Another method going old school with transform
df['p'] = df.groupby('cluster')['count'].transform(lambda x: (x-x.shift())/x.shift())
df = df.sort_values(by='cluster')
print(df)
year cluster area count p
0 2016 0 10 2952 NaN
4 2017 0 10 2470 -0.163279
8 2018 0 10 1313 -0.468421
1 2016 1 10 2556 NaN
5 2017 1 10 3729 0.458920
9 2018 1 10 3564 -0.044248
2 2016 2 10 8867 NaN
6 2017 2 10 8825 -0.004737
10 2018 2 10 7245 -0.179037
3 2016 3 10 9786 NaN
7 2017 3 10 9114 -0.068670
11 2018 3 10 6990 -0.233048

Logical indexing in pandas dataframes [duplicate]

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 3 years ago.
I have some data like this:
+-----------+---------+-------+
| Duration | Outcome | Event |
+-----------+---------+-------+
| 421 | 0 | 1 |
| 421 | 0 | 1 |
| 261 | 0 | 1 |
| 24 | 0 | 1 |
| 27 | 0 | 1 |
| 613 | 0 | 1 |
| 2454 | 0 | 1 |
| 227 | 0 | 1 |
| 2560 | 0 | 1 |
| 229 | 0 | 1 |
| 2242 | 0 | 1 |
| 6680 | 0 | 1 |
| 1172 | 0 | 1 |
| 5656 | 0 | 1 |
| 5082 | 0 | 1 |
| 7239 | 0 | 1 |
| 127 | 0 | 1 |
| 128 | 0 | 1 |
| 128 | 0 | 1 |
| 7569 | 1 | 1 |
| 324 | 0 | 2 |
| 6395 | 0 | 2 |
| 6196 | 0 | 2 |
| 31 | 0 | 2 |
| 228 | 0 | 2 |
| 274 | 0 | 2 |
| 270 | 0 | 2 |
| 275 | 0 | 2 |
| 232 | 0 | 2 |
| 7310 | 0 | 2 |
| 7644 | 1 | 2 |
| 6949 | 0 | 3 |
| 6903 | 1 | 3 |
| 6942 | 0 | 4 |
| 7031 | 1 | 4 |
+-----------+---------+-------+
Now, for each Event, with the Outcome 0/1 considered as Fail/Pass, I want to sum the total Duration of Fail/Pass events separately in 2 new columns (or 1, whatever ensures readability).
I'm new to dataframes and I feel significant logical indexing is involved here. What is the best way to approach this problem?
df.groupby(['Event', 'Outcome'])['Duration'].sum()
So you group by both the event then the outcome, look at the duration column then take the sum of each group.
You can also try:
pd.pivot_table(index='Event',
columns='Outcome',
values='Duration',
data=df,
aggfunc='sum')
which gives you a table with two columns:
+---------+-------+------+
| Outcome | 0 | 1 |
+---------+-------+------+
| Event | | |
+---------+-------+------+
| 1 | 35691 | 7569 |
| 2 | 21535 | 7644 |
| 3 | 6949 | 6903 |
| 4 | 6942 | 7031 |
+---------+-------+------+

Numpy version of finding the highest and lowest value locations within an interval of another column?

Given the following numpy array. How can I find the highest and lowest value locations of column 0 within the interval on column 1 using numpy?
import numpy as np
data = np.array([
[1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
[1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
[1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
[1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
[1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
[1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
[1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
[1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
[1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
[1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
[1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
[1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
[1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
[1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
[1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
[1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
[1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
])
In the third column below you can see where the max and min is for each interval.
+-------+----------+-----------+---------+
| index | Value | Intervals | Min/Max |
+-------+----------+-----------+---------+
| 0 | 1879.289 | np.nan | |
| 1 | 1879.281 | np.nan | |
| 2 | 1879.292 | 1 | |
| 3 | 1879.295 | 1 | |
| 4 | 1879.481 | 1 | |
| 5 | 1879.294 | 1 | |
| 6 | 1879.268 | 1 | -1 | min
| 7 | 1879.293 | 1 | |
| 8 | 1879.277 | 1 | |
| 9 | 1879.285 | 1 | |
| 10 | 1879.464 | 1 | |
| 11 | 1879.475 | 1 | |
| 12 | 1879.971 | 1 | |
| 13 | 1879.779 | 1 | |
| 17 | 1879.986 | 1 | |
| 18 | 1880.791 | 1 | 1 | max
| 19 | 1880.29 | 1 | |
| 55 | 1879.253 | np.nan | |
| 56 | 1878.268 | np.nan | |
| 57 | 1875.73 | 1 | -1 |min
| 58 | 1876.792 | 1 | |
| 59 | 1875.977 | 1 | |
| 60 | 1876.408 | 1 | |
| 61 | 1877.159 | 1 | |
| 62 | 1877.187 | 1 | |
| 63 | 1883.164 | 1 | |
| 64 | 1883.171 | 1 | |
| 65 | 1883.495 | 1 | |
| 66 | 1883.962 | 1 | |
| 67 | 1885.158 | 1 | |
| 68 | 1885.974 | 1 | 1 | max
| 69 | 1886.479 | np.nan | |
| 70 | 1885.969 | np.nan | |
| 71 | 1884.693 | 1 | |
| 72 | 1884.977 | 1 | |
| 73 | 1884.967 | 1 | |
| 74 | 1884.691 | 1 | -1 | min
| 75 | 1886.171 | 1 | 1 | max
| 76 | 1886.166 | np.nan | |
| 77 | 1884.476 | np.nan | |
| 78 | 1884.66 | 1 | 1 | max
| 79 | 1882.962 | 1 | |
| 80 | 1881.496 | 1 | |
| 81 | 1871.163 | 1 | -1 | min
| 82 | 1874.985 | 1 | |
| 83 | 1874.979 | 1 | |
| 84 | 1871.173 | np.nan | |
| 85 | 1871.973 | np.nan | |
| 86 | 1871.682 | np.nan | |
| 87 | 1872.476 | np.nan | |
| 88 | 1882.361 | 1 | 1 | max
| 89 | 1880.869 | 1 | |
| 90 | 1882.165 | 1 | |
| 91 | 1881.857 | 1 | |
| 92 | 1880.375 | 1 | |
| 93 | 1880.66 | 1 | |
| 94 | 1880.891 | 1 | |
| 95 | 1880.377 | 1 | |
| 96 | 1881.663 | 1 | |
| 97 | 1881.66 | 1 | |
| 98 | 1877.888 | 1 | |
| 99 | 1875.69 | 1 | |
| 100 | 1875.161 | 1 | -1 | min
| 101 | 1876.697 | np.nan | |
| 102 | 1876.671 | np.nan | |
| 103 | 1879.666 | np.nan | |
| 111 | 1877.182 | np.nan | |
| 112 | 1878.898 | 1 | |
| 113 | 1878.668 | 1 | |
| 114 | 1878.871 | 1 | |
| 115 | 1878.882 | 1 | |
| 116 | 1879.173 | 1 | 1 | max
| 117 | 1878.887 | 1 | |
| 118 | 1878.68 | 1 | |
| 119 | 1878.872 | 1 | |
| 120 | 1878.677 | 1 | |
| 121 | 1877.877 | 1 | |
| 122 | 1877.669 | 1 | |
| 123 | 1877.69 | 1 | |
| 124 | 1877.684 | 1 | |
| 125 | 1877.68 | 1 | |
| 126 | 1877.885 | 1 | |
| 127 | 1877.863 | 1 | |
| 128 | 1877.674 | 1 | |
| 129 | 1877.676 | 1 | |
| 130 | 1877.687 | 1 | |
| 131 | 1878.367 | 1 | |
| 132 | 1878.179 | 1 | |
| 133 | 1877.696 | 1 | |
| 134 | 1877.665 | 1 | -1 | min
| 135 | 1877.667 | np.nan | |
| 136 | 1878.678 | np.nan | |
| 137 | 1878.661 | 1 | 1 | max
| 138 | 1878.171 | 1 | |
| 139 | 1877.371 | 1 | |
| 140 | 1877.359 | 1 | |
| 141 | 1878.381 | 1 | |
| 142 | 1875.185 | 1 | -1 | min
| 143 | 1875.367 | np.nan | |
| 144 | 1865.492 | np.nan | |
| 145 | 1865.495 | 1 | -1 | min
| 146 | 1866.995 | 1 | |
| 147 | 1866.672 | 1 | |
| 148 | 1867.465 | 1 | |
| 149 | 1867.663 | 1 | |
| 150 | 1867.186 | 1 | |
| 151 | 1867.687 | 1 | |
| 152 | 1867.459 | 1 | |
| 153 | 1867.168 | 1 | |
| 154 | 1869.689 | 1 | |
| 155 | 1869.693 | 1 | |
| 156 | 1871.676 | 1 | |
| 157 | 1873.174 | 1 | 1 | max
| 158 | 1873.691 | np.nan | |
| 159 | 1873.685 | np.nan | |
+-------+----------+-----------+---------+
I must specify upfront that this question has already been answered here with a pandas solution. The solution performs reasonable at about 300 seconds for a table of around 1 million rows. But after some more testing, I see that if the table is over 3 million rows, the execution time increases dramatically to over 2500 seconds and even more. This is obviously too long for such a simple task. How would the same problem be solved with numpy?
Here's one NumPy approach -
mask = ~np.isnan(data[:,1])
s0 = np.flatnonzero(mask[1:] > mask[:-1])+1
s1 = np.flatnonzero(mask[1:] < mask[:-1])+1
lens = s1 - s0
tags = np.repeat(np.arange(len(lens)), lens)
idx = np.lexsort((data[mask,0], tags))
starts = np.r_[0,lens.cumsum()]
offsets = np.r_[s0[0], s0[1:] - s1[:-1]]
offsets_cumsum = offsets.cumsum()
min_ids = idx[starts[:-1]] + offsets_cumsum
max_ids = idx[starts[1:]-1] + offsets_cumsum
out = np.full(data.shape[0], np.nan)
out[min_ids] = -1
out[max_ids] = 1
So this is a bit of a cheat since it uses scipy:
import numpy as np
from scipy import ndimage
markers = np.isnan(data[:, 1])
groups = np.cumsum(markers)
mins, max, min_idx, max_idx = ndimage.measurements.extrema(
data[:, 0], labels=groups, index=range(2, groups.max(), 2))

Categories