In the terminal, I have pd.options.display.max_rows set to 60. But for a series that goes over 60 rows, the display is truncated down to show only 10 rows. How do I increase the number of truncated rows shown?
For example, the following (which is within the max_rows setting), shows 60 rows of data:
s = pd.date_range('2019-01-01', '2019-06-01').to_series()
s[:60]
But if I ask for 61 rows, it gets severely truncated:
In [44]: s[:61]
Out[44]:
2019-01-01 2019-01-01
2019-01-02 2019-01-02
2019-01-03 2019-01-03
2019-01-04 2019-01-04
2019-01-05 2019-01-05
...
2019-02-26 2019-02-26
2019-02-27 2019-02-27
2019-02-28 2019-02-28
2019-03-01 2019-03-01
2019-03-02 2019-03-02
Freq: D, Length: 61, dtype: datetime64[ns]
How can I set it so that I see, for example, 20 rows, every time it goes beyond the max_rows limit?
From the docs, you can use pd.options.display.min_rows.
Once the display.max_rows is exceeded, the display.min_rows options determines how many rows are shown in the truncated repr.
Example:
>>> pd.set_option('max_rows', 59)
>>> pd.set_option('min_rows', 20)
>>> s = pd.date_range('2019-01-01', '2019-06-01').to_series()
>>> s[:60]
2019-01-01 2019-01-01
2019-01-02 2019-01-02
2019-01-03 2019-01-03
2019-01-04 2019-01-04
2019-01-05 2019-01-05
2019-01-06 2019-01-06
2019-01-07 2019-01-07
2019-01-08 2019-01-08
2019-01-09 2019-01-09
2019-01-10 2019-01-10
...
2019-02-20 2019-02-20
2019-02-21 2019-02-21
2019-02-22 2019-02-22
2019-02-23 2019-02-23
2019-02-24 2019-02-24
2019-02-25 2019-02-25
2019-02-26 2019-02-26
2019-02-27 2019-02-27
2019-02-28 2019-02-28
2019-03-01 2019-03-01
Freq: D, Length: 60, dtype: datetime64[ns]
Related
I have the following two dataframes:
print(df_diff)
print(df_census_occupation)
pacients
2019-01-01 00:10:00 1
2019-01-01 00:20:00 1
2019-01-01 00:30:00 -1
2019-01-02 10:00:00 1
2019-01-02 11:30:00 1
2019-01-03 00:00:00 -1
2019-01-03 15:00:00 -1
2019-01-03 23:30:00 -1
2019-01-04 00:00:00 1
2019-01-04 00:00:00 1
2019-01-04 10:00:00 -1
2019-01-04 10:00:00 -1
pacients_census
2019-01-01 10
2019-01-02 20
2019-01-03 30
2019-01-04 10
And I need to transform them into:
pacients
2019-01-01 00:00:00 10
2019-01-01 00:10:00 11
2019-01-01 00:20:00 12
2019-01-01 00:30:00 11
2019-01-02 00:00:00 20
2019-01-02 10:00:00 21
2019-01-02 11:30:00 22
2019-01-03 00:00:00 30
2019-01-03 00:00:00 29
2019-01-03 15:00:00 28
2019-01-03 23:30:00 27
2019-01-04 00:00:00 10
2019-01-04 00:00:00 11
2019-01-04 00:00:00 12
2019-01-04 10:00:00 11
2019-01-04 10:00:00 10
It's like a cumsum by day, where each day starts over again based on another dataframe (df_census_occupation). Attention must be taken to consider repeated hours, there may be days where we have exactly the same hour in df_diff, and such hours may also coincide with the start of the day in df_census_occupation. This is what happens in 2019-01-04 00:00:00 for example.
I tried using cumsum with masks and shifts, and also some groupby operations, but the code was becoming difficult to understand and it was not considering the repeated hours issue.
Auxiliary code to generate the two dataframes:
import datetime
df_diff_index = [
"2019-01-01 00:10:00",
"2019-01-01 00:20:00",
"2019-01-01 00:30:00",
"2019-01-02 10:00:00",
"2019-01-02 11:30:00",
"2019-01-03 00:00:00",
"2019-01-03 15:00:00",
"2019-01-03 23:30:00",
"2019-01-04 00:00:00",
"2019-01-04 00:00:00",
"2019-01-04 10:00:00",
"2019-01-04 10:00:00",
]
df_diff_index = [datetime.datetime.strptime(date, "%Y-%m-%d %H:%M:%S") for date in df_diff_index]
df_census_occupation_index = [
"2019-01-01",
"2019-01-02",
"2019-01-03",
"2019-01-04",
]
df_census_occupation_index = [datetime.datetime.strptime(date, "%Y-%m-%d") for date in df_census_occupation_index]
df_diff = pd.DataFrame({"pacients": [1, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, -1]}, index=df_diff_index)
df_census_occupation = pd.DataFrame({"pacients_census": [10, 20, 30, 10]}, index=df_census_occupation_index)
Concatenate to data, sort by index, then groupby on the day and cumsum:
out = pd.concat([df_census_occupation.rename(columns={'pacients_census':'pacients'}), df_diff]
).sort_index().groupby(pd.Grouper(freq='D')).cumsum()
Output:
pacients
2019-01-01 00:00:00 10
2019-01-01 00:10:00 11
2019-01-01 00:20:00 12
2019-01-01 00:30:00 11
2019-01-02 00:00:00 20
2019-01-02 10:00:00 21
2019-01-02 11:30:00 22
2019-01-03 00:00:00 30
2019-01-03 00:00:00 29
2019-01-03 15:00:00 28
2019-01-03 23:30:00 27
2019-01-04 00:00:00 10
2019-01-04 00:00:00 11
2019-01-04 00:00:00 12
2019-01-04 10:00:00 11
2019-01-04 10:00:00 10
note you may want to pass kind='mergesort' to sort_index so the sort is stable, i.e. concensus goes before the data.
So I have a series of dates and I want to split it into chunks based on continuity.Series looks like the following:
2019-01-01 36.581647
2019-01-02 35.988585
2019-01-03 35.781111
2019-01-04 35.126273
2019-01-05 34.401451
2019-01-06 34.351714
2019-01-07 34.175517
2019-01-08 33.622116
2019-01-09 32.861861
2019-01-10 32.915251
2019-01-11 32.866832
2019-01-12 32.214259
2019-01-13 31.707626
2019-01-14 32.556175
2019-01-15 32.674965
2019-01-16 32.391766
2019-01-17 32.463836
2019-01-18 32.151290
2019-01-19 31.952946
2019-01-20 31.739855
2019-01-21 31.355354
2019-01-22 31.271243
2019-01-23 31.273255
2019-01-24 31.442803
2019-01-25 32.034161
2019-01-26 31.455956
2019-01-27 31.408881
2019-01-28 31.066477
2019-01-29 30.489070
2019-01-30 30.356210
2019-01-31 30.470496
2019-02-01 29.949312
2019-02-02 29.916971
2019-02-03 29.865447
2019-02-04 29.512595
2019-02-05 29.297967
2019-02-06 28.743329
2019-02-07 28.509800
2019-02-08 27.681294
2019-02-10 26.441899
2019-02-11 26.787360
2019-02-12 27.368621
2019-02-13 27.085167
2019-02-14 26.856398
2019-02-15 26.793370
2019-02-16 26.334788
2019-02-17 25.906381
2019-02-18 25.367705
2019-02-19 24.939880
2019-02-20 25.021575
2019-02-21 25.006527
2019-02-22 24.984512
2019-02-23 24.372664
2019-02-24 24.183728
2019-10-10 23.970567
2019-10-11 24.755944
2019-10-12 25.155136
2019-10-13 25.273033
2019-10-14 25.490775
2019-10-15 25.864637
2019-10-16 26.168158
2019-10-17 26.600422
2019-10-18 26.959990
2019-10-19 26.965104
2019-10-20 27.128877
2019-10-21 26.908657
2019-10-22 26.979930
2019-10-23 26.816817
2019-10-24 27.058753
2019-10-25 27.453882
2019-10-26 27.358057
2019-10-27 27.374445
2019-10-28 27.418648
2019-10-29 27.458521
2019-10-30 27.859687
2019-10-31 28.093942
2019-11-01 28.494706
2019-11-02 28.517255
2019-11-03 28.492476
2019-11-04 28.723757
2019-11-05 28.835151
2019-11-06 29.367227
2019-11-07 29.920598
2019-11-08 29.746370
2019-11-09 29.498023
2019-11-10 29.745044
2019-11-11 30.935084
2019-11-12 31.710737
2019-11-13 32.890792
2019-11-14 33.011911
2019-11-15 33.121803
2019-11-16 32.805403
2019-11-17 32.887447
2019-11-18 33.350492
2019-11-19 33.525344
2019-11-20 33.791458
2019-11-21 33.674697
2019-11-22 33.642584
2019-11-23 33.704386
2019-11-24 33.472346
2019-11-25 33.317035
2019-11-26 32.934307
2019-11-27 33.573193
2019-11-28 32.840514
2019-11-29 33.085686
2019-11-30 33.138131
2019-12-01 33.344264
2019-12-02 33.524948
2019-12-03 33.694687
2019-12-04 33.836534
2019-12-05 34.343416
2019-12-06 34.321793
2019-12-07 34.156796
2019-12-08 34.399591
2019-12-09 34.931185
2019-12-10 35.294034
2019-12-11 35.021331
2019-12-12 34.292483
2019-12-13 34.330898
2019-12-14 34.354278
2019-12-15 34.436500
2019-12-16 34.869841
2019-12-17 34.932567
2019-12-18 34.855816
2019-12-19 35.226241
2019-12-20 35.184222
2019-12-21 35.456716
2019-12-22 35.730350
2019-12-23 35.739911
2019-12-24 35.800030
2019-12-25 35.896615
2019-12-26 35.871280
2019-12-27 35.509646
2019-12-28 35.235416
2019-12-29 34.848605
2019-12-30 34.926700
2019-12-31 34.787211
And I want to split it like:
chunk,start,end,value
0,2019-01-01,2019-02-24,35.235416
1,2019-10-10,2019-12-31,34.787211
Values are random and can be of any aggregated function. About that I dont care. But still cannot find a way to do it. The important thing is the chunks I get
I assume that your DataFrame:
has columns named Date and Amount,
Date column is of datetime type (not string).
To generate your result, define the following function, to be applied
to each group of rows:
def grpRes(grp):
return pd.Series([grp.Date.min(), grp.Date.max(), grp.Amount.mean()],
index=['start', 'end', 'value'])
Then apply it to each group and rename the index:
res = df.groupby(df.Date.diff().dt.days.fillna(1, downcast='infer')
.gt(1).cumsum()).apply(grpRes)
res.index.name = 'chunk'
I noticed that your data sample has no row for 2019-02-09, but you
dot't treat such single missing day as a violation of the
"continuity rule".
If you realy want such behaviour, change gt(1) to e.g. gt(2).
One way is boolean indexing, which assumes your data is already sorted. I also assumed your columns were named ['Date', 'Val]
#reset index so you have a dataframe
data = s.reset_index()
# boolean indexing where the date below is greater than 1 day
end = data[((data['Date'] - data['Date'].shift(-1)).dt.days.abs() != 1)].reset_index(drop=True).rename(columns={'Date':'End', 'Val': 'End_val'})
# boolean indexing where the date above is greater than one day
start = data[(data['Date'] - data['Date'].shift()).dt.days != 1].reset_index(drop=True).rename(columns={'Date':'Start', 'Val':'Start_val'})
# concat your data
pd.concat([start,end], axis=1)
Start Start_val End End_val
0 2019-01-01 36.581647 2019-02-08 27.681294
1 2019-02-10 26.441899 2019-02-24 24.183728
2 2019-10-10 23.970567 2019-12-31 34.787211
I have a dataframe that looks roughly like:
01/01/19 02/01/19 03/01/19 04/01/19
hour
1.0 27.08 47.73 54.24 10.0
2.0 26.06 49.53 46.09 22.0
...
24.0 12.0 34.0 22.0 40.0
I'd like to reduce its dimension to a single column with a proper date index concatenating all the columns. Is there a smart pandas way to do it?
Expected result... something like:
01/01/19 00:00:00 27.08
01/01/19 01:00:00 26.08
...
01/01/19 23:00:00 12.00
02/01/19 00:00:00 47.73
02/01/19 01:00:00 49.53
...
02/01/19 23:00:00 34.00
...
You can stack and then fix the index using pd.to_datetime and pd.to_timedelta:
u = df.stack()
u.index = (pd.to_datetime(u.index.get_level_values(1), dayfirst=True)
+ pd.to_timedelta(u.index.get_level_values(0) - 1, unit='h'))
u.sort_index()
2019-01-01 00:00:00 27.08
2019-01-01 01:00:00 26.06
2019-01-01 23:00:00 12.00
2019-01-02 00:00:00 47.73
2019-01-02 01:00:00 49.53
2019-01-02 23:00:00 34.00
2019-01-03 00:00:00 54.24
2019-01-03 01:00:00 46.09
2019-01-03 23:00:00 22.00
2019-01-04 00:00:00 10.00
2019-01-04 01:00:00 22.00
2019-01-04 23:00:00 40.00
dtype: float64
Let's say I have the below Dataframe. How would I do to get an extra column 'flag' with 1's where a day has a age bigger than 90 and only if it happens in 2 consecutive days (48h in this case)? The output should contain 1' on 2 or more days, depending on how many days the condition is met The dataset is much bigger, but I put here just a small portion so you get an idea.
Age
Dates
2019-01-01 00:00:00 29
2019-01-01 01:00:00 56
2019-01-01 02:00:00 82
2019-01-01 03:00:00 13
2019-01-01 04:00:00 35
2019-01-01 05:00:00 53
2019-01-01 06:00:00 25
2019-01-01 07:00:00 23
2019-01-01 08:00:00 21
2019-01-01 09:00:00 12
2019-01-01 10:00:00 15
2019-01-01 11:00:00 9
2019-01-01 12:00:00 13
2019-01-01 13:00:00 87
2019-01-01 14:00:00 9
2019-01-01 15:00:00 63
2019-01-01 16:00:00 62
2019-01-01 17:00:00 52
2019-01-01 18:00:00 43
2019-01-01 19:00:00 77
2019-01-01 20:00:00 95
2019-01-01 21:00:00 79
2019-01-01 22:00:00 77
2019-01-01 23:00:00 5
2019-01-02 00:00:00 78
2019-01-02 01:00:00 41
2019-01-02 02:00:00 10
2019-01-02 03:00:00 10
2019-01-02 04:00:00 88
2019-01-02 05:00:00 19
This would be the desired output:
Dates Age flag
0 2019-01-01 00:00:00 29 1
1 2019-01-01 01:00:00 56 1
2 2019-01-01 02:00:00 82 1
3 2019-01-01 03:00:00 13 1
4 2019-01-01 04:00:00 35 1
5 2019-01-01 05:00:00 53 1
6 2019-01-01 06:00:00 25 1
7 2019-01-01 07:00:00 23 1
8 2019-01-01 08:00:00 21 1
9 2019-01-01 09:00:00 12 1
10 2019-01-01 10:00:00 15 1
11 2019-01-01 11:00:00 9 1
12 2019-01-01 12:00:00 13 1
13 2019-01-01 13:00:00 87 1
14 2019-01-01 14:00:00 9 1
15 2019-01-01 15:00:00 63 1
16 2019-01-01 16:00:00 62 1
17 2019-01-01 17:00:00 52 1
18 2019-01-01 18:00:00 43 1
19 2019-01-01 19:00:00 77 1
20 2019-01-01 20:00:00 95 1
21 2019-01-01 21:00:00 79 1
22 2019-01-01 22:00:00 77 1
23 2019-01-01 23:00:00 5 1
24 2019-01-02 00:00:00 78 0
25 2019-01-02 01:00:00 41 0
26 2019-01-02 02:00:00 10 0
27 2019-01-02 03:00:00 10 0
28 2019-01-02 04:00:00 88 0
29 2019-01-02 05:00:00 19 0
The dates is the index of the dataframe and is incremented by 1h.
thanks
You can first compare column by Series.gt, then grouping by DatetimeIndex.date and ccheck if at least one True per groups by GroupBy.transform with GroupBy.any, last cast mask to integers for True/False to 1/0 mapping, then combinae it with previous answer:
df = pd.DataFrame({'Age': 10}, index=pd.date_range('2019-01-01', freq='5H', periods=24))
#for test 1H timestamp use
#df = pd.DataFrame({'Age': 10}, index=pd.date_range('2019-01-01', freq='H', periods=24 * 5))
df.loc[pd.Timestamp('2019-01-02 01:00:00'), 'Age'] = 95
df.loc[pd.Timestamp('2019-01-03 02:00:00'), 'Age'] = 95
df.loc[pd.Timestamp('2019-01-05 19:00:00'), 'Age'] = 95
#print (df)
#for test 48 consecutive values change N = 48
N = 10
s = df['Age'].gt(90)
s1 = (s.groupby(df.index.date).transform('any'))
g1 = s1.ne(s1.shift()).cumsum()
df['flag'] = (s.groupby(g1).transform('size').ge(N) & s1).astype(int)
print (df)
Age flag
2019-01-01 00:00:00 10 0
2019-01-01 05:00:00 10 0
2019-01-01 10:00:00 10 0
2019-01-01 15:00:00 10 0
2019-01-01 20:00:00 10 0
2019-01-02 01:00:00 95 1
2019-01-02 06:00:00 10 1
2019-01-02 11:00:00 10 1
2019-01-02 16:00:00 10 1
2019-01-02 21:00:00 10 1
2019-01-03 02:00:00 95 1
2019-01-03 07:00:00 10 1
2019-01-03 12:00:00 10 1
2019-01-03 17:00:00 10 1
2019-01-03 22:00:00 10 1
2019-01-04 03:00:00 10 0
2019-01-04 08:00:00 10 0
2019-01-04 13:00:00 10 0
2019-01-04 18:00:00 10 0
2019-01-04 23:00:00 10 0
2019-01-05 04:00:00 10 0
2019-01-05 09:00:00 10 0
2019-01-05 14:00:00 10 0
2019-01-05 19:00:00 95 0
Apparently, this could be a solution to the first version of the question: how to add a column whose row values are 1 if at least one of the rows with the same date (y-m-d) has an Age value greater than 90.
import pandas as pd
df = pd.DataFrame({
'Dates':['2019-01-01 00:00:00',
'2019-01-01 01:00:00',
'2019-01-01 02:00:00',
'2019-01-02 00:00:00',
'2019-01-02 01:00:00',
'2019-01-03 02:00:00',
'2019-01-03 03:00:00',],
'Age':[29, 56, 92, 13, 1, 2, 93],})
df.set_index('Dates', inplace=True)
df.index = pd.to_datetime(df.index)
df['flag'] = pd.DatetimeIndex(df.index).day
df['flag'] = df.flag.isin(df['flag'][df['Age']>90]).astype(int)
It returns:
Age flag
Dates
2019-01-01 00:00:00 29 1
2019-01-01 01:00:00 56 1
2019-01-01 02:00:00 92 1
2019-01-02 00:00:00 13 0
2019-01-02 01:00:00 1 0
2019-01-03 02:00:00 2 1
2019-01-03 03:00:00 93 1
I am trying to set up a function with two different dictionaries.
datetime demand
0 2016-01-01 00:00:00 50.038
1 2016-01-01 00:00:10 50.021
2 2016-01-01 00:00:20 50.013
datetime dap
2016-01-01 00:00:00+01:00 23.86
2016-01-01 01:00:00+01:00 22.39
2016-01-01 02:00:00+01:00 20.59
As you can see, the dates are equal however the deltaT is different.
The function I have set up is as follows
for key, value in dap.items():
a = demand * value
print(a)
How do I make sure that in this function the dap value 23.86 is used for the datetime interval 2016-01-01 00:00:00 until 2016-01-01 01:00:00? This would mean that from the first dictionary indexed values 1-6 should be applied in the equation for 2016-01-01 00:00:00+01:00 23.86, and indexed values 7-12 are used for dap value 22.39 and so on?
datetime demand
0 2019-01-01 00:00:00 50.038
1 2019-01-01 00:00:10 50.021
2 2019-01-01 00:00:20 50.013
3 2019-01-01 00:00:30 50.004
4 2019-01-01 00:00:40 50.004
5 2019-01-01 00:00:50 50.009
6 2019-01-01 00:01:00 50.012
7 2019-01-01 00:01:10 49.998
8 2019-01-01 00:01:20 49.983
9 2019-01-01 00:01:30 49.979
10 2019-01-01 00:01:40 49.983
11 2019-01-01 00:01:50 49.983
12 2019-01-01 00:02:00 49.983