Compute the running (cumulative) maximum for a series in pandas - python

Given:
d = {
'High': [954,
953,
952,
955,
956,
952,
951,
950,
]
}
df = pandas.DataFrame(d)
I want to add another column which is the max at each index from the beginning. For example the desired column would be:
'Max': [954,
954,
954,
955,
956,
956,
956,
956]
I tried with a pandas rolling function but the window cannot be dynamic it seems

Use cummax
df.High.cummax()
0 954
1 954
2 954
3 955
4 956
5 956
6 956
7 956
Name: High, dtype: int64
df['Max'] = df.High.cummax()
df

Related

How to remove unwanted lines in azure(python)

/usr/local/lib/python3.8/dist-packages/attr/__init__.py 27 0 100%
/usr/local/lib/python3.8/dist-packages/attr/_cmp.py 55 45 18% 51-100, 108-114, 122-137, 144-147, 154
/usr/local/lib/python3.8/dist-packages/attr/_compat.py 96 48 50% 22-24, 28-107, 123, 132, 153-156, 175, 191-212, 234, 241-242
/usr/local/lib/python3.8/dist-packages/attr/_config.py 9 4 56% 19-22, 33
/usr/local/lib/python3.8/dist-packages/attr/_funcs.py 96 84 12% 54-116, 130-189, 225-289, 301, 323-341, 360-370, 409-422
/usr/local/lib/python3.8/dist-packages/attr/_make.py 977 346 65% 84, 87, 90, 115-116, 121, 274, 280, 285, 293, 296, 299, 351-352, 413, 431, 450, 457-481, 501-507, 529-532, 556, 581, 590-591, 602, 611, 623-634, 642, 649, 734-754, 763, 792-796, 807-810, 838-839, 847, 881, 914-915, 918, 929-939, 954, 962-971, 1011, 1064, 1069-1090, 1098-1099, 1105-1106, 1112-1113, 1130, 1134, 1145, 1156, 1163, 1170-1171, 1186, 1212-1216, 1501, 1509, 1514, 1523, 1552, 1571, 1576, 1583, 1596, 1610, 1620, 1641-1646, 1690-1698, 1722-1732, 1758-1762, 1788-1799, 1829, 1840-1843, 1849-1852, 1858-1861, 1867-1870, 1928, 1954-2015, 2047-2054, 2075-2082, 2093-2099, 2103, 2131, 2138, 2144-2147, 2149, 2200, 2213, 2224, 2235-2287, 2313, 2336, 2344, 2380, 2388-2396, 2407-2418, 2428, 2447, 2454-2469, 2488, 2544-2553, 2558-2560, 2564-2569, 2694, 2702, 2732-2734, 2748-2752, 2759, 2768, 2771-2776, 2925-2929, 2941-2946, 2981, 2987-2988, 3035-3079, 3095-3096, 3109-3117, 3135-3173
/usr/local/lib/python3.8/dist-packages/attr/_next_gen.py 37 24 35% 82-147, 175, 198, 214
/usr/local/lib/python3.8/dist-packages/attr/_version_info.py 37 17 54% 60-69, 72-77, 80-87
/usr/local/lib/python3.8/dist-packages/attr/converters.py 58 47 19% 40-62, 83-114, 143-155
/usr/local/lib/python3.8/dist-packages/attr/exceptions.py 18 4 78% 89-91, 94
/usr/local/lib/python3.8/dist-packages/attr/filters.py 16 9 44% 17, 32-37, 49-54
/usr/local/lib/python3.8/dist-packages/attr/setters.py 28 16 43% 21-26, 37, 46-55, 65-69
/usr/local/lib/python3.8/dist-packages/yaml/resolver.py 135 97 28% 22-23, 30, 33, 51-89, 92-112, 115-118, 122-141, 144-165
/usr/local/lib/python3.8/dist-packages/yaml/scanner.py 753 672 11% 39-44, 60-109, 115-123, 128-133, 137-141, 146-154, 159-258, 272-277, 286-293, 301-310, 314-321, 340-347, 351-355, 364-367, 374-388, 393-400, 403, 406, 411-422, 425, 428, 433-445, 448, 451, 456-468, 473-482, 487-515, 520-543, 548-599, 604-610, 615-621, 626-632, 635, 638, 643-649, 652, 655, 660-666, 671-679, 687-688, 693-696, 701-704, 709, 714-719, 724-729, 745-746, 772-785, 789-804, 808-825, 829-842, 846-855, 859-865, 869-874, 878-883, 887-897, 908-933, 937-974, 979-1049, 1054-1090, 1094-1104, 1108-1119, 1123-1132, 1141-1155, 1187-1226, 1230-1250, 1254-1268, 1276-1309, 1315-1346, 1352-1370, 1375-1395, 1399-1414, 1425-1435
/usr/local/lib/python3.8/dist-packages/yaml/serializer.py 85 70 18% 17-25, 28-34, 37-41, 47-58, 61-72, 75-76, 79-110
/usr/local/lib/python3.8/dist-packages/yaml/tokens.py
these lines are checking for other repos,
So how to remove all these unwanted pipelines in azure, while running the pipeline
Please provide the solution

How can I split my data into chunks of 7 days each

I want to group every 7 days together. The problem is the first date is on Wednesday and I want my weeks to start on Monday and end on Sunday without dropping any data. Even the last date on my data is on Monday. This is how my data looks now:
date bike_numbers
0 2017-06-28 632
1 2017-06-29 1019
2 2017-06-30 1038
3 2017-07-01 475
4 2017-07-02 523
... ... ...
550 2018-12-30 2653
551 2018-12-31 3044
I want it to show the bike rides only, where rows are an array of 7. I want it to look like this:
[632, 1019, 1038, 475, 523, 600, 558][1103, 1277,1126, 956, 433, 1347, 1506]... and so on till the last date
Use:
s = df.groupby(df.index // 7)['bike_numbers'].agg(list)
print (s)
0 [632, 1019, 1038, 475, 523]
78 [2653, 3044]
Name: bike_numbers, dtype: object
print (s.tolist())
[[632, 1019, 1038, 475, 523], [2653, 3044]]

How to make dataframe from list of list

I have searched but till not able to figure out how to make data frame from below:
0 ([179, 142, 176, 177, 176, 180, 180, 180, 180,...
1 ([353, 314, 349, 349, 344, 359, 359, 359, 359,...
2 ([535, 504, 535, 535, 535, 540, 540, 540, 540,...
3 ([711, 664, 703, 703, 703, 721, 721, 721, 721,...
4 ([850, 810, 822, 822, 842, 857, 857, 857, 857,.
below is how single data looks
([179, 142, 176],
['Qtr- Oct-20','Qtr- Oct-20','Qtr- Oct-20',],
['High','Low','Close'],
[43.8, 26.05,33.1])
what i want is
0 1 2 3
0 179 Qtr- Oct-20 High 43.8
1 142 Qtr- Oct-20 Low 26.05
2 176 Qtr- Oct-20 High_Volume 1123132
3 177 Qtr- Oct-20 High_Delivery 42499
what i am getting
0
0 ([179, 142, 176, 177, 176, 180, 180, 180, 180,...
1 ([353, 314, 349, 349, 344, 359, 359, 359, 359,...
2 ([535, 504, 535, 535, 535, 540, 540, 540, 540,...
Let's do apply + pd.Series.explode:
pd.DataFrame(df['col'].tolist()).apply(pd.Series.explode).reset_index(drop=True)
0 1 2 3
0 179 Qtr- Oct-20 High 43.8
1 142 Qtr- Oct-20 Low 26.05
2 176 Qtr- Oct-20 Close 33.1
Note: df['col'] is the column in the dataframe which contains list of lists.

Appending two dataframes - AttributeError: 'NoneType' object has no attribute 'is_extension'

I have 2 dataframes (df1 and df2) which look like:
df1
Quarter Body Total requests Requests Processed … Requests on-hold
Q3 2019 A 93 92 … 0
Q3 2019 B 228 210 … 0
Q3 2019 C 180 178 … 0
Q3 2019 D 31 31 … 0
Q3 2019 E 555 483 … 0
df2
Quarter Body Total requests Requests Processed … Requests on-hold
Q2 2019 A 50 50 … 0
Q2 2019 B 191 177 … 0
Q2 2019 C 186 185 … 0
Q2 2019 D 35 35 … 0
Q2 2019 E 344 297 … 0
I am tring to append df2 onto df2 to create df3:
df3
Quarter Body Total requests Requests Processed … Requests on-hold
Q3 2019 A 93 92 … 0
Q3 2019 B 228 210 … 0
Q3 2019 C 180 178 … 0
Q3 2019 D 31 31 … 0
Q3 2019 E 555 483 … 0
Q2 2019 A 50 50 … 0
Q2 2019 B 191 177 … 0
Q2 2019 C 186 185 … 0
Q2 2019 D 35 35 … 0
Q2 2019 E 344 297 … 0
using:
df3= df1.append(df2)
but get the error:
AttributeError: 'NoneType' object has no attribute 'is_extension'
the full error trace is:
File "<ipython-input-405-e3e0e047dbc0>", line 1, in <module>
runfile('C:/2019_Q3/Code.py', wdir='C:/2019_Q3')
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/2019_Q3/Code.py", line 420, in <module>
main()
File "C:/2019_Q3/Code.py", line 319, in main
df3= df1.append(df2, ignore_index=True)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\frame.py", line 6692, in append
sort=sort)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\reshape\concat.py", line 229, in concat
return op.get_result()
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\reshape\concat.py", line 426, in get_result
copy=self.copy)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\internals\managers.py", line 2056, in concatenate_block_managers
elif is_uniform_join_units(join_units):
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\internals\concat.py", line 379, in is_uniform_join_units
all(not ju.is_na or ju.block.is_extension for ju in join_units) and
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\internals\concat.py", line 379, in <genexpr>
all(not ju.is_na or ju.block.is_extension for ju in join_units) and
AttributeError: 'NoneType' object has no attribute 'is_extension'
using:
df3= pd.concat([df1, df2], ignore_index=True)
gives me a error:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
the full error trace is:
Traceback (most recent call last):
File "<ipython-input-406-e3e0e047dbc0>", line 1, in <module>
runfile('C:/2019_Q3/Code.py', wdir='C:/2019_Q3')
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/2019_Q3/Code.py", line 421, in <module>
main()
File "C:/2019_Q3/Code.py", line 321, in main
finalCSV = pd.concat([PreviousCSVdf, df], ignore_index=True)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\reshape\concat.py", line 228, in concat
copy=copy, sort=sort)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\reshape\concat.py", line 381, in __init__
self.new_axes = self._get_new_axes()
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\reshape\concat.py", line 448, in _get_new_axes
new_axes[i] = self._get_comb_axis(i)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\reshape\concat.py", line 469, in _get_comb_axis
sort=self.sort)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\indexes\api.py", line 70, in _get_objs_combined_axis
return _get_combined_index(obs_idxes, intersect=intersect, sort=sort)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\indexes\api.py", line 117, in _get_combined_index
index = _union_indexes(indexes, sort=sort)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\indexes\api.py", line 183, in _union_indexes
result = result.union(other)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\indexes\base.py", line 2332, in union
indexer = self.get_indexer(other)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\indexes\base.py", line 2740, in get_indexer
raise InvalidIndexError('Reindexing only valid with uniquely'
Both df1 and df2 have identical numbers of columns and column names. How would I append df1 and df2?
This tends to happen when you have duplicate columns in one or both of datasets.
Also, for general use its easier to go with pd.concat:
pd.concat([df1, df2], ignore_index=True) # ignore_index will reset index for you
And for the InvalidIndexError you can remove duplicate rows:
df1 = df1.loc[~df1.index.duplicated(keep='first')]
df2 = df2.loc[~df2.index.duplicated(keep='first')]
I'll make this short and sweet. I had this same issue.
The issue is not caused by duplicate column names but instead by duplicate column names with different data types.
Swapping to pd.concat will not fix this issue for you if you don't address the data types first.

Pandas dataframe has column title, yet doesn't find the applied filter

I have a dataframe with column titles printed below:
Index(['Unnamed: 0', 'material', 'step', 'zaid', 'mass(gm)', 'activity(Ci)',
'spec.act(Ci/gm)', 'atomden(a/b-cm)', 'atom_frac', 'mass_frac'],
dtype='object')
If I try to obtain data for only, say step 16, and I perform the command:
print (df[(16 in df['step'] == 16)])
Things work as expected:
Unnamed: 0 material step zaid mass(gm) activity(Ci) spec.act(Ci/gm) atomden(a/b-cm) atom_frac mass_frac
447 447 1 16 90232 2.034000e-09 2.231000e-16 1.097000e-07 9.311000e-12 2.597000e-10 3.048000e-10
448 448 1 16 92233 2.451000e-08 2.362000e-10 9.636000e-03 1.117000e-10 3.116000e-09 3.672000e-09
449 449 1 16 92234 4.525000e-05 2.813000e-07 6.217000e-03 2.053000e-07 5.728000e-06 6.780000e-06
450 450 1 16 92235 1.640000e-01 3.544000e-07 2.161000e-06 7.408000e-04 2.067000e-02 2.457000e-02
451 451 1 16 92236 1.553000e-02 1.004000e-06 6.467000e-05 6.987000e-05 1.949000e-03 2.327000e-03
... ... ... ... ... ... ... ... ... ... ...
37781 37781 10 16 67165 5.941000e-05 0.000000e+00 0.000000e+00 1.195000e-08 3.311000e-07 2.785000e-07
37782 37782 10 16 68166 4.205000e-05 0.000000e+00 0.000000e+00 8.411000e-09 2.330000e-07 1.971000e-07
37783 37783 10 16 68167 1.804000e-05 0.000000e+00 0.000000e+00 3.586000e-09 9.934000e-08 8.457000e-08
37784 37784 10 16 68168 7.046000e-06 0.000000e+00 0.000000e+00 1.393000e-09 3.857000e-08 3.303000e-08
37785 37785 10 16 68170 7.317000e-07 0.000000e+00 0.000000e+00 1.429000e-10 3.958000e-09 3.430000e-09
However if I now want to grab data for just the zaid 92235 (which clearly exists as it is displayed in the step 16 results above), according to the command:
print (df[(92235 in df['zaid'] == 92235)])
I get the following error:
Traceback (most recent call last):
File "/Users/jack/Library/Python/3.7/lib/python/site-packages/pandas/core/indexes/base.py", line 2890, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: False
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "get_pincell_isos.py", line 57, in <module>
print (df[(92235 in df['zaid'] == 92235)])
File "/Users/jack/Library/Python/3.7/lib/python/site-packages/pandas/core/frame.py", line 2975, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/jack/Library/Python/3.7/lib/python/site-packages/pandas/core/indexes/base.py", line 2892, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: False
It apparently can't find "92235", even though I know it exists (shown above) and the data is stored as an int64, the same type as the values in "step". This is illustrated by printing all values from "step" and "zaid".
print (df['step'])
print (df['zaid'])
gives the following results:
0 0
1 0
2 0
3 0
4 0
..
37781 16
37782 16
37783 16
37784 16
37785 16
Name: step, Length: 37786, dtype: int64
0 90230
1 90231
2 90232
3 90233
4 90234
...
37781 67165
37782 68166
37783 68167
37784 68168
37785 68170
Name: zaid, Length: 37786, dtype: int64
I hope I'm missing something obvious. I've tried any number of ways to try to cross-section the 'zaid' column data and no attempts have been successful at recognizing any of the values associated with 'zaid'.
Thanks!
Try df[df['zaid'] == 92235]. Try the below code in any ipython console
import pandas as pd
data=data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002, 2003],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
df = pd.DataFrame(data)
df['state'] == 'Nevada'
df[df['state'] == 'Nevada']

Categories