Related
I am processing data where values may be of the format '<x' I want to return 'x/2'. So <5 would be returned as '2.5'. I have columns of mixed numbers and text. The problem is that I want to style the values that have been changed. Dummy data and code:
dummy={'Location': {0: 'Perth', 1: 'Perth', 2: 'Perth', 3: 'Perth', 4: 'Perth', 5: 'Perth', 6: 'Perth', 7: 'Perth', 8: 'Perth', 9: 'Perth', 10: 'Perth', 11: 'Perth', 12: 'Perth', 13: 'Perth', 14: 'Perth', 15: 'Perth', 16: 'Perth', 17: 'Perth'}, 'Date': {0: '11/01/2012 0:00', 1: '11/01/2012 0:00', 2: '20/03/2012 0:00', 3: '6/06/2012 0:00', 4: '14/09/2012 0:00', 5: '17/12/2013 0:00', 6: '1/02/2014 0:00', 7: '1/02/2014 0:00', 8: '1/02/2014 0:00', 9: '1/02/2014 0:00', 10: '1/02/2014 0:00', 11: '1/02/2014 0:00', 12: '1/02/2014 0:00', 13: '1/02/2014 0:00', 14: '1/02/2014 0:00', 15: '1/02/2014 0:00', 16: '1/02/2014 0:00', 17: '1/02/2014 0:00'}, 'As µg/L': {0: '9630', 1: '9630', 2: '8580', 3: '4990', 4: '6100', 5: '282', 6: '21', 7: '<1', 8: '<1', 9: '<1', 10: '<1', 11: '<1', 12: '<1', 13: '<1', 14: '<1', 15: '<1', 16: '<1', 17: '<1'}, 'As': {0: '9.63', 1: '9.63', 2: '8.58', 3: '4.99', 4: '6.1', 5: '0.282', 6: '0.021', 7: '<1', 8: '<1', 9: '<1', 10: '<1', 11: '<1', 12: '<1', 13: '<1', 14: '<1', 15: '<1', 16: '<1', 17: '10'}, 'Ba': {0: 1000.0, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: np.nan, 11: np.nan, 12: np.nan, 13: np.nan, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}, 'HCO3': {0: '10.00', 1: '0.50', 2: '0.50', 3: '<22', 4: '0.50', 5: '0.50', 6: '0.50', 7: np.nan, 8: np.nan, 9: np.nan, 10: '0.50', 11: np.nan, 12: np.nan, 13: np.nan, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}, 'Cd': {0: 0.0094, 1: 0.0094, 2: 0.011, 3: 0.0035, 4: 0.004, 5: 0.002, 6: 0.0019, 7: np.nan, 8: np.nan, 9: np.nan, 10: np.nan, 11: np.nan, 12: np.nan, 13: np.nan, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}, 'Ca': {0: 248.0, 1: 248.0, 2: 232.0, 3: 108.0, 4: 150.0, 5: 396.0, 6: 472.0, 7: np.nan, 8: np.nan, 9: np.nan, 10: np.nan, 11: np.nan, 12: np.nan, 13: np.nan, 14: 472.0, 15: np.nan, 16: np.nan, 17: np.nan}, 'CO3': {0: 0.5, 1: 0.5, 2: 0.5, 3: 0.5, 4: 0.5, 5: 0.5, 6: 0.5, 7: np.nan, 8: np.nan, 9: 0.5, 10: np.nan, 11: np.nan, 12: np.nan, 13: np.nan, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}, 'Cl': {0: 2.0, 1: 2.0, 2: 2.0, 3: 2.0, 4: 0.5, 5: 2.0, 6: 5.0, 7: np.nan, 8: np.nan, 9: np.nan, 10: np.nan, 11: np.nan, 12: np.nan, 13: 5.0, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}}
df=pd.DataFrame(dummy)
import pandas a pd
import numpy as np
mask = df.applymap(lambda x: (isinstance(x, str) and x.startswith('<')))
def remove_less_thans(x):
if type(x) is int:
return x
elif type(x) is float:
return x
elif type(x) is str and x[0]=="<":
try:
return float(x[1:])/2
except:
return x
elif type(x) is str and len(x)<10:
try:
return float(x)
except:
return x
else:
return x
def colour_mask(val):
colour='color: red; font-weight: bold' if val in df.values[mask] else ''
return colour
#perform remove less-thans and divide the remainder by two
df=df.applymap(remove_less_thans)
styled_df= df.style.applymap(colour_mask)
styled_df
the mask looks correct, the remove < function works ok but I get values formatted when they shouldn't be. In the dummy data the HCO3 column has the 0.5 values reformatted even though they do no start with < and are not appearing as True in the mask. I know that they are numbers stored as text but that is how the real data might appear and given the mask is being constructed as expected (i.e. the one True is there and the rest of the values in the column are False) I don't know why they are being formatted. Same for column CO3, all the non-nan values are formatted when none should be. Why is this happening and how do I fix it? Dataframe
Output
Idea is pass mask to Styler.apply with numpy.where:
def colour_mask(x):
arr = np.where(mask, 'color: red; font-weight: bold', '')
return pd.DataFrame(arr, index=x.index, columns=x.columns)
styled_df = df.style.apply(colour_mask, axis=None)
Or:
def colour_mask(x, props=''):
return np.where(mask, props, '')
styled_df = df.style.apply(colour_mask, props='color: red; font-weight: bold', axis=None)
This question already has answers here:
Select rows based on multiple columns that are later than a certain date
(2 answers)
Closed 6 months ago.
I have the following dataframe:
Code-
df = {'sample_received': {1: 'NaN',
2: 'NaN',
17: 'NaN',
3: 'NaN',
4: 'NaN',
5: 'NaN',
6: 'NaN',
7: 'NaN',
8: 'NaN',
9: 'NaN',
10: 'NaN',
11: 'NaN',
12: 'NaN',
13: 'NaN',
14: '2022-08-01 20:15:28',
15: '2022-08-01 20:12:56',
16: '2022-08-01 20:18:19'},
'result_received': {1: '2022-07-28 12:25:37',
2: '2022-07-30 12:37:37',
17: '2022-07-28 12:45:37',
3: '2022-07-28 12:48:37',
4: '2022-07-28 12:49:37',
5: '2022-07-28 12:50:37',
6: '2022-07-28 12:21:37',
7: '2022-07-28 12:52:37',
8: '2022-07-28 12:54:37',
9: '2022-08-01 11:55:40',
10: '2022-08-01 13:56:15',
11: '2022-08-01 13:57:15',
12: '2022-08-01 13:58:28',
13: '2022-08-01 13:59:28',
14: '2022-08-02 08:33:39',
15: '2022-08-02 08:35:39',
16: '2022-08-02 08::39'},
'status': {1: 'Failed',
2: 'Failed',
17: 'Approved',
3: 'Approved',
4: 'Approved',
5: 'Approved',
6: 'Approved',
7: 'Approved',
8: 'Approved',
9: 'Approved',
10: 'Approved',
11: 'Approved',
12: 'Approved',
13: 'Approved',
14: 'Approved',
15: 'Approved',
16: 'Approved'}}
pd.DataFrame(df)
I would like to select all the rows in which the sample_received, or order_received is at least on the 1st of august. What would be the most effective way to do this? The main problem is that it could occur that the 'sample_received' column can have a date that is not mentioned. However, when the 'result_received' column contains a date that is on the 1st of august (in this case) I want the dataframe to include this. Or the other way around.
Thank you in advance.
This should do it.
cols = df.apply(lambda x: True if x.sample_received >= pd.Timestamp("2022-08-01") or x.order_received >= pd.Timestamp("2022-08-01") else False, axis=1)
df[cols]
I have a dataframe:
df_full = pd.DataFrame.from_dict({('group', ''): {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'A',
6: 'A',
7: 'B',
8: 'B',
9: 'B',
10: 'B',
11: 'B',
12: 'B',
13: 'B'},
('category', ''): {0: 'Books',
1: 'Candy',
2: 'Pencil',
3: 'Table',
4: 'PC',
5: 'Printer',
6: 'Lamp',
7: 'Books',
8: 'Candy',
9: 'Pencil',
10: 'Table',
11: 'PC',
12: 'Printer',
13: 'Lamp'},
(pd.Timestamp('2021-06-28 00:00:00'),
'Sales_1'): {0: 9.937449997200002, 1: 30.71300000639998, 2: 58.81199999639999, 3: 25.661999978399994, 4: 3.657999996, 5: 12.0879999972, 6: 61.16600000040001, 7: 6.319439989199998, 8: 12.333119997600003, 9: 24.0544100028, 10: 24.384659998799997, 11: 1.9992000012000002, 12: 0.324, 13: 40.69122000000001},
(pd.Timestamp('2021-06-28 00:00:00'),
'Sales_2'): {0: 21.890370397789923, 1: 28.300470581874837, 2: 53.52039700062155, 3: 52.425508769690694, 4: 6.384936971649232, 5: 6.807138946302334, 6: 52.172, 7: 5.916852561, 8: 5.810764652, 9: 12.1243325, 10: 17.88071596, 11: 0.913782413, 12: 0.869207661, 13: 20.9447844},
(pd.Timestamp('2021-06-28 00:00:00'), 'last_week_sales'): {0: np.nan,
1: np.nan,
2: np.nan,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: np.nan},
(pd.Timestamp('2021-06-28 00:00:00'), 'total_orders'): {0: 86.0,
1: 66.0,
2: 188.0,
3: 556.0,
4: 12.0,
5: 4.0,
6: 56.0,
7: 90.0,
8: 26.0,
9: 49.0,
10: 250.0,
11: 7.0,
12: 2.0,
13: 44.0},
(pd.Timestamp('2021-06-28 00:00:00'), 'total_sales'): {0: 4390.11,
1: 24825.059999999998,
2: 48592.39999999998,
3: 60629.77,
4: 831.22,
5: 1545.71,
6: 34584.99,
7: 5641.54,
8: 6798.75,
9: 13290.13,
10: 42692.68000000001,
11: 947.65,
12: 329.0,
13: 29889.65},
(pd.Timestamp('2021-07-05 00:00:00'),
'Sales_1'): {0: 13.690399997999998, 1: 38.723000005199985, 2: 72.4443400032, 3: 36.75802000560001, 4: 5.691999996, 5: 7.206999998399999, 6: 66.55265999039996, 7: 6.4613199911999954, 8: 12.845630001599998, 9: 26.032340003999998, 10: 30.1634600016, 11: 1.0203399996, 12: 1.4089999991999997, 13: 43.67116000320002},
(pd.Timestamp('2021-07-05 00:00:00'),
'Sales_2'): {0: 22.874363860953647, 1: 29.5726042895728, 2: 55.926190956481534, 3: 54.7820864335212, 4: 6.671946105284065, 5: 7.113126469779095, 6: 54.517, 7: 6.194107518, 8: 6.083562133, 9: 12.69221484, 10: 18.71872129, 11: 0.956574175, 12: 0.910216433, 13: 21.92632044},
(pd.Timestamp('2021-07-05 00:00:00'), 'last_week_sales'): {0: 4390.11,
1: 24825.059999999998,
2: 48592.39999999998,
3: 60629.77,
4: 831.22,
5: 1545.71,
6: 34584.99,
7: 5641.54,
8: 6798.75,
9: 13290.13,
10: 42692.68000000001,
11: 947.65,
12: 329.0,
13: 29889.65},
(pd.Timestamp('2021-07-05 00:00:00'), 'total_orders'): {0: 109.0,
1: 48.0,
2: 174.0,
3: 587.0,
4: 13.0,
5: 5.0,
6: 43.0,
7: 62.0,
8: 13.0,
9: 37.0,
10: 196.0,
11: 8.0,
12: 1.0,
13: 33.0},
(pd.Timestamp('2021-07-05 00:00:00'), 'total_sales'): {0: 3453.02,
1: 17868.730000000003,
2: 44707.82999999999,
3: 60558.97999999999,
4: 1261.0,
5: 1914.6000000000001,
6: 24146.09,
7: 6201.489999999999,
8: 5513.960000000001,
9: 9645.87,
10: 25086.785,
11: 663.0,
12: 448.61,
13: 26332.7}}).set_index(['group','category'])
I am trying to get a total for each column per category. So in this df example adding 2 lines below Lamp denoting the totals of each column. Red lines indicate the desired totals placement:
What I've tried:
df_out['total'] = df_out.sum(level=1).loc[:, (slice(None), 'total_sales')]
But get:
ValueError: Wrong number of items passed 4, placement implies 1
I also checked this question but could not apply it to my self.
Let us try groupby on level=0
s = df_full.groupby(level=0).sum()
s.index = pd.MultiIndex.from_product([s.index, ['Total']])
df_out = df_full.append(s).sort_index()
print(df_out)
2021-06-28 00:00:00 2021-07-05 00:00:00
Sales_1 Sales_2 last_week_sales total_orders total_sales Sales_1 Sales_2 last_week_sales total_orders total_sales
group category
A Books 9.93745 21.890370 NaN 86.0 4390.11 13.69040 22.874364 4390.11 109.0 3453.020
Candy 30.71300 28.300471 NaN 66.0 24825.06 38.72300 29.572604 24825.06 48.0 17868.730
Lamp 61.16600 52.172000 NaN 56.0 34584.99 66.55266 54.517000 34584.99 43.0 24146.090
PC 3.65800 6.384937 NaN 12.0 831.22 5.69200 6.671946 831.22 13.0 1261.000
Pencil 58.81200 53.520397 NaN 188.0 48592.40 72.44434 55.926191 48592.40 174.0 44707.830
Printer 12.08800 6.807139 NaN 4.0 1545.71 7.20700 7.113126 1545.71 5.0 1914.600
Table 25.66200 52.425509 NaN 556.0 60629.77 36.75802 54.782086 60629.77 587.0 60558.980
Total 202.03645 221.500823 0.0 968.0 175399.26 241.06742 231.457318 175399.26 979.0 153910.250
B Books 6.31944 5.916853 NaN 90.0 5641.54 6.46132 6.194108 5641.54 62.0 6201.490
Candy 12.33312 5.810765 NaN 26.0 6798.75 12.84563 6.083562 6798.75 13.0 5513.960
Lamp 40.69122 20.944784 NaN 44.0 29889.65 43.67116 21.926320 29889.65 33.0 26332.700
PC 1.99920 0.913782 NaN 7.0 947.65 1.02034 0.956574 947.65 8.0 663.000
Pencil 24.05441 12.124332 NaN 49.0 13290.13 26.03234 12.692215 13290.13 37.0 9645.870
Printer 0.32400 0.869208 NaN 2.0 329.00 1.40900 0.910216 329.00 1.0 448.610
Table 24.38466 17.880716 NaN 250.0 42692.68 30.16346 18.718721 42692.68 196.0 25086.785
Total 110.10605 64.460440 0.0 468.0 99589.40 121.60325 67.481717 99589.40 350.0 73892.415
I am trying to use plotly to compare the coefficents of regression models using error bars for the confidence intervals. I used the following code to plot it, using the variable as a categorical y axis in a scatter plot. The problem is that the points are overlapping, and I'd like to dodge them like happens in bar charts when you set barmode='group'. If I had a numerical axis I could manually dodge them, but I can't do that.
fig = px.scatter(
df, y='index', x='coef', text='label', color='model',
error_x_minus='lerr', error_x='uerr',
hover_data=['coef', 'pvalue', 'lower', 'upper']
)
fig.update_traces(textposition='top center')
fig.update_yaxes(autorange="reversed")
Using facets I get almost the result I want, but some of the labels goes off-plot and are not visible:
fig = px.scatter(
df, y='model', x='coef', text='label', color='model',
facet_row='index',
error_x_minus='lerr', error_x='uerr',
hover_data=['coef', 'pvalue', 'lower', 'upper']
)
fig.update_traces(textposition='top center')
fig.update_yaxes(visible=False)
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
Somebody has any idea or workaround for either dodging points in the first case or displaying labels in the second case?
Thanks in advance.
PS: Here's the random fake dataframe I made to generate the plots:
df = pd.DataFrame({'coef': {0: 1.0018729737113143,
1: 0.9408864645423858,
2: 0.29796556981484884,
3: -0.6844053575764955,
4: -0.13689631932690113,
5: 0.1473096200402363,
6: 0.9564712505670716,
7: 0.956099003887811,
8: 0.33319108930207175,
9: -0.7022778825729681,
10: -0.1773916842612131,
11: 0.09485417304851751},
'index': {0: 'const',
1: 'x1',
2: 'x2',
3: 'x3',
4: 'x4',
5: 'x5',
6: 'const',
7: 'x1',
8: 'x2',
9: 'x3',
10: 'x4',
11: 'x5'},
'label': {0: '1.002***',
1: '0.941***',
2: '0.298***',
3: '-0.684***',
4: '-0.137',
5: '0.147',
6: '0.956***',
7: '0.956***',
8: '0.333***',
9: '-0.702***',
10: '-0.177',
11: '0.095'},
'lerr': {0: 0.19788416996400904,
1: 0.19972987383410545,
2: 0.0606849959013587,
3: 0.1772734289533593,
4: 0.1988122854078155,
5: 0.21870366703236832,
6: 0.2734783191688098,
7: 0.2760291042678362,
8: 0.08386739920069491,
9: 0.2449940255063039,
10: 0.27476098595116555,
11: 0.3022511162310027},
'lower': {0: 0.8039888037473053,
1: 0.7411565907082803,
2: 0.23728057391349014,
3: -0.8616787865298547,
4: -0.33570860473471664,
5: -0.07139404699213203,
6: 0.6829929313982618,
7: 0.6800698996199748,
8: 0.24932369010137684,
9: -0.947271908079272,
10: -0.45215267021237865,
11: -0.2073969431824852},
'model': {0: 'OLS',
1: 'OLS',
2: 'OLS',
3: 'OLS',
4: 'OLS',
5: 'OLS',
6: 'QuantReg',
7: 'QuantReg',
8: 'QuantReg',
9: 'QuantReg',
10: 'QuantReg',
11: 'QuantReg'},
'pvalue': {0: 1.4211692095019375e-16,
1: 4.3583690618389965e-15,
2: 6.278403727223468e-16,
3: 1.596372747840846e-11,
4: 0.17483151363955116,
5: 0.18433051296752084,
6: 4.877385844808361e-10,
7: 6.665860891682504e-10,
8: 5.476882838731488e-12,
9: 1.4240852942202845e-07,
10: 0.20303143985022934,
11: 0.5347222575215599},
'uerr': {0: 0.19788416996400904,
1: 0.19972987383410556,
2: 0.06068499590135873,
3: 0.1772734289533593,
4: 0.19881228540781554,
5: 0.21870366703236832,
6: 0.27347831916880994,
7: 0.2760291042678362,
8: 0.08386739920069491,
9: 0.2449940255063039,
10: 0.27476098595116555,
11: 0.3022511162310027},
'upper': {0: 1.1997571436753234,
1: 1.1406163383764913,
2: 0.35865056571620757,
3: -0.5071319286231362,
4: 0.0619159660809144,
5: 0.3660132870726046,
6: 1.2299495697358815,
7: 1.2321281081556472,
8: 0.41705848850276667,
9: -0.4572838570666642,
10: 0.09736930168995245,
11: 0.3971052892795202}})
You were very close to a working solution with your second attempt. Just make more room for your labels with:
height=600, width=800
And then place the labels for the traces named 'OLS' within the boundaries of each subplot with:
fig.for_each_trace(lambda t: t.update(textposition='bottom center') if t.name == 'OLS' else ())
Plot:
Complete code:
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'coef': {0: 1.0018729737113143,
1: 0.9408864645423858,
2: 0.29796556981484884,
3: -0.6844053575764955,
4: -0.13689631932690113,
5: 0.1473096200402363,
6: 0.9564712505670716,
7: 0.956099003887811,
8: 0.33319108930207175,
9: -0.7022778825729681,
10: -0.1773916842612131,
11: 0.09485417304851751},
'index': {0: 'const',
1: 'x1',
2: 'x2',
3: 'x3',
4: 'x4',
5: 'x5',
6: 'const',
7: 'x1',
8: 'x2',
9: 'x3',
10: 'x4',
11: 'x5'},
'label': {0: '1.002***',
1: '0.941***',
2: '0.298***',
3: '-0.684***',
4: '-0.137',
5: '0.147',
6: '0.956***',
7: '0.956***',
8: '0.333***',
9: '-0.702***',
10: '-0.177',
11: '0.095'},
'lerr': {0: 0.19788416996400904,
1: 0.19972987383410545,
2: 0.0606849959013587,
3: 0.1772734289533593,
4: 0.1988122854078155,
5: 0.21870366703236832,
6: 0.2734783191688098,
7: 0.2760291042678362,
8: 0.08386739920069491,
9: 0.2449940255063039,
10: 0.27476098595116555,
11: 0.3022511162310027},
'lower': {0: 0.8039888037473053,
1: 0.7411565907082803,
2: 0.23728057391349014,
3: -0.8616787865298547,
4: -0.33570860473471664,
5: -0.07139404699213203,
6: 0.6829929313982618,
7: 0.6800698996199748,
8: 0.24932369010137684,
9: -0.947271908079272,
10: -0.45215267021237865,
11: -0.2073969431824852},
'model': {0: 'OLS',
1: 'OLS',
2: 'OLS',
3: 'OLS',
4: 'OLS',
5: 'OLS',
6: 'QuantReg',
7: 'QuantReg',
8: 'QuantReg',
9: 'QuantReg',
10: 'QuantReg',
11: 'QuantReg'},
'pvalue': {0: 1.4211692095019375e-16,
1: 4.3583690618389965e-15,
2: 6.278403727223468e-16,
3: 1.596372747840846e-11,
4: 0.17483151363955116,
5: 0.18433051296752084,
6: 4.877385844808361e-10,
7: 6.665860891682504e-10,
8: 5.476882838731488e-12,
9: 1.4240852942202845e-07,
10: 0.20303143985022934,
11: 0.5347222575215599},
'uerr': {0: 0.19788416996400904,
1: 0.19972987383410556,
2: 0.06068499590135873,
3: 0.1772734289533593,
4: 0.19881228540781554,
5: 0.21870366703236832,
6: 0.27347831916880994,
7: 0.2760291042678362,
8: 0.08386739920069491,
9: 0.2449940255063039,
10: 0.27476098595116555,
11: 0.3022511162310027},
'upper': {0: 1.1997571436753234,
1: 1.1406163383764913,
2: 0.35865056571620757,
3: -0.5071319286231362,
4: 0.0619159660809144,
5: 0.3660132870726046,
6: 1.2299495697358815,
7: 1.2321281081556472,
8: 0.41705848850276667,
9: -0.4572838570666642,
10: 0.09736930168995245,
11: 0.3971052892795202}})
fig = px.scatter(
df, y='model', x='coef', text='label', color='model',
facet_row='index',
error_x_minus='lerr', error_x='uerr',
hover_data=['coef', 'pvalue', 'lower', 'upper'],
height=600, width=800,
)
fig.update_traces(textposition='top center')
fig.update_yaxes(visible=False)
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig.for_each_trace(lambda t: t.update(textposition='bottom center') if t.name == 'OLS' else ())
fig.show()
All,
I have below Pandas dataframe, and I am trying to filter my dataframe such that my output displays country name along with the year 1989 column whose number is >1000000.For this I am using below code, but it is returning me below error.
{'Country': {0: 'Austria', 1: 'Belgium', 2: 'Denmark', 3: 'Finland', 4: 'France', 5: 'Germany', 6: 'Iceland', 7: 'Ireland', 8: 'Italy', 9: 'Luxemburg', 10: 'Netherland', 11: 'Norway', 12: 'Portugal', 13: 'Spain', 14: 'Sweden', 15: 'Switzerland', 16: 'United Kingdom'}, 'y1989': {0: 7602431, 1: 9927600, 2: 5129800, 3: 4954359, 4: 56269800, 5: 61715000, 6: 253500, 7: 3526600, 8: 57504700, 9: 374900, 10: 14805240, 11: 4226901, 12: 10304700, 13: 38851900, 14: 8458890, 15: 6619973, 16: 57236200}, 'y1990': {0: 7660345.0, 1: 9947800.0, 2: 5135400.0, 3: 4974383.0, 4: 0.0, 5: 62678000.0, 6: 255708.0, 7: 3505500.0, 8: 57576400.0, 9: 379300.0, 10: 14892574.0, 11: 4241473.0, 12: 0.0, 13: 38924500.0, 14: 8527040.0, 15: 6673850.0, 16: 57410600.0}, 'y1991': {0: 7790957, 1: 9987000, 2: 5146500, 3: 4998478, 4: 56893000, 5: 79753000, 6: 259577, 7: 3519000, 8: 57746200, 9: 384400, 10: 15010445, 11: 4261930, 12: 9858500, 13: 38993800, 14: 8590630, 15: 6750693, 16: 57649200}, 'y1992': {0: 7860800, 1: 10068319, 2: 5162100, 3: 5029300, 4: 57217500, 5: 80238000, 6: 262193, 7: 3542000, 8: 57788200, 9: 389800, 10: 15129200, 11: 4273634, 12: 9846000, 13: 39055900, 14: 8644100, 15: 6831900, 16: 58888800}, 'y1993': {0: 7909575, 1: 10100631, 2: 5180614, 3: 5054982, 4: 57529577, 5: 81338000, 6: 264922, 7: 3559985, 8: 57114161, 9: 395200, 10: 15354000, 11: 4324577, 12: 9987500, 13: 39790955, 14: 8700000, 15: 6871500, 16: 58191230}, 'y1994': {0: 7943652, 1: 10130574, 2: 5191000, 3: 5098754, 4: 57847000, 5: 81353000, 6: 266783, 7: 3570700, 8: 57201800, 9: 400000, 10: 15341553, 11: 4348410, 12: 9776000, 13: 39177400, 14: 8749000, 15: 7021200, 16: 58380000}, 'y1995': {0: 8054800, 1: 10143047, 2: 5251027, 3: 5116800, 4: 58265400, 5: 81845000, 6: 267806, 7: 3591200, 8: 57268578, 9: 412800, 10: 15492800, 11: 4370000, 12: 9920800, 13: 39241900, 14: 8837000, 15: 7060400, 16: 58684000}}
My code
df[(df.Country)& (df.y1989>1000000)]
Error:
TypeError: unsupported operand type(s) for &: 'str' and 'bool'
I am not sure what could be the reason, being a newbie to python if you could provide explanation for the error that will be greatly appreciated.
Thanks in advance,
'Country' doesn't form part of your filtering criteria, so don't use it to form your Boolean indexer. Instead, use the loc accessor to give a Boolean condition and specify necessary columns separately:
res = df.loc[df['y1989'] > 1000000, ['Country','y1989']]
Under no circumstances use chained assignment, e.g. via df[df['y1989']>1000000][['Country','y1989']], as this is ambiguous and explicitly discouraged in the docs.