Transform DataFrame into multidimensional TimeSeries? - python

I have the following pandas DataFrame with "periodic" values over the column 'county' as well as repeating values in 'reporting_period' and 'date':
data = pd.DataFrame({'county': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G', 7: 'H', 8: 'I', 9: 'A', 10: 'B', 11: 'C', 12: 'D', 13: 'E', 14: 'F', 15: 'G', 16: 'H', 17: 'I'}, 'new_covid_19_cases_per_100k': {0: 9.89857311398793, 1: 8.96808587445497, 2: 10.4018656786281, 3: 5.44259755461725, 4: 8.47402557487262, 5: 8.23708135804402, 6: 21.1781816000959, 7: 6.34201242466493, 8: 11.9630512616746, 9: 14.0, 10: 16.3, 11: 13.1, 12: 9.3, 13: 11.0, 14: 12.6, 15: 20.9, 16: 8.2, 17: 13.6}, 'new_covid_19_hospitalizations': {0: 0.735745284982339, 1: 0.681120446161137, 2: 1.07219230841243, 3: 0.118317338143853, 4: 0.526882419163064, 5: 0.599666185823225, 6: 1.07095735019448, 7: 0.141985352791006, 8: 0.854503661548189, 9: 0.9, 10: 0.8, 11: 1.5, 12: 0.2, 13: 0.5, 14: 0.8, 15: 0.9, 16: 0.1, 17: 0.7}, 'reporting_period': {0: '10/04/2020 - 10/17/2020', 1: '10/04/2020 - 10/17/2020', 2: '10/04/2020 - 10/17/2020', 3: '10/04/2020 - 10/17/2020', 4: '10/04/2020 - 10/17/2020', 5: '10/04/2020 - 10/17/2020', 6: '10/04/2020 - 10/17/2020', 7: '10/04/2020 - 10/17/2020', 8: '10/04/2020 - 10/17/2020', 9: '10/11/2020 - 10/24/2020', 10: '10/11/2020 - 10/24/2020', 11: '10/11/2020 - 10/24/2020', 12: '10/11/2020 - 10/24/2020', 13: '10/11/2020 - 10/24/2020', 14: '10/11/2020 - 10/24/2020', 15: '10/11/2020 - 10/24/2020', 16: '10/11/2020 - 10/24/2020', 17: '10/11/2020 - 10/24/2020'}, 'date': {0: '2020-10-22T00:00:00', 1: '2020-10-22T00:00:00', 2: '2020-10-22T00:00:00', 3: '2020-10-22T00:00:00', 4: '2020-10-22T00:00:00', 5: '2020-10-22T00:00:00', 6: '2020-10-22T00:00:00', 7: '2020-10-22T00:00:00', 8: '2020-10-22T00:00:00', 9: '2020-10-29T00:00:00', 10: '2020-10-29T00:00:00', 11: '2020-10-29T00:00:00', 12: '2020-10-29T00:00:00', 13: '2020-10-29T00:00:00', 14: '2020-10-29T00:00:00', 15: '2020-10-29T00:00:00', 16: '2020-10-29T00:00:00', 17: '2020-10-29T00:00:00'}})
My goal is to transform this DataFrame into some sort of multidimensional Time Series but I don't know what the best approach is or whether this is even possible.
My first idea was to use groupby and pivot_table but I'm not sure if this useful.

Easiest way to visualize the ts data as multiindex is to set_index.
reporting_period can also be converted to period type but that depends on the requirement.
If we want to apply any aggregation, reduction or any other transformation then we will have to use groupby or pivot.
data['date'] = pd.to_datetime(data.date)
data = data.set_index(['reporting_period', 'date'])
data
Sample Output
data.head(2)
county new_covid_19_cases_per_100k new_covid_19_hospitalizations
reporting_period date
10/04/2020 - 10/17/2020 2020-10-22 A 9.898573 0.735745
2020-10-22 B 8.968086 0.681120

Related

Is there a way of creating boxplots using the exact boxplot values?

I am trying to create boxplots for 24 hours, each hour already having the maxValue, quartile75, mean, quartile25 and minValue. Those values are stored in a dataframe - I put them into a dict.
{'hour': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23},
'minValue': {0: -491.69,
1: -669.49,
2: -551.22,
3: -514.2,
4: -506.94,
5: -665.7,
6: -484.89,
7: -488.99,
8: -524.22,
9: -851.9,
10: -610.0,
11: -998.8,
12: -580.57,
13: -737.22,
14: -895.2,
15: -500.0,
16: -852.0,
17: -610.0,
18: -500.0,
19: -610.0,
20: -1000.0,
21: -674.0,
22: -1005.0,
23: -499.33},
'quartile25': {0: 114.94,
1: 119.29,
2: 128.8,
3: 139.8,
4: 151.48,
5: 146.75,
6: 139.1,
7: 125.02,
8: 110.0,
9: 105.0,
10: 94.9,
11: 92.81,
12: 107.62,
13: 134.5,
14: 150.8,
15: 168.51,
16: 175.71,
17: 163.0,
18: 142.57,
19: 139.3,
20: 139.45,
21: 120.68,
22: 116.89,
23: 112.84},
'median': {0: 188.53,
1: 193.2,
2: 206.6,
3: 222.2,
4: 234.58,
5: 227.68,
6: 218.32,
7: 200.93,
8: 190.92,
9: 182.6,
10: 175.01,
11: 176.87,
12: 192.33,
13: 210.38,
14: 227.0,
15: 243.87,
16: 252.1,
17: 245.45,
18: 226.86,
19: 219.6,
20: 209.09,
21: 192.32,
22: 187.4,
23: 184.94},
'quartile75': {0: 292.1,
1: 295.33,
2: 316.62,
3: 340.8,
4: 357.0,
5: 345.3,
6: 330.4,
7: 305.28,
8: 290.4,
9: 280.1,
10: 268.23,
11: 270.99,
12: 301.84,
13: 321.04,
14: 345.61,
15: 373.84,
16: 393.39,
17: 382.79,
18: 359.89,
19: 341.55,
20: 325.5,
21: 292.1,
22: 287.2,
23: 285.96},
'maxValue': {0: 2420.3,
1: 1450.0,
2: 2852.0,
3: 7300.0,
4: 3967.0,
5: 3412.1,
6: 6999.99,
7: 2999.99,
8: 6000.0,
9: 3000.0,
10: 8885.9,
11: 9999.0,
12: 6254.0,
13: 2300.0,
14: 2057.58,
15: 2860.0,
16: 5000.0,
17: 4151.01,
18: 7000.0,
19: 3000.0,
20: 6000.0,
21: 3000.5,
22: 2000.0,
23: 2500.0}}
When I used a normal time series data set I plotted like this:
N=24
c = ['hsl('+str(h)+',50%'+',50%)' for h in np.linspace(0, 360, N)]
fig = go.Figure(data=[go.Box(
x=hour_dataframes[i]['hour'],
y=hour_dataframes[i]['priceNum'],
marker_color=c[i]
) for i in range(int(N))])
fig.update_layout(
xaxis=dict(showgrid=True, zeroline=True, showticklabels=True),
yaxis=dict(zeroline=True, gridcolor='white'),
paper_bgcolor='rgb(233,233,233)',
plot_bgcolor='rgb(233,233,233)',
autosize=False,
width=1500,
height=1000,
)
fig.show()
It worked fine but the data set became too big and Jupyterlab started crashing, so I pulled aggregated data but now I don't know how to plot multiple boxes (like the code above does) using the exact box plot values.

Python Filter Dataframe with Dynamic arguments

Hi i want to Filter a dataframe from arguments dynamically.
this is my idea now:
tr=pd.read_csv("sales.csv")
def filtr(*arg2):
fltr = tr.loc[(tr[arg2[0]] arg2[1] arg2[2]) arg2[3] ....]
print(fltr)
filtr(*sys.argv[1:])
## python test.py "Unit Cost" "==" 4 & .......
i had the idea of making the (tr[arg2[0]] arg2[1] arg2[2]) as body and iterating it but i don't know how.
edit: Data Example:
{'Region': {0: 'Sub-Saharan Africa', 1: 'Europe', 2: 'Middle East and North Africa', 3: 'Sub-Saharan Africa', 4: 'Europe', 5: 'Sub-Saharan Africa', 6: 'Asia', 7: 'Asia', 8: 'Sub-Saharan Africa', 9: 'Central America and the Caribbean', 10: 'Sub-Saharan Africa', 11: 'Europe', 12: 'Europe', 13: 'Asia', 14: 'Middle East and North Africa', 15: 'Australia and Oceania', 16: 'Central America and the Caribbean', 17: 'Europe', 18: 'Middle East and North Africa', 19: 'Europe'}, 'Country': {0: 'Chad', 1: 'Latvia', 2: 'Pakistan', 3: 'Democratic Republic of the Congo', 4: 'Czech Republic', 5: 'South Africa', 6: 'Laos', 7: 'China', 8: 'Eritrea', 9: 'Haiti', 10: 'Zambia', 11: 'Bosnia and Herzegovina', 12: 'Germany', 13: 'India', 14: 'Algeria', 15: 'Palau', 16: 'Cuba', 17: 'Vatican City', 18: 'Lebanon', 19: 'Lithuania'}, 'Item Type': {0: 'Office Supplies', 1: 'Beverages', 2: 'Vegetables', 3: 'Household', 4: 'Beverages', 5: 'Beverages', 6: 'Vegetables', 7: 'Baby Food', 8: 'Meat', 9: 'Office Supplies', 10: 'Cereal', 11: 'Baby Food', 12: 'Office Supplies', 13: 'Household', 14: 'Clothes', 15: 'Snacks', 16: 'Beverages', 17: 'Beverages', 18: 'Personal Care', 19: 'Snacks'}, 'Sales Channel': {0: 'Online', 1: 'Online', 2: 'Offline', 3: 'Online', 4: 'Online', 5: 'Offline', 6: 'Online', 7: 'Online', 8: 'Online', 9: 'Online', 10: 'Offline', 11: 'Offline', 12: 'Online', 13: 'Online', 14: 'Offline', 15: 'Offline', 16: 'Online', 17: 'Online', 18: 'Offline', 19: 'Offline'}, 'Order Priority': {0: 'L', 1: 'C', 2: 'C', 3: 'C', 4: 'C', 5: 'H', 6: 'L', 7: 'C', 8: 'L', 9: 'C', 10: 'M', 11: 'M', 12: 'C', 13: 'C', 14: 'C', 15: 'L', 16: 'H', 17: 'L', 18: 'H', 19: 'H'}, 'Order Date': {0: '1/27/2011', 1: '12/28/2015', 2: '1/13/2011', 3: '9/11/2012', 4: '10/27/2015', 5: '7/10/2012', 6: '2/20/2011', 7: '4/10/2017', 8: '11/21/2014', 9: '7/4/2015', 10: '7/26/2016', 11: '10/20/2012', 12: '2/22/2015', 13: '8/27/2016', 14: '6/21/2011', 15: '9/19/2013', 16: '11/15/2015', 17: '4/6/2015', 18: '4/12/2010', 19: '9/26/2011'}, 'Order ID': {0: 292494523, 1: 361825549, 2: 141515767, 3: 500364005, 4: 127481591, 5: 482292354, 6: 844532620, 7: 564251220, 8: 411809480, 9: 327881228, 10: 773452794, 11: 479823005, 12: 498603188, 13: 151717174, 14: 181401288, 15: 500204360, 16: 640987718, 17: 206925189, 18: 221503102, 19: 878520286}, 'Ship Date': {0: '2/12/2011', 1: '1/23/2016', 2: '2/1/2011', 3: '10/6/2012', 4: '12/5/2015', 5: '8/21/2012', 6: '3/20/2011', 7: '5/12/2017', 8: '1/10/2015', 9: '7/20/2015', 10: '8/24/2016', 11: '11/15/2012', 12: '2/27/2015', 13: '9/2/2016', 14: '7/21/2011', 15: '10/4/2013', 16: '11/30/2015', 17: '4/27/2015', 18: '5/19/2010', 19: '10/2/2011'}, 'Units Sold': {0: 4484, 1: 1075, 2: 6515, 3: 7683, 4: 3491, 5: 9880, 6: 4825, 7: 3330, 8: 2431, 9: 6197, 10: 724, 11: 9145, 12: 6618, 13: 5338, 14: 9527, 15: 441, 16: 1365, 17: 2617, 18: 6545, 19: 2530}, 'Unit Price': {0: 651.21, 1: 47.45, 2: 154.06, 3: 668.27, 4: 47.45, 5: 47.45, 6: 154.06, 7: 255.28, 8: 421.89, 9: 651.21, 10: 205.7, 11: 255.28, 12: 651.21, 13: 668.27, 14: 109.28, 15: 152.58, 16: 47.45, 17: 47.45, 18: 81.73, 19: 152.58}, 'Unit Cost': {0: 524.96, 1: 31.79, 2: 90.93, 3: 502.54, 4: 31.79, 5: 31.79, 6: 90.93, 7: 159.42, 8: 364.69, 9: 524.96, 10: 117.11, 11: 159.42, 12: 524.96, 13: 502.54, 14: 35.84, 15: 97.44, 16: 31.79, 17: 31.79, 18: 56.67, 19: 97.44}, 'Total Revenue': {0: 2920025.64, 1: 51008.75, 2: 1003700.9, 3: 5134318.41, 4: 165647.95, 5: 468806.0, 6: 743339.5, 7: 850082.4, 8: 1025614.59, 9: 4035548.37, 10: 148926.8, 11: 2334535.6, 12: 4309707.78, 13: 3567225.26, 14: 1041110.56, 15: 67287.78, 16: 64769.25, 17: 124176.65, 18: 534922.85, 19: 386027.4}, 'Total Cost': {0: 2353920.64, 1: 34174.25, 2: 592408.95, 3: 3861014.82, 4: 110978.89, 5: 314085.2, 6: 438737.25, 7: 530868.6, 8: 886561.39, 9: 3253177.12, 10: 84787.64, 11: 1457895.9, 12: 3474185.28, 13: 2682558.52, 14: 341447.68, 15: 42971.04, 16: 43393.35, 17: 83194.43, 18: 370905.15, 19: 246523.2}, 'Total Profit': {0: 566105.0, 1: 16834.5, 2: 411291.95, 3: 1273303.59, 4: 54669.06, 5: 154720.8, 6: 304602.25, 7: 319213.8, 8: 139053.2, 9: 782371.25, 10: 64139.16, 11: 876639.7, 12: 835522.5, 13: 884666.74, 14: 699662.88, 15: 24316.74, 16: 21375.9, 17: 40982.22, 18: 164017.7, 19: 139504.2}}
Just use eval() and here are the code:
import pandas as pd
def filter_df(df, args_list):
constraints = []
for a in args_list:
col = a[0]
symbol = a[1]
value = a[2]
constraint = "(df.{}{}{})".format(col, symbol, value)
constraints.append(constraint)
filter_str = "&".join(constraints)
return df[eval(filter_str)]
data = {
"COL_A": [1,2,3,2,4,6],
"COL_B": [1,10,100,20,20,40],
"COL_C": ["aaa", "bbb", "zzz", "xxx", "xxx", "xxx"]
}
df = pd.DataFrame(data)
args_list = [["COL_A", "<=", "4"], ["COL_C", "==", "'xxx'"]]
df2 = filter_df(df, args_list)
This is df:
After filter COL_A <= 4 & COL_C == 'xxx', this is df2:
How about this ?
def filter(df, **args):
conditions = args["args"]
for key , value in conditions.items():
df = df[df[key] > value]
return df
Invoke using
df = filter(df, args={"Unit Cost": 500, "Unit Price": 500})
Result:
print(df.shape)
(5,14)
Note: This approach can be used only when you want to compare all the conditions using >. if you need to include multiple operation, you may need to find a better approach
def filter_df(arg2):
if arg2[1]==">":
return tr.loc[(tr[arg2[0]] > int(arg2[2]))]
elif arg2[1]=="<":
return tr.loc[(tr[arg2[0]] < int(arg2[2]))]
elif arg2[1]=="=":
return tr.loc[(tr[arg2[0]] == int(arg2[2]))]
else:
raise ValueError("invalid comparison: %s"%arg2[1])
filter_df(arg2)
now if (for example) arg2 = ('Unit Cost', '>', '500'), the function will return only the rows with Unit Cost>500:
If you want to pass multiple condition it is more complicated and my hint is to pass them step-by-step, separately.

How to Use Melt to Tidy Dataframe in Pandas?

dt = {'Ind': {0: 'Ind1',
1: 'Ind2',
2: 'Ind3',
3: 'Ind4',
4: 'Ind5',
5: 'Ind6',
6: 'Ind7',
7: 'Ind8',
8: 'Ind9',
9: 'Ind10',
10: 'Ind1',
11: 'Ind2',
12: 'Ind3',
13: 'Ind4',
14: 'Ind5',
15: 'Ind6',
16: 'Ind7',
17: 'Ind8',
18: 'Ind9',
19: 'Ind10'},
'Treatment': {0: 'Treat',
1: 'Treat',
2: 'Treat',
3: 'Treat',
4: 'Treat',
5: 'Treat',
6: 'Treat',
7: 'Treat',
8: 'Treat',
9: 'Treat',
10: 'Cont',
11: 'Cont',
12: 'Cont',
13: 'Cont',
14: 'Cont',
15: 'Cont',
16: 'Cont',
17: 'Cont',
18: 'Cont',
19: 'Cont'},
'value': {0: 4.5,
1: 8.3,
2: 6.2,
3: 4.2,
4: 7.1,
5: 7.5,
6: 7.9,
7: 5.1,
8: 5.8,
9: 6.0,
10: 11.3,
11: 11.6,
12: 13.3,
13: 12.2,
14: 13.4,
15: 11.7,
16: 12.1,
17: 12.0,
18: 14.0,
19: 13.8}}
mydt = pd.DataFrame(dt, columns = ['Ind', 'Treatment', 'value')
How can I tidy up my dataframe to make it look like?
Desired Output
You can use DataFrame.from_dict
pd.DataFrame.from_dict(data, orient='index')

How to select rows with certain value between 2 columns from another DataFrame in pandas?

For example, I have 2 Frames, the first one is the one I want to select rows from, the second one contains the creteria for selection.
df1 = pd.DataFrame({'chr': {0: 7, 1: 7, 2: 7, 3: 7, 4: 7, 5: 7, 6: 7},
0: {0: 55241686,
1: 55242415,
2: 55248986,
3: 55259412,
4: 55260459,
5: 55266410,
6: 55268009},
1: {0: 55241736,
1: 55242513,
2: 55249171,
3: 55259567,
4: 55260534,
5: 55266556,
6: 55268064}})
df1
df2 = pd.DataFrame({'chr': {0: 7,
1: 7,
2: 7,
3: 7,
4: 7,
5: 7,
6: 7,
7: 7,
8: 7,
9: 7,
10: 7,
11: 7,
12: 7,
13: 7,
14: 7,
15: 7,
16: 7,
17: 7,
18: 7,
19: 7},
's': {0: 55241646,
1: 55241658,
2: 55241690,
3: 55241718,
4: 55241721,
5: 55241722,
6: 55241727,
7: 55241732,
8: 55242454,
9: 55242457,
10: 55242488,
11: 55242511,
12: 55248991,
13: 55248995,
14: 55248995,
15: 55249000,
16: 55249022,
17: 55249036,
18: 55249053,
19: 55249057},
'e': {0: 55241646,
1: 55241658,
2: 55241690,
3: 55241718,
4: 55241721,
5: 55241722,
6: 55241727,
7: 55241732,
8: 55242454,
9: 55242457,
10: 55242488,
11: 55242511,
12: 55248991,
13: 55248995,
14: 55248995,
15: 55249000,
16: 55249022,
17: 55249036,
18: 55249053,
19: 55249057},
'ref': {0: 'T',
1: 'T',
2: 'A',
3: 'G',
4: 'C',
5: 'G',
6: 'G',
7: 'A',
8: 'G',
9: 'G',
10: 'C',
11: 'G',
12: 'C',
13: 'G',
14: 'G',
15: 'G',
16: 'G',
17: 'G',
18: 'C',
19: 'C'},
'alt': {0: 'C',
1: 'G',
2: 'C',
3: 'A',
4: 'T',
5: 'A',
6: 'A',
7: 'G',
8: 'A',
9: 'A',
10: 'T',
11: 'A',
12: 'G',
13: 'A',
14: 'C',
15: 'A',
16: 'C',
17: 'A',
18: 'G',
19: 'T'}})
df2 here only shows a small part.
df2
what I want to achieve is
for each row in df1, if this row(row_df1) match with certain row in df2 (row_df2) (match means, row_df1['chr']==row_df2['chr'] & row_df1[0] >= row_df2['s'] & row_df11 <= row_df2['e']
in brief,
if the value is fall into certain intervals constructed by df2['s'] and df2['e'], return it.
I believe best case scenario for you is to merge both dataframes first using a common column. In your case "chr". For example as I understand you want all 'chr' from df1 which exist df2, so in that case you just do:
merged_df = df1.merge(df2, on='chr', how='left')
In merge you can use "indicator=True" which will create a new column called "_merge" for you which will indicate the source of each row.
Now when you have your data merged on you can make simple condition statements to get all the needed columns like:
merged_df.loc[(merged_df[0] >= merged_df['s']) & (merged_df[1] >= merged_df ['e'])]
Or you could add a new column as a result, using apply and etc.

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

All,
I have below Pandas dataframe, and I am trying to filter my dataframe such that my output displays country name along with the year 1989 column whose number is >1000000.For this I am using below code, but it is returning me below error.
{'Country': {0: 'Austria', 1: 'Belgium', 2: 'Denmark', 3: 'Finland', 4: 'France', 5: 'Germany', 6: 'Iceland', 7: 'Ireland', 8: 'Italy', 9: 'Luxemburg', 10: 'Netherland', 11: 'Norway', 12: 'Portugal', 13: 'Spain', 14: 'Sweden', 15: 'Switzerland', 16: 'United Kingdom'}, 'y1989': {0: 7602431, 1: 9927600, 2: 5129800, 3: 4954359, 4: 56269800, 5: 61715000, 6: 253500, 7: 3526600, 8: 57504700, 9: 374900, 10: 14805240, 11: 4226901, 12: 10304700, 13: 38851900, 14: 8458890, 15: 6619973, 16: 57236200}, 'y1990': {0: 7660345.0, 1: 9947800.0, 2: 5135400.0, 3: 4974383.0, 4: 0.0, 5: 62678000.0, 6: 255708.0, 7: 3505500.0, 8: 57576400.0, 9: 379300.0, 10: 14892574.0, 11: 4241473.0, 12: 0.0, 13: 38924500.0, 14: 8527040.0, 15: 6673850.0, 16: 57410600.0}, 'y1991': {0: 7790957, 1: 9987000, 2: 5146500, 3: 4998478, 4: 56893000, 5: 79753000, 6: 259577, 7: 3519000, 8: 57746200, 9: 384400, 10: 15010445, 11: 4261930, 12: 9858500, 13: 38993800, 14: 8590630, 15: 6750693, 16: 57649200}, 'y1992': {0: 7860800, 1: 10068319, 2: 5162100, 3: 5029300, 4: 57217500, 5: 80238000, 6: 262193, 7: 3542000, 8: 57788200, 9: 389800, 10: 15129200, 11: 4273634, 12: 9846000, 13: 39055900, 14: 8644100, 15: 6831900, 16: 58888800}, 'y1993': {0: 7909575, 1: 10100631, 2: 5180614, 3: 5054982, 4: 57529577, 5: 81338000, 6: 264922, 7: 3559985, 8: 57114161, 9: 395200, 10: 15354000, 11: 4324577, 12: 9987500, 13: 39790955, 14: 8700000, 15: 6871500, 16: 58191230}, 'y1994': {0: 7943652, 1: 10130574, 2: 5191000, 3: 5098754, 4: 57847000, 5: 81353000, 6: 266783, 7: 3570700, 8: 57201800, 9: 400000, 10: 15341553, 11: 4348410, 12: 9776000, 13: 39177400, 14: 8749000, 15: 7021200, 16: 58380000}, 'y1995': {0: 8054800, 1: 10143047, 2: 5251027, 3: 5116800, 4: 58265400, 5: 81845000, 6: 267806, 7: 3591200, 8: 57268578, 9: 412800, 10: 15492800, 11: 4370000, 12: 9920800, 13: 39241900, 14: 8837000, 15: 7060400, 16: 58684000}}
My code
df[(df.Country)& (df.y1989>1000000)]
Error:
TypeError: unsupported operand type(s) for &: 'str' and 'bool'
I am not sure what could be the reason, being a newbie to python if you could provide explanation for the error that will be greatly appreciated.
Thanks in advance,
'Country' doesn't form part of your filtering criteria, so don't use it to form your Boolean indexer. Instead, use the loc accessor to give a Boolean condition and specify necessary columns separately:
res = df.loc[df['y1989'] > 1000000, ['Country','y1989']]
Under no circumstances use chained assignment, e.g. via df[df['y1989']>1000000][['Country','y1989']], as this is ambiguous and explicitly discouraged in the docs.

Categories