Retrieving empty list in the output instead of values

Retrieving empty list in the output instead of values - python

I have the following code, I want to filter out those values which are less than the threshold value and store it in a variable. But the below code gives an empty list in the output.
current_memory_available=29.373699550744604
total_memory=614480960
lst= {0: 602112, 1: 12852224, 2: 12992768, 3: 3211264, 4: 6717952, 5: 7012864, 6: 1605632, 7: 4391936, 8: 5571584, 9: 5571584, 10: 802816, 11: 6326272, 12: 11044864, 13: 11044864, 14: 401408, 15: 9840640, 16: 9840640, 17: 9840640, 18: 100352, 19: 100352, 20: 411074560, 21: 67141632, 22: 16392000}
passed = {key: (value/total_memory) * 100 for key,value in lst.items() if value <
current_memory_available}
print(passed)
Thanks, Help is highly appreciated

current_memory_available = 29.373699550744604
total_memory=614480960
dct = {0: 602112, 1: 12852224, 2: 12992768, 3: 3211264, 4: 6717952, 5: 7012864, 6: 1605632, 7: 4391936, 8: 5571584, 9: 5571584, 10: 802816, 11: 6326272, 12: 11044864, 13: 11044864, 14: 401408, 15: 9840640, 16: 9840640, 17: 9840640, 18: 100352, 19: 100352, 20: 411074560, 21: 67141632, 22: 16392000}
new_dct = {}
for key, value in dct.items():
calc = (value / total_memory) *100
if calc < current_memory_available:
new_dct[key] = value
print(new_dct)

'if' statement used in dict comprehension will never get True as all the values are greater than 'current_memory_available' variable. Thus the dictionary is getting printed as null.

Below logic is incorrect:
passed = {key: (value/total_memory) * 100 for key,value in lst.items() if value <
current_memory_available}
Try using below:
passed = {key for key,value in lst.items() if (value/total_memory) * 100 < current_memory_available}
Full code:
current_memory_available=29.373699550744604
total_memory=614480960
lst= {0: 602112, 1: 12852224, 2: 12992768, 3: 3211264, 4: 6717952, 5: 7012864, 6: 1605632, 7: 4391936, 8: 5571584, 9: 5571584, 10: 802816, 11: 6326272, 12: 11044864, 13: 11044864, 14: 401408, 15: 9840640, 16: 9840640, 17: 9840640, 18: 100352, 19: 100352, 20: 411074560, 21: 67141632, 22: 16392000}
passed = {key for key,value in lst.items() if (value/total_memory) * 100 < current_memory_available}
print(passed)

Related

A network flow problem need a elegant solution(python)

I am using a networkx package from python and I have a dataframe
(Sample dataframe)
from to count
v0 v1 0.1
v0 v2 0.15
v0 v3 0.15
v0 v4 0.25
v0 v5 0.15
and so on..
Sample picture(weighted direct graph)
That is my dataframe.
{'grad': {0: 'CUHK', 1: 'CUHK', 2: 'CUHK', 3: 'CUHK', 4: 'CUHK', 5: 'CityU', 6: 'CityU', 7: 'CityU', 8: 'CityU', 9: 'HKU', 10: 'HKU', 11: 'HKU', 12: 'HKUST', 13: 'HKUST', 14: 'HKUST', 15: 'HKUST', 16: 'HKUST', 17: 'HKUST', 18: 'Low Frequency', 19: 'Low Frequency', 20: 'Low Frequency', 21: 'Low Frequency', 22: 'Low Frequency', 23: 'Low Frequency', 24: 'PolyU', 25: 'PolyU', 26: 'PolyU', 27: 'PolyU'}, 'to': {0: 'CUHK', 1: 'CityU', 2: 'HKU', 3: 'LingU', 4: 'PolyU', 5: 'CityU', 6: 'HKU', 7: 'LingU', 8: 'PolyU', 9: 'CityU', 10: 'HKU', 11: 'PolyU', 12: 'CUHK', 13: 'CityU', 14: 'HKU', 15: 'HKUST', 16: 'LingU', 17: 'PolyU', 18: 'CUHK', 19: 'CityU', 20: 'HKU', 21: 'HKUST', 22: 'LingU', 23: 'PolyU', 24: 'CityU', 25: 'HKU', 26: 'LingU', 27: 'PolyU'}, 'count': {0: 9, 1: 5, 2: 3, 3: 2, 4: 3, 5: 3, 6: 2, 7: 2, 8: 3, 9: 3, 10: 9, 11: 4, 12: 2, 13: 1, 14: 2, 15: 1, 16: 4, 17: 4, 18: 49, 19: 34, 20: 29, 21: 34, 22: 3, 23: 36, 24: 1, 25: 1, 26: 1, 27: 11}}
The principle of ranking is when Vx -> Vy is bigger than Vy -> Vx, Vx has a higher rank than Vy.
e.g. V0 -> V5 = 0.2 and V5 -> V0 = 0.5 so, V5 have a higher rank
Now I am using the brute force method, which loops and checks all the relationships. When the condition is met, I change their order in a new list. -> {V0,V1,V2,V3,V4,V5,V6,V7}
I want an elegant solution to rank these nodes. Maybe I can get some partial orders like V5>V0 and V0>V1 and use them to form a global order V5>V0>V1, but I don't know how to achieve it. Is there any method better than brute force? Is this related to any famous problem?

One way of doing this would be the following:
import networkx as nx
import pandas as pd
data = {'grad': {0: 'CUHK', 1: 'CUHK', 2: 'CUHK', 3: 'CUHK', 4: 'CUHK', 5: 'CityU', 6: 'CityU', 7: 'CityU', 8: 'CityU', 9: 'HKU', 10: 'HKU', 11: 'HKU', 12: 'HKUST', 13: 'HKUST', 14: 'HKUST', 15: 'HKUST', 16: 'HKUST', 17: 'HKUST', 18: 'Low Frequency', 19: 'Low Frequency', 20: 'Low Frequency', 21: 'Low Frequency', 22: 'Low Frequency', 23: 'Low Frequency', 24: 'PolyU', 25: 'PolyU', 26: 'PolyU', 27: 'PolyU'},
'to': {0: 'CUHK', 1: 'CityU', 2: 'HKU', 3: 'LingU', 4: 'PolyU', 5: 'CityU', 6: 'HKU', 7: 'LingU', 8: 'PolyU', 9: 'CityU', 10: 'HKU', 11: 'PolyU', 12: 'CUHK', 13: 'CityU', 14: 'HKU', 15: 'HKUST', 16: 'LingU', 17: 'PolyU', 18: 'CUHK', 19: 'CityU', 20: 'HKU', 21: 'HKUST', 22: 'LingU', 23: 'PolyU', 24: 'CityU', 25: 'HKU', 26: 'LingU', 27: 'PolyU'},
'count': {0: 9, 1: 5, 2: 3, 3: 2, 4: 3, 5: 3, 6: 2, 7: 2, 8: 3, 9: 3, 10: 9, 11: 4, 12: 2, 13: 1, 14: 2, 15: 1, 16: 4, 17: 4, 18: 49, 19: 34, 20: 29, 21: 34, 22: 3, 23: 36, 24: 1, 25: 1, 26: 1, 27: 11}}
df = pd.DataFrame(data)
G = nx.from_pandas_edgelist(df, 'grad', 'to', edge_attr='count', create_using=nx.DiGraph())
pagerank = nx.pagerank(G, weight='count')
sorted_pagerank = sorted(pagerank.items(), key=lambda x: x[1], reverse=True)
This returns a list of tuples with the node and its PageRank score, sorted in descending order of the PageRank score.
[('PolyU', 0.4113039270586079),
('HKU', 0.1945013448661985),
('CityU', 0.14888513201115303),
('LingU', 0.09978025157613143),
('CUHK', 0.07069262490080512),
('HKUST', 0.041291981078138223),
('Low Frequency', 0.03354473850896578)]
If you want this with the graph:
import matplotlib.pyplot as plt
import networkx as nx
G = nx.from_pandas_edgelist(df, 'grad', 'to', edge_attr='count', create_using=nx.DiGraph())
pagerank = nx.pagerank(G, weight='count')
sorted_pagerank = sorted(pagerank.items(), key=lambda x: x[1], reverse=True)
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_color='skyblue', node_size=[v * 100 for v in pagerank.values()])
labels = nx.get_edge_attributes(G,'count')
nx.draw_networkx_edge_labels(G, pos, edge_labels=labels)
plt.show()

Is there a way of creating boxplots using the exact boxplot values?

I am trying to create boxplots for 24 hours, each hour already having the maxValue, quartile75, mean, quartile25 and minValue. Those values are stored in a dataframe - I put them into a dict.
{'hour': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23},
'minValue': {0: -491.69,
1: -669.49,
2: -551.22,
3: -514.2,
4: -506.94,
5: -665.7,
6: -484.89,
7: -488.99,
8: -524.22,
9: -851.9,
10: -610.0,
11: -998.8,
12: -580.57,
13: -737.22,
14: -895.2,
15: -500.0,
16: -852.0,
17: -610.0,
18: -500.0,
19: -610.0,
20: -1000.0,
21: -674.0,
22: -1005.0,
23: -499.33},
'quartile25': {0: 114.94,
1: 119.29,
2: 128.8,
3: 139.8,
4: 151.48,
5: 146.75,
6: 139.1,
7: 125.02,
8: 110.0,
9: 105.0,
10: 94.9,
11: 92.81,
12: 107.62,
13: 134.5,
14: 150.8,
15: 168.51,
16: 175.71,
17: 163.0,
18: 142.57,
19: 139.3,
20: 139.45,
21: 120.68,
22: 116.89,
23: 112.84},
'median': {0: 188.53,
1: 193.2,
2: 206.6,
3: 222.2,
4: 234.58,
5: 227.68,
6: 218.32,
7: 200.93,
8: 190.92,
9: 182.6,
10: 175.01,
11: 176.87,
12: 192.33,
13: 210.38,
14: 227.0,
15: 243.87,
16: 252.1,
17: 245.45,
18: 226.86,
19: 219.6,
20: 209.09,
21: 192.32,
22: 187.4,
23: 184.94},
'quartile75': {0: 292.1,
1: 295.33,
2: 316.62,
3: 340.8,
4: 357.0,
5: 345.3,
6: 330.4,
7: 305.28,
8: 290.4,
9: 280.1,
10: 268.23,
11: 270.99,
12: 301.84,
13: 321.04,
14: 345.61,
15: 373.84,
16: 393.39,
17: 382.79,
18: 359.89,
19: 341.55,
20: 325.5,
21: 292.1,
22: 287.2,
23: 285.96},
'maxValue': {0: 2420.3,
1: 1450.0,
2: 2852.0,
3: 7300.0,
4: 3967.0,
5: 3412.1,
6: 6999.99,
7: 2999.99,
8: 6000.0,
9: 3000.0,
10: 8885.9,
11: 9999.0,
12: 6254.0,
13: 2300.0,
14: 2057.58,
15: 2860.0,
16: 5000.0,
17: 4151.01,
18: 7000.0,
19: 3000.0,
20: 6000.0,
21: 3000.5,
22: 2000.0,
23: 2500.0}}
When I used a normal time series data set I plotted like this:
N=24
c = ['hsl('+str(h)+',50%'+',50%)' for h in np.linspace(0, 360, N)]
fig = go.Figure(data=[go.Box(
x=hour_dataframes[i]['hour'],
y=hour_dataframes[i]['priceNum'],
marker_color=c[i]
) for i in range(int(N))])
fig.update_layout(
xaxis=dict(showgrid=True, zeroline=True, showticklabels=True),
yaxis=dict(zeroline=True, gridcolor='white'),
paper_bgcolor='rgb(233,233,233)',
plot_bgcolor='rgb(233,233,233)',
autosize=False,
width=1500,
height=1000,
)
fig.show()
It worked fine but the data set became too big and Jupyterlab started crashing, so I pulled aggregated data but now I don't know how to plot multiple boxes (like the code above does) using the exact box plot values.

How to make the loop for output be based on 2 fields

I have a dataframe with the fields: "nome", "acesspoint", "dia", "momento", "latitude" and "longitude". The field "momento" is 15 minute interval.
I need to count the number of users I have in each location according to the "dia" and "momento".
Example: On 03/06/2022 at 08:00 at DCF I have 2 users (bruno and thiago). On 06/03/2022 at 08:15 in DCF I have 2 users. On 06/03/2022 at 08:00 at DCC I have 1 user (Maria).
Output for the above example:
print(weight_list_access)
[-21.22604, -44.97349, 2], [-21.22604, -44.97349, 2], [-21.22780, -44.97850, 1]
#[latitude, longitude, counter]
Dict:
dicionario = {'nome': {0: ' bruno', 1: ' bruno', 2: ' bruno', 3: ' bruno', 4: ' bruno', 5: ' bruno', 6: ' bruno', 7: ' bruno', 8: ' Thiago', 9: ' Thiago', 10: ' Thiago', 11: ' Thiago', 12: ' Thiago', 13: ' Thiago', 14: ' Thiago', 15: ' Thiago', 16: ' Maria', 17: ' Maria', 18: ' Maria', 19: ' Maria', 20: ' Maria', 21: ' Maria', 22: ' Maria', 23: ' Maria', 24: ' Thiago', 25: ' Thiago', 26: ' Thiago', 27: ' Thiago', 28: ' Thiago', 29: ' Thiago', 30: ' Thiago', 31: ' Thiago'}, 'acesspoint': {0: 'DCF', 1: 'DCF', 2: 'DCF', 3: 'DCF', 4: 'DCF', 5: 'DCF', 6: 'DCF', 7: 'DCF', 8: 'DCF', 9: 'DCF', 10: 'DCF', 11: 'DCF', 12: 'DCF', 13: 'DCF', 14: 'DCF', 15: 'DCF', 16: 'DCC', 17: 'DCC', 18: 'DCC', 19: 'DCC', 20: 'DCC', 21: 'DCC', 22: 'DCC', 23: 'DCC', 24: 'DEX', 25: 'DEX', 26: 'DEX', 27: 'DEX', 28: 'DEX', 29: 'DEX', 30: 'DEX', 31: 'DEX'}, 'dia': {0: '03/06/2022', 1: '03/06/2022', 2: '03/06/2022', 3: '03/06/2022', 4: '03/06/2022', 5: '03/06/2022', 6: '03/06/2022', 7: '03/06/2022', 8: '03/06/2022', 9: '03/06/2022', 10: '03/06/2022', 11: '03/06/2022', 12: '03/06/2022', 13: '03/06/2022', 14: '03/06/2022', 15: '03/06/2022', 16: '03/06/2022', 17: '03/06/2022', 18: '03/06/2022', 19: '03/06/2022', 20: '03/06/2022', 21: '03/06/2022', 22: '03/06/2022', 23: '03/06/2022', 24: '04/06/2022', 25: '04/06/2022', 26: '04/06/2022', 27: '04/06/2022', 28: '04/06/2022', 29: '04/06/2022', 30: '04/06/2022', 31: '04/06/2022'}, 'momento': {0: '08:00', 1: '08:30', 2: '08:45', 3: '09:00', 4: '09:15', 5: '09:30', 6: '09:45', 7: '10:00', 8: '08:00', 9: '08:30', 10: '08:45', 11: '09:00', 12: '09:15', 13: '09:30', 14: '09:45', 15: '10:00', 16: '08:00', 17: '08:30', 18: '08:45', 19: '09:00', 20: '09:15', 21: '09:30', 22: '09:45', 23: '10:00', 24: '08:00', 25: '08:30', 26: '08:45', 27: '09:00', 28: '09:15', 29: '09:30', 30: '09:45', 31: '10:00'}, 'Latitude': {0: -21.22604, 1: -21.22604, 2: -21.22604, 3: -21.22604, 4: -21.22604, 5: -21.22604, 6: -21.22604, 7: -21.22604, 8: -21.22604, 9: -21.22604, 10: -21.22604, 11: -21.22604, 12: -21.22604, 13: -21.22604, 14: -21.22604, 15: -21.22604, 16: -21.2278, 17: -21.2278, 18: -21.2278, 19: -21.2278, 20: -21.2278, 21: -21.2278, 22: -21.2278, 23: -21.2278, 24: -21.22707, 25: -21.22707, 26: -21.22707, 27: -21.22707, 28: -21.22707, 29: -21.22707, 30: -21.22707, 31: -21.22707}, 'Longitude': {0: -44.97349, 1: -44.97349, 2: -44.97349, 3: -44.97349, 4: -44.97349, 5: -44.97349, 6: -44.97349, 7: -44.97349, 8: -44.97349, 9: -44.97349, 10: -44.97349, 11: -44.97349, 12: -44.97349, 13: -44.97349, 14: -44.97349, 15: -44.97349, 16: -44.9785, 17: -44.9785, 18: -44.9785, 19: -44.9785, 20: -44.9785, 21: -44.9785, 22: -44.9785, 23: -44.9785, 24: -44.97849, 25: -44.97849, 26: -44.97849, 27: -44.97849, 28: -44.97849, 29: -44.97849, 30: -44.97849, 31: -44.97849}}
I used the following code:
df_acesso = pd.DataFrame(dicionario)
weight_list_access = []
df_acesso['counter'] = 1
for x in df_acesso['dia'].sort_values().unique():
weight_list_access.append(df_acesso.loc[df_acesso['dia'] == x , ['Latitude', "Longitude", "counter"]].groupby(['Latitude',"Longitude"]).sum().reset_index().values.tolist())
With this code you are counting all the connections of the day("dia"), without considering the "momento" field (time interval). I tried to do it using nested for, with the "momento" field. But it did not work.
How is it possible to do?

Does this do what you need?
df = pd.DataFrame(dicionario)
df["coords"] = pd.Series(zip(df["Latitude"], df["Longitude"]))
df.groupby(["dia", "momento"])["coords"].value_counts().rename("count").reset_index()
Result
dia momento coords count
0 03/06/2022 08:00 (-21.22604, -44.97349) 2
1 03/06/2022 08:00 (-21.2278, -44.9785) 1
2 03/06/2022 08:30 (-21.22604, -44.97349) 2
3 03/06/2022 08:30 (-21.2278, -44.9785) 1
4 03/06/2022 08:45 (-21.22604, -44.97349) 2
5 03/06/2022 08:45 (-21.2278, -44.9785) 1
6 03/06/2022 09:00 (-21.22604, -44.97349) 2
...

Is there an efficient/pythonic way to compute a rolling window based on latitude and longitude distance?

I am working on a problem currently which involves moving over a large dataset, in which I am getting measurements that are spread out over distance (each measurement has a given lat/lon). I'm looking for a more pythonic/efficient way to solve this problem than I have currently, since Jupyter Notebook doesn't finish compiling after around 40000 iterations (I need around 300000).
My current solution is currently the following code, where the window size is 100m:
for m in range(6):
co_means = list(dfs[m]['co'])
dates = list(pd.to_datetime(dfs[m]['gps_time']))
dfs[m]['co'] = dfs[m]['co']*1000
R = 6373.0
for i in range (len(co_means)-3):
print (len(co_means))
print (i)
current_list_co2 = []
current_list_co = []
k = i
lat1 = radians(dfs[m]['lat'][i])
lon1 = radians(dfs[m]['lon'][i])
if (dates[k] != dates[-1]):
distance = 0
while(distance<100):
lat2 = radians(dfs[m]['lat'][k])
lon2 = radians(dfs[m]['lon'][k])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
print (distance)
distance = R * c *1000
if distance<100:
current_list_co2.append(dfs[m]['co2d'][k])
current_list_co.append(dfs[m]['co'][k])
k+=1
if (dates[k] == dates[-1]):
break
#only do calculations for those values that aren't in empty lists
print (dates[k])
if len(current_list_co2)!=0 and len(current_list_co) != 0:
results.write("%s" % (dates[i]))
#for ratio of co:co2
a_fit, cov = curve_fit(linear_function, current_list_co2, current_list_co)
y_int = a_fit[0]
slope = a_fit[1]
print (len(current_list_co2))
err_yint = math.sqrt(cov[0][0])
err_slope = math.sqrt(cov[1][1])
#to find r2:
z = stats.linregress(current_list_co2,current_list_co)
r2 = z[2]**2
x_list = list(range(1,len(current_list_co2)+1))
#for linregress parameters for co2 and co individually
a_fit_co2, cov_co2 = curve_fit(linear_function, x_list, current_list_co2)
a_fit_co, cov_co = curve_fit(linear_function, x_list, current_list_co)
Any help would be greatly appreciated!
Edit: Sample of dataset: dict
{'co': {0: 425.07144266999995, 1: 425.06915346999995, 2: 425.06915346999995, 3: 433.21636567, 4: 433.21636567,
5: 433.21803501999995, 6: 433.21803501999995, 7: 411.10666247, 8: 411.10666247, 9: 411.38779539999996,
10: 411.38779539999996, 11: 420.62025938000005, 12: 420.62025938000005, 13: 421.1036325, 14: 421.1036325,
15: 413.96486982000005, 16: 413.96486982000005, 17: 413.44999135, 18: 413.44999135, 19: 408.73726959},
'gps_time': {0: '2019-11-18 14:37:51.000000', 1: '2019-11-18 14:37:51.000000', 2: '2019-11-18 14:37:52.000000',
3: '2019-11-18 14:37:53.000000', 4: '2019-11-18 14:37:54.000000', 5: '2019-11-18 14:37:54.000000',
6: '2019-11-18 14:37:55.000000', 7: '2019-11-18 14:37:56.000000', 8: '2019-11-18 14:37:56.000000',
9: '2019-11-18 14:37:57.000000', 10: '2019-11-18 14:37:57.000000', 11: '2019-11-18 14:37:58.000000',
12: '2019-11-18 14:37:59.000000', 13: '2019-11-18 14:38:00.000000', 14: '2019-11-18 14:38:00.000000',
15: '2019-11-18 14:38:01.000000', 16: '2019-11-18 14:38:02.000000', 17: '2019-11-18 14:38:02.000000',
18: '2019-11-18 14:38:03.000000', 19: '2019-11-18 14:38:04.000000'},
'lat': {0: 45.5052230462, 1: 45.5052230462, 2: 45.5052236012, 3: 45.5052241548, 4: 45.5052247083, 5: 45.5052247083,
6: 45.505224740900005, 7: 45.505224193000004, 8: 45.505224193000004, 9: 45.5052236451,
10: 45.5052236451, 11: 45.5052230897, 12: 45.5052225243, 13: 45.505221958999996, 14: 45.505221958999996,
15: 45.5052211427, 16: 45.505220058, 17: 45.505220058, 18: 45.505218973299996, 19: 45.5052183333},
'lon': {0: -73.5761855743, 1: -73.5761855743, 2: -73.576185, 3: -73.576185, 4: -73.576185, 5: -73.576185,
6: -73.5761855183, 7: -73.5761866141, 8: -73.5761866141, 9: -73.5761877098, 10: -73.5761877098,
11: -73.576188577, 12: -73.5761891424, 13: -73.57618970770001, 14: -73.57618970770001,
15: -73.5761894761, 16: -73.57618839140001, 17: -73.57618839140001, 18: -73.5761873066, 19: -73.5761857775},
'co2d': {0: 380.58647938, 1: 381.44674445, 2: 451.67041972, 3: 451.67041972, 4: 451.66555392, 5: 451.66555392,
6: 456.29788806, 7: 456.29788806, 8: 456.29412627, 9: 456.29412627, 10: 520.61774288, 11: 520.61774288,
12: 520.62904898, 13: 520.62904898, 14: 630.97037738, 15: 630.97037738, 16: 630.9919346, 17: 630.9919346,
18: 512.76133406, 19: 512.76133406}}

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

All,
I have below Pandas dataframe, and I am trying to filter my dataframe such that my output displays country name along with the year 1989 column whose number is >1000000.For this I am using below code, but it is returning me below error.
{'Country': {0: 'Austria', 1: 'Belgium', 2: 'Denmark', 3: 'Finland', 4: 'France', 5: 'Germany', 6: 'Iceland', 7: 'Ireland', 8: 'Italy', 9: 'Luxemburg', 10: 'Netherland', 11: 'Norway', 12: 'Portugal', 13: 'Spain', 14: 'Sweden', 15: 'Switzerland', 16: 'United Kingdom'}, 'y1989': {0: 7602431, 1: 9927600, 2: 5129800, 3: 4954359, 4: 56269800, 5: 61715000, 6: 253500, 7: 3526600, 8: 57504700, 9: 374900, 10: 14805240, 11: 4226901, 12: 10304700, 13: 38851900, 14: 8458890, 15: 6619973, 16: 57236200}, 'y1990': {0: 7660345.0, 1: 9947800.0, 2: 5135400.0, 3: 4974383.0, 4: 0.0, 5: 62678000.0, 6: 255708.0, 7: 3505500.0, 8: 57576400.0, 9: 379300.0, 10: 14892574.0, 11: 4241473.0, 12: 0.0, 13: 38924500.0, 14: 8527040.0, 15: 6673850.0, 16: 57410600.0}, 'y1991': {0: 7790957, 1: 9987000, 2: 5146500, 3: 4998478, 4: 56893000, 5: 79753000, 6: 259577, 7: 3519000, 8: 57746200, 9: 384400, 10: 15010445, 11: 4261930, 12: 9858500, 13: 38993800, 14: 8590630, 15: 6750693, 16: 57649200}, 'y1992': {0: 7860800, 1: 10068319, 2: 5162100, 3: 5029300, 4: 57217500, 5: 80238000, 6: 262193, 7: 3542000, 8: 57788200, 9: 389800, 10: 15129200, 11: 4273634, 12: 9846000, 13: 39055900, 14: 8644100, 15: 6831900, 16: 58888800}, 'y1993': {0: 7909575, 1: 10100631, 2: 5180614, 3: 5054982, 4: 57529577, 5: 81338000, 6: 264922, 7: 3559985, 8: 57114161, 9: 395200, 10: 15354000, 11: 4324577, 12: 9987500, 13: 39790955, 14: 8700000, 15: 6871500, 16: 58191230}, 'y1994': {0: 7943652, 1: 10130574, 2: 5191000, 3: 5098754, 4: 57847000, 5: 81353000, 6: 266783, 7: 3570700, 8: 57201800, 9: 400000, 10: 15341553, 11: 4348410, 12: 9776000, 13: 39177400, 14: 8749000, 15: 7021200, 16: 58380000}, 'y1995': {0: 8054800, 1: 10143047, 2: 5251027, 3: 5116800, 4: 58265400, 5: 81845000, 6: 267806, 7: 3591200, 8: 57268578, 9: 412800, 10: 15492800, 11: 4370000, 12: 9920800, 13: 39241900, 14: 8837000, 15: 7060400, 16: 58684000}}
My code
df[(df.Country)& (df.y1989>1000000)]
Error:
TypeError: unsupported operand type(s) for &: 'str' and 'bool'
I am not sure what could be the reason, being a newbie to python if you could provide explanation for the error that will be greatly appreciated.
Thanks in advance,

'Country' doesn't form part of your filtering criteria, so don't use it to form your Boolean indexer. Instead, use the loc accessor to give a Boolean condition and specify necessary columns separately:
res = df.loc[df['y1989'] > 1000000, ['Country','y1989']]
Under no circumstances use chained assignment, e.g. via df[df['y1989']>1000000][['Country','y1989']], as this is ambiguous and explicitly discouraged in the docs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Retrieving empty list in the output instead of values - python

'if' statement used in dict comprehension will never get True as all the values are greater than 'current_memory_available' variable. Thus the dictionary is getting printed as null.

Related

A network flow problem need a elegant solution(python)

Is there a way of creating boxplots using the exact boxplot values?

How to make the loop for output be based on 2 fields

Is there an efficient/pythonic way to compute a rolling window based on latitude and longitude distance?

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

Categories

Resources