I am using a networkx package from python and I have a dataframe
(Sample dataframe)
from to count
v0 v1 0.1
v0 v2 0.15
v0 v3 0.15
v0 v4 0.25
v0 v5 0.15
and so on..
Sample picture(weighted direct graph)
That is my dataframe.
{'grad': {0: 'CUHK', 1: 'CUHK', 2: 'CUHK', 3: 'CUHK', 4: 'CUHK', 5: 'CityU', 6: 'CityU', 7: 'CityU', 8: 'CityU', 9: 'HKU', 10: 'HKU', 11: 'HKU', 12: 'HKUST', 13: 'HKUST', 14: 'HKUST', 15: 'HKUST', 16: 'HKUST', 17: 'HKUST', 18: 'Low Frequency', 19: 'Low Frequency', 20: 'Low Frequency', 21: 'Low Frequency', 22: 'Low Frequency', 23: 'Low Frequency', 24: 'PolyU', 25: 'PolyU', 26: 'PolyU', 27: 'PolyU'}, 'to': {0: 'CUHK', 1: 'CityU', 2: 'HKU', 3: 'LingU', 4: 'PolyU', 5: 'CityU', 6: 'HKU', 7: 'LingU', 8: 'PolyU', 9: 'CityU', 10: 'HKU', 11: 'PolyU', 12: 'CUHK', 13: 'CityU', 14: 'HKU', 15: 'HKUST', 16: 'LingU', 17: 'PolyU', 18: 'CUHK', 19: 'CityU', 20: 'HKU', 21: 'HKUST', 22: 'LingU', 23: 'PolyU', 24: 'CityU', 25: 'HKU', 26: 'LingU', 27: 'PolyU'}, 'count': {0: 9, 1: 5, 2: 3, 3: 2, 4: 3, 5: 3, 6: 2, 7: 2, 8: 3, 9: 3, 10: 9, 11: 4, 12: 2, 13: 1, 14: 2, 15: 1, 16: 4, 17: 4, 18: 49, 19: 34, 20: 29, 21: 34, 22: 3, 23: 36, 24: 1, 25: 1, 26: 1, 27: 11}}
The principle of ranking is when Vx -> Vy is bigger than Vy -> Vx, Vx has a higher rank than Vy.
e.g. V0 -> V5 = 0.2 and V5 -> V0 = 0.5 so, V5 have a higher rank
Now I am using the brute force method, which loops and checks all the relationships. When the condition is met, I change their order in a new list. -> {V0,V1,V2,V3,V4,V5,V6,V7}
I want an elegant solution to rank these nodes. Maybe I can get some partial orders like V5>V0 and V0>V1 and use them to form a global order V5>V0>V1, but I don't know how to achieve it. Is there any method better than brute force? Is this related to any famous problem?
One way of doing this would be the following:
import networkx as nx
import pandas as pd
data = {'grad': {0: 'CUHK', 1: 'CUHK', 2: 'CUHK', 3: 'CUHK', 4: 'CUHK', 5: 'CityU', 6: 'CityU', 7: 'CityU', 8: 'CityU', 9: 'HKU', 10: 'HKU', 11: 'HKU', 12: 'HKUST', 13: 'HKUST', 14: 'HKUST', 15: 'HKUST', 16: 'HKUST', 17: 'HKUST', 18: 'Low Frequency', 19: 'Low Frequency', 20: 'Low Frequency', 21: 'Low Frequency', 22: 'Low Frequency', 23: 'Low Frequency', 24: 'PolyU', 25: 'PolyU', 26: 'PolyU', 27: 'PolyU'},
'to': {0: 'CUHK', 1: 'CityU', 2: 'HKU', 3: 'LingU', 4: 'PolyU', 5: 'CityU', 6: 'HKU', 7: 'LingU', 8: 'PolyU', 9: 'CityU', 10: 'HKU', 11: 'PolyU', 12: 'CUHK', 13: 'CityU', 14: 'HKU', 15: 'HKUST', 16: 'LingU', 17: 'PolyU', 18: 'CUHK', 19: 'CityU', 20: 'HKU', 21: 'HKUST', 22: 'LingU', 23: 'PolyU', 24: 'CityU', 25: 'HKU', 26: 'LingU', 27: 'PolyU'},
'count': {0: 9, 1: 5, 2: 3, 3: 2, 4: 3, 5: 3, 6: 2, 7: 2, 8: 3, 9: 3, 10: 9, 11: 4, 12: 2, 13: 1, 14: 2, 15: 1, 16: 4, 17: 4, 18: 49, 19: 34, 20: 29, 21: 34, 22: 3, 23: 36, 24: 1, 25: 1, 26: 1, 27: 11}}
df = pd.DataFrame(data)
G = nx.from_pandas_edgelist(df, 'grad', 'to', edge_attr='count', create_using=nx.DiGraph())
pagerank = nx.pagerank(G, weight='count')
sorted_pagerank = sorted(pagerank.items(), key=lambda x: x[1], reverse=True)
This returns a list of tuples with the node and its PageRank score, sorted in descending order of the PageRank score.
[('PolyU', 0.4113039270586079),
('HKU', 0.1945013448661985),
('CityU', 0.14888513201115303),
('LingU', 0.09978025157613143),
('CUHK', 0.07069262490080512),
('HKUST', 0.041291981078138223),
('Low Frequency', 0.03354473850896578)]
If you want this with the graph:
import matplotlib.pyplot as plt
import networkx as nx
G = nx.from_pandas_edgelist(df, 'grad', 'to', edge_attr='count', create_using=nx.DiGraph())
pagerank = nx.pagerank(G, weight='count')
sorted_pagerank = sorted(pagerank.items(), key=lambda x: x[1], reverse=True)
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_color='skyblue', node_size=[v * 100 for v in pagerank.values()])
labels = nx.get_edge_attributes(G,'count')
nx.draw_networkx_edge_labels(G, pos, edge_labels=labels)
plt.show()
Related
I am trying to create boxplots for 24 hours, each hour already having the maxValue, quartile75, mean, quartile25 and minValue. Those values are stored in a dataframe - I put them into a dict.
{'hour': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23},
'minValue': {0: -491.69,
1: -669.49,
2: -551.22,
3: -514.2,
4: -506.94,
5: -665.7,
6: -484.89,
7: -488.99,
8: -524.22,
9: -851.9,
10: -610.0,
11: -998.8,
12: -580.57,
13: -737.22,
14: -895.2,
15: -500.0,
16: -852.0,
17: -610.0,
18: -500.0,
19: -610.0,
20: -1000.0,
21: -674.0,
22: -1005.0,
23: -499.33},
'quartile25': {0: 114.94,
1: 119.29,
2: 128.8,
3: 139.8,
4: 151.48,
5: 146.75,
6: 139.1,
7: 125.02,
8: 110.0,
9: 105.0,
10: 94.9,
11: 92.81,
12: 107.62,
13: 134.5,
14: 150.8,
15: 168.51,
16: 175.71,
17: 163.0,
18: 142.57,
19: 139.3,
20: 139.45,
21: 120.68,
22: 116.89,
23: 112.84},
'median': {0: 188.53,
1: 193.2,
2: 206.6,
3: 222.2,
4: 234.58,
5: 227.68,
6: 218.32,
7: 200.93,
8: 190.92,
9: 182.6,
10: 175.01,
11: 176.87,
12: 192.33,
13: 210.38,
14: 227.0,
15: 243.87,
16: 252.1,
17: 245.45,
18: 226.86,
19: 219.6,
20: 209.09,
21: 192.32,
22: 187.4,
23: 184.94},
'quartile75': {0: 292.1,
1: 295.33,
2: 316.62,
3: 340.8,
4: 357.0,
5: 345.3,
6: 330.4,
7: 305.28,
8: 290.4,
9: 280.1,
10: 268.23,
11: 270.99,
12: 301.84,
13: 321.04,
14: 345.61,
15: 373.84,
16: 393.39,
17: 382.79,
18: 359.89,
19: 341.55,
20: 325.5,
21: 292.1,
22: 287.2,
23: 285.96},
'maxValue': {0: 2420.3,
1: 1450.0,
2: 2852.0,
3: 7300.0,
4: 3967.0,
5: 3412.1,
6: 6999.99,
7: 2999.99,
8: 6000.0,
9: 3000.0,
10: 8885.9,
11: 9999.0,
12: 6254.0,
13: 2300.0,
14: 2057.58,
15: 2860.0,
16: 5000.0,
17: 4151.01,
18: 7000.0,
19: 3000.0,
20: 6000.0,
21: 3000.5,
22: 2000.0,
23: 2500.0}}
When I used a normal time series data set I plotted like this:
N=24
c = ['hsl('+str(h)+',50%'+',50%)' for h in np.linspace(0, 360, N)]
fig = go.Figure(data=[go.Box(
x=hour_dataframes[i]['hour'],
y=hour_dataframes[i]['priceNum'],
marker_color=c[i]
) for i in range(int(N))])
fig.update_layout(
xaxis=dict(showgrid=True, zeroline=True, showticklabels=True),
yaxis=dict(zeroline=True, gridcolor='white'),
paper_bgcolor='rgb(233,233,233)',
plot_bgcolor='rgb(233,233,233)',
autosize=False,
width=1500,
height=1000,
)
fig.show()
It worked fine but the data set became too big and Jupyterlab started crashing, so I pulled aggregated data but now I don't know how to plot multiple boxes (like the code above does) using the exact box plot values.
I have a dataframe with the fields: "nome", "acesspoint", "dia", "momento", "latitude" and "longitude". The field "momento" is 15 minute interval.
I need to count the number of users I have in each location according to the "dia" and "momento".
Example: On 03/06/2022 at 08:00 at DCF I have 2 users (bruno and thiago). On 06/03/2022 at 08:15 in DCF I have 2 users. On 06/03/2022 at 08:00 at DCC I have 1 user (Maria).
Output for the above example:
print(weight_list_access)
[-21.22604, -44.97349, 2], [-21.22604, -44.97349, 2], [-21.22780, -44.97850, 1]
#[latitude, longitude, counter]
Dict:
dicionario = {'nome': {0: ' bruno', 1: ' bruno', 2: ' bruno', 3: ' bruno', 4: ' bruno', 5: ' bruno', 6: ' bruno', 7: ' bruno', 8: ' Thiago', 9: ' Thiago', 10: ' Thiago', 11: ' Thiago', 12: ' Thiago', 13: ' Thiago', 14: ' Thiago', 15: ' Thiago', 16: ' Maria', 17: ' Maria', 18: ' Maria', 19: ' Maria', 20: ' Maria', 21: ' Maria', 22: ' Maria', 23: ' Maria', 24: ' Thiago', 25: ' Thiago', 26: ' Thiago', 27: ' Thiago', 28: ' Thiago', 29: ' Thiago', 30: ' Thiago', 31: ' Thiago'}, 'acesspoint': {0: 'DCF', 1: 'DCF', 2: 'DCF', 3: 'DCF', 4: 'DCF', 5: 'DCF', 6: 'DCF', 7: 'DCF', 8: 'DCF', 9: 'DCF', 10: 'DCF', 11: 'DCF', 12: 'DCF', 13: 'DCF', 14: 'DCF', 15: 'DCF', 16: 'DCC', 17: 'DCC', 18: 'DCC', 19: 'DCC', 20: 'DCC', 21: 'DCC', 22: 'DCC', 23: 'DCC', 24: 'DEX', 25: 'DEX', 26: 'DEX', 27: 'DEX', 28: 'DEX', 29: 'DEX', 30: 'DEX', 31: 'DEX'}, 'dia': {0: '03/06/2022', 1: '03/06/2022', 2: '03/06/2022', 3: '03/06/2022', 4: '03/06/2022', 5: '03/06/2022', 6: '03/06/2022', 7: '03/06/2022', 8: '03/06/2022', 9: '03/06/2022', 10: '03/06/2022', 11: '03/06/2022', 12: '03/06/2022', 13: '03/06/2022', 14: '03/06/2022', 15: '03/06/2022', 16: '03/06/2022', 17: '03/06/2022', 18: '03/06/2022', 19: '03/06/2022', 20: '03/06/2022', 21: '03/06/2022', 22: '03/06/2022', 23: '03/06/2022', 24: '04/06/2022', 25: '04/06/2022', 26: '04/06/2022', 27: '04/06/2022', 28: '04/06/2022', 29: '04/06/2022', 30: '04/06/2022', 31: '04/06/2022'}, 'momento': {0: '08:00', 1: '08:30', 2: '08:45', 3: '09:00', 4: '09:15', 5: '09:30', 6: '09:45', 7: '10:00', 8: '08:00', 9: '08:30', 10: '08:45', 11: '09:00', 12: '09:15', 13: '09:30', 14: '09:45', 15: '10:00', 16: '08:00', 17: '08:30', 18: '08:45', 19: '09:00', 20: '09:15', 21: '09:30', 22: '09:45', 23: '10:00', 24: '08:00', 25: '08:30', 26: '08:45', 27: '09:00', 28: '09:15', 29: '09:30', 30: '09:45', 31: '10:00'}, 'Latitude': {0: -21.22604, 1: -21.22604, 2: -21.22604, 3: -21.22604, 4: -21.22604, 5: -21.22604, 6: -21.22604, 7: -21.22604, 8: -21.22604, 9: -21.22604, 10: -21.22604, 11: -21.22604, 12: -21.22604, 13: -21.22604, 14: -21.22604, 15: -21.22604, 16: -21.2278, 17: -21.2278, 18: -21.2278, 19: -21.2278, 20: -21.2278, 21: -21.2278, 22: -21.2278, 23: -21.2278, 24: -21.22707, 25: -21.22707, 26: -21.22707, 27: -21.22707, 28: -21.22707, 29: -21.22707, 30: -21.22707, 31: -21.22707}, 'Longitude': {0: -44.97349, 1: -44.97349, 2: -44.97349, 3: -44.97349, 4: -44.97349, 5: -44.97349, 6: -44.97349, 7: -44.97349, 8: -44.97349, 9: -44.97349, 10: -44.97349, 11: -44.97349, 12: -44.97349, 13: -44.97349, 14: -44.97349, 15: -44.97349, 16: -44.9785, 17: -44.9785, 18: -44.9785, 19: -44.9785, 20: -44.9785, 21: -44.9785, 22: -44.9785, 23: -44.9785, 24: -44.97849, 25: -44.97849, 26: -44.97849, 27: -44.97849, 28: -44.97849, 29: -44.97849, 30: -44.97849, 31: -44.97849}}
I used the following code:
df_acesso = pd.DataFrame(dicionario)
weight_list_access = []
df_acesso['counter'] = 1
for x in df_acesso['dia'].sort_values().unique():
weight_list_access.append(df_acesso.loc[df_acesso['dia'] == x , ['Latitude', "Longitude", "counter"]].groupby(['Latitude',"Longitude"]).sum().reset_index().values.tolist())
With this code you are counting all the connections of the day("dia"), without considering the "momento" field (time interval). I tried to do it using nested for, with the "momento" field. But it did not work.
How is it possible to do?
Does this do what you need?
df = pd.DataFrame(dicionario)
df["coords"] = pd.Series(zip(df["Latitude"], df["Longitude"]))
df.groupby(["dia", "momento"])["coords"].value_counts().rename("count").reset_index()
Result
dia momento coords count
0 03/06/2022 08:00 (-21.22604, -44.97349) 2
1 03/06/2022 08:00 (-21.2278, -44.9785) 1
2 03/06/2022 08:30 (-21.22604, -44.97349) 2
3 03/06/2022 08:30 (-21.2278, -44.9785) 1
4 03/06/2022 08:45 (-21.22604, -44.97349) 2
5 03/06/2022 08:45 (-21.2278, -44.9785) 1
6 03/06/2022 09:00 (-21.22604, -44.97349) 2
...
Here is the dataframe I'm working with in python. I'm including the dataframe here with this line of code:
print(mtcars.to_dict())
{'Unnamed: 0': {0: 'Mazda RX4', 1: 'Mazda RX4 Wag', 2: 'Datsun 710', 3: 'Hornet 4 Drive', 4: 'Hornet Sportabout', 5: 'Valiant', 6: 'Duster 360', 7: 'Merc 240D', 8: 'Merc 230', 9: 'Merc 280', 10: 'Merc 280C', 11: 'Merc 450SE', 12: 'Merc 450SL', 13: 'Merc 450SLC', 14: 'Cadillac Fleetwood', 15: 'Lincoln Continental', 16: 'Chrysler Imperial', 17: 'Fiat 128', 18: 'Honda Civic', 19: 'Toyota Corolla', 20: 'Toyota Corona', 21: 'Dodge Challenger', 22: 'AMC Javelin', 23: 'Camaro Z28', 24: 'Pontiac Firebird', 25: 'Fiat X1-9', 26: 'Porsche 914-2', 27: 'Lotus Europa', 28: 'Ford Pantera L', 29: 'Ferrari Dino', 30: 'Maserati Bora', 31: 'Volvo 142E'}, 'mpg': {0: 21.0, 1: 21.0, 2: 22.8, 3: 21.4, 4: 18.7, 5: 18.1, 6: 14.3, 7: 24.4, 8: 22.8, 9: 19.2, 10: 17.8, 11: 16.4, 12: 17.3, 13: 15.2, 14: 10.4, 15: 10.4, 16: 14.7, 17: 32.4, 18: 30.4, 19: 33.9, 20: 21.5, 21: 15.5, 22: 15.2, 23: 13.3, 24: 19.2, 25: 27.3, 26: 26.0, 27: 30.4, 28: 15.8, 29: 19.7, 30: 15.0, 31: 21.4}, 'cyl': {0: 6, 1: 6, 2: 4, 3: 6, 4: 8, 5: 6, 6: 8, 7: 4, 8: 4, 9: 6, 10: 6, 11: 8, 12: 8, 13: 8, 14: 8, 15: 8, 16: 8, 17: 4, 18: 4, 19: 4, 20: 4, 21: 8, 22: 8, 23: 8, 24: 8, 25: 4, 26: 4, 27: 4, 28: 8, 29: 6, 30: 8, 31: 4}, 'disp': {0: 160.0, 1: 160.0, 2: 108.0, 3: 258.0, 4: 360.0, 5: 225.0, 6: 360.0, 7: 146.7, 8: 140.8, 9: 167.6, 10: 167.6, 11: 275.8, 12: 275.8, 13: 275.8, 14: 472.0, 15: 460.0, 16: 440.0, 17: 78.7, 18: 75.7, 19: 71.1, 20: 120.1, 21: 318.0, 22: 304.0, 23: 350.0, 24: 400.0, 25: 79.0, 26: 120.3, 27: 95.1, 28: 351.0, 29: 145.0, 30: 301.0, 31: 121.0}, 'hp': {0: 110, 1: 110, 2: 93, 3: 110, 4: 175, 5: 105, 6: 245, 7: 62, 8: 95, 9: 123, 10: 123, 11: 180, 12: 180, 13: 180, 14: 205, 15: 215, 16: 230, 17: 66, 18: 52, 19: 65, 20: 97, 21: 150, 22: 150, 23: 245, 24: 175, 25: 66, 26: 91, 27: 113, 28: 264, 29: 175, 30: 335, 31: 109}, 'drat': {0: 3.9, 1: 3.9, 2: 3.85, 3: 3.08, 4: 3.15, 5: 2.76, 6: 3.21, 7: 3.69, 8: 3.92, 9: 3.92, 10: 3.92, 11: 3.07, 12: 3.07, 13: 3.07, 14: 2.93, 15: 3.0, 16: 3.23, 17: 4.08, 18: 4.93, 19: 4.22, 20: 3.7, 21: 2.76, 22: 3.15, 23: 3.73, 24: 3.08, 25: 4.08, 26: 4.43, 27: 3.77, 28: 4.22, 29: 3.62, 30: 3.54, 31: 4.11}, 'wt': {0: 2.62, 1: 2.875, 2: 2.32, 3: 3.215, 4: 3.44, 5: 3.46, 6: 3.57, 7: 3.19, 8: 3.15, 9: 3.44, 10: 3.44, 11: 4.07, 12: 3.73, 13: 3.78, 14: 5.25, 15: 5.424, 16: 5.345, 17: 2.2, 18: 1.615, 19: 1.835, 20: 2.465, 21: 3.52, 22: 3.435, 23: 3.84, 24: 3.845, 25: 1.935, 26: 2.14, 27: 1.513, 28: 3.17, 29: 2.77, 30: 3.57, 31: 2.78}, 'qsec': {0: 16.46, 1: 17.02, 2: 18.61, 3: 19.44, 4: 17.02, 5: 20.22, 6: 15.84, 7: 20.0, 8: 22.9, 9: 18.3, 10: 18.9, 11: 17.4, 12: 17.6, 13: 18.0, 14: 17.98, 15: 17.82, 16: 17.42, 17: 19.47, 18: 18.52, 19: 19.9, 20: 20.01, 21: 16.87, 22: 17.3, 23: 15.41, 24: 17.05, 25: 18.9, 26: 16.7, 27: 16.9, 28: 14.5, 29: 15.5, 30: 14.6, 31: 18.6}, 'vs': {0: 0, 1: 0, 2: 1, 3: 1, 4: 0, 5: 1, 6: 0, 7: 1, 8: 1, 9: 1, 10: 1, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 1, 18: 1, 19: 1, 20: 1, 21: 0, 22: 0, 23: 0, 24: 0, 25: 1, 26: 0, 27: 1, 28: 0, 29: 0, 30: 0, 31: 1}, 'am': {0: 1, 1: 1, 2: 1, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 1, 18: 1, 19: 1, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 1, 26: 1, 27: 1, 28: 1, 29: 1, 30: 1, 31: 1}, 'gear': {0: 4, 1: 4, 2: 4, 3: 3, 4: 3, 5: 3, 6: 3, 7: 4, 8: 4, 9: 4, 10: 4, 11: 3, 12: 3, 13: 3, 14: 3, 15: 3, 16: 3, 17: 4, 18: 4, 19: 4, 20: 3, 21: 3, 22: 3, 23: 3, 24: 3, 25: 4, 26: 5, 27: 5, 28: 5, 29: 5, 30: 5, 31: 4}, 'carb': {0: 4, 1: 4, 2: 1, 3: 1, 4: 2, 5: 1, 6: 4, 7: 2, 8: 2, 9: 4, 10: 4, 11: 3, 12: 3, 13: 3, 14: 4, 15: 4, 16: 4, 17: 1, 18: 2, 19: 1, 20: 1, 21: 2, 22: 2, 23: 4, 24: 2, 25: 1, 26: 2, 27: 2, 28: 4, 29: 6, 30: 8, 31: 2}}
This SO post was helpful in learning how to print the python dataframe like R does with the dput() function.
Now I import seaborn and create a histogram.
import seaborn as seaborn
seaborn.histplot(data=mtcars, x="mpg", bins = 30)
plt.suptitle("Mtcars", loc = 'left')
plt.title("histogram", loc = 'left')
plt.show()
This doesn't work as the title disappears.
So I clear out whatever is happening with the graphs and try again.
plt.figure().clear()
plt.close()
plt.cla()
plt.clf()
seaborn.histplot(data=mtcars, x="mpg", bins = 30)
plt.suptitle("Mtcars", horizontalalignment = 'left')
plt.title("histogram", loc = 'left')
plt.show()
But this doesn't work either. This time, the title is there but the alignment is wrong.
I'd like to put both the title and the subtitle on the left side.
Here is the dataframe that I'm working with in python.
{'Unnamed: 0': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14, 14: 15, 15: 16, 16: 17, 17: 18, 18: 19, 19: 20, 20: 21, 21: 22, 22: 23, 23: 24, 24: 25, 25: 26, 26: 27, 27: 28, 28: 29, 29: 30, 30: 31, 31: 32}, 'car': {0: 'Mazda RX4', 1: 'Mazda RX4 Wag', 2: 'Datsun 710', 3: 'Hornet 4 Drive', 4: 'Hornet Sportabout', 5: 'Valiant', 6: 'Duster 360', 7: 'Merc 240D', 8: 'Merc 230', 9: 'Merc 280', 10: 'Merc 280C', 11: 'Merc 450SE', 12: 'Merc 450SL', 13: 'Merc 450SLC', 14: 'Cadillac Fleetwood', 15: 'Lincoln Continental', 16: 'Chrysler Imperial', 17: 'Fiat 128', 18: 'Honda Civic', 19: 'Toyota Corolla', 20: 'Toyota Corona', 21: 'Dodge Challenger', 22: 'AMC Javelin', 23: 'Camaro Z28', 24: 'Pontiac Firebird', 25: 'Fiat X1-9', 26: 'Porsche 914-2', 27: 'Lotus Europa', 28: 'Ford Pantera L', 29: 'Ferrari Dino', 30: 'Maserati Bora', 31: 'Volvo 142E'}, 'mpg': {0: 21.0, 1: 21.0, 2: 22.8, 3: 21.4, 4: 18.7, 5: 18.1, 6: 14.3, 7: 24.4, 8: 22.8, 9: 19.2, 10: 17.8, 11: 16.4, 12: 17.3, 13: 15.2, 14: 10.4, 15: 10.4, 16: 14.7, 17: 32.4, 18: 30.4, 19: 33.9, 20: 21.5, 21: 15.5, 22: 15.2, 23: 13.3, 24: 19.2, 25: 27.3, 26: 26.0, 27: 30.4, 28: 15.8, 29: 19.7, 30: 15.0, 31: 21.4}, 'cyl': {0: 6, 1: 6, 2: 4, 3: 6, 4: 8, 5: 6, 6: 8, 7: 4, 8: 4, 9: 6, 10: 6, 11: 8, 12: 8, 13: 8, 14: 8, 15: 8, 16: 8, 17: 4, 18: 4, 19: 4, 20: 4, 21: 8, 22: 8, 23: 8, 24: 8, 25: 4, 26: 4, 27: 4, 28: 8, 29: 6, 30: 8, 31: 4}, 'disp': {0: 160.0, 1: 160.0, 2: 108.0, 3: 258.0, 4: 360.0, 5: 225.0, 6: 360.0, 7: 146.7, 8: 140.8, 9: 167.6, 10: 167.6, 11: 275.8, 12: 275.8, 13: 275.8, 14: 472.0, 15: 460.0, 16: 440.0, 17: 78.7, 18: 75.7, 19: 71.1, 20: 120.1, 21: 318.0, 22: 304.0, 23: 350.0, 24: 400.0, 25: 79.0, 26: 120.3, 27: 95.1, 28: 351.0, 29: 145.0, 30: 301.0, 31: 121.0}, 'hp': {0: 110, 1: 110, 2: 93, 3: 110, 4: 175, 5: 105, 6: 245, 7: 62, 8: 95, 9: 123, 10: 123, 11: 180, 12: 180, 13: 180, 14: 205, 15: 215, 16: 230, 17: 66, 18: 52, 19: 65, 20: 97, 21: 150, 22: 150, 23: 245, 24: 175, 25: 66, 26: 91, 27: 113, 28: 264, 29: 175, 30: 335, 31: 109}, 'drat': {0: 3.9, 1: 3.9, 2: 3.85, 3: 3.08, 4: 3.15, 5: 2.76, 6: 3.21, 7: 3.69, 8: 3.92, 9: 3.92, 10: 3.92, 11: 3.07, 12: 3.07, 13: 3.07, 14: 2.93, 15: 3.0, 16: 3.23, 17: 4.08, 18: 4.93, 19: 4.22, 20: 3.7, 21: 2.76, 22: 3.15, 23: 3.73, 24: 3.08, 25: 4.08, 26: 4.43, 27: 3.77, 28: 4.22, 29: 3.62, 30: 3.54, 31: 4.11}, 'wt': {0: 2.62, 1: 2.875, 2: 2.32, 3: 3.215, 4: 3.44, 5: 3.46, 6: 3.57, 7: 3.19, 8: 3.15, 9: 3.44, 10: 3.44, 11: 4.07, 12: 3.73, 13: 3.78, 14: 5.25, 15: 5.424, 16: 5.345, 17: 2.2, 18: 1.615, 19: 1.835, 20: 2.465, 21: 3.52, 22: 3.435, 23: 3.84, 24: 3.845, 25: 1.935, 26: 2.14, 27: 1.513, 28: 3.17, 29: 2.77, 30: 3.57, 31: 2.78}, 'qsec': {0: 16.46, 1: 17.02, 2: 18.61, 3: 19.44, 4: 17.02, 5: 20.22, 6: 15.84, 7: 20.0, 8: 22.9, 9: 18.3, 10: 18.9, 11: 17.4, 12: 17.6, 13: 18.0, 14: 17.98, 15: 17.82, 16: 17.42, 17: 19.47, 18: 18.52, 19: 19.9, 20: 20.01, 21: 16.87, 22: 17.3, 23: 15.41, 24: 17.05, 25: 18.9, 26: 16.7, 27: 16.9, 28: 14.5, 29: 15.5, 30: 14.6, 31: 18.6}, 'vs': {0: 0, 1: 0, 2: 1, 3: 1, 4: 0, 5: 1, 6: 0, 7: 1, 8: 1, 9: 1, 10: 1, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 1, 18: 1, 19: 1, 20: 1, 21: 0, 22: 0, 23: 0, 24: 0, 25: 1, 26: 0, 27: 1, 28: 0, 29: 0, 30: 0, 31: 1}, 'am': {0: 1, 1: 1, 2: 1, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 1, 18: 1, 19: 1, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 1, 26: 1, 27: 1, 28: 1, 29: 1, 30: 1, 31: 1}, 'gear': {0: 4, 1: 4, 2: 4, 3: 3, 4: 3, 5: 3, 6: 3, 7: 4, 8: 4, 9: 4, 10: 4, 11: 3, 12: 3, 13: 3, 14: 3, 15: 3, 16: 3, 17: 4, 18: 4, 19: 4, 20: 3, 21: 3, 22: 3, 23: 3, 24: 3, 25: 4, 26: 5, 27: 5, 28: 5, 29: 5, 30: 5, 31: 4}, 'carb': {0: 4, 1: 4, 2: 1, 3: 1, 4: 2, 5: 1, 6: 4, 7: 2, 8: 2, 9: 4, 10: 4, 11: 3, 12: 3, 13: 3, 14: 4, 15: 4, 16: 4, 17: 1, 18: 2, 19: 1, 20: 1, 21: 2, 22: 2, 23: 4, 24: 2, 25: 1, 26: 2, 27: 2, 28: 4, 29: 6, 30: 8, 31: 2}}
Here is the code that I'm using. The subplot part I got off a datacamp module.
fig, ax = plt.subplot()
plt.show()
But when I go to plot the mtcars dataset, one variable against the other, I get a blank canvas. Why is that? I don't see how the code is different than what I am looking at on DataCamp.
ax.plot(mtcars['cyl'], mtcars['mpg'])
plt.show()
The answer from below is helpful and gets me closer to a solution but it is giving me lines instead of a scatterplot?
fig, ax = plt.subplot()
plt.show()
import matplotlib.pyplot as plt
plt.plot(df['cyl'], df['mpg'])
plt.show()
or:
ax = plt.subplot(2, 1, 1)
ax.plot(df['cyl'], df['mpg'])
plt.show()
I have the following code, I want to filter out those values which are less than the threshold value and store it in a variable. But the below code gives an empty list in the output.
current_memory_available=29.373699550744604
total_memory=614480960
lst= {0: 602112, 1: 12852224, 2: 12992768, 3: 3211264, 4: 6717952, 5: 7012864, 6: 1605632, 7: 4391936, 8: 5571584, 9: 5571584, 10: 802816, 11: 6326272, 12: 11044864, 13: 11044864, 14: 401408, 15: 9840640, 16: 9840640, 17: 9840640, 18: 100352, 19: 100352, 20: 411074560, 21: 67141632, 22: 16392000}
passed = {key: (value/total_memory) * 100 for key,value in lst.items() if value <
current_memory_available}
print(passed)
Thanks, Help is highly appreciated
current_memory_available = 29.373699550744604
total_memory=614480960
dct = {0: 602112, 1: 12852224, 2: 12992768, 3: 3211264, 4: 6717952, 5: 7012864, 6: 1605632, 7: 4391936, 8: 5571584, 9: 5571584, 10: 802816, 11: 6326272, 12: 11044864, 13: 11044864, 14: 401408, 15: 9840640, 16: 9840640, 17: 9840640, 18: 100352, 19: 100352, 20: 411074560, 21: 67141632, 22: 16392000}
new_dct = {}
for key, value in dct.items():
calc = (value / total_memory) *100
if calc < current_memory_available:
new_dct[key] = value
print(new_dct)
'if' statement used in dict comprehension will never get True as all the values are greater than 'current_memory_available' variable. Thus the dictionary is getting printed as null.
Below logic is incorrect:
passed = {key: (value/total_memory) * 100 for key,value in lst.items() if value <
current_memory_available}
Try using below:
passed = {key for key,value in lst.items() if (value/total_memory) * 100 < current_memory_available}
Full code:
current_memory_available=29.373699550744604
total_memory=614480960
lst= {0: 602112, 1: 12852224, 2: 12992768, 3: 3211264, 4: 6717952, 5: 7012864, 6: 1605632, 7: 4391936, 8: 5571584, 9: 5571584, 10: 802816, 11: 6326272, 12: 11044864, 13: 11044864, 14: 401408, 15: 9840640, 16: 9840640, 17: 9840640, 18: 100352, 19: 100352, 20: 411074560, 21: 67141632, 22: 16392000}
passed = {key for key,value in lst.items() if (value/total_memory) * 100 < current_memory_available}
print(passed)