When using networkx I only now that there are several possibilities of plotting graphs with edges and nodes.
Is it possible only to plot a lot of nodes, without connections between them? The points all have x- and y-coordinates. The points are saved in a pandas dataframe with only 3 columns: ID, X, Y
g = nx.from_pandas_dataframe(df1, source='x', target='y')
I tried something like this but I don´t want to have edges only points.
This is a part of the dataframe:
id x y
0 550 1005.600 1539.400
1 551 1006.600 1549.400
2 705 1029.997 2140.001
3 706 1030.997 2141.001
4 478 180.000 1354.370
5 479 190.000 1354.370
.. ... ... ...
500 237 1135.000 2615.000
501 238 1145.000 2615.000
You can draw nodes and edges separately. Use the following to only draw the nodes:
nodes=nx.draw_networkx_nodes(G)
If you want to pass the specific position of the nodes you may want to create the pos out of the x and y values. (At that point I would rather not use networkx...)
See the docs...
Related
I have data set which looks like this:
Hour_day Profits
7 645
3 354
5 346
11 153
23 478
7 464
12 356
0 346
I crated a line plot to visualize the hour on the x-axis and the profit values on y-axis. My code worked good with me but the problem is that on the x-axis it started at 0. but I want to start from 5 pm for example.
hours = df.Hour_day.value_counts().keys()
hours = hours.sort_values()
# Get plot information from actual data
y_values = list()
for hr in hours:
temp = df[df.Hour_day == hr]
y_values.append(temp.Profits.mean())
# Plot comparison
plt.plot(hours, y_values, color='y')
From what I know you have two options:
Create a sub DF that excludes the rows that have an Hour_day value under 5 and proceed with the rest of your code as normal:
df_new = df.where(df['Hour_day'] >= 5)
or, you might be able to set the x_ticks:
default_x_ticks = range(5:23)
plt.plot(hours, y_values, color='y')
plt.xticks(default_x_ticks, hours)
plt.show()
I haven't tested the x_ticks code so you might have to play around with it just a touch, but there are lots of easy to find resources on x_ticks.
I am trying to draw a choropleth map of municipalties in Denmark with color encoded as a sum of crimes in that municipalty.
I have several entries for each municipalty since the data is over a time-period and types of crime and I have a single geometry entry for each municipalty.
I want to perform a transform_lookup on the geometry field in the geopandas dataframe on the label_dk key, but I can't seem to get the map to render.
I could always merge the dataframes, but I am trying to save space by not repeating the geometry for every entry of crime, since I also want to plot the data in different charts and allow for slicing and dicing over time and offfence.
Bear in mind that this crime data is just a small example, and the real data I want to use has around 30,000 entries, so a merged geojson file takes up 647,000 KB and the map won't render.
Does anybody know why this transform_lookup doesn't work?
The data looks like this:
label_dk geometry
0 Aabenraa MULTIPOLYGON Z (((9.51215 54.85672 -999.00000,...
1 Aalborg MULTIPOLYGON Z (((9.84688 57.04365 -999.00000,...
2 Aarhus POLYGON Z ((9.99682 56.17872 -999.00000, 9.990...
3 Albertslund POLYGON Z ((12.35234 55.70461 -999.00000, 12.3...
4 Allerød POLYGON Z ((12.31845 55.88305 -999.00000, 12.3...
.. ... ...
94 Vejle POLYGON Z ((9.11714 55.76669 -999.00000, 9.100...
95 Vesthimmerlands MULTIPOLYGON Z (((9.17798 56.91745 -999.00000,...
96 Viborg POLYGON Z ((9.29501 56.59336 -999.00000, 9.297...
97 Vordingborg MULTIPOLYGON Z (((12.04479 54.95566 -999.00000...
98 Ærø MULTIPOLYGON Z (((10.43467 54.87952 -999.00000...
[99 rows x 2 columns]
tid offence label_dk Anmeldte forbrydelser
0 2021K1 Seksualforbrydelser i alt København 133
1 2021K1 Voldsforbrydelser i alt København 900
2 2021K2 Seksualforbrydelser i alt København 244
3 2021K2 Voldsforbrydelser i alt København 996
4 2021K3 Seksualforbrydelser i alt København 174
.. ... ... ... ...
787 2021K2 Voldsforbrydelser i alt Aalborg 178
788 2021K3 Seksualforbrydelser i alt Aalborg 53
789 2021K3 Voldsforbrydelser i alt Aalborg 185
790 2021K4 Seksualforbrydelser i alt Aalborg 43
791 2021K4 Voldsforbrydelser i alt Aalborg 205
[792 rows x 4 columns]
The code is below:
import altair as alt
import geopandas as gpd
import pandas as pd
import altair_viewer
alt.data_transformers.enable('data_server')
path = "data/small_few_umbrella_terms_crimes_2021.csv"
df = pd.read_csv(path,encoding="utf_8",index_col='Unnamed: 0')
geometry = gpd.read_file("data_with_geo/geometry.geojson")
map_chart = alt.Chart(df).mark_geoshape(
).transform_aggregate(
crime='sum(Anmeldte forbrydelser)',
groupby=["label_dk"]
).transform_lookup(
lookup='label_dk',
from_=alt.LookupData(geometry, 'label_dk', ['geometry'])
).encode(
color=alt.Color(
"crime:Q",
scale=alt.Scale(
scheme='viridis')
)
)
altair_viewer.show(map_chart)
The data can be found here:
https://github.com/Joac1137/Data-Visualization/blob/main/data_with_geo/geometry.geojson
and
https://github.com/Joac1137/Data-Visualization/blob/main/data/small_few_umbrella_terms_crimes_2021.csv
I think you're running into an issue similar to HConcat of mark_geoshape and mark_bar breaks depending of order (and the comments in the linked vega-lite issue). If you change the order of the data frames it will work.
There also seems to be some issue with the aggregation which I think is related to this issue https://github.com/altair-viz/altair/issues/1357, but I just used pandas to aggregate here:
grouped_sums = df.groupby('label_dk').sum().reset_index()
alt.Chart(geometry).mark_geoshape().transform_lookup(
lookup='label_dk',
from_=alt.LookupData(grouped_sums, 'label_dk', grouped_sums.columns.tolist())
).encode(
color=alt.Color("Anmeldte forbrydelser:Q"),
tooltip=['label_dk', 'Anmeldte forbrydelser:Q']
)
We're working on a revamp on the geo docs which you might find useful https://deploy-preview-1--spontaneous-sorbet-49ed10.netlify.app/user_guide/marks/geoshape.html#lookup-datasets
Thanks a lot #joelostblom !
I found the solution in the new docs you linked.
The trick was that I was missing the "type" column in my geojson, which usually only contains the string "Feature", but whatever.
The geojson data now looks like this:
label_dk type geometry
0 Aabenraa Feature MULTIPOLYGON Z (((9.51215 54.85672 -999.00000,...
1 Aalborg Feature MULTIPOLYGON Z (((9.84688 57.04365 -999.00000,...
2 Aarhus Feature POLYGON Z ((9.99682 56.17872 -999.00000, 9.990...
3 Albertslund Feature POLYGON Z ((12.35234 55.70461 -999.00000, 12.3...
4 Allerød Feature POLYGON Z ((12.31845 55.88305 -999.00000, 12.3...
And the code like this
import altair as alt
import geopandas as gpd
import pandas as pd
import altair_viewer
path = "data/small_few_umbrella_terms_crimes_2021.csv"
df = pd.read_csv(path,encoding="utf_8",index_col='Unnamed: 0')
geometry = gpd.read_file("data_with_geo/geometry.geojson")
map_chart = alt.Chart(df).transform_lookup(
lookup='label_dk',
from_=alt.LookupData(geometry, 'label_dk',['geometry','type'])
).transform_aggregate(
crime='sum(Anmeldte forbrydelser)',
groupby=["label_dk","type","geometry"]
).mark_geoshape(
).encode(
color=alt.Color(
"crime:Q",
scale=alt.Scale(
scheme='viridis')
)
)
altair_viewer.show(map_chart)
Changing from the merged data that I previously used to this lookup method resulted in a significant speedup when initializing. It used to take around 10 minutes to start up, but now it does it in a matter of seconds.
I have a multi-index pandas dataframe that looks something like this :
Default Config Best Config
Device Experiment Total Power (W)
titan SGEMM 140 1.000000 1.158175
280 0.990189 1.273428
MiniFE 140 1.000000 1.262243
280 0.979770 1.246412
titanxp SGEMM 140 1.000000 1.181740
280 1.646472 1.674499
MiniFE 140 1.000000 1.037918
280 1.005121 1.203337
I want to draw a bar graph that looks something like this :
My main issue is the three-level of x-tick marks right now, as you can see in the hand-drawn image, the Experiment Name is in between Total Power and Device is in between Experiment. How to achieve this. I first thought of drawing separately two subplots and merging them but the issue will be a common y-axis. I am open to some other types of graphs also.
Suppose I have this dataframe df that contains 3794 rows x 2 columns, where column a-number represents nodes with directed edges to nodes in b-number:
a_number b_number
0 0123456789343 0123456789991
1 0123456789343 0123456789633
2 0123456789343 0123456789633
3 0123456789343 0123456789628
4 0123456789343 0123456789633
... ... ...
3789 0123456789697 0123456789916
3790 0123456789697 0123456789886
3791 0123456789697 0123456789572
3792 0123456789697 0123456789884
3793 0123456789697 0123456789125
3794 rows × 2 columns
Additional information:
len(df['a_number'].unique())
>>> 18
len(df['b_number'].unique())
>>>1145
I am trying to generate an image representation of the graph. Here's code to apply networkx:
import networkx as nx
G = nx.DiGraph()
for i, (x, y) in df.iterrows():
G.add_node(x)
G.add_node(y)
G.add_edge(x,y)
nx.draw(G, with_labels = True, font_size=14 , node_size=2000)
I get this output:
I am having some problems in visualizing the graphs created with python-networkx, I want to able to reduce clutter and regulate the distance between the nodes. Please advise. What can I do on the code? thank you.
First to reduce the clutter I would start by decreasing the node size, to maybe 200 or 400.
Try to reduce the font_size parameter in the draw function. This parameter regulate the size of the node's labels. Since you have large node names, it will help reduce the clutter.
If having the labels on the graph is not necessary, then remove them to make it cleaner by passing the with_labels=False to the draw function.
Then to regulate the distance between nodes you an use the spring layout for the nodes position.
pos = nx.spring_layout(G, k=0.8)
nx.draw(G, pos , with_labels = True, font_size=7, node_size=400)
The k parameter in the spring layout allows you to regulate distance between nodes. You can try different values to see what suit you most.
I am using Python and I would like to represent a set of time series on a heatmap.
My static representation of the geographical space, which is a "fixed-time slice" of all the time series, looks like this:
Each time serie corresponds to a cell, and every cell is associated to a specific geometric shape on the plot above. Here is an example of one of the time series:
Is there any way I can "animate" the heatmap above or at least set a time parameter that I can regulate in order to see the time evolution of the entire map? Obviously the arrays have all the same length and are stored as NumPy arrays in a DataFrame like this:
SQUAREID
155 [0.057949285005512684, 0.04961411245865491, 0....
272 [0.4492307820512821, 0.3846153846153846, 0.415...
273 [0.09658214167585447, 0.08269018743109151, 0.0...
276 [0.03208695579710145, 0.03234782536231884, 0.0...
277 [0.82994485446527, 0.8366923737596471, 0.79620...
...
10983 [0.6770833333333334, 0.6865036231884057, 0.692...
10984 [0.21875, 0.22179347826086956, 0.2236956521739...
11097 [0.5921739130434782, 0.5934782608695652, 0.598...
11098 [0.06579710144927536, 0.06594202898550725, 0.0...
11099 [0.21273428886438808, 0.21320286659316426, 0.2...
Name: wp, Length: 2020, dtype: object
and SQUAREID is matched with the cellId column in a GeoDataFrame that looks like this:
cellId geometry
0 38 POLYGON ((10.91462 45.68201, 10.92746 45.68179...
1 39 POLYGON ((10.92746 45.68179, 10.94029 45.68157...
2 40 POLYGON ((10.94029 45.68157, 10.95312 45.68136...
3 154 POLYGON ((10.90209 45.69122, 10.91493 45.69100...
4 155 POLYGON ((10.91493 45.69100, 10.92777 45.69079...
... ... ...
6570 11336 POLYGON ((11.80475 46.52767, 11.81777 46.52735...
6571 11337 POLYGON ((11.81777 46.52735, 11.83080 46.52703...
6572 11452 POLYGON ((11.79219 46.53698, 11.80521 46.53666...
6573 11453 POLYGON ((11.80521 46.53666, 11.81824 46.53634...
6574 11454 POLYGON ((11.81824 46.53634, 11.83126 46.53601...
6575 rows × 2 columns
Thanks in advance.
You can use animation from the matplotlib library : https://towardsdatascience.com/learn-how-to-create-animated-graphs-in-python-fce780421afe