Three level of x-tick mark in matplotlib bar graphs - python

I have a multi-index pandas dataframe that looks something like this :
Default Config Best Config
Device Experiment Total Power (W)
titan SGEMM 140 1.000000 1.158175
280 0.990189 1.273428
MiniFE 140 1.000000 1.262243
280 0.979770 1.246412
titanxp SGEMM 140 1.000000 1.181740
280 1.646472 1.674499
MiniFE 140 1.000000 1.037918
280 1.005121 1.203337
I want to draw a bar graph that looks something like this :
My main issue is the three-level of x-tick marks right now, as you can see in the hand-drawn image, the Experiment Name is in between Total Power and Device is in between Experiment. How to achieve this. I first thought of drawing separately two subplots and merging them but the issue will be a common y-axis. I am open to some other types of graphs also.

Related

How is it possible to draw unconnected nodes with networkx?

When using networkx I only now that there are several possibilities of plotting graphs with edges and nodes.
Is it possible only to plot a lot of nodes, without connections between them? The points all have x- and y-coordinates. The points are saved in a pandas dataframe with only 3 columns: ID, X, Y
g = nx.from_pandas_dataframe(df1, source='x', target='y')
I tried something like this but I don´t want to have edges only points.
This is a part of the dataframe:
id x y
0 550 1005.600 1539.400
1 551 1006.600 1549.400
2 705 1029.997 2140.001
3 706 1030.997 2141.001
4 478 180.000 1354.370
5 479 190.000 1354.370
.. ... ... ...
500 237 1135.000 2615.000
501 238 1145.000 2615.000
You can draw nodes and edges separately. Use the following to only draw the nodes:
nodes=nx.draw_networkx_nodes(G)
If you want to pass the specific position of the nodes you may want to create the pos out of the x and y values. (At that point I would rather not use networkx...)
See the docs...

Python data visualization: too small value to be visible - how to solve?

Here is dataset, i have:
Source
All Leads
Not Junks
Warms
Hots
Deals
Weighted Sum
web
281316
269490
10252
2508
1602
4376.5
telesales
30458
29732
431
138
85
316.2
networking
4249
4195
763
547
476
539.1
promos
1356
1308
30
1
0
10.8
I visualized it:
df.plot.bar()
And got this output:
Some columns got too small values, so that they are not visible, how can tackle this problem?
Setting bigger figure size isn't useful, it makes chart bigger, but columns ratio is still the same, so nothing changes
Any ideas how to make it look more sophisticated? Or maybe i should try different type of chart? Thank you!
Could try df.plot.bar(logy=true), but it's going to make useful interpretation of it messy. A Sankey diagram would probably be a better fit for showing how the data breaks down in each category.
Seaborn comes out a little nicer, but takes some transformation to produce the same type of output:
import seaborn as sns
df2 = df.melt('Source').rename(columns={'variable': 'Category', 'value': 'Values'})
sns.barplot(x='Source', y='Values', data=df2, hue='Category')
plt.show()
Output:
Or with log=True

Animating a geographical heatmap

I am using Python and I would like to represent a set of time series on a heatmap.
My static representation of the geographical space, which is a "fixed-time slice" of all the time series, looks like this:
Each time serie corresponds to a cell, and every cell is associated to a specific geometric shape on the plot above. Here is an example of one of the time series:
Is there any way I can "animate" the heatmap above or at least set a time parameter that I can regulate in order to see the time evolution of the entire map? Obviously the arrays have all the same length and are stored as NumPy arrays in a DataFrame like this:
SQUAREID
155 [0.057949285005512684, 0.04961411245865491, 0....
272 [0.4492307820512821, 0.3846153846153846, 0.415...
273 [0.09658214167585447, 0.08269018743109151, 0.0...
276 [0.03208695579710145, 0.03234782536231884, 0.0...
277 [0.82994485446527, 0.8366923737596471, 0.79620...
...
10983 [0.6770833333333334, 0.6865036231884057, 0.692...
10984 [0.21875, 0.22179347826086956, 0.2236956521739...
11097 [0.5921739130434782, 0.5934782608695652, 0.598...
11098 [0.06579710144927536, 0.06594202898550725, 0.0...
11099 [0.21273428886438808, 0.21320286659316426, 0.2...
Name: wp, Length: 2020, dtype: object
and SQUAREID is matched with the cellId column in a GeoDataFrame that looks like this:
cellId geometry
0 38 POLYGON ((10.91462 45.68201, 10.92746 45.68179...
1 39 POLYGON ((10.92746 45.68179, 10.94029 45.68157...
2 40 POLYGON ((10.94029 45.68157, 10.95312 45.68136...
3 154 POLYGON ((10.90209 45.69122, 10.91493 45.69100...
4 155 POLYGON ((10.91493 45.69100, 10.92777 45.69079...
... ... ...
6570 11336 POLYGON ((11.80475 46.52767, 11.81777 46.52735...
6571 11337 POLYGON ((11.81777 46.52735, 11.83080 46.52703...
6572 11452 POLYGON ((11.79219 46.53698, 11.80521 46.53666...
6573 11453 POLYGON ((11.80521 46.53666, 11.81824 46.53634...
6574 11454 POLYGON ((11.81824 46.53634, 11.83126 46.53601...
6575 rows × 2 columns
Thanks in advance.
You can use animation from the matplotlib library : https://towardsdatascience.com/learn-how-to-create-animated-graphs-in-python-fce780421afe

GroupBy is being shown as a singular line on plot

I'm trying to create a plot to show profit over time through a pandas dataframe. Here are the steps I have taken:
profit_data=agg_data.groupby(['segment','yyyy_mm_dd'])[["profit"]].sum()
profit_data
This gives me a dataframe similar to the below:
profit
segment yyyy_mm_dd
Core 2019-06-01 100
2019-06-02 100
2019-06-03 100
2019-06-04 100
2019-06-05 100
NonCore 2019-06-07 100
2019-06-08 100
2019-06-09 100
2019-06-10 100
...
...
I then try to plot this using matplotlib:
profit_data.plot()
The above does generate a plot, however my segments are one continuous line rather than two different lines (one for each segment). What change do I need to make so each segment is plotted?
Your output dataframe is only one column, so only one column gets plotted.
To solve this, you need to reshape your dataframe into two columns - one for each segment:
df_unstacked = df.unstack(level="segment")
Alternatively, you could select which indices to plot and plot twice:
df.loc["Core"].plot(label="Core")
df.loc["NonCore"].plot(label="NonCore")
Hope this helps!

Plotting different lines for different states on the same chart

I am trying to create a distribution for the number of ___ across a few states.
I want to get all of the states on the same graph, represented by different lines.
Here is an example what my data looks like: you have the state ('which I want to filter lines by), the number of reviews (x axis), and the frequency of restaurants that have that many reviews (y axis)
State | num_of_reviews | Count_id
alaska 1 400
alaska 2 388
alaska 3 344
...
Wyoming 57 13
Whenever I try doing a simple line plot in seaborn or matplotlib, it just returns a messy graph.
Does anyone know a string of code where I easily can filter df['State']?
Assuming that you have 50+ states, I wouldn't plot the distribution for each on the same plot as it would get really messy and hard to read. Instead, I would suggest to use a FacetGrid (read more about it here).
Something like this should do.
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.FacetGrid(df, col="State", col_wrap=5, height=1.5)
g = g.map(plt.hist, "num_of_reviews")
You can find other possible solutions and ideas on how to visualize your data here.
If none of these work for you then it might be helpful if you explain a bit better your problem and provide a desired output and a minimal, complete, and verifiable example.

Categories