GroupBy is being shown as a singular line on plot

GroupBy is being shown as a singular line on plot - python

I'm trying to create a plot to show profit over time through a pandas dataframe. Here are the steps I have taken:
profit_data=agg_data.groupby(['segment','yyyy_mm_dd'])[["profit"]].sum()
profit_data
This gives me a dataframe similar to the below:
profit
segment yyyy_mm_dd
Core 2019-06-01 100
2019-06-02 100
2019-06-03 100
2019-06-04 100
2019-06-05 100
NonCore 2019-06-07 100
2019-06-08 100
2019-06-09 100
2019-06-10 100
...
...
I then try to plot this using matplotlib:
profit_data.plot()
The above does generate a plot, however my segments are one continuous line rather than two different lines (one for each segment). What change do I need to make so each segment is plotted?

Your output dataframe is only one column, so only one column gets plotted.
To solve this, you need to reshape your dataframe into two columns - one for each segment:
df_unstacked = df.unstack(level="segment")
Alternatively, you could select which indices to plot and plot twice:
df.loc["Core"].plot(label="Core")
df.loc["NonCore"].plot(label="NonCore")
Hope this helps!

Related

How to create a clear line plot that shows the course of values x against y

I have a pandas DataFrame of 8664 rows. Shown here:
frame_3_LW.
frame_3_LW contains the following columns of importance to me: EASTVEL , NORTHVEL, Z_NAP, DATE+TIME. Definitions of the columns are:
EASTVEL = Flow of current where min(-) values are west and plus(+) values are east.
NORTHVEL = Flow of current where min(-) values are south and plus(+) values are north.
Z_NAP = Depth of water
DATE+TIME = Date + time in this format: 2021-11-17 10:00:00
Now the problem that I encouter is the following: I want to generate a plot with EASTVEL on the x-axis and Z_NAP on the y-axis.
I used a simple:
df.plot(x = 'EASTVEL', y = 'Z_NAP')
However because I have so many values I get an unclear plot with a lot of lines. However I would like just one line describing the course of EASTVEL (x axis) against Z_NAP (y axis). Can anybody help me with that? It would be such a big help!

Show how when values rise in one column, so does the values in another one

I'm working with a covid dataset for some python exercises I am working through to try learn. I've got it by doing the normal:
import pandas as pd
import numpy as np
df = pd.read_csv("C:/Users/Desktop/Python Short Course/diagnosis.csv")
In this dataset there are 2 columns called BodyTemp and SpO2, what I am looking to try do is show how the results of the columns are similar. So like when the values rise in the BodyTemp column, so does the values in the SpO2 column, that sort of idea. I had thought of maybe doing a bar chart like:
plt.xlabel("BodyTemp") , plt.ylabel("SpO2")
plt.bar(x = df["BodyTemp"], height = df["SpO2"])
plt.show()
but all the bars are very close together and it just doesn't look great, so what would be a better way to do this? Or would there be a better approach to show the visualisation of the distribution of values?
Edit: to show screenshot of graph
Edit to show data:
BodyTemp
SpO2
37.6
85
38.9
93
38.5
92
37
80
I've added a table showing the first few, there are a whole lot more though but it gives an idea of the data

you need to change the scale of y-axis. try this.
plt.ylim((df['SpO2'].min()-.5, df['SpO2'].max()+.5))
If this didn't work, it's probably because there are very small values in the column SpO2. These gaps between the bars may be small values that are distorting the data. Try to remove them from the dataframe.

python: Adjusting the values in the x axis of a plot

IM trying to create plots in python.the first 10 rows of the dataset named Psmc_dolphin looks like the below. the original file has 57 rows and 5 columns.
0 1 2 3 4
0 0.000000e+00 11.915525 299.807861 0.000621 0.000040
1 4.801704e+03 11.915525 326.288712 0.000675 0.000311
2 1.003041e+04 11.915525 355.090418 0.000735 0.000497
3 1.572443e+04 11.915525 386.413025 0.000800 0.000548
4 2.192481e+04 0.583837 8508.130872 0.017613 0.012147
5 2.867635e+04 0.583837 9092.811889 0.018823 0.014021
6 3.602925e+04 0.466402 12111.617016 0.025073 0.019815
7 4.403533e+04 0.466402 12826.458632 0.026553 0.021989
8 5.275397e+04 0.662226 9587.887034 0.019848 0.017158
9 6.224833e+04 0.662226 10201.024439 0.021118 0.018877
10 7.258698e+04 0.991930 7262.773560 0.015035 0.013876
im trying to plot the column0 in x axis and column1 in y axis i get a plot with xaxis values 1000000,2000000,3000000,400000 etc. andthe codes i used are attached below.
i need to adjust the values in x axis so that the x axis should have values such as 1e+06,2e+06,3e+06 ... etc instead of 1000000,2000000,3000000,400000 etc .
# load the dataset
Psmc_dolphin = pd.read_csv('Beluga_mapped_to_dolphin.0.txt', sep="\t",header=None)
plt.plot(Psmc_dolphin[0],Psmc_dolphin[1],color='green')
Any help or suggstion will be appreciated

Scaling the values might help you. Convert 1000000 to 1,2000000 to 2 and so on . Divide the values by 1000000. Or use some different scale like logarithmic scale. I am no expert just a newbie but i think this might help

Plots do not appear when calling seaborn's pairplot on a pandas Dataframe

I have a Dataframe that looks like so
Price Mileage Age
4250 71000 8
6500 43100 6
26950 10000 3
1295 78000 17
5999 61600 8
This is assigned to dataset. I simply call sns.pairplot(dataset) and I'm left with just a single graph - the distribution of prices across my dataset. I expected a 3x3 grid of plots.
When I import a pre-configured dataset from seaborn I get the expected multiplot pair plot.
I'm new to seaborn so apologies if this is a silly question, but what am I doing wrong? It seems like a simple task.

From your comment, it seems like you're trying to plot on non-numeric columns. Try coercing them first:
dataset = dataset.apply(lambda x: pd.to_numeric(x, errors='coerce'))
sns.pairplot(dataset)
The errors='coerce' argument will replace non-coercible values (the reason your columns are objects in the first place) to NaN.

Why are my histogram bars all displaying frequencies of 1

I have a series (114 rows) with indexed timestamps and percentages (astype float).
testseries.head()
Out[100]:
Timestamps
2018-04-19 13:23:57-04:00 0.000161238
2018-04-06 13:59:50-04:00 -0.0169348
2018-04-04 11:39:41-04:00 0.0475188
2018-04-03 14:53:37-04:00 -0.00231244
2018-03-29 14:09:57-04:00 0.0209815
Name: Change, dtype: object
I'm trying to create a histogram of the distribution of these, as I've done several times before, but am getting an unexpected result when I call
testseries.hist()
link to image of output hist
I've tried various options, like setting density=True, changing the number of bins, or plotting in matplotlib vs. pandas, but the result is always a series of thin bars with height equal to the maximum on the y-axis.
What's causing this?

The histogram is correctly showing you that each value appears once. In order to show something smoother, you might want to group counts by quantiles and count, displaying the histogram of the result:
testseries.groupby(pd.cut(testseries.astype(float), 10)).sum().hist()
Example
import pandas as pd
import numpy as np
testseries = pd.Series(np.random.randn(100000))
testseries.groupby(pd.cut(testseries.astype(float), 10)).sum().hist();

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

GroupBy is being shown as a singular line on plot - python

Related

How to create a clear line plot that shows the course of values x against y

Show how when values rise in one column, so does the values in another one

python: Adjusting the values in the x axis of a plot

Plots do not appear when calling seaborn's pairplot on a pandas Dataframe

Why are my histogram bars all displaying frequencies of 1

Categories

Resources