I'm trying to create a plot to show profit over time through a pandas dataframe. Here are the steps I have taken:
profit_data=agg_data.groupby(['segment','yyyy_mm_dd'])[["profit"]].sum()
profit_data
This gives me a dataframe similar to the below:
profit
segment yyyy_mm_dd
Core 2019-06-01 100
2019-06-02 100
2019-06-03 100
2019-06-04 100
2019-06-05 100
NonCore 2019-06-07 100
2019-06-08 100
2019-06-09 100
2019-06-10 100
...
...
I then try to plot this using matplotlib:
profit_data.plot()
The above does generate a plot, however my segments are one continuous line rather than two different lines (one for each segment). What change do I need to make so each segment is plotted?
Your output dataframe is only one column, so only one column gets plotted.
To solve this, you need to reshape your dataframe into two columns - one for each segment:
df_unstacked = df.unstack(level="segment")
Alternatively, you could select which indices to plot and plot twice:
df.loc["Core"].plot(label="Core")
df.loc["NonCore"].plot(label="NonCore")
Hope this helps!
Related
I have a pandas DataFrame of 8664 rows. Shown here:
frame_3_LW.
frame_3_LW contains the following columns of importance to me: EASTVEL , NORTHVEL, Z_NAP, DATE+TIME. Definitions of the columns are:
EASTVEL = Flow of current where min(-) values are west and plus(+) values are east.
NORTHVEL = Flow of current where min(-) values are south and plus(+) values are north.
Z_NAP = Depth of water
DATE+TIME = Date + time in this format: 2021-11-17 10:00:00
Now the problem that I encouter is the following: I want to generate a plot with EASTVEL on the x-axis and Z_NAP on the y-axis.
I used a simple:
df.plot(x = 'EASTVEL', y = 'Z_NAP')
However because I have so many values I get an unclear plot with a lot of lines. However I would like just one line describing the course of EASTVEL (x axis) against Z_NAP (y axis). Can anybody help me with that? It would be such a big help!
I'm working with a covid dataset for some python exercises I am working through to try learn. I've got it by doing the normal:
import pandas as pd
import numpy as np
df = pd.read_csv("C:/Users/Desktop/Python Short Course/diagnosis.csv")
In this dataset there are 2 columns called BodyTemp and SpO2, what I am looking to try do is show how the results of the columns are similar. So like when the values rise in the BodyTemp column, so does the values in the SpO2 column, that sort of idea. I had thought of maybe doing a bar chart like:
plt.xlabel("BodyTemp") , plt.ylabel("SpO2")
plt.bar(x = df["BodyTemp"], height = df["SpO2"])
plt.show()
but all the bars are very close together and it just doesn't look great, so what would be a better way to do this? Or would there be a better approach to show the visualisation of the distribution of values?
Edit: to show screenshot of graph
Edit to show data:
BodyTemp
SpO2
37.6
85
38.9
93
38.5
92
37
80
I've added a table showing the first few, there are a whole lot more though but it gives an idea of the data
you need to change the scale of y-axis. try this.
plt.ylim((df['SpO2'].min()-.5, df['SpO2'].max()+.5))
If this didn't work, it's probably because there are very small values in the column SpO2. These gaps between the bars may be small values that are distorting the data. Try to remove them from the dataframe.
IM trying to create plots in python.the first 10 rows of the dataset named Psmc_dolphin looks like the below. the original file has 57 rows and 5 columns.
0 1 2 3 4
0 0.000000e+00 11.915525 299.807861 0.000621 0.000040
1 4.801704e+03 11.915525 326.288712 0.000675 0.000311
2 1.003041e+04 11.915525 355.090418 0.000735 0.000497
3 1.572443e+04 11.915525 386.413025 0.000800 0.000548
4 2.192481e+04 0.583837 8508.130872 0.017613 0.012147
5 2.867635e+04 0.583837 9092.811889 0.018823 0.014021
6 3.602925e+04 0.466402 12111.617016 0.025073 0.019815
7 4.403533e+04 0.466402 12826.458632 0.026553 0.021989
8 5.275397e+04 0.662226 9587.887034 0.019848 0.017158
9 6.224833e+04 0.662226 10201.024439 0.021118 0.018877
10 7.258698e+04 0.991930 7262.773560 0.015035 0.013876
im trying to plot the column0 in x axis and column1 in y axis i get a plot with xaxis values 1000000,2000000,3000000,400000 etc. andthe codes i used are attached below.
i need to adjust the values in x axis so that the x axis should have values such as 1e+06,2e+06,3e+06 ... etc instead of 1000000,2000000,3000000,400000 etc .
# load the dataset
Psmc_dolphin = pd.read_csv('Beluga_mapped_to_dolphin.0.txt', sep="\t",header=None)
plt.plot(Psmc_dolphin[0],Psmc_dolphin[1],color='green')
Any help or suggstion will be appreciated
Scaling the values might help you. Convert 1000000 to 1,2000000 to 2 and so on . Divide the values by 1000000. Or use some different scale like logarithmic scale. I am no expert just a newbie but i think this might help
I have a Dataframe that looks like so
Price Mileage Age
4250 71000 8
6500 43100 6
26950 10000 3
1295 78000 17
5999 61600 8
This is assigned to dataset. I simply call sns.pairplot(dataset) and I'm left with just a single graph - the distribution of prices across my dataset. I expected a 3x3 grid of plots.
When I import a pre-configured dataset from seaborn I get the expected multiplot pair plot.
I'm new to seaborn so apologies if this is a silly question, but what am I doing wrong? It seems like a simple task.
From your comment, it seems like you're trying to plot on non-numeric columns. Try coercing them first:
dataset = dataset.apply(lambda x: pd.to_numeric(x, errors='coerce'))
sns.pairplot(dataset)
The errors='coerce' argument will replace non-coercible values (the reason your columns are objects in the first place) to NaN.
I have a series (114 rows) with indexed timestamps and percentages (astype float).
testseries.head()
Out[100]:
Timestamps
2018-04-19 13:23:57-04:00 0.000161238
2018-04-06 13:59:50-04:00 -0.0169348
2018-04-04 11:39:41-04:00 0.0475188
2018-04-03 14:53:37-04:00 -0.00231244
2018-03-29 14:09:57-04:00 0.0209815
Name: Change, dtype: object
I'm trying to create a histogram of the distribution of these, as I've done several times before, but am getting an unexpected result when I call
testseries.hist()
link to image of output hist
I've tried various options, like setting density=True, changing the number of bins, or plotting in matplotlib vs. pandas, but the result is always a series of thin bars with height equal to the maximum on the y-axis.
What's causing this?
The histogram is correctly showing you that each value appears once. In order to show something smoother, you might want to group counts by quantiles and count, displaying the histogram of the result:
testseries.groupby(pd.cut(testseries.astype(float), 10)).sum().hist()
Example
import pandas as pd
import numpy as np
testseries = pd.Series(np.random.randn(100000))
testseries.groupby(pd.cut(testseries.astype(float), 10)).sum().hist();