My simple Dataframe produces a plot with 4 single, horizontal bars, rather than one stacked horizontal bar. I've tried transposing it etc - without success. I'm sure I'm doing something simple wrong - but I can't work it out. Help much appreciated!
import pandas as pd
import matplotlib.pyplot as plt
fake_data = [['dogs',12],['cats',8],['fish',22],['bird',8]]
myDF = pd.DataFrame(fake_data)
myDF.columns = ['animals','count']
myDF.plot.barh(stacked=True)
plt.show()
I think you need create one row DataFrame with Series.to_frame and transpose by DataFrame.T:
myDF.set_index('animals')['count'].to_frame().T.plot.barh(stacked=True)
Related
I have a question about plotly parallel_coordinates function.
To reproduce my case use this code:
import numpy as np
import pandas as pd
import plotly.express as px
np.random.seed(100)
data = np.random.random([1000,2])
data_df = pd.DataFrame(data, columns=["data1","data2"])
fig_parallel = px.parallel_coordinates(data_df,
color="data2",
dimensions=["data1","data2"])
fig_parallel.write_html('test.html')
You will get this image:
The color-bar correspond to the last axis ( data2).
We can notice that the traces going to the top of the last axis (the yellow ones) are in front of the plot.
What I want to have is the same plot but with the blue traces in the front of the image to have a better visualisation (bottom data are my data of interest).
Thank you in advance to any one who may be able to give me a solution ^^
Here is my problem
This is a sample of my two DataFrames (I have 30 columns in reality)
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({"Marc":[6,0,8,-30,-15,0,-3],
"Elisa":[0,1,0,-1,0,-2,-4],
"John":[10,12,24,-20,7,-10,-30]})
df1 = pd.DataFrame({"Marc":[8,2,15,-12,-8,0,-35],
"Elisa":[4,5,7,0,0,1,-2],
"John":[20,32,44,-30,15,-10,-50]})
I would like to create a scatter plot with two different colors :
1 color if the scores of df1 are negative and one if they are positive, but I don't really know how to do it.
I already did that by using matplotlib
plt.scatter(df,df1);
And I also checked this link Link but the problem is that I have two Pandas Dataframe
and not numpy array as on this link. Hence the I can't use the c= np.sign(df.y) method.
I would like to keep Pandas DataFrame as I have many columns but I really stuck on that.
If anyone has a solution, you are welcome!
You can pass the color array in, but it seems to work with 1D array only:
# colors as stated
colors = np.where(df1<0, 'C0', 'C1')
# stack and ravel to turn into 1D
plt.scatter(df.stack(),df1.stack(), c=colors.ravel())
Output:
I am trying to plot my steps as a scatter graph and then eventually add a trend line.
I managed to get it to work with df.plot() but it is a line chart.
The following is the code I have tried:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data_file = pd.read_csv('CSV/stepsgyro.csv')
# print(data_file.head())
# put in the correct data types
data_file = data_file.astype({"steps": int})
pd.to_datetime(data_file['date'])
# makes the date definitely the index at the bottom
data_file.set_index(['date'], inplace=True)
# sorts the data frame by the index
data_file.sort_values(by=['date'], inplace=True, ascending=True)
# data_file.columns.values[1] = 'date'
# plot the raw steps data
# data_file.plot()
plt.scatter(data_file.date, data_file.steps)
plt.title('Daily Steps')
plt.grid(alpha=0.3)
plt.show()
plt.close('all')
# plot the cumulative steps data
data_file = data_file.cumsum()
data_file.plot()
plt.title('Cumulative Daily Steps')
plt.grid(alpha=0.3)
plt.show()
plt.close('all')
and here is a screenshot of what it's looking like on my IDE:
any guidance would be greatly appreciated!
You have set the index to be the "date" column. From that moment on, there is no "date" column anymore, hence data_file.date fails.
Two options:
Don't set the index. Sorting doesn't seem to be needed anyways.
Plot the index, plt.scatter(data_file.index, data_file.steps)
I can't figure out just by looking at your example why you are getting that error. However, I can offer a quick and easy solution to plotting your data:
data_file.plot(marker='.', linestyle='none')
You can use df.plot(kind='scatter') to avoid the line chart.
I'm looking to make a stacked area plot over time, based on summary data created by groupby and sum.
The groupby and sum part correctly groups and sums the data I want, but it seems the resultant format is nonsense in terms of plotting it.
I'm not sure where to go from here:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.DataFrame({'invoice':[1,2,3,4,5,6],'year':[2016,2016,2017,2017,2017,2017],'part':['widget','wonka','widget','wonka','wonka','wonka'],'dollars':[10,20,30,10,10,10]})
#drop the invoice number from the data since we don't need it
df=df[['dollars','part','year']]
#group by year and part, and add them up
df=df.groupby(['year','part']).sum()
#plotting this is nonsense:
df.plot.area()
plt.show()
to chart multiple series, its easiest to have each series organized as a separate column, i.e. replace
df=df.groupby(['year','part']).sum()
with
df=df.groupby(['year', 'part']).sum().unstack(-1)
Then the rest of the code should work. But, I'm not sure if this is what you need because the desired output is not shown.
df.plot.area() then produces the chart like
I am creating a stacked area chart using pandas df.plot(kind = area). Some of my data values are zero at some times. I would like to not have the line show where the value is zero. Is it possible to hide the line while still showing the area?
Here is basic code that makes a simple graph. I don't want the red line to show between 3 and 4 because the values are 0.
import numpy as np
import pandas as pd
data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, columns = ['A','B','C'])
df['C']=np.where(df.index==4,0,df['C'])
df['C']=np.where(df.index==3,0,df['C'])
df.plot(kind='area')
I have finally worked out the solution to this. Other places suggested edgecolor etc but it didn't solve the problem. linewidth, however, does.
linewidth=0
or, in your case, use the line of code:
df.plot(kind='area', linewidth=0)