I'm reading a pandas dataframe, and trying to generate a plot from it. In the plot, the data points seem to be getting connected in an order determined by ascending y value, resulting in a weird zig-zagging plot like this:
The code goes something like this:
from pandas import DataFrame as df
import matplotlib as mpl
mpl.use('Agg')
import matplotlib.pyplot as plt
data = df.from_csv(...)
plt.plot(data['COL1'], data['COL2'])
Any suggestions on how to fix the order in which the dots are connected (i.e. connect them in the sequence in which they appear going from left to right on the plot)? Thanks.
Is the order of values in COL1 different from the csv?
You can sort by COL1 first, add this before plotting:
data.sort('COL1', inplace=True)
Related
I have an excel file with the following data:
My code so far:
import pandas as pd
import matplotlib as plt
df=pd.read_excel('file.xlsx', header=[0,1], index_col=[0])
Firstly, am I reading in my file correctly to have a multi index using Main (A,B,C) as the first level and Value (X,Y) as the second level.
Using Pandas and Matplotlib - how do I plot individual scatter plot for Main (A,B,C) with each x,y as the scatter values (imaged below) . I can do it messily calling each column in an individual plot function.
Is there a nicer way to do it with multi-indexing or group by?
This should help:
df = df.set_index(['Main1', 'Main2']).value
df.unstack.plot(kind='line', stacked=True)
Here is my problem
This is a sample of my two DataFrames (I have 30 columns in reality)
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({"Marc":[6,0,8,-30,-15,0,-3],
"Elisa":[0,1,0,-1,0,-2,-4],
"John":[10,12,24,-20,7,-10,-30]})
df1 = pd.DataFrame({"Marc":[8,2,15,-12,-8,0,-35],
"Elisa":[4,5,7,0,0,1,-2],
"John":[20,32,44,-30,15,-10,-50]})
I would like to create a scatter plot with two different colors :
1 color if the scores of df1 are negative and one if they are positive, but I don't really know how to do it.
I already did that by using matplotlib
plt.scatter(df,df1);
And I also checked this link Link but the problem is that I have two Pandas Dataframe
and not numpy array as on this link. Hence the I can't use the c= np.sign(df.y) method.
I would like to keep Pandas DataFrame as I have many columns but I really stuck on that.
If anyone has a solution, you are welcome!
You can pass the color array in, but it seems to work with 1D array only:
# colors as stated
colors = np.where(df1<0, 'C0', 'C1')
# stack and ravel to turn into 1D
plt.scatter(df.stack(),df1.stack(), c=colors.ravel())
Output:
This question already has an answer here:
seaborn two corner pairplot
(1 answer)
Closed 1 year ago.
I wanted to do a pairplot with two different dataframes data_up and data_low on the lower part and the upper part of the pairgrid. The two dataframes have both 4 columns, wich correspond to the variables.
Looking at Pairgrid, i did not found a way to give different data to each triangle.
e.g :
import numpy as np
import seaborn as sns
import pandas as pd
# Dummy data :
data_up = np.random.uniform(size=(100,4))
data_low = np.random.uniform(size=(100,4))
# The two pairplots i currently uses and want to mix :
sns.pairplot(pd.DataFrame(data_up))
sns.pairplot(pd.DataFrame(data_low))
How can i have only the upper triangle of the first one plotted witht he lower traingle of the second one ? On the diagonal i dont really care what's plotted. Maybe a qqplot between the two corresponding marginals could be nice, but i'll see later.
You could try to put all columns together in the dataframe, and then use x_vars=... to tell which columns to use for the x-direction. Similar for y.
import numpy as np
import seaborn as sns
import pandas as pd
# Dummy data :
data_up_down = np.random.uniform(size=(100,8))
df = pd.DataFrame(data_up_down)
# use columns 0..3 for the x and 4..7 for the y
sns.pairplot(df, x_vars=(0,1,2,3), y_vars=(4,5,6,7))
import matplotlib.pyplot as plt
plt.show()
I'm looking to make a stacked area plot over time, based on summary data created by groupby and sum.
The groupby and sum part correctly groups and sums the data I want, but it seems the resultant format is nonsense in terms of plotting it.
I'm not sure where to go from here:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.DataFrame({'invoice':[1,2,3,4,5,6],'year':[2016,2016,2017,2017,2017,2017],'part':['widget','wonka','widget','wonka','wonka','wonka'],'dollars':[10,20,30,10,10,10]})
#drop the invoice number from the data since we don't need it
df=df[['dollars','part','year']]
#group by year and part, and add them up
df=df.groupby(['year','part']).sum()
#plotting this is nonsense:
df.plot.area()
plt.show()
to chart multiple series, its easiest to have each series organized as a separate column, i.e. replace
df=df.groupby(['year','part']).sum()
with
df=df.groupby(['year', 'part']).sum().unstack(-1)
Then the rest of the code should work. But, I'm not sure if this is what you need because the desired output is not shown.
df.plot.area() then produces the chart like
I am creating a stacked area chart using pandas df.plot(kind = area). Some of my data values are zero at some times. I would like to not have the line show where the value is zero. Is it possible to hide the line while still showing the area?
Here is basic code that makes a simple graph. I don't want the red line to show between 3 and 4 because the values are 0.
import numpy as np
import pandas as pd
data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, columns = ['A','B','C'])
df['C']=np.where(df.index==4,0,df['C'])
df['C']=np.where(df.index==3,0,df['C'])
df.plot(kind='area')
I have finally worked out the solution to this. Other places suggested edgecolor etc but it didn't solve the problem. linewidth, however, does.
linewidth=0
or, in your case, use the line of code:
df.plot(kind='area', linewidth=0)