How to display values as different colours in pandas' pivot_table? - python

I am trying to recreate this graph from here
My code is:
impute_grps = data.pivot_table(values=["Loan_Status"], index=[ "Credit_History","Gender"],
aggfunc='count')
print (impute_grps)
impute_grps.plot(kind='bar', stacked=True, color=['red','blue'], grid=False)
giving the image below. Can someone tell me how to split loan_status as per the original? i have tried adding it to index but get ValueError: Grouper for 'Loan_Status' not 1-dimensional error.

Related

Matplotlib plot plotting the wrong data values

I am trying to plot random rows in a dataset, where the data consists of data collated across different dates. I have plotted it in such a way that the x-axis is labelled for the specific dates, and there is no interpolation between dates.
The issue I am having, is that the values plotted by matplotlib, do not match the entry values in the dataset. I am unsure as to what is happening here, would anyone be able to provide some insight, and possibly as to how I would fix it?
I have attached an image of the dataset and the plot, with the code contained below.
The code for generating the x-ticks, is as follows:
In: #creating a flat dates object such that dates are integer objects
flat_Dates_dates = flat_Dates[2:7]
flat_Dates_dates
Out: [20220620, 20220624, 20220627, 20220701, 20220708]
In: #creating datetime object(pandas, not datetime module) to only plot specific dates and remove interpolation of dates
date_obj_pd = pd.to_datetime(flat_Dates_dates, format=("%Y%m%d"))
Out: DatetimeIndex(['2022-06-20', '2022-06-24', '2022-06-27', '2022-07-01',
'2022-07-08'],
dtype='datetime64[ns]', freq=None)
As you can see from the dataset, the plotted trends should not take that form, the data values are wildly different from where they should be on the graph.
Edit: Apologies, I forgot to mention x = date_obj_pd - which is why I added the code, essentially just the array of datetime objects.
y is just the name of the pandas DataFrame (data table) I have included in the image.
You are plotting columns instead of rows. The blue line contains elements 1:7 from the first column, namely these:
If you transpose the dataframe you should get the desired result:
plt.plot(x, y[1:7].transpose(), 'o--')

Plot multiple line graph from Pandas into Seaborn

I'm trying to plot a multi line-graph plot from a pandas dataframe using seaborn. Below is a .csv of the of the data and the desired plot. In excel I simply selected the whole dataset and swapped the axis. Technically there are 110 lines (rows) on this, but many aren't visible because they only contain 0's.
This is my code:
individual_burst_data = {'nb001':nb001, 'nb002':nb002, 'nb003':nb003, 'nb004':nb004, 'nb005':nb005, 'nb006':nb006, 'nb007':nb007, 'nb008':nb008, 'nb009':nb009, 'nb010':nb010, 'nb011':nb011, 'nb012':nb012, 'nb013':nb013, 'nb015':nb015, 'nb016':nb016 }
ibd_panda_conv = pd.DataFrame(individual_burst_data)
sns.lineplot(data = ibd_panda_conv, x = individual_burst_data, y =ibd_panda_conv)
Other sources seem to only extract one column, whereas I need all the columns.
I tried to create an index for the y-axis
index_data = list(range(0,len(individual_burst_data)))
but this didn't work either.
The seaborn lineplot() documentation says:
Passing the entire wide-form dataset to data plots a separate line for each column
Since you want a line for each row instead, you need to transpose your dataframe, so try this:
sns.lineplot(data=ibd_panda_conv.T, dashes=False)

How can you set the x-axis in matplotlib?

I have data of shipping dates (1=Jan, 2=Feb ect..) and revenue corresponding to it in a pandas dataframe.
Data Frame Here
My code for the line graph that I am trying to make is:
finalhelp.plot(x='shippeddate',y='revenue',title='Revenue Per Month')
It returns a line graph like this
linegraph
I tried to fix it by using the code
fig = finalhelp.plot(x='shippeddate',y='revenue',title='Revenue Per Month',yticks=([0,20000,40000,60000,80000,100000]), legend=False,)
fig.set_xticklabels(['','Jan','Feb','March','April','May','June','July','August','Sept','Oct','Nov','Dec'])
I would like to find a way to set each of the x axis to one of the corresponding months, right now it still returns only Jan-June.
It returns this image
newlinegraph
You need to set_xticks and set_xticklabels:
fig.set_xticks(df['shippeddate'])
fig.set_xticklabels(['Jan','Feb','March','April','May','June','July','August','Sept','Oct','Nov','Dec'])

Matplotlib 3D plot colors from different classes from Dataframe

I am trying to plot a 3D plot in Matplotlib from a Pointcloud data which is essentially extracted from two different classes.
However, I cannot differentiate the classes into different colors. My code is below.
x=pd.DataFrame(np.array(x).reshape(-1,1))
y=pd.DataFrame( np.array(y).reshape( -1, 1 ) )
z=pd.DataFrame(np.array(z).reshape(-1,1))
target=pd.DataFrame(np.array(target).reshape(-1,1))
new_data=[x,y,z,target]
new_data = pd.concat(new_data, axis=1, ignore_index=True )
new_data.columns = ['x','y','z','target']
colors=[]
fig=plt.figure(figsize=(8,8))
ax=fig.add_subplot(111,projection='3d')
ax.scatter(new_data.x,new_data.y,new_data.z,color='target')
The color argument cannot be linked to the class in the "Target" column in my dataframe. Is there something that I am missing?
I found the answer myself- Mapped the Dataframe to the arguments of Color using below col=new_data['target'].map({'Variable1':'r','Variable2 ':'g','Variable3':'b'})
you're saying that the colors should come from the values of the string 'target'. Change it to c=new_data.target

Heatmap with specific axis labels coloured

I am trying to plot a heatmap with 2 columns of data from a pandas dataframe. However, I would like to use a 3rd column to label the x axis, ideally by colour though another method such as an additional axis would be equally suitable. My dataframe is:
MUT SAMPLE VAR GROUP
True s1 1_1334442_T CC002
True s2 1_1334442_T CC006
True s1 1_1480354_GAC CC002
True s2 1_1480355_C CC006
True s2 1_1653038_C CC006
True s3 1_1730932_G CC002
...
Just to give a better idea of the data; there are 9 different types of 'GROUP', ~60,000 types of 'VAR' and 540 'SAMPLE's. I am not sure if this is the best way to build a heatmap in python but here is what I figured out so far:
pivot = pd.crosstab(df_all['VAR'],df_all['SAMPLE'])
sns.set(font_scale=0.4)
g = sns.clustermap(pivot, row_cluster=False, yticklabels=False, linewidths=0.1, cmap="YlGnBu", cbar=False)
plt.show()
I am not sure how to get 'GROUP' to display along the x-axis, either as an additional axis or just colouring the axis labels? Any help would be much appreciated.
I'm not sure if the 'MUT' column being a boolean variable is an issue here, df_all is 'TRUE' on every 'VAR' but as pivot is made, any samples which do not have a particular 'VAR' are filled as 0, others are filled with 1. My aim was to try and cluster samples with similar 'VAR' profiles. I hope this helps.
Please let me know if I can clarify anything further? Many thanks
Take look at this example. You can give a list or a dataframe column to the clustermap function. By specifying either the col_colors argument or the row_colors argument you can give colours to either the rows or the columns based on that list.
In the example below I use the iris dataset and make a pandas series object that specifies which colour the specific row should have. That pandas series is given as an argument for row_colors.
iris = sns.load_dataset("iris")
species = iris.pop("species")
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
g = sns.clustermap(iris, row_colors=row_colors,row_cluster=False)
This code results in the following image.
You may need to tweak a bit further to also include a legend for the colouring for groups.

Categories