Plot pandas dataframe using column names as x axis

Plot pandas dataframe using column names as x axis - python

I have the following pandas Data Frame:
and I need to make line plots using the column names (400, 400.5, 401....) as the x axis and the data frame values as the y axis, and using the index column ('fluorophore') as the label for that line plot. I want to be able to choose which fluorophores I want to plot.
How can I accomplish that?

I do not know your dataset, so if it's always just full columns of NaN you could do
df[non_nan_cols].T[['FAM', 'TET']].plot.line()
Where non_nan_cols is a list of your columns that do not contain NaN values.
Alternatively, you could
choice_of_fp = df.index.tolist()
x_val = np.asarray(df.columns.tolist())
for i in choice_of_fp:
mask = np.isfinite(df.loc[i].values)
plt.plot(x_val[mask], df.loc[i].values[mask], label=i)
plt.legend()
plt.show()
which allows to have NaN values. Here choice_of_fp is a list containing the fluorophores you want to plot.

You can do the below and it will use all columns except the index and plot the chart.
abs_data.set_index('fluorophore ').plot()
If you want to filter values for fluorophore then you can do this
abs_data[abs_data.fluorophore .isin(['A', 'B'])].set_index('fluorophore ').plot()

Related

How to add condition to value_counts method

I have a dataframe named concatenated_df
I am plotting the data with the following code
(concatenated_df[concatenated_df.DAY.eq('Tuesday')].groupby('COMPANY')['STATUS'].value_counts(normalize=True).unstack().plot.bar())
plt.xticks(rotation=0)
plt.show()
which gives me an output plot as
How can I plot only those values which are greater than 0.8?
In the current example, it should print only VEDL.NS and WIPRO.NS

you can filter Data frame which has values greater than 80 and save it into new data frame and then plot it
you can use this for example
but you need to sepcify wich colum are you want to fillter
new_df= df[df.b > 80]
plot df2

Plot multiple lines in subplots

I'd like to plot lines from a 3D data frame, the third dimension being an extra level in the column index. But I can't manage to either wrangle the data in a proper format or call the plot function appropriately. What I'm looking for is a plot where many series are plotted in subplots arranged by the outer column index. Let me illustrate with some random data.
import numpy as np
import pandas as pd
n_points_per_series = 6
n_series_per_feature = 5
n_features = 4
shape = (n_points_per_series, n_features, n_series_per_feature)
data = np.random.randn(*shape).reshape(n_points_per_series, -1)
points = range(n_points_per_series)
features = [chr(ord('a') + i) for i in range(n_features)]
series = [f'S{i}' for i in range(n_series_per_feature)]
index = pd.Index(points, name='point')
columns = pd.MultiIndex.from_product((features, series)).rename(['feature', 'series'])
data = pd.DataFrame(data, index=index, columns=columns)
So for this particular data frame, 4 subplots (n_features) should be generated, each containing 5 (n_series_per_feature) series with 6 data points. Since the method plots lines in the index direction and subplots can be generated for each column, I tried some variations:
data.plot()
data.plot(subplots=True)
data.stack().plot()
data.stack().plot(subplots=True)
None of them work. Either too many lines are generated with no subplots, a subplot is made for each line separately or after stacking values along the index are joined to one long series. And I think the x and y arguments are not usable here, since converting the index to a column and using it in x just produces a long line jumping all over the place:
data.stack().reset_index().set_index('series').plot(x='point', y=features)
In my experience this sort of stuff should be pretty straight forward in Pandas, but I'm at a loss. How could this subplot arrangement be achieved? If not a single function call, are there any more convenient ways than generating subplots in matplotlib and indexing the series for plotting manually?

If you're okay with using seaborn, it can be used to produce subplots from a data frame column, onto which plots with other columns can then be mapped. With the same setup you had I'd try something along these lines:
import seaborn as sns
# Completely stack the data frame
df = data \
.stack() \
.stack() \
.rename("value") \
.reset_index()
# Create grid and map line plots
g = sns.FacetGrid(df, col="feature", col_wrap=2, hue="series")
g.map_dataframe(sns.lineplot, x="point", y="value")
g.add_legend()
Output:

Seaborn distplot only return one column when try to plot each Pandas column by loop for

I have problem when try to plot Pandas columns using for each loop
when i use displot instead distplot it act well, besides it only show distribution globally, not based from its group. Let say i have list of column name called columns and Pandas' dataframe n, which has column name class. The goal is to show Distribution Plot based on column for each class:
for w in columns:
if w!=<discarded column> or w!=<discarded column>:
sns.displot(n[w],kde=True
but when I use distplot, it returns only first column:
for w in columns:
if w!=<discarded column> or w!=<discarded column>:
sns.distplot(n[w],kde=True
I'm still new using Seaborn, since i never use any visualization and rely on numerical analysis like p-value and correlation. Any help are appreciated.

You probably getting only the figure corresponding to the last loop.
So you have to explicitly ask for showing the picture in each loop.
import matplotlib.pyplot as plt
for w in columns:
if w not in discarded_columns:
sns.distplot(n[w], kde=True)
plt.show()
or you can make subplots:
# Keep only target-columns
target_columns = list(filter(lambda x: x not in discarded_columns, columns))
# Plot with subplots
fig, axes = plt.subplots(len(target_columns)) # see the parameters, like: nrows, ncols ... figsize=(16,12)
for i,w in enumerate(target_columns):
sns.distplot(n[w], kde=True, ax=axes[i])

Seaborn barplot - column values without estimator parameter

I am a beginner in seaborn plotting and noticed that sns.barplot shows the value of bars using a parameter called estimator.
Is there a way for the barplot to show the value of each column instead of using a statiscal approach through the estimator parameter?
For instance, I have the following dataframe:
data = [["2019/oct",10],["2019/oct",20],["2019/oct",30],["2019/oct",40],["2019/nov",20],["2019/dec",30]]
df = pd.DataFrame(data, columns=['Period', 'Observations'])
I would like to plot all values from the Period "2019/oct" column (10,20,30 and 40), but the bar chart returns the average of these values (25) for the period "2019/oct":
sns.barplot(x='Period',y='Observations',data=df,ci=None)
How can I bring all column values to the chart?

barplot combines values with the same x, unless the have a different hue. If you want to keep the different value for "2019/oct", you could create a new column to attribute them a different hue:
data = [["2019/oct",10],["2019/oct",20],["2019/oct",30],["2019/oct",40],["2019/nov",20],["2019/dec",30]]
df = pd.DataFrame(data, columns=['Period', 'Observations'])
df['subgroup'] = df.groupby('Period').cumcount()+1
sns.barplot(x='Period',y='Observations',hue='subgroup',data=df,ci=None)

Plot dataframe with two-column index and show x-tick values

Suppose I have dataframe, which has index composed of two columns and I want to plot it:
import pandas
from matplotlib import pyplot as plot
df=pandas.DataFrame(data={'floor':[1,1,1,2,2,2,3,3],'room':[1,2,3,1,1,2,1,3],'count':[1, 1, 3,2,2,4,1,5]})
df2=df.groupby(['floor','room']).sum()
df2.plot()
plot.show()
The above example will result in a plot where row numbers are used for x axis and no tick labels. Are there any facilities to use the index instead?
Say, I'd like to have x axis separated into even sections for first column of index and spread out points values of second index column inside those sections.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plot pandas dataframe using column names as x axis - python

You can do the below and it will use all columns except the index and plot the chart. abs_data.set_index('fluorophore ').plot() If you want to filter values for fluorophore then you can do this abs_data[abs_data.fluorophore .isin(['A', 'B'])].set_index('fluorophore ').plot()

Related

How to add condition to value_counts method

Plot multiple lines in subplots

Seaborn distplot only return one column when try to plot each Pandas column by loop for

Seaborn barplot - column values without estimator parameter

Plot dataframe with two-column index and show x-tick values

Categories

Resources