df in my program happens to be a dataframe with these columns :
df.columns
'''output : Index(['lat', 'lng', 'desc', 'zip', 'title', 'timeStamp', 'twp', 'addr', 'e',
'reason'],
dtype='object')'''
When I execute this piece of code:
sns.countplot(x = df['reason'], data=df)
# output is the plot below
but if i slightly tweak my code like this :
p = df['reason'].value_counts()
k = pd.DataFrame({'causes':p.index,'freq':p.values})
sns.countplot(x = k['causes'], data = k)
So essentially I just stored the 'reasons' column values and its frequencies as a series in p and then converted them to another dataframe k but this new countplot doesn't have the right range of Y-axis for the given values.
My doubts happen to be :
Can we set of Y-axis in the second countplot in its appropriate limits
Why the does second countplot differ from the first one when i just separated the specific column i wanted to graph and plotted it separately ?
Related
I have two dictionaries of dataframes that are generated from separate for loops.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Control_Keys = ('C1','C2','C3','C4')
Patient_Keys = ('P1','P2','P3','P4','P5','P6')
df_columns = ['Reads','Score']
C_dict = {}
P_dict = {}
# Generating dictionaries of dataframes
for i in Control_Keys:
C_dict[i] = pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns=df_columns)
for i in Patient_Keys:
P_dict[i] = pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns=df_columns)
I want to plot their data using the same axis on the same graph, where the x-axis is 'Reads' and the y-axis is 'Score'. I also want to be able to differentiate between the two groups, (not individual data) where all data belonging to Controls are blue, and Patients are red. I am able to plot these individually, but I have not been able to find a reliable way to merge these two graphs into one.
I would slightly redesign your dataframe in order to achieve your plotting goal. I would first assign the key as a column (optionally, if you need to distinguish) and also assign the column as another property to help with color distinguishing (hue)
Below I combine all the C's, P's and assign two new columns: Table and Hue
c = pd.DataFrame()
for key in Control_Keys:
C_dict[key]['Table'] = key
C_dict[key]['Hue'] = 'C'
c = c.append(C_dict[key])
p = pd.DataFrame()
for key in Patient_Keys:
P_dict[key]['Table'] = key
P_dict[key]['Hue'] = 'P'
p = p.append(P_dict[key])
Then you plot them and assign hue parameter for color assignment and palette to change the color:
sns.scatterplot(x='Reads', y='Score', hue='Hue', data=p.append(c), palette=['Red', 'Blue'])
plt.show()
I have dataframe as ,
i need something like this for each columns like stress , depression and anxiety and each participant data in each category
i wrote the python code as
ax = data_full.plot(x="participants", y=["Stress","Depression","Anxiety"],kind="line", lw=3, ls='--', figsize = (12,6))
plt.grid(True)
plt.show()
get the output like this
Split the participant column and merge it with the original data frame. Change the data frame to a data frame with only the columns you need in the merged data frame. Transform the data frame in its final form by pivoting. The resulting data frame is then used as the basis for the graph. Now we can adjust the x-axis tick marks, the legend position, and the y-axis limits.
dfs = pd.concat([df,df['participants'].str.split('_', expand=True)],axis=1)
dfs.columns = ['Stress', 'Depression', 'Anxiety', 'participants', 'category', 'group']
fin_df = dfs[['category','group','Stress']]
fin_df = dfs.pivot(index='category', columns='group', values='Stress')
# update
fin_df = fin_df.sort_index(ascending=False)
g = fin_df.plot(kind='line', title='Stress')
g.set_xticks([0,1])
g.set_xticklabels(['pre','post'])
g.legend(loc='center right')
g.set_ylim(5,25)
I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])
I have a dataframe as mentioned below:
Date,Time,Price,Volume
31/01/2019,09:15:00,10691.50,600
31/01/2019,09:15:01,10709.90,13950
31/01/2019,09:15:02,10701.95,9600
31/01/2019,09:15:03,10704.10,3450
31/01/2019,09:15:04,10700.05,2625
31/01/2019,09:15:05,10700.05,2400
31/01/2019,09:15:06,10698.10,3000
31/01/2019,09:15:07,10699.90,5925
31/01/2019,09:15:08,10699.25,5775
31/01/2019,09:15:09,10700.45,5925
31/01/2019,09:15:10,10700.00,4650
31/01/2019,09:15:11,10699.40,8025
31/01/2019,09:15:12,10698.95,5025
31/01/2019,09:15:13,10698.45,1950
31/01/2019,09:15:14,10696.15,3900
31/01/2019,09:15:15,10697.15,2475
31/01/2019,09:15:16,10697.05,4275
31/01/2019,09:15:17,10696.25,3225
31/01/2019,09:15:18,10696.25,3300
The data frame contains approx 8000 rows. I want plot both price and volume in same chart. (Volume Range: 0 - 8,00,000)
Suppose you want to compare price and volume vs time, try this:
df = pd.read_csv('your_path_here')
df.plot('Time', ['Price', 'Volume'], secondary_y='Price')
edit: x-axis customization
Since you want x-axis customization,try this (this is just a basic example you can follow):
# Create a Datetime column while parsing the csv file
df = pd.read_csv('your_path_here', parse_dates= {'Datetime': ['Date', 'Time']})
Then you need to create two list, one containing the position on the x-axis and the other one the labels.
Say you want labels every 5 seconds (your requests at 30 min is possibile but not with the data you provided)
positions = [p for p in df.Datetime if p.second in range(0, 60, 5)]
labels = [l.strftime('%H:%M:%S') for l in positions]
Then you plot passing the positions and labels lists to set_xticks and set_xticklabels
ax = df.plot('Datetime', ['Price', 'Volume'], secondary_y='Price')
ax.set_xticks(positions)
ax.set_xticklabels(labels)
category = df.category_name_column.value_counts()
I have the above series which returns the values:
CategoryA,100
CategoryB,200
I am trying to plot the top 5 category names in X - axis and values in y-axis
head = (category.head(5))
sns.barplot(x = head ,y=df.category_name_column.value_counts(), data=df)
It does not print the "names" of the categories in the X-axis, but the count. How to print the top 5 names in X and Values in Y?
You can pass in the series' index & values to x & y respectively in sns.barplot. With that the plotting code becomes:
sns.barplot(head.index, head.values)
I am trying to plot the top 5 category names in X
calling category.head(5) will return the first five values from the series category, which may be different than the top 5 based on the number of times each category appears. If you want the 5 most frequent categories, it is necessary to sort the series first & then call head(5). Like this:
category = df.category_name_column.value_counts()
head = category.sort_values(ascending=False).head(5)
Since the previous accepted solution is deprecated in seaborn. Another workaround could be as follows:
Convert series to dataframe
category = df.category_name_column.value_counts()
category_df = category.reset_index()
category_df.columns = ['categories', 'frequency']
Use barplot
ax = sns.barplot(x = 'categories', y = 'frequency', data = category_df)
Although this is not exactly plot of series, this is a workaround that's officially supported by seaborn.
For more barplot examples please refer here:
https://seaborn.pydata.org/generated/seaborn.barplot.html
https://stackabuse.com/seaborn-bar-plot-tutorial-and-examples/