Using the following data frame (mx):
code Growth Value Risk Mcap
0 APOLLOHOSP 8 6 High small
1 ANUP 8 7 High small
2 SIS 4 6 High mid
3 HAWKINCOOK 5 2 Low mid
4 NEULANDLAB 6 4 Low large
5 ORIENTELEC 7 9 Low mid
6 AXISBANK 2 3 Medium mid
7 DMART 4 1 Medium large
8 ARVIND 2 10 Medium small
9 TCI 1 7 High mid
10 MIDHANI 5 5 Low large
11 RITES 6 4 Medium mid
12 COROMANDEL 9 9 High small
13 SBIN 10 3 Medium large
dataframe
I am trying to create a sns relplot which should annotate the scatter plot points in respective facetgrid. However the out put i get looks something like this:
relplot
Here all the annotations are seen in the first facet and points in other facets don't have any annotations.
I have tried the following code:
p1 = sns.relplot(x="Growth", y="Value", hue="Risk",col="Mcap",data=mx,s=200,palette = ['r','g','y'])
ax = p1.axes[0,0]
for idx,row in mx.iterrows():
x = row[1]
y = row[2]
text = row[0]
ax.text(x+0.5,y,text, horizontalalignment='left')
please advise the modifications. thanks in advance.
The main problem is that you set ax = p1.axes[0,0], while ax should be p1.axes[0,colum_number] depending on the column number of the subplot the text has to go.
Further, addressing row[0], row[1] etc. can make the code less readable and less easy to adapt when something changes. So, it's better to assign the row directly to some variables, as in text, x, y, _risk, mcap = row. Even more maintainable would be itertuples() instead of iterrows(), as illustrated in the code below.
In order to make space for the name, you could widen the x-limits a bit to the right.
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
data = [['APOLLOHOSP', 8, 6, 'High', 'small'],
['ANUP', 8, 7, 'High', 'small'],
['SIS', 4, 6, 'High', 'mid'],
['HAWKINCOOK', 5, 2, 'Low', 'mid'],
['NEULANDLAB', 6, 4, 'Low', 'large'],
['ORIENTELEC', 7, 9, 'Low', 'mid'],
['AXISBANK', 2, 3, 'Medium', 'mid'],
['DMART', 4, 1, 'Medium', 'large'],
['ARVIND', 2, 10, 'Medium', 'small'],
['TCI', 1, 7, 'High', 'mid'],
['MIDHANI', 5, 5, 'Low', 'large'],
['RITES', 6, 4, 'Medium', 'mid'],
['COROMANDEL', 9, 9, 'High', 'small'],
['SBIN', 10, 3, 'Medium', 'large']]
mx = pd.DataFrame(data=data, columns=["code", "Growth", "Value", "Risk", "Mcap"])
plotnum = {'small': 0, 'mid': 1, 'large': 2}
p1 = sns.relplot(x="Growth", y="Value", hue="Risk", col="Mcap", data=mx, s=200, palette=['r', 'g', 'y'])
for ax in p1.axes[0]:
ax.set_xlim(0.0, max(mx["Growth"]) + 1.9)
for row in mx.itertuples():
ax = p1.axes[0, plotnum[row.Mcap]]
ax.text(row.Growth + 0.5, row.Value, row.code, horizontalalignment='left')
plt.show()
Related
I have a data frame like this:
df:
Type Col-1 Col-2
A 3 8
A 4 7
A 5 9
A 6 6
A 7 7
B 4 8
B 2 7
B 6 6
B 4 9
B 5 7
I have 2 violin plots for Col-1 & Col-2. Now, I want to create a single violin plot with 2 violin images for Type A & B. In the violin plot, I want to split every violin such that the left half of the violin denotes Col-1 & right half of the violin denotes Col-2. I created two separate violin plots for col-1 and col-2 but now I want to make it a single plot and represent 2 columns at a time by splitting. How can I do it?
This is my code for separate plots:
def violin(data):
for col in data.columns:
x = data[col].to_frame().reset_index()
ax = sns.violinplot(data=x, x='type',y=col,inner='quart',split=True)
plt.show()
violin(df)
This is what my current violin plots look like. I want to make them in single plot:
Can anyone help me with this?
Seaborn works easiest with data in "long form", combining the value columns.
Here is how the code could look like:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Type': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
'Col-1': [4, 3, 5, 6, 7, 4, 2, 6, 4, 5],
'Col-2': [7, 8, 9, 6, 7, 8, 7, 6, 9, 7]})
df_long = df.melt(id_vars=['Type'], value_vars=['Col-1', 'Col-2'], var_name='Col', value_name='Value')
plt.figure(figsize=(12, 5))
sns.set()
sns.violinplot(data=df_long, x='Type', y='Value', hue='Col', split=True, palette='spring')
plt.tight_layout()
plt.show()
I have a DataFrame like this:
Apples Oranges
0 1 1
1 2 1
2 3 2
3 2 3
4 1 2
5 2 3
I'm trying to count the occurence of values for both Apples and Oranges (how often values 1,2 and 3 occur in data frame for each fruit). I want to draw a bar chart using Matplotlib but so far I have not been successful. I have tried:
plt.bar(2,['Apples', 'Oranges'], data=df)
plt.show()
But the output is very weird, could I have some advise? Thanks in advance.
Edit: I'm expecting result like this:
You can use the value_counts method together with pandas plotting:
# Sample data
d = {'apples': [1, 2,3,2,1,2], 'oranges': [1,1,2,3,2,3]}
df = pd.DataFrame(data=d)
# Calculate the frequencies of each value
df2 = pd.DataFrame({
"apples": df["apples"].value_counts(),
"oranges": df["oranges"].value_counts()
})
# Plot
df2.plot.bar()
You will get:
Here is another one
import pandas as pd
df = pd.DataFrame({'Apples': [1, 2, 3, 2, 1, 2], 'Oranges': [1, 1, 2, 3, 2, 3]})
df.apply(pd.Series.value_counts).plot.bar()
You can use hist from matplotlib:
d = {'apples': [1, 2,3,2,1,2], 'oranges': [1,1,2,3,2,3]}
df = pd.DataFrame(data=d)
plt.hist([df.apples,df.oranges],label=['apples','oranges'])
plt.legend()
plt.show()
This will give the output:
Whenever you are plotting the count or frequency of something, you should look into a histogram:
from matplotlib import pyplot as plt
import pandas as pd
df = pd.DataFrame({'Apples': {0: 1, 1: 2, 2: 3, 3: 2, 4: 1, 5: 2}, 'Oranges': {0: 1, 1: 1, 2: 2, 3: 3, 4: 2, 5: 3}})
plt.hist([df.Apples,df.Oranges], bins=3,range=(0.5,3.5),label=['Apples', 'Oranges'])
plt.xticks([1,2,3])
plt.yticks([1,2,3])
plt.legend()
plt.show()
I'm trying to change the size of only SOME of the markers in a seaborn pairplot.
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
Prettier:
num_legs num_wings num_specimen_seen class
falcon 2 2 10 1
dog 4 0 2 2
spider 8 0 1 3
fish 0 0 8 4
I want to for example increase the size of all samples with class=4.
How could this be done with the seaborn pairplot?
What I have so far:
sns.pairplot(data=df,diag_kind='hist',hue='class')
I have tried adding plot_kws={"s": 3}, but that changes the size of all the dots. Cheers!
After checking out how the pairplot is built up, one could iterate through the axes and change the size of each 4th set of scatter dots:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
N = 100
classes = np.random.randint(1, 5, N)
df = pd.DataFrame({'num_legs': 2 * classes % 8,
'num_wings': (classes == 1) * 2,
'num_specimen_seen': np.random.randint(1,20,N),
'class': classes})
g = sns.pairplot(data=df,diag_kind='hist',hue='class')
for ax in np.ravel(g.axes):
if len(ax.collections) == 4:
ax.collections[3].set_sizes([100])
g.fig.legends[0].legendHandles[3].set_sizes([100])
plt.show()
New to Python (& StackOverflow), I am struggling to find a solution to take my ['Product_Name', 'Date_of_Sale', 'Quantity'] data and output the relative frequencies of the daily quantity frequencies per product.
As an example, Product 1 sells 8 units (Day1), 6 units (Day2), 6 (Day 3), 5 (Day 4), 8 (Day 5), 7 (Day 6), 6 (Day 7) over 7 days, giving relative frequencies for Product 1 of {5 units : 0.142, 6 : 0.429, 7 : 0.142, 8 : 0.286}.
How can I do this for all products for a period?
Normalize the value counts:
>>> df['Product1'].value_counts(normalize=True)
6 0.428571
8 0.285714
7 0.142857
5 0.142857
Name: Product1, dtype: float64
Doing this "for all products for a period" depends on the structure of your data. You would need to provide a sample and your expected result.
Use value_counts() and to_dict():
import pandas as pd
df = pd.DataFrame({'Day': [1, 2, 3, 4, 5, 6, 7],
'Product1': [8, 6, 6, 5, 8, 7, 6]})
df['Product1'].value_counts().div(df.shape[0]).to_dict()
Yields:
{6: 0.42857142857142855, 8: 0.2857142857142857, 7: 0.14285714285714285, 5: 0.14285714285714285}
I'm fairly new to Pandas, but typically what I do with data (when all columns are of equal sizes), I build np.zeros(count) matrices, then use a for loop to populate the data from a text file (np.genfromtxt()) to do my graphing and analysis in matplotlib.
However, I am now trying to implement similar analysis with columns of different sizes on the same plot from a CSV file.
For instance:
data.csv:
A B C D E F
1 2 3 4 5 6
2 3 4 5 6 7
3 4 5 6
4 5
df = pandas.read_csv('data.csv')
ax = df.plot(x = 'A', y = 'B')
df.plot(x = 'C', y = 'D', ax = ax)
df.plot(x = 'E', y = 'F', ax = ax)
This code plots the first two on the same graph, but the rest of the information is lost (and there are a lot more columns of mismatched sizes, but the x/y columns I am plotting are the all the same size).
Is there an easier way to do all of this? Thanks!
Here is how you could generalize your solution :
I edited my answer to add an error handling. If you have a lonely last column, it'll still work.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
data = {
'A' : [1, 2, 3, 4],
'B' : [2, 3, 4, 5],
'C' : [3, 4, 5, np.nan],
'D' : [4, 5, 6, np.nan],
'E' : [5, 6, np.nan, np.nan],
'F' : [6, 7, np.nan, np.nan]
}
df = pd.DataFrame(data)
def Chris(df):
ax = df.plot(x='A', y='B')
df.plot(x='C', y='D', ax=ax)
df.plot(x='E', y='F', ax=ax)
plt.show()
def IMCoins(df):
fig, ax = plt.subplots()
try:
for idx in range(0, df.shape[1], 2):
df.plot(x = df.columns[idx],
y = df.columns[idx + 1],
ax= ax)
except IndexError:
print('Index Error: Log the error.')
plt.show()
Chris(df)
IMCoins(df)