I have a dataframe (97 columns x 30 rows). In this dataframe there
are only 1 and 0.
I want to plot it like a scatter plot, in which in the x axis the are
the name of the columns and in the y axis the name of the indexes.
[my dataframe is like this][1]
The output I want is similar to the photo, but the red dot must be
there only if the value of the intersection between row and columns
has a 1 value.
If there is a 0 value nothing is plot in the intersection.[][2][the
output scatter plot I want][3]
https://i.stack.imgur.com/hFnQX.png
https://i.stack.imgur.com/Rsguk.jpg
https://i.stack.imgur.com/keGC6.png
A straightforward way to do this is to use two nested loops for plotting the points conditionally on each dataframe cell:
import pandas as pd
import matplotlib.pyplot as plt
example = pd.DataFrame({'column 1': [0, 1, 0, 1],
'column 2': [1, 0, 1, 0],
'column 3': [1, 1, 0, 0]})
for x, col in enumerate(example.columns):
for y, ind in enumerate(example.index):
if example.loc[ind, col]:
plt.plot(x, y, 'o', color='red')
plt.xticks(range(len(example.columns)), labels=example.columns)
plt.yticks(range(len(example)), labels=example.index)
plt.show()
Related
I have the following DataFrame:
LATITUDE LONGITUDE STATE
... ... True
With the code bellow I can plot the graph with coordinates
import matplotlib.pyplot as plt
plt.scatter(x=df['LAT'], y=df['LONG'])
plt.show()
graph
However, I want to define two different colors for each point according to the 'state' attribute
How to do this?
What you're looking for is the c parameter, taking your example and adding the STATUS column
import matplotlib.pyplot as plt
df = {'LAT': [1, 2, 3, 4, 5], 'LONG': [3, 2, 4, 5, 3], 'STATUS': [0, 1, 0, 0, 1] }
plt.scatter(x=df['LAT'], y=df['LONG'], c=df['STATUS'])
plt.show()
it shows a bicoloured chart
I am trying to plot a scatter plot of the following type of pandas dataframe:
df = pd.DataFrame([['RH1', 1, 3], ['RH2', 0, 3], ['RH3', 2, 0], ['RH4', 1, 2], columns=['name', 'A', 'B'])
The final plot should have "name" column as Y axis and "A" and "B" as X axis. And the different numerical values with different colours. something like this
I tried to plot it by looping over each row of the dataframe but I got stuck at some place and couldn't do it, the main problem I encounter is the size of both the axis. It would be really great if anyone can help me. Thank you in advance.
You can melt your dataframe and use the values as the column for color:
from matplotlib import pyplot as plt
import pandas as pd
df = pd.DataFrame([['RH1', 1, 3], ['RH2', 0, 3], ['RH3', 2, 0], ['RH4', 1, 2]], columns=['name', 'A', 'B'])
df.melt(["name"]).plot(x="variable", y= "name", kind="scatter", c="value", cmap="plasma")
plt.show()
Sample output:
If you have a limited number of values, you can change the colormap to a discrete colormap and label each color with its value. Alternatively, use seaborn's stripplot:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame([['RH1', 1, 3], ['RH2', 0, 3], ['RH3', 2, 0], ['RH4', 1, 2]], columns=['name', 'A', 'B'])
sns.stripplot(data=df.melt(["name"]), x="variable", y= "name", hue="value", jitter=False)
plt.show()
Output:
I have renamed the indexes of the dataframe A and plottet it as a heatmap:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
A = pd.DataFrame(np.array([[1,2,3,4],[5,6,7,8]]))
columns = np.array([ 0. , 0.635, 1.27 , 1.905])
rows = np.array([ 0. , 0.635])
A.columns = columns
A.index = rows
plt.imshow(A)
The x Axis of the plot shows the values of the dataframe at 0, 1, 2, 3 and the y Axis at 0, 1
How can I change the values of the axes to the ones of the dataframe?
In want the values on the x-axis to be on 0, 0.635, 1.27, 1.905 and the y-axis on 0, 0.635
Seaborn is preferable to matplotlib.pyplot for making heatmaps in my opinion.
If you wanted to use seabobrn
import seaborn as sns
sns.heatmap(A, yticklabels = rows, xticklabels = columns)
plt.show()
I have a dataframe with 3 columns. The first two columns are my data. The third column only takes on binary values, 0 or 1. I'd like to plot the first two columns such that the points are color coded (in two colors) depending upon whether the corresponding value in the third column is 0 or 1.
df = pd.DataFrame(dict(A=[1, 2, 3, 4],
B=[7.5, 7, 5, 4.5],
C=[0, 1, 1, 0]))
colors = {0: 'red', 1: 'aqua'}
plt.scatter(df.A, df.B, c=df.C.map(colors))
This paper has a nice way of visualizing clusters of a dataset with binary features by plotting a 2D matrix and sorting the values according to a cluster.
In this case, there are three clusters, as indicated by the black dividing lines; the rows are sorted, and show which examples are in each cluster, and the columns are the features of each example.
Given a vector of cluster assignments and a pandas DataFrame, how can I replicate this using a Python library (e.g. seaborn)? Plotting a DataFrame using seaborn isn't difficult, nor is sorting the rows of the DataFrame to align with the cluster assignments. What I am most interested in is how to display those black dividing lines which delineate each cluster.
Dummy data:
"""
col1 col2
x1_c0 0 1
x2_c0 0 1
================= I want a line drawn here
x3_c1 1 0
================= and here
x4_c2 1 0
"""
import pandas as pd
import seaborn as sns
df = pd.DataFrame(
data={'col1': [0, 0, 1, 1], 'col2': [1, 1, 0, 0]},
index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2']
)
clus = [0, 0, 1, 2] # This is the cluster assignment
sns.heatmap(df)
The link that mwaskom posted in a comment is good starting place. The trick is figuring out what the coordinates are for the vertical and horizontal lines.
To illustrate what the code is actually doing, it's worthwhile to just plot all of the lines individually
%matplotlib inline
import pandas as pd
import seaborn as sns
df = pd.DataFrame(data={'col1': [0, 0, 1, 1], 'col2': [1, 1, 0, 0]},
index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2'])
f, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(df)
ax.axvline(1, 0, 2, linewidth=3, c='w')
ax.axhline(1, 0, 1, linewidth=3, c='w')
ax.axhline(2, 0, 1, linewidth=3, c='w')
ax.axhline(3, 0, 1, linewidth=3, c='w')
f.tight_layout()
The the way that the axvline method works is the first argument is the x location of the line and then the lower bound and upper bound of the line (in this case 1, 0, 2). The horizontal line takes the y location and then the x start and x stop of the line. The defaults will create the line for the entire plot, so you can typically leave those out.
This code above creates a line for every value in the dataframe. If you want to create groups for the heatmap, you will want to create an index in your data frame, or some other list of values to loop through. For instance with a more complicated example using code from this example:
df = pd.DataFrame(data={'col1': [0, 0, 1, 1, 1.5], 'col2': [1, 1, 0, 0, 2]},
index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2', 'x5_c2'])
df['id_'] = df.index
df['group'] = [1, 2, 2, 3, 3]
df.set_index(['group', 'id_'], inplace=True)
df
col1 col2
group id_
1 x1_c0 0.0 1
2 x2_c0 0.0 1
x3_c1 1.0 0
3 x4_c2 1.0 0
x5_c2 1.5 2
Then plot the heatmap with the groups:
f, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(df)
groups = df.index.get_level_values(0)
for i, group in enumerate(groups):
if i and group != groups[i - 1]:
ax.axhline(len(groups) - i, c="w", linewidth=3)
ax.axvline(1, c="w", linewidth=3)
f.tight_layout()
Because your heatmap is not symmetric you may need to use a separate for loop for the columns