I'm making a scatter plot from a Pandas DataFrame with 3 columns. The first two would be the x and y axis, and the third would be classicfication data that I want to visualize by points having different colors. My question is, how can I add the legend to this plot:
df= df.groupby(['Month', 'Price'])['Quantity'].sum().reset_index()
df.plot(kind='scatter', x='Month', y='Quantity', c=df.Price , s = 100, legend = True);
As you can see, I'd like to automatically color the dots based on their price, so adding labels manually is a bit of an inconvenience. Is there a way I could add something to this code, that would also show a legend to the Price values?
Also, this colors the scatter plot dots on a range from black to white. Can I add custom colors without giving up the easy usage of c=df.Price?
Thank you!
Related
I've come across the issue of having overlapping xlables on a Seaborn line plot. The data set has multiple occurrences of those labels, which is understandable. But is there a way to fix the xlabels without having to change the format of the plot or the data frame?
The xlabels have been formatted to the Timestamp type earlier on, and the plot is shown below;
code:
plt.figure(figsize=(15,10))
sns.lineplot(data=data_no_orkney, x="Data_Month_Date", y=percentage)
plt.xticks(list(set(data_no_orkney.Data_Month_Date)))
#plt.axvline(x=pd.Timestamp(year=2020,month=3,day=23), color='r', ls='--', label="Date of first lockdown")
plt.xlabel("Year")
plt.ylabel("Percentage meeting target")
plt.show()
Also, would it be correct of me to assume that the solid, blue line in the middle is the mean out of the values shown in the lighter blue area? I've never seen such line plot before, but that's more or less my understanding, judging by the looks of it.
I tried using plt.xticks(list), where I tried having the list to contain unduplicated Timestamp (date) values. The only result was that it to took the code longer to run, and the labels did not change.
I was trying to plot geophysics data (well-log) into a scatter plot in Altair using mark_line function, but the line plot is not connecting the dots/ points from top-bottom, but rather from left-right. If you see figure on the left, the data is distributed vertically as clearly seen, in the middle is the result using mark_line, and on the right is the one I wanted, just flipped the X and Y axis.
Is there any way to make a plot to behave just like left figure, but in line encoding?
Or perhaps some form of hacks to flipped the display on the right figure?
chart1 = alt.Chart(w).mark_point(color='green').encode(
alt.X('GR', scale=alt.Scale(domain=[0,300])),
alt.Y('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart2 = alt.Chart(w).mark_line(color='green').encode(
alt.X('GR', scale=alt.Scale(domain=[0,300])),
alt.Y('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart3 = alt.Chart(w).mark_line(color='green').encode(
alt.Y('GR', scale=alt.Scale(domain=[0,300])),
alt.X('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart1 | chart2 | chart3
Plot using Altair
For those who needs more information, this is a typical dataset from borehole geophysics data/ well-log. Data (GR) is displayed in vertical line, against depth (DEPT).
Thanks for the help!
From what I tested so far, Altair scatters plot using mark_line will always follow the X-axis by default. Therefore, in the case where you want to plot data across Y-axis, one has to specify the order of the connecting line. In the following, I add order = 'DEPT' which was the Y-axis in the plot.
alt.Chart(
w
).mark_line(
color='green',
point=True,
).encode(
alt.X('GR', scale=alt.Scale(domain=[0,250])),
alt.Y('DEPT', sort = 'descending',scale=alt.Scale(domain=[7000, 7030])),
order = 'DEPT' #this has to be added to make sure the plot is following the order of Y-axis, DEPT
).configure_mark(
color = 'red'
).interactive()
Result:
I am using dataset "tips".
Plotting scatter plot with below code
sns.scatterplot(data=df['total_bill'])
I want to show the outliers let's say in this case points which are above 40 on y-axis, in different color or big or is it possible to draw a horizontal like at 40?
With below code desired result achieved.
sns.scatterplot(data=df, y='total_bill', x=range(0,244), hue='is_outlier')
Using seaborn.scatterplot you can leverage the "hue" parameter to plot groups in different color. For your example the following should work
is_outlier = (df['total_bill'] >= 40)
sns.scatterplot(data=df['total_bill'], hue=is_outlier)
I am having trouble getting the seaborn hue to work to color by value. My data is in a pandas df and I am using a barplot.
sns.barplot(x = plot_data['gene'], y = plot_data['freq'],
hue=plot_data["type"],palette={"type1":"red", "type2":"blue"}, ax=ax2)
I am confused by the grey bars that appear in places. I expect only red and blue bars and I am sure these are the only two types in the data.
I suggest you draw seaborn barplot in horizontal order because vertically not show proper way that's reason may be you saying seaborn barplot hue parameter not working.
and use
plt.figure(figsize(9,200)) # for figure size in ration 9:200
you can change according to requirement.
I have a dataset of 5000 products with 50 features. One of the column is 'colors' and there are more than 100 colors in the column. I'm trying to plot a bar chart to show only the top 10 colors and how many products there are in each color.
top_colors = df.colors.value_counts()
top_colors[:10].plot(kind='barh')
plt.xlabel('No. of Products');
Using Seaborn:
sns.factorplot("colors", data=df , palette="PuBu_d");
1) Is there a better way to do this?
2) How can i replicate this with Seaborn?
3) How do i plot such that the highest count is at the top (i.e black at the very top of the bar chart)
An easy trick might be to invert the y axis of your plot, rather than futzing with the data:
s = pd.Series(np.random.choice(list(string.uppercase), 1000))
counts = s.value_counts()
ax = counts.iloc[:10].plot(kind="barh")
ax.invert_yaxis()
Seaborn barplot doesn't currently support horizontally oriented bars, but if you want to control the order the bars appear in you can pass a list of values to the x_order param. But I think it's easier to use the pandas plotting methods here, anyway.
If you want to use pandas then you can first sort:
top_colors[:10].sort(ascending=0).plot(kind='barh')
Seaborn already styles your pandas plots, but you can also use:
sns.barplot(top_colors.index, top_colors.values)