Im new to Python and Pandas but have a CSV file with multiple columns that I have read in to a dataframe. I would like to plot a scatter plot of x=Index and y='data'. Where the index is Index of the dataframe and is a date.
Thanks heaps
Jason
You can use plot_date:
plot_date(df.index, df.data)
Whilst, I guess technically not a scatter plot, you can use the pandas.plot function with point markers drawn on and no lines.
df.plot(marker='o', linewidth=0)
This then allows us to use all of the convenient pandas functionality you desire. e.g. plot two series and different scales, using a single function,
df.plot(marker='o', linewidth=0, secondary_y='y2')
The downside to this is that you lose some of the scatter functionality such as shading and sizing the markers differently.
Still, if your aim is a quick scatter plot, this might be the easiest route.
Related
I come across this issue constantly; and my current solution is to create additional dataframes, I feel like there must be an easier solution.
Here is an example of data where I have multiple countries with multiple attributes:
If I wanted to plot Population vs. Depression (%) I would write:
ax = df.plot.scatter(x='Population', y='Depression (%)')
This isn't super helpful, as there are clearly lines linked to specific Countries (df['Country']). Is there a simple way to plot a scatter plot with different series (colors/shapes/etc) as different Countries?
Right now I use groupby to separate out individual Countries and plot them on the same axes (ax = ax).
Any thoughts or input would be greatly appreciated! Thank you!
Try c="Country" and then if you want some nice colors you can go colormap='viridis' for example documentation
ax2 = df.plot.scatter(x='length',
y='width',
c='species',
colormap='viridis')
Since you are using strings as variables we can't use this approach directly and need to convert the data to numbers. This can be done by writing:
c=df.country.astype("category").cat.codes
In python seaborn, What is the difference between countplot and catplot?
Eg:
sns.catplot(x='class', y='survived', hue='sex', kind='bar', data=titanic);
sns.countplot(y='deck', hue='class', data=titanic);
seaborn.countplot
Shows the counts of observations in each categorical bin using bars.
seaborn.catplot
Provides access to several axes-level functions that show the relationship between a numerical and one or more categorical variables using one of several visual representations.
There is a lot of overhead in catplot, or for that matter in FacetGrid, that will ensure that the categories are synchronized along the grid. Consider e.g. that you have a variable you plot along the columns of the grid for which not every age group occurs. You would still need to show that non-occuring age group and hold on to its color. Hence, two countplots next to each other do not necessarily make up one catplot.
However, if you are only interested in a single countplot, a catplot is clearly overkill. On the other hand, even a single countplot is overkill compared to a barplot of the counts.
my problem is that I could only find answers for plots sharing the same y-axis units.
My graphs are defined as follows:
#Plot1
sns.set_style("white")
sns.catplot(y="Reaction_cd_positive", x="Flux_cd_positive",
kind="bar",height=4, data=CDP,aspect=1.5)
#Plot2
sns.catplot(y="Reaction_cd_negative",x="Flux_cd_negative",
kind="bar",height=4, data=CDN, aspect=1.5)
Thank you in advance!
Ok, let me translate this. You are using seaborn in a jupyter notebook. You want 2 barplots next to each other within the same figure, instead of two individual figures. Since catplot produces a figure by itself, there are two options.
Create a single catplot with two subplots. To this end you would need to concatenate your two DataFrames into a single one, then use the col argument to split the data into the two subplots.
Create a subplot grid with matplotlib first, then plot a barplot into each of the subplots. This is shown in this question.
I'm trying to get my seaborn plot to look something like this:
If I use lmplot with the z-axis for the hue, I get this:
The lmplot in the picture is basically what I want, but I need the colorbar on the right side instead of the actual values.
I tried to do this with a heatmap, but the data plot was worse that way due to the large gaps between samples.
Thanks for any help!
You should use plt.scatter. The hue parameter in lmplot only accepts categorical variables.
Here's the result of scatter plot using Matplotlib
And now here's the result of calling scatter plot using Pandas
Is there bug in Pandas scatter function or is it supposed to work like this?
I think the grey area you see is the boundary of each point. Use the argument edgecolors='none' or edgecolors='black' to get the same result as you get with matplotlib (see also http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter)