Pyplot colormap line by line - python

I'm beginning with plotting on python using the very nice pyplot. I aim at showing the evolution of two series of data along time. Instead of doing a casual plot of data function of time, I'd like to have a scatter plot (data1,data2) where the time component is shown as a color gradient.
In my two column file, the time would be described by the line number. Either written as a 3rd column in the file either using the intrinsic capability of pyplot to get the line number on its own.
Can anyone help me in doing that ?
Thanks a lot.
Nicolas

When plotting using matplotlib.pyplot.scatter you can pass a third array via the keyword argument c. This array can choose the colors that you want your scatter points to be. You then also pick an appropriate colormap from matplotlib.cm and assign that with the cmap keyword argument.
This toy example creates two datasets data1 and data2. It then also creates an array colors, an array of continual values equally spaced between 0 and 1, and with the same length as data1 and data2. It doesn't need to know the "line number", it just needs to know the total number of data points, and then equally spaces the colors.
I've also added a colorbar. You can remove this by removing the plt.colorbar() line.
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
N = 500
data1 = np.random.randn(N)
data2 = np.random.randn(N)
colors = np.linspace(0,1,N)
plt.scatter(data1, data2, c=colors, cmap=cm.Blues)
plt.colorbar()
plt.show()

Related

Histogram shows unlimited bins despite bin specification in matplotlib

I have a error data and when I tried to make a histogram of the data the intervals or the bin sizes were showing large as shown in the below image
Below is the code
import matplotlib.pyplot as plt
plt.figure()
plt.hist(error)
plt.title('histogram of error')
plt.xlabel('error')
plt.show()
When I tried to explicitly mention the bins as we usually do, like in the below code I get the hist plot as shown below
plt.figure()
plt.hist(error, bins=[-4,-3,-2,-1, 0,1, 2,3, 4,])
#plt.hist(error, bins = 6)
plt.title('histogram of error')
plt.xlabel('error')
plt.show()
I wish to make the hist look nice, something like below (an example from google) with bins clearly defined.
i Tried with seaborn displot and it gave a nice plot as shown below.
import seaborn as sns
sns.displot(error, bins=[-4,-3,-2,-1, 0,1, 2,3, 4,])
plt.title('histogram of error')
plt.xlabel('error')
plt.show()
Why is that the matplotlib not able to make this plot? Did I miss anything or do I need to set something in order to make the usual histogram plot? Please highlight
The matplotlib documentation for plt.hist() explains that the first parameter can either by a 1D array or a sequence of 1D arrays. The latter case is used if you pass in a 2D array and will result in plotting a separate bar with cycling colors for each of the rows.
This is what we see in your example: The X-axis ticks still correspond to the bin-edges that were passed in - but for each bin there are many bars. So, I'm assuming you passed in a multidimensional array.
To fix this, simply flatten your data before passing it to matplotlib, e.g. plt.hist(np.ravel(error), bins=bins).

Plots not visible when using a line plot

I am new to python and I am trying to plot x and y (both have a large number of data) but when I use a plt.plot there is not plot visible on the output.
The code I have been using is
for i in range(len(a)):
plt.plot(a[i],b[i])
plt.figure()
plt.show()
when I tried a scatter plot
for i in range(len(a)):
plt.scatter(a[i],b[i])
plt.figure()
plt.show()
I am not able to understand the reason for missing the line plot and even when I try seaborn it showing me an error ValueError: If using all scalar values, you must pass an index
import numpy as np
import matplotlib.pyplot as plt
a = np.linspace(0,5,100)
b = np.linspace(0,10,100)
plt.plot(a,b)
plt.show()
I think this answers your question. I have taken sample values of a and b. The matplotlib line plots are not required to run in loops
A line is created between two points. If you are plotting single values, a line can't be constructed.
Well, you might say "but I am plotting many points," which already contains part of the answer (points). Actually, matplotlib.plot() plots line-objects. So every time, you call plot, it creates a new one (no matter if you are calling it on the same or on a new axis). The reason why you don't get lines is that only single points are plotted. The reason why you're not even seeing the these points is that plot() does not indicate the points with markers per default. If you add marker='o' to plot(), you will end up with the same figure as with scatter.
A scatter-plot on the other hand is an unordered collection of points. There characteristic is that there are no lines between these points because they are usually not a sequence. Nonetheless, because there are no lines between them, you can plot them all at once. Per default, they have all the same color but you can even specify a color vector so that you can encode a third information in it.
import matplotlib.pyplot as plt
import numpy as np
# create random data
a = np.random.rand(10)
b = np.random.rand(10)
# open figure + axes
fig,axs = plt.subplots(1,2)
# standard scatter-plot
axs[0].scatter(a,b)
axs[0].set_title("scatter plot")
# standard line-plot
axs[1].plot(a,b)
axs[1].set_title("line plot")

Plot average of an array in python

I have a 2D array of temperature over time data. There are about 7500 x-values and as much corresponding y-values (so one y for every x).
It looks like this:
The blue line in the middle is the result of my unsuccessful attempt to draw a plot line, which would represent the average of my data. Code:
import numpy as np
import matplotlib.pyplot as plt
data=np.genfromtxt("data.csv")
temp_av=[np.mean(data[1])]*len(data[0])
plt.figure()
plt.subplot(111)
plt.scatter(data[0],data[1])
plt.plot(data[0],temp_av)
plt.show()
However what I need is a curve, which will follow the rise in the temperature. Basically a line which will be somewhere in the middle of data points.
I googled for some solutions, but all I found were suggestions how to compute an average in cases where you have multiple y-values for one x. I understand how to do that, but it doesn't help in this case.
My next idea would be to use a loop to compute an average for every 2 neighbor points. But I am not sure how to do that best and if there aren't better solutions.
Also, I understand that what I need is to compute an other array. Plotting is only for representation.
If I undestrand correclty, what you are trying to plot is a trend line. You could do it by using the numpy function 'polyfit'. If that's what you are looking for, try this small modification to your code
import numpy as np
import matplotlib.pyplot as plt
data=np.genfromtxt("data.csv")
plt.figure()
plt.subplot(111)
plt.scatter(data[0],data[1])
pfit = np.polyfit(data[0], data[1], 1)
trend_line_model = np.poly1d(pfit)
plt.plot(data[0], trend_line_model(data[0]), "m--")
plt.show()
This will plot the trend line in dashed magenta

Show a (discrete) colorbar next to a plot as a legend for the (automatically chosen) line colors

I tried to make a plot showing many lines, but it is hard to tell them apart. They have different colors, but I would like to make it easy to show which line is which. A normal legend does not really work so well, since I have more than 10 lines.
The lines follow a logical sequence. I would like to (1) have their color automatically chosen from a colormaps (preferably one that has a smooth ordering, such as viridis or a rainbow). Then I would like (2) to have the tick marks next to the color bar to correspond to the index i for each line (or better a text label from an array of strings textlabels[i]).
Here's a minimal piece of code (with some gaps where I am not sure what to use). I hope this illustrates what I am trying.
import numpy as np
import matplotlib.pyplot as plt
# Genereate some values to plot on the x-axis
x = np.linspace(0,1,1000)
# Some code to select a (discrete version of) a rainbow/viridis color map
...
# Loop over lines that should appear in the plot
for i in range(0,9):
# Plot something (using straight lines with different slope as example)
plt.plot(i*x)
# Some code to plot a discrete color bar next
# to the plot with ticks showing the value of i
...
I currently have this. I would like the color bar to have the ticks with values of i, i.e. 0, 1, 2, ... next to it as tick marks.
Example figure of what I have now. It is hard to tell the lines apart now.
One gets a colormap via plt.get_cmap("name of cmap", number_of_colors).
This colormap can be used to compute the colors for the plots. It can also be used to generate a colorbar.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
n = 10 # how many lines to draw or number of discrete color levels
x = np.linspace(0,1,17)
cmap = plt.get_cmap("viridis", n)
for i in range(0,n):
plt.plot(i*x, color=cmap(i))
norm= matplotlib.colors.BoundaryNorm(np.arange(0,n+1)-0.5, n)
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array([])
plt.colorbar(sm, ticks=np.arange(0,n))
plt.show()

Violin Plot with python

I want to create 10 violin plots but within one diagram. I looked at many examples like this one: Violin plot matplotlib, what shows what I would like to have at the end.
But I did not know how to adapt it to a real data set. They all just generate some random data which is normal distributed.
I have data in form D[10,730] and if I try to adapt it from the link above with :
example:
axes[0].violinplot(all_data,showmeans=False,showmedians=True)
my code:
axes[0].violinplot(D,showmeans=False,showmedians=True)
it do not work.
It should print 10 violin plot in parallel (first dimension of D).
So how do my data need to look like to get the same type of violin plot?
You just need to transpose your data array D.
axes[0].violinplot(D.T,showmeans=False,showmedians=True)
This appears to be a small bug in matplotlib. The axes are treated in a non-consistent manner for a list of 1D arrays and a 2D array.
import numpy as np
import matplotlib.pyplot as plt
n_datasets = 10
n_samples = 730
data = np.random.randn(n_datasets,n_samples)
fig, axes = plt.subplots(1,3)
# http://matplotlib.org/examples/statistics/boxplot_vs_violin_demo.html
axes[0].violinplot([d for d in data])
# should be equivalent to:
axes[1].violinplot(data)
# is actually equivalent to
axes[2].violinplot(data.T)
You should file a bug report.

Categories