I'm trying to make a program that will create a descending series of plots based on given files containing an n*2 matrix of numerical values (they more or less share an x-axis, and they're close enough on the y-axis that they need to be manipulated to avoid overlap).
Right now, the way it works is to read in the files one at a time with fileinput, add a constant to the values in column two (arbitrary so long as the constant splits each plot; I do it by multiplying the number of files by two, and decrementing by two each plot so they get split), then add the manipulated values to two master lists (for x and y), which are plotted at the end by matplotlib.
I have it doing very close to exactly what I want, but it has some odd lines connecting the end of one file to the beginning of the next, and I'd like to know how to remove them.
Here's the relevant part of the code:
mpl.suptitle(spectitle, fontsize=16)
mpl.xlabel('wavelength (A)', fontsize=14)
mpl.ylabel('flux (erg s^-1 cm^-2)', fontsize=14)
with open(filelist) as infile:
allfiles = [line.rstrip('\n') for line in open(filelist)]
multiplier = len(allfiles)
multiplier *= 2
for line in fileinput.input(allfiles):
filename = fileinput.filename()
waveN, fluxN = np.loadtxt(filename, usecols=[0,1], unpack=True)
fluxCalc = np.array(fluxN)
fluxCalc += multiplier
multiplier -= 2 #decrease multiplier, causing next output specturm to be placed below the one just calculated
wavelenAll.extend(waveN)
fluxCalc.tolist()
fluxAll.extend(fluxCalc)
fileinput.nextfile()
mpl.plot(wavelenAll, fluxAll)
mpl.savefig('allspec.png')
mpl.show()
I can add an image of the output in a few hours. Thanks for any help in advance.
Try something like:
import matplotlib.pyplot as plt
import numpy as np
filelist = []
spectitle = 'spectrum'
with open(filelist) as infile:
allfiles = [line.rstrip('\n') for line in open(filelist)]
all_flux, all_wavelen = [], []
# just get the data from the file and accumulate in a list
# which assumes you want these lists for something else
for fname in allfiles:
waveN, fluxN = np.loadtxt(fname, usecols=[0, 1], unpack=True)
all_flux.append(fluxN)
all_wavelen.append(waveN)
fig, ax = plt.subplots()
fig.suptitle(spectitle, fontsize=16)
ax.set_xlabel('wavelength (A)', fontsize=14)
ax.set_ylabel('flux (erg s^-1 cm^-2)', fontsize=14)
# loop over the data and plot
for wv, flux, shift in zip(all_wavelen, all_flux,
range(1, len(allfiles) + 1)[::-1]):
# do the shift as late as possible so you do not accidentally reuse
# cosmetically shifted data for computing something
ax.plot(wv, flux + shift, color='b')
fig.savefig('allspec.png')
plt.show()
It must be in the data or caused by its wrong postprocessing. It is hard to say more unless we see the data. Try to plot it without first of last elements, i.e. like mpl.plot(wavelenAll[1:-1], fluxAll[1:-1])
Related
I am drawing streamplots using matplotlib, and exporting them to a vector format. However, I find the streamlines are exported as a series of separate lines - not joined objects. This has the effect of reducing the quality of the image, and making for an unwieldy file for further manipulation. An example; the following images are of a pdf generated by exportfig and viewed in Acrobat Reader:
This is the entire plot
and this is a zoom of the center.
Interestingly, the length of these short line segments is affected by 'density' - increasing the density decreases the length of the lines. I get the same behavior whether exporting to svg, pdf or eps.
Is there a way to get a streamplot to export streamlines as a single object, preferably as a curved line?
MWE
import matplotlib.pyplot as plt
import numpy as np
square_size = 101
x = np.linspace(-1,1,square_size)
y = np.linspace(-1,1,square_size)
u, v = np.meshgrid(-x,y)
fig, axis = plt.subplots(1, figsize = (4,3))
axis.streamplot(x,y,u,v)
fig.savefig('YourDirHere\\test.pdf')
In the end, it seemed like the best solution was to extract the lines from the streamplot object, and plot them using axis.plot. The lines are stored as individual segments with no clue as to which line they belong, so it is necessary to stitch them together into continuous lines.
Code follows:
import matplotlib.pyplot as plt
import numpy as np
def extract_streamlines(sl):
# empty list for extracted lines, flag
new_lines = []
for line in sl:
#ignore zero length lines
if np.array_equiv(line[0],line[1]):
continue
ap_flag = 1
for new_line in new_lines:
#append the line segment to either start or end of exiting lines, if either the star or end of the segment is close.
if np.allclose(line[0],new_line[-1]):
new_line.append(list(line[1]))
ap_flag = 0
break
elif np.allclose(line[1],new_line[-1]):
new_line.append(list(line[0]))
ap_flag = 0
break
elif np.allclose(line[0],new_line[0]):
new_line.insert(0,list(line[1]))
ap_flag = 0
break
elif np.allclose(line[1],new_line[0]):
new_line.insert(0,list(line[0]))
ap_flag = 0
break
# otherwise start a new line
if ap_flag:
new_lines.append(line.tolist())
return [np.array(line) for line in new_lines]
square_size = 101
x = np.linspace(-1,1,square_size)
y = np.linspace(-1,1,square_size)
u, v = np.meshgrid(-x,y)
fig_stream, axis_stream = plt.subplots(1, figsize = (4,3))
stream = axis_stream.streamplot(x,y,u,v)
np_new_lines = extract_streamlines(stream.lines.get_segments())
fig, axis = plt.subplots(1, figsize = (4,4))
for line in np_new_lines:
axis.plot(line[:,0], line[:,1])
fig.savefig('YourDirHere\\test.pdf')
A quick solution to this issue is to change the default cap styles of those tiny segments drawn by the streamplot function. In order to do this, follow the below steps.
Extract all the segments from the stream plot.
Bundle these segments through LineCollection function.
Set the collection's cap style to round.
Set the collection's zorder value smaller than the stream plot's default 2. If it is higher than the default value, the arrows of the stream plot will be overdrawn by the lines of the new collection.
Add the collection to the figure.
The solution of the example code is presented below.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.collections import LineCollection # Import LineCollection function.
square_size = 101
x = np.linspace(-1,1,square_size)
y = np.linspace(-1,1,square_size)
u, v = np.meshgrid(-x,y)
fig, axis = plt.subplots(1, figsize = (4,3))
strm = axis.streamplot(x,y,u,v)
# Extract all the segments from streamplot.
strm_seg = strm.lines.get_segments()
# Bundle segments with round capstyle. The `zorder` value should be less than 2 to not
# overlap streamplot's arrows.
lc = LineCollection(strm_seg, zorder=1.9, capstyle='round')
# Add the bundled segment to the subplot.
axis.add_collection(lc)
fig.savefig('streamline.pdf')
Additionally, if you want to have streamlines their line widths changing throughout the graph, you have to extract them and append this information to LineCollection.
strm_lw = strm.lines.get_linewidths()
lc = LineCollection(strm_seg, zorder=1.9, capstyle='round', linewidths=strm_lw)
Sadly, the implementation of a color map is not as straight as the above solution. Therefore, using a color map with above approach will not be very pleasing. You can still automate the coloring process, as shown below.
strm_col = strm.lines.get_color()
lc = LineCollection(strm_seg, zorder=1.9, capstyle='round', color=strm_col)
Lastly, I opened a pull request to change the default capstyle option in the matplotlib repository, it can be seen here. You can apply this commit using below code too. If you prefer to do so, you do not need any tricks explained above.
diff --git a/lib/matplotlib/streamplot.py b/lib/matplotlib/streamplot.py
index 95ce56a512..0229ae107c 100644
--- a/lib/matplotlib/streamplot.py
+++ b/lib/matplotlib/streamplot.py
## -222,7 +222,7 ## def streamplot(axes, x, y, u, v, density=1, linewidth=None, color=None,
arrows.append(p)
lc = mcollections.LineCollection(
- streamlines, transform=transform, **line_kw)
+ streamlines, transform=transform, **line_kw, capstyle='round')
lc.sticky_edges.x[:] = [grid.x_origin, grid.x_origin + grid.width]
lc.sticky_edges.y[:] = [grid.y_origin, grid.y_origin + grid.height]
if use_multicolor_lines:
So lets say I have a dictionary as follows:
dictionary = {'a': [1,2,3], 'b':[4,2,5], 'c':[5,9,1]}
So the way I would do a single plot of all 'a','b','c' lines would be (assuming figure has already been declared, etc.):
#half-setup for animation
lines = []
mass = list(dictionary.keys()) #I know this is redundant but my 'mass' variable serves another purpose in my actual program
for i in range(len(mass)): #create a list of line objects with zero entries
a, = ax.plot([], [])
lines.append(a)
#single plot
for i in dictionary:
index = np.array(locations[i]) #convert to numpy
ax.plot(index[:,0],index[:,1],index[:,2])
plt.show()
So how can I turn this into an animated 3D graph? I have already tried plt.ion() and plt.pause() but the animation is painfully slow.
Here is the following general implementation that I used and it works pretty well (involves dictionaries):
import matplotlib.animation as anim
#create regular 3D figure 'fig'
lines = []
for i in range(3): #create however many lines you want
a, = ax.plot([],[],[]) #create lines with no data
lines.append(a)
bodies = {i:[data] for i in lines} #where [data] is your x,y,z dataset that you have before hand and 'i' is your matplotlib 'line' object
def update(num):
for i in bodies: #update positions of each line
index = np.array(bodies[i])
i.set_data(index[:,0][:num],index[:,1][:num])
i.set_3d_properties(index[:,2][:num])
if __name__=='__main__':
totalSteps = 1000 #can change
ani = anim.FuncAnimation(fig, update, totalSteps, interval = 1)
I'm trying to plot data from 2 seperate MultiIndex, with the same data as levels in each.
Currently, this is generating two seperate plots and I'm unable to customise the legend by appending some string to individualise each line on the graph. Any help would be appreciated!
Here is the method so far:
def plot_lead_trail_res(df_ante, df_post, symbols=[]):
if len(symbols) < 1:
print "Try again with a symbol list. (Time constraints)"
else:
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
ante_leg = [str(x)+'_ex-ante' for x in df_ante.index.levels[0]]
post_leg = [str(x)+'_ex-post' for x in df_post.index.levels[0]]
print "ante_leg", ante_leg
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
ax = df_post.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=post_leg)
ax.set_xlabel('Time-shift of sentiment data (days) with financial data')
ax.set_ylabel('Mutual Information')
Using this function call:
sentisignal.plot_lead_trail_res(data_nasdaq_top_100_preprocessed_mi_res, data_nasdaq_top_100_preprocessed_mi_res_validate, ['AAL', 'AAPL'])
I obtain the following figure:
Current plots
Ideally, both sets of lines would be on the same graph with the same axes!
Update 2 [Concatenation Solution]
I've solved the issues of plotting from multiple frames using concatenation, however the legend does not match the line colors on the graph.
There are not specific calls to legend and the label parameter in plot() has not been used.
Code:
df_ante = data_nasdaq_top_100_preprocessed_mi_res
df_post = data_nasdaq_top_100_preprocessed_mi_res_validate
symbols = ['AAL', 'AAPL']
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
df_ante.index.set_levels([[str(x)+'_ex-ante' for x in df_ante.index.levels[0]],df_ante.index.levels[1]], inplace=True)
df_post.index.set_levels([[str(x)+'_ex-post' for x in df_post.index.levels[0]],df_post.index.levels[1]], inplace=True)
df_merge = pd.concat([df_ante, df_post])
df_merge['SHIFT'] = abs(df_merge['SHIFT'])
df_merge.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION')
Image:
MultiIndex Plot Image
I think, with
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
you put the output of the plot() in ax, including the lines, which then get overwritten by the second function call. Am I right, that the lines which were plotted first are missing?
The official procedure would be rather something like
fig = plt.figure(figsize=(5, 5)) # size in inch
ax = fig.add_subplot(111) # if you want only one axes
now you have an axes object in ax, and can take this as input for the next plots.
I have a csv file which has two sets of data. Basically:
for row in reader:
###I have some other code but here's the stuff that applies to the question###
disorder_long = sequence_analysis(looped_region.upper(), mode = 'long')
disorder_short = sequence_analysis(looped_region.upper(), mode = 'short')
length = len(list(disorder_short))
#print length
xmin = 1
xmax_long = length
ymin = 0
ymax_long = max(disorder_long)
ymax_short = max(disorder_short)
y_limit = max([ymax_long, ymax_short])
#print y_limit
while True:
try:
newfig = str(raw_input('Name the graph to be created: '))
break #break out of loop
except ValueError:
print("error")
continue #return to start of loop
plt.figure
#data
x_series = np.array(range(1,length+1))
# print "x series: "
# print x_series
# print len(x_series)
y_series1 = np.array(disorder_long)
y_series2 = np.array(disorder_short)
# print y_series1, y_series2
#plot data
plt.plot(x_series, y_series1, label=uniprot_id+' long')
plt.plot(x_series, y_series2, label=uniprot_id+' short')
#add limits to the x and y axis
plt.xlim(xmin, xmax_long)
plt.ylim(ymin, 1)
#create legend
plt.legend(loc="upper left")
#save figure to png
plt.savefig(newfig)`
Returns me two graphs, one of which is perfectly fine (its the first set of data) but the other has two extra lines plotted and I have no idea where they came from. Second graph has two extra lines, the top two are extraneous.
Just at a cursory glance, the second graph, top two lines, appear to be the same as the lines in the first. I'm not the best with matplotlib but I do see that the line
plt.figure
doesn't specify it needs to make a new one. You should use plt.figure(1) and plt.figure(2) to specify they are different figures
plt.clf()
works to create new plots without appending to the existing one.
This is from Chapter 2 in the book Machine Learning In Action and I am trying to make the plot pictured here:
The author has posted the plot's code here, which I believe may be a bit hacky (he also mentions this code is sloppy since it is out of the book's scope).
Here is my attempt to re-create the plot:
First, the .txt file holding the data is as follows (source: "datingTestSet2.txt" in Ch.2 here):
40920 8.326976 0.953952 largeDoses
14488 7.153469 1.673904 smallDoses
26052 1.441871 0.805124 didntLike
75136 13.147394 0.428964 didntLike
38344 1.669788 0.134296 didntLike
...
Assume datingDataMat is a numpy.ndarray of shape `(1000L, 2L) where column 0 is "Frequent Flier Miles Per Year", column 1 is "% Time Playing Video Games", and column 2 is "liter of ice cream consumed per week", as shown in the sample above.
Assume datingLabels is a list of ints 1, 2, or 3 meaning "Did Not Like", "Liked in Small Doses", and "Liked in Large Doses" respectively - associated with column 3 above.
Here is the code I have to create the plot (full details for file2matrix are at the end):
datingDataMat,datingLabels = file2matrix("datingTestSet2.txt")
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot (111)
plt.xlabel("Freq flier miles")
plt.ylabel("% time video games")
# Not sure how to finish this: plt.legend([1, 2, 3], ["did not like", "small doses", "large doses"])
plt.scatter(datingDataMat[:,0], datingDataMat[:,1], 15.0*np.array(datingLabels), 15.0*np.array(datingLabels)) # Change marker color and size
plt.show()
The output is here:
My main concern is how to create this legend. Is there a way to do this without needing a direct handle to the points?
Next, I am curious whether I can find a way to switch the colors to match those of the plot. Is there a way to do this without having some kind of "handle" on the individual points?
Also, if interested, here is the file2matrix implementation:
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = np.zeros((numberOfLines,3)) #numpy.zeros(shape, dtype=float, order='C')
classLabelVector = []
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3] # FFmiles/yr, % time gaming, L ice cream/wk
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
Here's an example that mimics the code you already have that shows the approach described in Saullo Castro's example.
It also shows how to set the colors in the example.
If you want more information on the colors available, see the documentation at http://matplotlib.org/api/colors_api.html
It would also be worth looking at the scatter plot documentation at http://matplotlib.org/1.3.1/api/pyplot_api.html#matplotlib.pyplot.scatter
from numpy.random import rand, randint
from matplotlib import pyplot as plt
n = 1000
# Generate random data
data = rand(n, 2)
# Make a random array to mimic datingLabels
labels = randint(1, 4, n)
# Separate the data according to the labels
data_1 = data[labels==1]
data_2 = data[labels==2]
data_3 = data[labels==3]
# Plot each set of points separately
# 's' is the size parameter.
# 'c' is the color parameter.
# I have chosen the colors so that they match the plot shown.
# With each set of points, input the desired label for the legend.
plt.scatter(data_1[:,0], data_1[:,1], s=15, c='r', label="label 1")
plt.scatter(data_2[:,0], data_2[:,1], s=30, c='g', label="label 2")
plt.scatter(data_3[:,0], data_3[:,1], s=45, c='b', label="label 3")
# Put labels on the axes
plt.ylabel("ylabel")
plt.xlabel("xlabel")
# Place the Legend in the plot.
plt.gca().legend(loc="upper left")
# Display it.
plt.show()
The gray borders should become white if you use plt.savefig to save the figure to file instead of displaying it.
Remember to run plt.clf() or plt.cla() after saving to file to clear the axes so you don't end up replotting the same data on top of itself over and over again.
To create the legend you have to:
give labels to each curve
call the legend() method from the current AxesSubplot object, which can be obtained using plt.gca(), for example.
See the example below:
plt.scatter(datingDataMat[:,0], datingDataMat[:,1],
15.0*np.array(datingLabels), 15.0*np.array(datingLabels),
label='Label for this data')
plt.gca().legend(loc='upper left')