Matplotlib pyplot: plotting array mixes up y axis labels - python

I'm making a simple program in Python to plot two lists of integers, one data one the time axis.
The time list goes from 0 to 3 in increments of 1, while the data list consists of: 5,10, 3,12. I used print statements to verify that the lists do have the values mentioned above.
plt.plot(time_axis,data_array, 'ro')
plt.axis([0, 20, 0, 20])
plt.show()
However, as shown in the image, the plot y axis is labeled in the order that my data list is processed, not in ascending order: 5,10,3,12
Is there a way to make the y axis go in equal increasing increments upto 20?
EDIT: I noticed that this mixup only happens when i use the list as a parameter: eg,
plt.plot([0,1,2,3],[5,10,3,12],'bo') #gives the correct graph while
plt.plot(time_axis,data_array,'bo') #gives the incorrect graph,
Even though the two lists time_axis and data_array contain the same values.
Tracing back my error, I was importing my data values from a text file, and the parsing was done incorrectly, so the data values were not ints. The char values were in the format '5\n',etc so numplot was getting confused. Fixing that solved the issue!

I feel like you're omitting the code which is making this screwy, but here's what I did:
import matplotlib.pyplot as plt
time_axis = range(0,4)
data_array=[5,10,3,12]
plt.plot(time_axis,data_array, 'ro')
plt.axis([0, 20, 0, 20])
plt.show()
This produces the image:
which seems to be what you were aiming for.

Related

How to Order Coordinates in Matplotlib (xticks and yticks)

Alright, so I was working on a simple program to just pull coordinates out of a text pad and then graph what was in the text pad on a graph. I thought it would be pretty simple, but I am VERY new to matplotlib, so I still don't fully understand. I got most of the code done correctly, but the only thing that is not working is that when I put the values in the graph, they come all out of order. I want to order the xticks and yticks so that it actually looks like a real line graph you'd see in math, so you can see how the lower coordinates lower than the higher coordinates, and vice versa. Here is my code:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def split(word):
return list(word)
fileIWant = open('C:/Users/JustA/Desktop/Python Shenanigans/Converting Coordinates in a .txt to a Graph/Coordinates.txt', 'r');
stopwords = ['\n']
array = fileIWant.readlines()
array = [array.replace('\n', '') for array in array if array not in stopwords]
fileIWant.close()
editFile = open('C:/Users/JustA/Desktop/Python Shenanigans/Converting Coordinates in a .txt to a Graph/Coordinates.txt', 'w')
array_length = len(array)
x = []
y = []
for i in range(array_length):
dataSplit = array[i].split()
getCoordinateX = dataSplit[1]
getCoordinateY = dataSplit[3]
x.append(getCoordinateX)
y.append(getCoordinateY)
plt.scatter(x, y)
plt.plot(x, y) #Add this line in if you want to show lines.
plt.title('Your Coordinate Graph')
plt.xlabel('X Coordinates')
plt.ylabel('Y Coordinates')
#plt.xticks([-100,-80,-60,-40,-20,0,20,40,60,80,100])
#plt.yticks([-100,-80,-60,-40,-20,0,20,40,60,80,100])
plt.show()
editFile.close()
I commented out what I put for the ticks, because it was not working at all. With those commented out, it looks okay, but it is very confusing. I think it just puts them in the order they are at in the .txt, when I want them to order themselves in the code. Here is what it is outputting right now:
Sorry if this is so simple that it has never been asked before, like I said, very new to matplotlib, and numpy if I have to use that at all. I imported it because I thought I may have to, but I don't think I really used it as of yet. Also, I am going to rewrite the coordinates into the graph in order, but I think I can do that myself later.
The problem is that your coordinates are strings, which means matplotlib is just plotting strings against strings ("categorical" axis labels). To fix, you simply have to convert your strings to numbers, e.g. x.append(int(getCoordinateX)).
Note that you also don't have to put plt.scatter/plt.plot in the loop - you only have to call one of those once on the full array. That'll probably make things a little faster too.

How to chunk up an a step function array based on when x-values flatten out

I am running into a problem that I am having trouble figuring out in python (which I will currently blame on sever jetlag).
I have an array, let's call it x. The plot of x where y-axis is generic value, x-axis is index of array, looks like:
What I want to do is isolate the flat sections after the initial bump (see next picture that I am interested in):
I want to ignore the leading flat line and bump, and make an array of the five red boxes in the second image such that I have something like
x_chunk = [[box 0], [box 1], [box 2], [box 3], [box 4]]
I want to ignore all of the sloped transition line between the red chunks. I am having trouble figuring out the proper iterating procedure and setting the condition such that I get what I need.
So, this is probably not the cleanest solution, however it works:
import numpy as np
import matplotlib.pyplot as plt
# Create data
r=np.random.random(50)
y1 = np.array([50,40,30,20,10])
y=np.repeat(y1,10)
y[9]=y[9]+10
y=y+r
# Plot data
x=np.arange(len(y))
plt.plot(x,y)
plt.show()
Will give you something like this:
# Find maximum and start from there
idxStart=np.argmax(y)
y2=y[idxStart:]
# Grab jump indices
idxs=np.where(np.diff(y2)<-1)[0]+1
# Put into boxes
boxs=[]
for i in range(len(idxs)-1):
boxs.append(y2[idxs[i]:idxs[i+1]])
print boxs
Of course you will need to find the right threshold to distinguish the "jumps/drops" in the data, in my case -1 was good enough since random returns values between 0 and 1. Hope your jetlag gets better soon.
Not tested as I have no data, but something like this should work
def findSteps(arr, thr=.02, window=10, disc=np.std):
d = disc(np.lib.stride_tricks.as_strided(arr, strides = arr.strides*2, shape = (arr.size-window+1, window)), axis = 1)
m = np.minimum(np.abs(d[:-window]), np.abs(d[window:])) < thr
i = np.nonzero(np.diff(m))
return np.split(arr[window:-window], i)[::2]
May have to play around with the window and threshold value, and you may want to write a slope function for disc if np.std doesn't work, but the basic idea is looking forward and backward by window steps and seeing if the standard deviation (or slope) of the stride is close to 0.
You'll end up with blocks of True values, which you find the start and end of by np.nonzero(np.diff())
You then np.split the array into a list of arrays by the blocks and only take every other member of the list (since the other sub-arrays will be the transitions).

Plotting trajectories in python using matplotlib

I'm having some trouble using matplotlib to plot the path of something.
Here's a basic version of the type of thing I'm doing.
Essentially, I'm seeing if the value breaks a certain threshold (6 in this case) at any point during the path and then doing something with it later on.
Now, I have 3 lists set-up. The end_vector will be based on the other two lists. If the value breaks past 2 any time during a single simulation, I will add the last position of the object to my end_vector
trajectories_vect is something I want to keep track of my trajectories for all 5 simulations, by keeping a list of lists. I'll clarify this below. And, timestep_vect stores the path for a single simulation.
from random import gauss
from matplotlib import pyplot as plt
import numpy as np
starting_val = 5
T = 1 #1 year
delta_t = .1 #time-step
N = int(T/delta_t) #how many points on the path looked at
trials = 5 #number of simulations
#main iterative loop
end_vect = []
trajectories_vect = []
for k in xrange(trials):
s_j = starting_val
timestep_vect = []
for j in xrange(N-1):
xi = gauss(0,1.0)
s_j *= xi
timestep_vect.append(s_j)
trajectories_vect.append(timestep_vect)
if max(timestep_vect) > 5:
end_vect.append(timestep_vect[-1])
else:
end_vect.append(0)
Okay, at this part if I print my trajectories, I get something like this (I only posted two simulations, instead of the full 5):
[[ -3.61689976e+00 2.85839230e+00 -1.59673115e+00 6.22743522e-01
1.95127718e-02 -1.72827152e-02 1.79295788e-02 4.26807446e-02
-4.06175288e-02] [ 4.29119818e-01 4.50321728e-01 -7.62901016e-01
-8.31124346e-02 -6.40330554e-03 1.28172906e-02 -1.91664737e-02
-8.29173982e-03 4.03917926e-03]]
This is good and what I want to happen.
Now, my problem is that I don't know how to plot my path (y-axis) against my time (x-axis) properly.
First, I want to put my data into numpy arrays because I'll need to use them later on to compute some statistics and other things which from experience numpy makes very easy.
#creating numpy arrays from list
#might need to use this with matplotlib somehow
np_trajectories = np.array(trajectories_vect)
time_array = np.arange(1,10)
Here's the crux of the issue though. When i'm putting my trajectories (y-axis) into matplotlib, it's not treating each "list" (row in numpy) as one path. Instead of getting 5 paths for 5 simulations, I am getting 9 paths for 5 simulations. I believe I am inputing stuff wrong hence it is using the 9 time intervals in the wrong way.
#matplotlib stuff
plt.plot(np_trajectories)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
Here's the image produced:
Obviously, this is wrong for the aforementioned reason. Instead, I want to have 5 paths based on the 5 lists (rows) in my trajectories. I seem to understand what the problem is but don't know how to go about fixing it.
Thanks in advance for the help.
When you call np_trajectories = np.array(trajectories_vect), your list of trajectories is transformed into a 2d numpy array. The information about its dimensions is stored in np_trajectories.shape, and, in your case, is (5, 9). Therefore, when you pass np_trajectories to plt.plot(), the plotting library assumes that the y-values are stored in the first dimension, while the second dimension describes individual lines to plot.
In your case, all you need to do is to transpose your np_trajectories array. In numpy, it is as simple as
plt.plot(np_trajectories.T)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
If you want to plot the x-axis as time, instead of steps of one, you have to define your time progression as a list or an array. In numpy, you can do something like
times = np.linspace(0, T, N-1)
plt.plot(times, np_trajectories.T)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
which produces the following figure:

Python Matplotlib: How to scale the color of scatterplot/errorbar markers with a 3rd variable

This question seems to have been asked a few times already, but for some reason it doesn't work for me.
I am making a plt.errorbar plot from the arrays of points results['logF'], results['R'] which are in a pandas DataFrame. I want to scale the colour of the points with a third variable results['M']. I've tried various things but I always get some kind of error, I'm clearly doing something wrong but I can't find any place that explains exactly what is required.
So firstly, results['M'] are a bunch of floats in the range 0 - 13. So as I understand it, I need to normalise them, which I did with matplotlib.colors.Normalise(vmin=0.0, vmax=13.0).
When I try plotting with the following code:
results = get_param_results(totP)
colormap = mlb.colors.Normalize(vmin=0.0, vmax=13.0)
mass_color = np.array(colormap(results['M']))
#import pdb; pdb.set_trace()
plt.errorbar(results['logF'], results['R'], marker='x',
mew=1.2, ms=3.5, capsize=2, c=mass_color,
yerr=[results['logF_l'], results['logF_u']],
xerr=[results['R_l'], results['R_u']],
elinewidth=1.2)
I get an error ValueError: Color array must be two-dimensional. Not sure why it should be two dimensional. In other stackoverflow threads, they pass one dimensional arrays and it's fine.
Using a different form (basically just copying the style from another stackoverflow thread), I write:
results = get_param_results(totP)
colormap = mlb.colors.Normalize(vmin=0.0, vmax=13.0)
#import pdb; pdb.set_trace()
plt.errorbar(results['logF'], results['R'], marker='x',
mew=1.2, ms=3.5, capsize=2, c=results['M'], cmap=mlb.cm.jet, norm=colormap
yerr=[results['logF_l'], results['logF_u']],
xerr=[results['R_l'], results['R_u']],
elinewidth=1.2)
I get a different error, TypeError: There is no Line2D property "cmap"
I don't understand this either (it also doesn't recognise norm), scatter should definitely have the norm and cmap arguments.
Basically, I can't find any great explanations or tutorials on how to get the color scale with an errorbar plot. Can someone help?
Thanks.
EDIT:
Been asked to post the data I'm using. This is the .head() table of the results DataFrame (the full one has 257 rows).
R R_l R_u F F_l F_u \
0 1.486045 0.068775 0.068508 2.999561e+06 488301.994185 496244.025108
1 0.992957 0.062303 0.062664 4.583829e+04 6652.971755 6636.980813
2 1.422328 0.029163 0.029323 2.068257e+06 186692.732530 187685.738474
3 1.326820 0.094840 0.093995 1.049490e+06 185012.117516 184290.913875
4 0.887831 0.013825 0.013939 5.883107e+05 52537.237452 52492.326206
M M_l M_u logF logF_l logF_u
0 1.030471 0.122698 0.123368 6.471041 0.071150 0.072506
1 2.753916 0.157837 0.160584 4.656550 0.063427 0.063404
2 2.344767 0.340987 0.345171 6.313780 0.039261 0.039548
3 0.918979 0.069931 0.069984 6.014049 0.077296 0.077189
4 1.310289 0.076565 0.076805 5.767830 0.038848 0.038895
So basically:
results['M'] = array([ 1.03047146, 2.75391626, 2.34476658, 0.91897949, 1.31028926])
results['logF'] = array([ 6.47104102, 4.65655021, 6.31377955, 6.01404944, 5.76782953])
results['R'] = array([ 1.48604489, 0.99295713, 1.42232837, 1.3268205 , 0.88783067])
and etc... (for the error bars, just use an array([1,1,1,1,1]) to save time or something).
I reran the code, by replacing results with the above, and it still gives me ValueError: Color array must be two-dimensional
I'm not sure what the second dimension should be. Is there something obvious that I'm doing wrong when I'm calling the errorbar plot function?

"Clean" way to use words as markers in matplotlib? And make font size and color differ?

Suppose I have the 3x3 matrix below:
[apples 19 3.5]
[oranges 07 2.2]
[grapes 23 7.8]
Only in real life the matrix has dozens of rows, not just three.
I want to create an XY plot where the second column is the X coordinate, the third column is the Y coordinate, and the words themselves (i.e., the first column) are the markers (so no dots, lines, or any other symbols).
I also want the font size of each word to be determined by the second column (in the example above, that means making "grapes" have about three times the size of "oranges", for instance).
Finally, I want to color the words on a red-to-blue scale corresponding to the third column, with 0 = darkest red and 10 = darkest blue.
What's the best way to go about it in Python 2.x? I know I can use matplotlib's "annotate" and "text" to do many (if not all) of those things, but somehow that feels like a workaround. Surely there must be a way of declaring the words to be markers (so I don't have to treat them as "annotations")? Perhaps something outside matplotlib? Has anyone out there ever done something similar?
As you did not want to use annotate or text the next best thing is py.scatter which will accept a marker
``'$...$'`` render the string using mathtext.
For example
import pylab as py
data = [["peach", 1.0, 1.0],
["apples", 19, 3.5],
["oranges", 7, 2.2],
["grapes", 23, 7.8]]
for item in data:
py.scatter(item[1], item[2], s=700*item[1],
c=(item[2]/10.0, 0, 1 - item[2]/10.0),
marker=r"$ {} $".format(item[0]), edgecolors='none' )
py.show()
This method has several issues
Using \textrm{} in the math text so that it is not italic appears to break matplotlib
The letters sizes need to be adjusted by hand (hence the factor of 700)
It would probably be better to use a colormap rather than simply defining the RGB color value.
While looking around for a solution to the same problem, I've found one that seems a bit cleaner (or at least more in spirit to what the original question asked), namely to use TextPath:
from matplotlib import pyplot as plt
from matplotlib.text import TextPath
data = [["peach", 1.0, 1.0],
["apples", 19, 3.5],
["oranges", 7, 2.2],
["grapes", 23, 7.8]]
max_d2 = max([d[2] for d in data]) + 1e-3
max_d1 = max([d[1] for d in data]) + 1e-3
cmap = plt.get_cmap('RdBu')
for d in data:
path = TextPath((0,0), d[0])
# These dots are to display the weakness below, remove for the actual question
plt.plot(d[1],d[2],'.',color='k')
plt.plot(d[1],d[2],marker=path,markersize=100, color=cmap(d[2]/max_d2))
plt.xlim([0,max_d1+5])
plt.ylim([0,max_d2+0.5])
This solution has some advantages and disadvantages of its own:
Main disadvantage: as the dots show, I wasn't able to properly center the text as I wanted. Instead, the required value is the bottom left of the picture.
Main advantage: this has no latex issue and uses a "real" marker path, which means that it can easily be used to e.g. mark line plots (not the original question, though)
Code:
import numpy as np
x = np.cumsum(np.random.randn(100,5), axis=0)
plt.figure(figsize=(15,5))
for i in range(5):
label = TextPath((0,0), str(i), linewidth=1)
plt.plot(x[:,i], color='k')
plt.plot(np.arange(0,len(x),5),x[::5,i], color='k', marker=label, markersize=15, linewidth=0)
Doing the above via a naive loop over "text" or "annotate" would be very slow if you had many lines / markers, while this scales better.

Categories