Graph Reset Query - python

To whomever,
I am having a graphing problem where it seems that previous data is being stacked on top of new data. I wanted to find a way to separate these so that I can get individual graphs per data set.
Briefly before we get into the script let me tell you what you're looking at. I have 8 data sets each one named somethingsomethingsomething...n=0,1,...,7. So there 8 different files with different sets of values for the wavelength (here I named it WL) and stokes parameters (here I named them SI SQ SU SV). I was told to make some graphs of them so here we are.
The following is what I have:
the base
import matplotlib.pyplot as plt
import numpy as np
import scipy.constants as c
from scipy.interpolate import spline
import re
something to tell the program to not worry about random spaces in data set files
split_on_spaces = re.compile(" +").split
defining the arrays
WL = np.array([])
SI = np.array([])
SQ = np.array([])
SU = np.array([])
SV = np.array([])
code for data interpretation
with open('C:\\Users\\Schmidt\\Desktop\\Python\\Homework_4\\CoolStuffLivesHere\\stokes_profiles_1.txt') as f:
for line in f:
data=split_on_spaces(line.strip())
if len(data) == 0:
continue
if len(data) != 5:
sys.stderr.write("BAD LINE: {}".format(repr(line)))
continue
WL = np.append(WL, float(data[0]))
SI = np.append(SI, data[1])
SQ = np.append(SQ, data[2])
SU = np.append(SU, data[3])
SV = np.append(SV, data[4])
plotting sequence
plt.plot(WL,SI)
plt.show()
Then rinse and repeat for the other 3 parameters and then rinse and repeat for the other data sets as well. It works real fine for the first rendering. However for subsequent graphs it looks more like these: first example, second example.
So in a nut shell what line of code should I be typing in where to resolve my graph stacking issue?

Without getting into subplots, you're just adding to the original plot. You need to close it if you want to re-use it.
i.e.
plt.plot(WL,SI)
plt.show()
plt.close()
plt.plot(WL,SQ)
Unless you want them on the same plot.

Related

lmfit matplot - fitting many curves from many different files at the same time/graph

I have the following code, with which I intend to read and plot many curves from many different files. The "reading and plotting" is already working pretty good.
The problem is that now I want to make a fitting for all those curves in the same plot. This code already manages to fit the curves, but the output is all in one array and I can not plot it, since I could not separate it.
#!/usr/bin/python
import matplotlib.pyplot as plt
from numpy import exp
from lmfit import Model
def read_files(arquivo):
x = []
y = []
abscurrent = []
time = []
data = open(arquivo, 'r')
headers = data.readlines()[60:]
for line in headers:
line = line.strip()
X, Y, ABS, T = line.split('\t')
x.append(X)
y.append(Y)
abscurrent.append(ABS)
time.append(T)
data.close()
def J(x, j, n):
return j*((exp((1.6e-19*x)/(n*1.38e-23*333)))-1)
gmod = Model(J)
result = gmod.fit(abscurrent, x=x, j=10e-10, n=1)
return x, y, abscurrent, time
print(result.fit_report())
When I ask to print the "file" result.best_fit, which in the lmfit would give the best fit for that curve, I get 12 times this result (I have 12 curves) , with different values:
- Adding parameter "j"
- Adding parameter "n"
[ 4.30626925e-17 3.25367918e-14 9.60736218e-14 2.20310475e-13
4.63245638e-13 9.38169958e-13 1.86480698e-12 3.67881758e-12
7.22634738e-12 1.41635088e-11 2.77290634e-11 5.42490983e-11
1.06108942e-10 2.07520542e-10 4.05768903e-10 7.93323537e-10
1.55126521e-09 3.03311029e-09 5.93085363e-09 1.16032067e-08
2.26884736e-08 4.43641560e-08 8.67362753e-08 1.69617697e-07
3.31685858e-07 6.48478168e-07]
- Adding parameter "j"
- Adding parameter "n"
[ 1.43571772e-16 1.00037588e-13 2.92349492e-13 6.62623404e-13
This means that the code is calculating the fit correctly, I just have to separate this output somehow in order to plot each of them with the their curve. Each set of values between [] is what I want to separate in a way I can plot it.
I do not see how the code you posted could possibly produce your output. I do not see a print() function that prints out the array of 26 values, but would imagine that could be the length of your lists x, y and abscurrent -- it is not the output of your print(result.fit_report()), and I do not see that result.
I do not see anything to suggest you have 12 independent curves.
Also, result.best_fit is not a file, it is an array.

Plotting trajectories in python using matplotlib

I'm having some trouble using matplotlib to plot the path of something.
Here's a basic version of the type of thing I'm doing.
Essentially, I'm seeing if the value breaks a certain threshold (6 in this case) at any point during the path and then doing something with it later on.
Now, I have 3 lists set-up. The end_vector will be based on the other two lists. If the value breaks past 2 any time during a single simulation, I will add the last position of the object to my end_vector
trajectories_vect is something I want to keep track of my trajectories for all 5 simulations, by keeping a list of lists. I'll clarify this below. And, timestep_vect stores the path for a single simulation.
from random import gauss
from matplotlib import pyplot as plt
import numpy as np
starting_val = 5
T = 1 #1 year
delta_t = .1 #time-step
N = int(T/delta_t) #how many points on the path looked at
trials = 5 #number of simulations
#main iterative loop
end_vect = []
trajectories_vect = []
for k in xrange(trials):
s_j = starting_val
timestep_vect = []
for j in xrange(N-1):
xi = gauss(0,1.0)
s_j *= xi
timestep_vect.append(s_j)
trajectories_vect.append(timestep_vect)
if max(timestep_vect) > 5:
end_vect.append(timestep_vect[-1])
else:
end_vect.append(0)
Okay, at this part if I print my trajectories, I get something like this (I only posted two simulations, instead of the full 5):
[[ -3.61689976e+00 2.85839230e+00 -1.59673115e+00 6.22743522e-01
1.95127718e-02 -1.72827152e-02 1.79295788e-02 4.26807446e-02
-4.06175288e-02] [ 4.29119818e-01 4.50321728e-01 -7.62901016e-01
-8.31124346e-02 -6.40330554e-03 1.28172906e-02 -1.91664737e-02
-8.29173982e-03 4.03917926e-03]]
This is good and what I want to happen.
Now, my problem is that I don't know how to plot my path (y-axis) against my time (x-axis) properly.
First, I want to put my data into numpy arrays because I'll need to use them later on to compute some statistics and other things which from experience numpy makes very easy.
#creating numpy arrays from list
#might need to use this with matplotlib somehow
np_trajectories = np.array(trajectories_vect)
time_array = np.arange(1,10)
Here's the crux of the issue though. When i'm putting my trajectories (y-axis) into matplotlib, it's not treating each "list" (row in numpy) as one path. Instead of getting 5 paths for 5 simulations, I am getting 9 paths for 5 simulations. I believe I am inputing stuff wrong hence it is using the 9 time intervals in the wrong way.
#matplotlib stuff
plt.plot(np_trajectories)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
Here's the image produced:
Obviously, this is wrong for the aforementioned reason. Instead, I want to have 5 paths based on the 5 lists (rows) in my trajectories. I seem to understand what the problem is but don't know how to go about fixing it.
Thanks in advance for the help.
When you call np_trajectories = np.array(trajectories_vect), your list of trajectories is transformed into a 2d numpy array. The information about its dimensions is stored in np_trajectories.shape, and, in your case, is (5, 9). Therefore, when you pass np_trajectories to plt.plot(), the plotting library assumes that the y-values are stored in the first dimension, while the second dimension describes individual lines to plot.
In your case, all you need to do is to transpose your np_trajectories array. In numpy, it is as simple as
plt.plot(np_trajectories.T)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
If you want to plot the x-axis as time, instead of steps of one, you have to define your time progression as a list or an array. In numpy, you can do something like
times = np.linspace(0, T, N-1)
plt.plot(times, np_trajectories.T)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
which produces the following figure:

Ways to Create Tables and Presentable Objects Other than Plots in Python

I have the following code that runs through the following:
Draw a number of points from a true distribution.
Use those points with curve_fit to extract the parameters.
Check if those parameters are, on average, close to the true values.
(You can do this by creating the "Pull distribution" and see if it returns
a standard normal variable.
# This script calculates the mean and standard deviation for
# the pull distributions on the estimators that curve_fit returns
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import gauss
import format
numTrials = 10000
# Pull given by (a_j - a_true)/a_error)
error_vec_A = []
error_vec_mean = []
error_vec_sigma = []
# Loop to determine pull distribution
for i in xrange(0,numTrials):
# Draw from primary distribution
mean = 0; var = 1; sigma = np.sqrt(var);
N = 20000
A = 1/np.sqrt((2*np.pi*var))
points = gauss.draw_1dGauss(mean,var,N)
# Histogram parameters
bin_size = 0.1; min_edge = mean-6*sigma; max_edge = mean+9*sigma
Nn = (max_edge-min_edge)/bin_size; Nplus1 = Nn + 1
bins = np.linspace(min_edge, max_edge, Nplus1)
# Obtain histogram from primary distributions
hist, bin_edges = np.histogram(points,bins,density=True)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2
# Initial guess
p0 = [5, 2, 4]
coeff, var_matrix = curve_fit(gauss.gaussFun, bin_centres, hist, p0=p0)
# Get the fitted curve
hist_fit = gauss.gaussFun(bin_centres, *coeff)
# Error on the estimates
error_parameters = np.sqrt(np.array([var_matrix[0][0],var_matrix[1][1],var_matrix[2][2]]))
# Obtain the error for each value: A,mu,sigma
A_std = (coeff[0]-A)/error_parameters[0]
mean_std = ((coeff[1]-mean)/error_parameters[1])
sigma_std = (np.abs(coeff[2])-sigma)/error_parameters[2]
# Store results in container
error_vec_A.append(A_std)
error_vec_mean.append(mean_std)
error_vec_sigma.append(sigma_std)
# Plot the distribution of each estimator
plt.figure(1); plt.hist(error_vec_A,bins,normed=True); plt.title('Pull of A')
plt.figure(2); plt.hist(error_vec_mean,bins,normed=True); plt.title('Pull of Mu')
plt.figure(3); plt.hist(error_vec_sigma,bins,normed=True); plt.title('Pull of Sigma')
# Store key information regarding distribution
mean_A = np.mean(error_vec_A); sigma_A = np.std(error_vec_A)
mean_mu = np.mean(error_vec_mean); sigma_mu = np.std(error_vec_mean)
mean_sigma = np.mean(error_vec_sigma); sigma_sig = np.std(error_vec_sigma)
info = np.array([[mean_A,sigma_A],[mean_mu,sigma_mu],[mean_sigma,sigma_sig]])
My problem is I don't know how to use python to format the data into a table. I have to manually go into the variables and go to google docs to present the information. I'm just wondering how I can do that using pandas or some other library.
Here's an example of the manual insertion:
Trial 1 Trial 2 Trial 3
Seed [0.2,0,1] [10,2,5] [5,2,4]
Bins for individual runs 20 20 20
Points Thrown 1000 1000 1000
Number of Runs 5000 5000 5000
Bins for pull dist fit 20 20 20
Mean_A -0.11177 -0.12249 -0.10965
sigma_A 1.17442 1.17517 1.17134
Mean_mu 0.00933 -0.02773 -0.01153
sigma_mu 1.38780 1.38203 1.38671
Mean_sig 0.05292 0.06694 0.04670
sigma_sig 1.19411 1.18438 1.19039
I would like to automate this table so If I change my parameters in my code, I get a new table with that new data.
I would go with the CSV module to generate a presentable table.
if you're not already using it, the IPython notebook is really good for rendering rich display formats. It's really good in a lot of other ways, too.
It will render pandas dataframe objects as an html table when they're either the last, unreturned value in a cell or if you explicitly call Ipython.core.display.display function instead of print.
If you're not already using pandas, I highly recommend it. It's basically a wrapper around 2D & 3D numpy arrays; it's just as fast, but it has nice naming conventions, data grouping and filtering funcitons, and some other cool stuff.
At that point, it depends on how you want to present it. You can use nbconvert to render a whole notebook as static html or a pdf. You can copy-paste the html table into Excel or PowerPoint or an E-mail.

How to plot a graph with text file (2 columns of data) in Python

I have a text file with lots of data that is arranged in 2 columns. I need to use the data in the 2nd column in a formula (which outputs Energy). I need to plot that energy against the time which is all the data in the first column.
So far I have this, and it prints a very weird graph. I know that the energy should be oscillating and decaying exponentially.
import numpy as np
import matplotlib.pyplot as plt
m = 0.090
l = 0.089
g = 9.81
H = np.loadtxt("AngPosition_3p5cmSeparation.txt")
x, y = np.hsplit(H,2)
Ep = m*g*l*(1-np.cos(y))
plt.plot(x, Ep)
plt.show()
I'm struggling to see where I have gone wrong, but then again I am somewhat new to Python. Any help is much appreciated.
I managed to get it to work. My problem was that the angle data had to be converted into radians.
I couldn't do that automatically in Python using math.radians for some reason so I just edited the data in Excel and then back into Notepad.

Matplotlib pcolor

I am using Matplotlib to create an image based on some data. All of the data falls in the range of 0 through to 1 and I am trying to color the data based on its value using a colormap and this works perfectly in Matlab, however when converting the code across to Python I simply get a black square as the output. I believe this is because I'm plotting the image wrong and so it is plotting all the data as 0. I have tried searching this problem for several hours and I have tried plt.set_clim([0, 1]) however that didn't seem to do anything. I am new to Python and Matplotlib, although I am not new to programming (Java, javascript, PHP, etc), but I cannot see where I am going wrong. If any body can see anything glaringly incorrect in my code then I would be extremely grateful.
Thank you
from numpy import *
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as myColor
e1cx=[]
e1cy=[]
e1cz=[]
print("Reading files...")
in_file = open("eigenvector_1_component_x.txt", "rt")
for line in in_file.readlines():
e1cx.append([])
for i in line.split():
e1cx[-1].append(float(i))
in_file.close()
in_file = open("eigenvector_1_component_y.txt", "rt")
for line in in_file.readlines():
e1cy.append([])
for i in line.split():
e1cy[-1].append(float(i))
in_file.close()
in_file = open("eigenvector_1_component_z.txt", "rt")
for line in in_file.readlines():
e1cz.append([])
for i in line.split():
e1cz[-1].append(float(i))
in_file.close()
print("...done")
nx = 120
ny = 128
nz = 190
fx = zeros((nz,nx,ny))
fy = zeros((nz,nx,ny))
fz = zeros((nz,nx,ny))
z = 0
while z<nz-1:
x = 0
while x<nx:
y = 0
while y<ny:
fx[z][x][y]=e1cx[(z*128)+y][x]
fy[z][x][y]=e1cy[(z*128)+y][x]
fz[z][x][y]=e1cz[(z*128)+y][x]
y += 1
x += 1
z+=1
if((z % 10) == 0):
plt.figure(num=None)
plt.axis("off")
normals = myColor.Normalize(vmin=0,vmax=1)
plt.pcolor(fx[z][:][:],cmap='spectral', norm=normals)
filename = 'Imagex_%d' % z
plt.savefig(filename)
plt.colorbar(ticks=[0,2,4,6], format='%0.2f')
Although you have resolved your original issue and have code that works, I wanted to point out that both python and numpy provide several tools that make code like this much simpler to write. Here are a few examples:
Loading data
Instead of building up lists by appending to the end of an empty one, it is often easier to generate them from other lists. For example, instead of
e1cx = []
for line in in_file.readlines():
e1cx.append([])
for i in line.split():
e1cx[-1].append(float(i))
you can simply write:
e1cx = [[float(i) for i in line.split()] for line in in_file]
The syntax [x(y) for y in l] is known as a list comprehension, and, in addition to being more concise will execute more quickly than a for loop.
However, for loading tabular data from a text file, it is even simpler to use numpy.loadtxt:
import numpy as np
e1cx = np.loadtxt("eigenvector_1_component_x.txt")
for more information,
print np.loadtxt.__doc__
See also, its slightly more sophisticated cousin numpy.genfromtxt
Reshaping data
Now that we have our data loaded, we need to reshape it. The while loops you use work fine, but numpy provides an easier way. First, if you prefer to use your method of loading the data, then convert your eigenvector arrays into proper numpy arrays using e1cx = array(e1cx), etc.
The array class provides methods for rearranging how the data in an array is indexed without requiring it to be copied. The simplest method is array.reshape, which will do half of what your while loops do:
almost_fx = e1cx.reshape((nz,ny,nx))
Here, almost_fx is a rank-3 array indexed as almost_fx[iz,iy,ix]. One important thing to be aware of is that e1cx and almost_fx share their data. So, if you change e1cx[0,0], you will also change almost_fx[0,0,0].
In your code, you swapped the x and y locations. If this is indeed what you wanted to do, you can accomplish this with array.swapaxes:
fx = almost_fx.swapaxes(1,2)
Of course, you could always combine this into one line
fx = e1cx.reshape((nz,ny,nx)).swapaxes(1,2)
However, if you want the z-slices (fx[z,:,:]) to plot with x horizontal and y vertical, you probably do not want to swap the axes above. Just reshape and plot.
Slicing arrays
Finally, rather than looping over the z-index and testing for multiples of 10, you can loop directly over a slice of the array using:
for fx_slice in fx[::10]:
# plot fx_slice and save it
This indexing syntax is array[start:end:step] where start is included in the result end is not. Leaving start blank implies 0, while leaving end blank implies the end of the list.
Summary
In summary your complete code (after introducing a few more python idioms like enumerate) could look something like:
import numpy as np
from matplotlib import pyplot as pt
shape = (190,128,120)
fx = np.loadtxt("eigenvectors_1_component_x.txt").reshape(shape).swapaxes(1,2)
for i,fx_slice in enumerate(fx[::10]):
z = i*10
pt.figure()
pt.axis("off")
pt.pcolor(fx_slice, cmap='spectral', vmin=0, vmax=1)
pt.colorbar(ticks=[0,2,4,6], format='%0.2f')
pt.savefig('Imagex_%d' % z)
Alternatively, if you want one pixel per element, you can replace the body of the for loop with
z = i*10
pt.imsave('Imagex_%d' % z, fx_slice, cmap='spectral', vmin=0, vmax=1)

Categories