I do not know python at all thus I have been unsuccessful in interpreting similar previous answers and using them.
I have a python script that I wish to execute in unix. The script uses an input file but I do not understand how to ensure that the input file is read as numpy float array.
My input file is called chk.bed and it has one column of numeric values
-bash-4.1$ # head chk.bed
7.25236
0.197037
0.189464
2.60056
0
32.721
11.3978
3.85692
0
0
The original script is -
from scipy.stats import gaussian_kde
import numpy as np
#assume "fpkm" is a NumPy array of log2(fpkm) values
kernel = gaussian_kde(fpkm)
xi = np.linspace(fpkm.min(), fpkm.max(), 100)
yi = kernel.evaluate(xi)
mu = xi[np.argmax(yi)]
U = fpkm[fpkm > mu].mean()
sigma = (U - mu) * np.sqrt(np.pi / 2)
zFPKM = (fpkm - mu) / sigma
What I could understand up until now is to make sure the script is reading the file so I included fpkm = open("chk.bed", 'r') in the code.
However on executing the code - I get the following error -
Traceback (most recent call last):
File "./calc_zfpkm.py", line 10, in <module>
kernel = gaussian_kde(fpkm)
File "/usr/lib64/python2.6/site-packages/scipy/stats/kde.py", line 88, in __init__
self._compute_covariance()
File "/usr/lib64/python2.6/site-packages/scipy/stats/kde.py", line 340, in _compute_covariance
self.factor * self.factor)
File "/usr/lib64/python2.6/site-packages/numpy/lib/function_base.py", line 1971, in cov
X = array(m, ndmin=2, dtype=float)
TypeError: float() argument must be a string or a number
This seems to suggest that I am not reading in the file correctly and so the function gaussian_kde() cannot read in the values as float.
Can you please help ?
Thanks !
You're passing a file object to gaussian_kde but it expects a NumPy array, you need to use numpy.loadtxt first to load the data in an array:
>>> import numpy as np
>>> arr = np.loadtxt('chk.bed')
>>> arr
array([ 7.25236 , 0.197037, 0.189464, 2.60056 , 0. ,
32.721 , 11.3978 , 3.85692 , 0. , 0. ])
>>> gaussian_kde(arr)
<scipy.stats.kde.gaussian_kde object at 0x7f7350390190>
Here you can find the
R script for zFPKM normalization.
I inspired from the python code which has given above and also at this link:https://www.biostars.org/p/94680/
install.packages("ks","pracma")
library(ks)
library(pracma)
/* fpkm is an example data */
fpkm <- c(1,2,3,4,5,6,7,8,4,5,6,5,6,5,6,5,5,5,5,6,6,78,8,89,8,8,8,2,2,2,1,1,4,4,4,4,4,4,4,4,4,4,4,3,2,2,3,23,2,3,23,4,2,2,4,23,2,2,24,4,4,2,2,4,4,4,2,2,4,4,2,2,4,2,45,5,5,5,3,2,2,4,4,4,4,4,4,4,4,4,3,2,2,3,23,2,3,23,4,2,2,4,23,2,2,24,4,4,2,2,4,4,4,2,2,4,4,2,2,4,2,45,5,5,5,3,2,2)
xi=linspace(min(fpkm),max(fpkm),100)
fhat = kde(x=fpkm,gridsize=100,eval.points=xi)
/* here I put digits=0. if I you do not round the numbers(yi) the results are a little bit changing.*/
yi=round(fhat$estimate,digits=0)
mu=xi[which.max(yi)]
U=mean(fpkm[fpkm>mu])
sigma=(U-mu)* (sqrt(pi/2))
zFPKM = (fpkm - mu) / sigma
Btw, I have a question.
Can I apply the same approach to RPKM?
Cankut CUBUK
Computational Genomics Program - Systems Genomics Lab
Centro de Investigación Príncipe Felipe (CIPF)
C/ Eduardo Primo Yúfera nº3
46012 Valencia, Spain
http://bioinfo.cipf.es
Related
Following my previous two posts (post1, post 2), I have now reached the point where I use scipy to find a curve fit. However, the code I have produces an error.
A sample of the .csv file I'm working with is located in post1. I tried to copy and substitute examples from the Internet, but it doesn't seem to be working.
Here's what I have (the .py file)
import pandas as pd
import numpy as np
from scipy import optimize
df = pd.read_csv("~/Truncated raw data hcl.csv", usecols=['time' , '1mnaoh trial 1']).dropna()
data1 = df
array1 = np.asarray(data1)
x , y = np.split(array1,[-1],axis=1)
def func(x, a , b , c , d , e):
return a + (b - a)/((1 + c*np.exp(-d*x))**(1/e))
popt, pcov = optimize.curve_fit(func, x , y , p0=[23.2, 30.1 , 1 , 1 , 1])
popt
From the limited research I've done, it might be a problem with the x and y arrays. The title states the error that is written. It is a minpack.error.
Edit: the error returned
ValueError: object too deep for desired array
Traceback (most recent call last):
File "~/test2.py", line 15, in <module>
popt, pcov = optimize.curve_fit(func, x , y , p0=[23.2, 30.1 , 1 , 1 , 1])
File "~/'virtualenvname'/lib/python3.7/site-packages/scipy/optimize/minpack.py", line 744, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "~/'virtualenvname'/lib/python3.7/site-packages/scipy/optimize/minpack.py", line 394, in leastsq
gtol, maxfev, epsfcn, factor, diag)
minpack.error: Result from function call is not a proper array of floats.
Thank you.
After the split, the shape of x and y is (..., 1). This means that each element of them itself are arrays of length one. You want to flatten the array first, i.e. via x = np.flatten(x).
But I think you don't need the split at all. You can just do the following
array1 = np.asarray(data1).T
x , y = array1
You want x and y to be the first and second columns of array1. So an easy way to achieve this is to transpose the array first. You could also access them via [:,0] and [:,1].
I want to run scipy.signal.spectrogram in a loop with different nperseg, noverlap, and nfft each time. However I got:
TypeError: 'numpy.float64' object cannot be interpreted as an integer
Here is what I wrote:
Fs=10e3
data = testData(Fs)
r = []
for i in numpy.linspace(-0.4, 0.4, 9):
t_step = 0.5+i
f_step = 0.5-i
window_length = round(2 * t_step * Fs)
noverlap = round(t_step * Fs)
nfft = round(Fs / f_step)
arr_f, arr_t, fft = scipy.signal.spectrogram(data, Fs,
nperseg=window_length,
noverlap=noverlap,
nfft=nfft,
window='hanning')
r.append((arr_f, arr_t, fft))
where testData is copied from spectrogram documentation,
Scipy version is 1.1.0.
When I run the same code with constant, hardcoded t_step and f_step (without +/- i) everything is going smoothly in the whole range. So here are my questions:
Why is it not working?
Is there a way not to do it manually?
Full Tracback:
File "/Users/desktop/test.py", line 34, in main window='hanning')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/signal/spectral.py", line 691, in spectrogram input_length=x.shape[axis])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/signal/spectral.py", line 1775, in _triage_segments win = get_window(window, nperseg)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site packages/scipy/signal/windows/windows.py", line 2106, in get_window return winfunc(*params)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/signal/windows/windows.py", line 786, in hann return general_hamming(M, 0.5, sym)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/signal/windows/windows.py", line 1016, in general_hamming return general_cosine(M, [alpha, 1. - alpha], sym)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/signal/windows/windows.py", line 116, in general_cosine w = np.zeros(M)
TypeError: 'numpy.float64' object cannot be interpreted as an integer
Your calculations of tstep and fstep produce float numbers, but scipy expects integers. You could for instance change your code to
arr_f, arr_t, fft = signal.spectrogram(data, Fs,
nperseg=window_length.astype(int),
noverlap=noverlap.astype(int),
nfft=nfft.astype(int),
window='hanning')
and scipy should work without problems. The .astype(int) just tweaks the numpy datatype, so instead of the float number 2000.0 scipy receives the integer number 2000. You can find more information about numpy data types in the official documentation.
A better way is of course to change your calculations, so that they produce integers numbers right away.
I'm trying to exponentiate a complex matrix in Python and am running into some trouble. I'm using the scipy.linalg.expm function, and am having a rather strange error message when I try the following code:
import numpy as np
from scipy import linalg
hamiltonian = np.mat('[1,0,0,0;0,-1,0,0;0,0,-1,0;0,0,0,1]')
# This works
t_list = np.linspace(0,1,10)
unitary = [linalg.expm(-(1j)*t*hamiltonian) for t in t_list]
# This doesn't
t_list = np.linspace(0,10,100)
unitary = [linalg.expm(-(1j)*t*hamiltonian) for t in t_list]
The error when the second experiment is run is:
This works!
Traceback (most recent call last):
File "matrix_exp.py", line 11, in <module>
unitary_t = [linalg.expm(-1*t*(1j)*hamiltonian) for t in t_list]
File "/usr/lib/python2.7/dist-packages/scipy/linalg/matfuncs.py", line 105, in expm
return scipy.sparse.linalg.expm(A)
File "/usr/lib/python2.7/dist- packages/scipy/sparse/linalg/matfuncs.py", line 344, in expm
X = _fragment_2_1(X, A, s)
File "/usr/lib/python2.7/dist- packages/scipy/sparse/linalg/matfuncs.py", line 462, in _fragment_2_1
X[k, k] = exp_diag[k]
TypeError: only length-1 arrays can be converted to Python scalars
This seems really strange since all I changed was the range of t I was using. Is it because the Hamiltonian is diagonal? In general, the Hamiltonians won't be, but I also want it to work for diagonal ones. I don't really know the mechanics of expm, so any help would be greatly appreciated.
That is interesting. One thing I can say is that the problem is specific to the np.matrix subclass. For example, the following works fine:
h = np.array(hamiltonian)
unitary = [linalg.expm(-(1j)*t*h) for t in t_list]
Digging a little deeper into the traceback, the exception is being raised in _fragment_2_1 in scipy.sparse.linalg.matfuncs.py, specifically these lines:
n = X.shape[0]
diag_T = T.diagonal().copy()
# Replace diag(X) by exp(2^-s diag(T)).
scale = 2 ** -s
exp_diag = np.exp(scale * diag_T)
for k in range(n):
X[k, k] = exp_diag[k]
The error message
X[k, k] = exp_diag[k]
TypeError: only length-1 arrays can be converted to Python scalars
suggests to me that exp_diag[k] ought to be a scalar, but is instead returning a vector (and you can't assign a vector to X[k, k], which is a scalar).
Setting a breakpoint and examining the shapes of these variables confirms this:
ipdb> l
751 # Replace diag(X) by exp(2^-s diag(T)).
752 scale = 2 ** -s
753 exp_diag = np.exp(scale * diag_T)
754 for k in range(n):
755 import ipdb; ipdb.set_trace() # breakpoint e86ebbd4 //
--> 756 X[k, k] = exp_diag[k]
757
758 for i in range(s-1, -1, -1):
759 X = X.dot(X)
760
761 # Replace diag(X) by exp(2^-i diag(T)).
ipdb> exp_diag.shape
(1, 4)
ipdb> exp_diag[k].shape
(1, 4)
ipdb> X[k, k].shape
()
The underlying problem is that exp_diag is assumed to be either 1D or a column vector, but the diagonal of an np.matrix object is a row vector. This highlights a more general point that np.matrix is generally less well-supported than np.ndarray, so in most cases it's better to use the latter.
One possible solution would be to use np.ravel() to flatten diag_T into a 1D np.ndarray:
diag_T = np.ravel(T.diagonal().copy())
This seems to fix the problem you're encountering, although there may be other issues relating to np.matrix that I haven't spotted yet.
I've opened a pull request here.
I seem to be getting an error when I use the root-finder in scipy. I was wondering if anyone could point out what I'm doing wrong.
The function I'm finding the root of is just an easy example, and not particularly important.
If I run this code with scipy 0.9.0:
import numpy as np
from scipy.optimize import fsolve
tmpFunc = lambda xIn: (xIn[0]-4)**2 + (xIn[1]-5)**2 + (xIn[2]-7)**3
x0 = [3,4,5]
xFinal = fsolve(tmpFunc, x0 )
print xFinal
I get the following error message:
Traceback (most recent call last):
File "tmpStack.py", line 7, in <module>
xFinal = fsolve(tmpFunc, x0 )
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 115, in fsolve
_check_func('fsolve', 'func', func, x0, args, n, (n,))
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 26, in _check_func
raise TypeError(msg)
TypeError: fsolve: there is a mismatch between the input and output shape of the 'func' argument '<lambda>'.
Well it looks like I was trying to use this routine incorrectly. This routine requires the same number of equations and variables vs. the one equation with three variables I gave it. So if the input to the function to be minimized is a 3-D array the output should be a 3-D array. This code works:
import numpy as np
from scipy.optimize import fsolve
tmpFunc = lambda xIn: np.array( [(xIn[0]-4)**2 + xIn[1], (xIn[1]-5)**2 - xIn[2]) \
, (xIn[2]-7)**3 + xIn[0] ] )
x0 = [3,4,5]
xFinal = fsolve(tmpFunc, x0 )
print xFinal
Which represents solving three equations simultaneously.
I am having problems interpolating some data points using Scipy. I guess that it might depend on the fact that the function I'm trying to interpolate is discontinuous at x roughly 4.
Here is the code I'm using to interpolate:
from scipy import *
y_interpolated = interp1d(x,y,buonds_error=False,fill_value=0.,kind='cubic')
new_x_array = arange(min(x),max(x),0.05)
plot(new_x_array,x_interpolated(new_x_array),'r-')
The error I get is
File "<stdin>", line 2, in <module>
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/scipy/interpolate/interpolate.py", line 357, in __call__
out_of_bounds = self._check_bounds(x_new)
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/scipy/interpolate/interpolate.py", line 415, in _check_bounds
raise ValueError("A value in x_new is above the interpolation "
ValueError: A value in x_new is above the interpolation range.
These are my data points:
1.56916432074 -27.9998263169
1.76773750527 -27.6198430485
1.98360238449 -27.2397962268
2.25133982943 -26.8596491107
2.49319293195 -26.5518194791
2.77823462692 -26.1896935372
3.07201297519 -25.9540514619
3.46090507092 -25.7362456112
3.65968688527 -25.6453922172
3.84116464506 -25.53652509
3.97070419447 -25.3374215879
4.03087127145 -24.8493356465
4.08217147954 -24.0540196233
4.12470899596 -23.0960856364
4.17612639206 -22.4634289328
4.19318305992 -22.1380894034
4.2708234589 -21.902951035
4.3745696768 -21.9027079759
4.52158254627 -21.9565591238
4.65985875536 -21.8839570732
4.80666329863 -21.6486676004
4.91026629192 -21.4496126386
5.05709528961 -21.2685401725
5.29054655428 -21.2860476871
5.54129211534 -21.3215908912
5.73174988353 -21.6645019816
6.06035782465 -21.772138994
6.30243916407 -21.7715483093
6.59656410998 -22.0238656166
6.86481948673 -22.3665921479
7.01182409559 -22.4385289076
7.17609125906 -22.4200564296
7.37494987052 -22.4376476472
7.60844044988 -22.5093814451
7.79869207061 -22.5812017094
8.00616642549 -22.5445612485
8.17903446593 -22.4899243886
8.29141325457 -22.4715846981
What version of scipy are you using?
The script you posted has some syntax errors (I assume due to wrong copy and paste).
This script works, with scipy.__version__ == 0.9.0. .
import sys
from scipy import *
from scipy.interpolate import *
from pylab import plot
x = []
y = []
for line in sys.stdin:
a, b = line.split()
x.append(float(a))
y.append(float(b))
y_interpolated = interp1d(x,y,bounds_error=False,fill_value=0.,kind='cubic')
new_x_array = arange(min(x),max(x),0.05)
plot(new_x_array,y_interpolated(new_x_array),'r-')