Custom Floor/Ceiling with Significance on a Time Series data Python

Custom Floor/Ceiling with Significance on a Time Series data Python - python

I have been working on a project which uses Time Series for its Calculation.
I would want to have the the Floor and Celing of data (similar to the ones in Excel's FLOOR and CEILING for an entire column
I checked for custom numpy functions, but couldnt see anything which includes significance level
I defined custom functions
def ceil(x, s):
return s * math.ceil(float(x)/s)
def floor(x, s):
return s * math.floor(float(x)/s)
However I cannot use them simultaneously on an entire column
Because of which I need to iterate each row individually:
for i in symbols:
symbols[i]['PutStrike']=0
symbols[i]['CallStrike']=0
for counter in range(0,len(symbols[0])):
symbols[i]['PutStrike'][counter]=floor(symbols[i]['FUT'][counter],Strike_Diff[i])
symbols[i]['CallStrike'][counter]=ceil(symbols[i]['FUT'][counter],Strike_Diff[i])
return symbols
Which of course is not the correct approach along with being time consuming
What I want is something like this:
def CalculateIV(symbols):
for i in symbols:
symbols[i]['PutStrike']=0
symbols[i]['CallStrike']=0
symbols[i]['PutStrike']=floor(symbols[i]['FUT'],Strike_Diff[i])
symbols[i]['CallStrike']=ceil(symbols[i]['FUT'],Strike_Diff[i])
return symbols
However when I run, I get:
CalculateIV(abc)
Traceback (most recent call last):
File "<ipython-input-456-599f9aa19e37>", line 1, in <module>
CalculateIV(abc)
File "<ipython-input-452-190c395d86ed>", line 9, in CalculateIV
symbols[i]['PutStrike']=floor(symbols[i]['FUT'],Strike_Diff[i])
File "<ipython-input-260-8a88fc57ddf5>", line 2, in floor
return s * math.floor(float(x)/s)
File "C:\Users\jay\Anaconda2\lib\site-packages\pandas\core\series.py", line 93, in wrapper
"{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>
Can some one please suggest an alternative/quicker approach or any library which could ease this.
Thanks in Advance

Well it was easier than I thought
I had to use the vectorize function in numpy (np.vectorize)
def ceil(x, s):
return s * math.ceil(float(x)/s)
def floor(x, s):
return s * math.floor(float(x)/s)
vfloor=np.vectorize(floor)
vceil=np.vectorize(ceil)
Thus these functions are now vectorized.
I can straight away use this to process multiple dataframes within seconds.
def CalculateIV(symbols):
for i in symbols:
symbols[i]['PutStrike']=0
symbols[i]['CallStrike']=0
symbols[i]['PutStrike']=vfloor(symbols[i]['FUT'],Strike_Diff[i])
symbols[i]['CallStrike']=vceil(symbols[i]['FUT'],Strike_Diff[i])
return symbols
If pqr has multiple dataframes in it.
I can just use the below to gather the floor and ceil values
output=CalculateIV(pqr)

Related

I am trying to get inverse of a matrix from a file in python, but the format is in string and even after changing it into float I am unable to do it

I am trying to write a code which reads this file and gives the inverse of square root of each term in a matrix. This is the file I am using:
1.659999999999999963e-04
3.970000000000000005e-04
-8.014499999999999402e-02
-2.274299999999999933e-02
-7.559999999999999880e-03
-3.156229999999999869e-01
5.650100000000000261e-02
2.350100000000000106e-02
-4.383999999999999876e-03
-4.878299999999999997e-02
1.207599999999999993e-02
-5.254199999999999843e-02
1.123500000000000019e-02
1.614240000000000119e-01
1.954900000000000040e-02
-2.614100000000000104e-02
1.534899999999999980e-02
5.446000000000000320e-03
-6.210299999999999848e-02
-9.615000000000000283e-03
1.687800000000000064e-02
6.460999999999999729e-03
-9.490999999999999437e-03
1.676700000000000065e-02
-2.308000000000000156e-03
-1.412399999999999940e-02
8.978899999999999382e-02
1.848960000000000048e-01
5.956000000000000356e-03
-5.592300000000000049e-02
1.114599999999999966e-02
-5.689600000000000213e-02
-6.731000000000000004e-03
2.572999999999999940e-02
1.512000000000000106e-03
-3.237999999999999993e-03
-4.068999999999999700e-03
-1.234000000000000071e-03
2.378109999999999946e-01
-1.128000000000000096e-03
-3.534999999999999948e-03
-4.550000000000000008e-04
1.479999999999999925e-04
5.220000000000000031e-04
3.718099999999999877e-02
1.104580000000000006e-01
1.965000000000000167e-03
4.266999999999999960e-03
-5.140999999999999737e-03
1.648640000000000105e-01
1.776220000000000021e-01
1.922000000000000097e-03
3.250600000000000017e-02
4.402899999999999869e-02
-8.430999999999999259e-03
4.409999999999999858e-04
1.389999999999999905e-04
1.374209999999999876e-01
-2.431860000000000133e-01
-1.727000000000000019e-03
-2.280000000000000126e-04
8.100000000000000375e-05
-7.480999999999999803e-03
8.000000000000000654e-05
-3.939999999999999817e-04
1.441000000000000007e-03
-7.290000000000000473e-04
-3.663000000000000284e-02
-1.657999999999999969e-03
-8.369999999999999619e-04
-6.904999999999999680e-03
1.593100000000000072e-02
-3.393000000000000183e-03
1.495999999999999934e-03
-7.368999999999999682e-03
1.436199999999999977e-02
-1.319700000000000040e-02
-4.557000000000000287e-03
8.123700000000000365e-02
2.447399999999999923e-02
-1.295199999999999997e-02
-8.722100000000000686e-02
-5.232999999999999804e-03
-1.255940000000000112e-01
1.291999999999999963e-03
-1.382999999999999898e-03
4.989999999999999644e-03
1.508000000000000009e-03
2.304399999999999851e-02
2.819400000000000031e-02
3.119999999999999944e-04
-8.781999999999999876e-03
6.794500000000000539e-02
6.198999999999999649e-03
-2.058879999999999877e-01
9.219999999999999680e-04
-1.618800000000000100e-02
-3.415860000000000007e-01
-1.660999999999999933e-03
-4.889999999999999599e-04
1.759999999999999954e-04
-3.763999999999999985e-03
-6.566600000000000215e-02
-7.680000000000000195e-04
-1.231799999999999978e-01
7.047999999999999578e-03
1.425000000000000051e-02
-2.900799999999999906e-02
1.187499999999999944e-01
-1.449199999999999933e-01
-1.106999999999999911e-03
-1.557999999999999923e-03
-2.236999999999999839e-03
-7.270699999999999386e-02
-5.140000000000000254e-04
2.246999999999999865e-03
-1.778949999999999976e-01
1.669599999999999904e-02
-1.277799999999999943e-02
-2.379040000000000044e-01
-2.207999999999999893e-03
1.925000000000000062e-03
7.750100000000000044e-02
-5.004100000000000215e-02
1.704999999999999918e-03
3.272400000000000309e-02
1.957499999999999865e-02
-1.514620000000000133e-01
-3.288999999999999823e-03
-3.605699999999999877e-02
3.648999999999999900e-03
3.459799999999999681e-02
-1.859999999999999945e-04
3.300000000000000253e-05
3.000000000000000076e-06
9.999999999999999547e-07
4.800000000000000122e-05
1.361999999999999929e-03
-6.057300000000000184e-02
5.689999999999999529e-04
-5.000000000000000409e-06
2.984699999999999853e-02
6.999999999999999387e-05
4.600000000000000004e-05
-1.294499999999999991e-02
-2.318000000000000182e-03
-0.000000000000000000e+00
1.858200000000000129e-01
-5.969959999999999711e-01
6.000000000000000152e-06
The code I have tried to write is this:
from ast import Num
from cmath import sqrt
import math
import numpy as np
from numpy import append
f1= open('diagonal.txt', 'r')
k=f1.readlines()
#def mkstr(s):
# str1=" "
# return (str1.join(s))
#print(float(mkstr(k)))
#for j in f1:
#inverse_root(j)
#def mkstr2():
# f1=open("diagonal.txt", "r")
# k=f1.readlines()
# fl=[float(item) for item in k]
# inverse_root(fl)
#for j in mkstr2():
#inverse_root(j)
f=[float(x) for x in k]
sa=np.array(f)
ca=sa.astype(np.float)
def inverse_root(j):
return [1/(sqrt(j))]
j=[inverse_root(ca)]
I am getting the output like this:
DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ca=sa.astype(np.float)
Traceback (most recent call last):
File "/home/gian-2018/prateek/python/read_file/27/5/rmwpd.py", line 34, in <module>
j=[inverse_root(ca)]
File "/home/gian-2018/prateek/python/read_file/27/5/rmwpd.py", line 33, in inverse_root
return [1/(sqrt(j))]
TypeError: only length-1 arrays can be converted to Python scalars
So to get inverse of sqrt for each term from the matrix file: I tried to convert the list of these strings into float list, but even after I get error for trying to get list of these numbers into the formula of inverse sqrt instead of using a real number.
Now I am trying to create the inverse formula suitable for an array which means creating a new array which converts each term in the previous matrix into their inverse sqrt and saves the value into the new array.
By any way possible please either tell me how to convert this list into real numbers or convert each term of this matrix into its inverse sqrt and save it into new matrix.
Or just tell me how to implement this task in any way possible if you have understood it.

Convert a sympy poly with imaginary powers to an mpmath mpc

I have a sympy poly that looks like:
Poly(0.764635937801645*I**4 + 7.14650839258644*I**3 - 0.667712176660315*I**2 - 2.81663805543677*I - 0.623299856233272, I, domain='RR')
I'm converting to mpc using the following code:
a = val.subs('I',1.0j)
b = sy.re(a)
c = sy.im(a)
d = mpmath.mpc(b,c)
Two questions.
Assuming my mpc and sympy type have equal precision (of eg 100 dps) is there a precision loss using this conversion from a to d?
Is there a better way to convert?
Aside: sympy seems to treat I just like a symbol here. How do I get sympy to simplify this polynomial?
Edit: Ive also noticed that the following works in place of a above:
a = val.args[0]

Strings and expressions
Root cause of the issue is seen in val.subs('I', 1.0j) -- you appear to pass strings as arguments to SymPy functions. There are some valid uses for this (such as creation of high-precision floats), but when symbols are concerned, using strings is a recipe for confusion. The string 'I' gets implicitly converted to SymPy expression Symbol('I'), which is different from SymPy expression I. So the answer to
How do I get sympy to simplify this polynomial?
is to revisit the process of creation of that polynomial, and fix that. If you really need to create it from a string, then use locals parameter:
>>> S('3.3*I**2 + 2*I', locals={'I': I})
-3.3 + 2*I
Polynomials and expressions
If the Poly structure is not needed, use the method as_expr() of Poly to get an expression from it.
Conversion to mpmath and precision loss
is there a precision loss using this conversion from a to d?
Yes, splitting into real and imaginary and then recombining can lead to precision loss. Pass a SymPy object directly to mpc if you know it's a complex number. Or to mpmathify if you want mpmath to decide what type it should have. An example:
>>> val = S('1.111111111111111111111111111111111111111111111111')*I**3 - 2
>>> val
-2 - 1.111111111111111111111111111111111111111111111111*I
>>> import mpmath
>>> mpmath.mp.dps = 40
>>> mpmath.mpc(val)
mpc(real='-2.0', imag='-1.111111111111111111111111111111111111111111')
>>> mpmath.mpmathify(val)
mpc(real='-2.0', imag='-1.111111111111111111111111111111111111111111')
>>> mpmath.mpc(re(val), im(val))
mpc(real='-2.0', imag='-1.111111111111111111111111111111111111111114')
Observations:
When I is actual imaginary unit, I**3 evaluates fo -I, you don't have to do anything for it to happen.
A string representation of high-precision decimal is used to create such a float in SymPy. Here S stands for sympify. One can also be more direct and use Float('1.1111111111111111111111111')
Direct conversion of a SymPy complex number to an mpmath complex number is preferable to splitting in real/complex and recombining.
Conclusion
Most of the above is just talking around an XY problem. Your expression with I was not what you think it was, so you tried to do strange things that were not needed, and my answer is mostly a waste of time.

I'm adding my own answer here, as FTP's answer, although relevant and very helpful, did not (directly) resolve my issue (which wasn't that clear from the question tbh). When I ran the code in his example I got the following:
>>> from sympy import *
>>> import mpmath
>>> val = S('1.111111111111111111111111111111111111111111111111')*I**3 - 2
>>> val
-2 - 1.111111111111111111111111111111111111111111111111*I
>>> mpmath.mp.dps = 40
>>> mpmath.mpc(val)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 373, in __new__
real = cls.context.mpf(real)
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 77, in __new__
v._mpf_ = mpf_pos(cls.mpf_convert_arg(val, prec, rounding), prec, rounding)
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 96, in mpf_convert_arg
raise TypeError("cannot create mpf from " + repr(x))
TypeError: cannot create mpf from -2 - 1.111111111111111111111111111111111111111111111111*I
>>> mpmath.mpmathify(val)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 662, in convert
return ctx._convert_fallback(x, strings)
File "C:\Python27\lib\site-packages\mpmath\ctx_mp.py", line 614, in _convert_fallback
raise TypeError("cannot create mpf from " + repr(x))
TypeError: cannot create mpf from -2 - 1.111111111111111111111111111111111111111111111111*I
>>> mpmath.mpc(re(val), im(val))
mpc(real='-2.0', imag='-1.111111111111111111111111111111111111111114')
>>> mpmath.mpmathify(val)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\mpmath\ctx_mp_python.py", line 662, in convert
return ctx._convert_fallback(x, strings)
File "C:\Python27\lib\site-packages\mpmath\ctx_mp.py", line 614, in _convert_fallback
raise TypeError("cannot create mpf from " + repr(x))
TypeError: cannot create mpf from -2 - 1.111111111111111111111111111111111111111111111111*I
Updating my sympy (1.0->1.1.1) and mpmath (0.19->1.0.0) fixed the exceptions. I did not test which of these upgrades actually resolved the issue.

ValueError in Python 3 code

I have this code that will allow me to count the number of missing rows of numbers within the csv for a script in Python 3.6. However, these are the following errors in the program:
Error:
Traceback (most recent call last):
File "C:\Users\GapReport.py", line 14, in <module>
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
File "C:\Users\GapReport.py", line 14, in <genexpr>
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
ValueError: invalid literal for int() with base 10: 'AC-SEC 000000001'
Code:
import csv
def out(*args):
print('{},{}'.format(*(str(i).rjust(4, "0") for i in args)))
prev = 0
data = csv.reader(open('Padded Numbers_export.csv'))
print(*next(data), sep=', ') # header
for line in data:
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
if start != prev+1:
out(prev+1, start-1)
prev = end
out(start, end)
I'm stumped on how to fix these issues.Also, I think the csv many lines in it, so if there's a section that limits it to a few numbers, please feel free to update me on so.
CSV Snippet (Sorry if I wasn't clear before!):

The values you have in your CSV file are not numeric.
For example, FMAC-SEC 000000001 is not a number. So when you run int(s.strip()[2:]), it is not able to convert it to an int.
Some more comments on the code:
What is the utility of doing EndDoc_Padded, EndDoc_Padded = (...)? Currently you are assigning values to two variables with the same name. Either name one of them something else, or just have one variable there.
Are you trying to get the two different values from each column? In that case, you need to split line into two first. Are the contents of your file comma separated? If yes, then do for s in line.split(','), otherwise use the appropriate separator value in split().
You are running this inside a loop, so each time the values of the two variables would get updated to the values from the last line. If you're trying to obtain 2 lists of all the values, then this won't work.

Biopython Array Addition Error (Open for all)

Okay. Let me explain the things first. I have used a specific module named Biopython in this code. I am explaining the necessary details to solve the problem if you are not accustomed with the module.
The code is:
#!/usr/bin/python
from Bio.PDB.PDBParser import PDBParser
import numpy as np
parser=PDBParser(PERMISSIVE=1)
structure_id="mode_7"
filename="mode_7.pdb"
structure=parser.get_structure(structure_id, filename)
model1=structure[0]
s=(124,3)
newc=np.zeros(s,dtype=np.float32)
coord=[]
#for chain1 in model1.get_list():
# for residue1 in chain1.get_list():
# ca1=residue1["CA"]
# coord1=ca1.get_coord()
# newc.append(coord1)
for i in range(0,29):
model=structure[i]
for chain in model.get_list():
for residue in chain.get_list():
ca=residue["CA"]
coord.append(ca.get_coord())
newc=np.add(newc,coord)
print newc
print "END"
PDB file is the protein data bank file. The file I'm working with can be downloaded from https://drive.google.com/open?id=0B8oUhqYoEX6YVFJBTGlNZGNBdlk
If you remove the hashes from the first for loop, you'll find that get_coord() returns a (124,3) array with dtype float32. Likewise, the next for loop is supposed to return the same.
It gives out a strange error:
Traceback (most recent call last):
File "./average.py", line 27, in <module>
newc=np.add(newc,coord)
ValueError: operands could not be broadcast together with shapes (124,3) (248,3)
I am absolutely clueless how it manages to make a 248,3 array. I just want to add the array coord over itself. I tried with another modification of the code:
#!/usr/bin/python
from Bio.PDB.PDBParser import PDBParser
import numpy as np
parser=PDBParser(PERMISSIVE=1)
structure_id="mode_7"
filename="mode_7.pdb"
structure=parser.get_structure(structure_id, filename)
model1=structure[0]
s=(124,3)
newc=np.zeros(s,dtype=np.float32)
coord=[]
newc2=[]
#for chain1 in model1.get_list():
# for residue1 in chain1.get_list():
# ca1=residue1["CA"]
# coord1=ca1.get_coord()
# newc.append(coord1)
for i in range(0,29):
model=structure[i]
for chain in model.get_list():
for residue in chain.get_list():
ca=residue["CA"]
coord.append(ca.get_coord())
newc2=np.add(newc,coord)
print newc
print "END"
It gives out the same error. Can you help???

I'm not sure I fully understand what you're doing, but it looks like you need to reset the coords list at the start of every iteration:
for i in range(0,29):
coords = []
model=structure[i]
for chain in model.get_list():
for residue in chain.get_list():
ca=residue["CA"]
coord.append(ca.get_coord())
newc=np.add(newc,coord)
If you keep appending without clearing the list you add 124 items to coords at every iteration of the outer loop. The exception you see is likely raised during the second iteration.

receiving "ValueError: setting an array element with a sequence."

I've been having some problems with this code, trying to end up with an inner product of two 1-D arrays. The code of interest looks like this:
def find_percents(i):
percents=[]
median=1.5/(6+2*int(i/12))
b=2*median
m=b/(7+2*int(i/12))
for j in xrange (1,6+2*int(i/12)):
percents.append(float((b-m*j)))
percentlist=numpy.asarray(percents, dtype=float)
#print percentlist
total=sum(percentlist)
return total, percentlist
def playerlister(i):
players=[]
for i in xrange(i+1,i+6+2*int(i/12)):
position=sheet.cell(i,2)
points=sheet.cell(i,24)
if re.findall('RB', str(position.value)):
vbd=points.value-rbs[24]
players.append(vbd)
else:
pass
playerlist=numpy.asarray(players, dtype=float)
return playerlist
def others(i,percentlist,playerlist,total):
alternatives=[]
playerlist=playerlister(i)
percentlist=find_percents(i)
players=numpy.dot(playerlist,percentlist)
I am receiving the following error in response to the very last line of this attached code:
ValueError: setting an array element with a sequence.
In most other examples of this error, I have found the error to be because of incorrect data types in the arrays percentlist and playerlist, but mine should be float type. If it helps at all, I call these functions a little later in the program, like so:
for i in xrange(1,30):
total, percentlist= find_percents(i)
playerlist= playerlister(i)
print type(playerlist[i])
draft_score= others(i,percentlist,playerlist,total)
Can anyone help me figure out why I am setting an array element with a sequence? Please let me know if any more information might be helpful! Also for clarity, the playerlister is making use of the xlrd module to extract data from a spreadsheet, but the data are numerical and testing has shown that that both lists have a type of numpy.float64.
The shape and contents of each of these for one iteration of i is
<type 'numpy.float64'>
(5,)
[ 73.7 -94.4 140.9 44.8 130.9]
(5,)
[ 0.42857143 0.35714286 0.28571429 0.21428571 0.14285714]

Your function find_percents returns a two-element tuple.
When you call it in others, you are binding that tuple to the variable named percentlist, which you then try to use in a dot-product.
My guess is that by writing this in others it is fixed:
def others(i,percentlist,playerlist,total):
playerlist = playerlister(i)
_, percentlist = find_percents(i)
players = numpy.dot(playerlist,percentlist)
provided of course playerlist and percentlist always have the same number of elements (which we can't check because of the missing spreadsheet).
To verify, the following gives you the exact error message and the minimum of code needed to reproduce it:
>>> import numpy as np
>>> a = np.arange(5)
>>> np.dot(a, (2, a))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Custom Floor/Ceiling with Significance on a Time Series data Python - python

Related

I am trying to get inverse of a matrix from a file in python, but the format is in string and even after changing it into float I am unable to do it

Convert a sympy poly with imaginary powers to an mpmath mpc

ValueError in Python 3 code

Biopython Array Addition Error (Open for all)

receiving "ValueError: setting an array element with a sequence."

Categories

Resources