I am trying to use the round() function in databricks to round some float values to 2 digits. However, the databricks python is not working like normal python.
Please help me with the reasons and solutions if any.
lis = [-12.1334, 12.23433, 1.2343, -104.444]
lis2 = [round(val,2) for val in lis]
print(lis2)
TypeError: Invalid argument, not a string or column: -12.1334 of type <type 'float'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
Image Proof of Code
This is only reproducible when you import the spark round function from the function module in spark.sql
The spark round function requires a string or a column. Which explains the error.
You can either alias the import such as import pyspark.sql.functions as F instead of from pyspark.sql.functions import *
You can get the origin round method this way.
import builtins
round = getattr(builtins, "round")
And then you can execute
lis = [-12.1334, 12.23433, 1.2343, -104.444]
lis2 = [round(val, 2) for val in lis]
print(lis2)
Good day, The question is most likely with name-space conflict. I ran ran something like
from pyspark.sql.functions import *
Which contains function round. You can easily see which round is in usage by running help on it:
help(round)
Easy fix for this is to designate pyspark function to different name-space.
import pyspark.sql.functions as F
lis = [-12.1334, 12.23433, 1.2343, -104.444]
lis2 = [round(val,2) for val in lis]
print(lis2)
[-12.13, 12.23, 1.23, -104.44]
Try this:
lis = [-12.1334, 12.23433, 1.2343, -104.444]
list_em = []
for row in lis:
list_em.append(round(row,2))
print(list_em)
[-12.13, 12.23, 1.23, -104.44]
I believe this is the source code for the function you are applying:
def round(col, scale=0):
"""
Round the given value to `scale` decimal places using HALF_UP rounding mode if `scale` >= 0
or at integral part when `scale` < 0.
>>> spark.createDataFrame([(2.5,)], ['a']).select(round('a', 0).alias('r')).collect()
[Row(r=3.0)]
"""
sc = SparkContext._active_spark_context
return Column(sc._jvm.functions.round(_to_java_column(col), scale))
Clearly it says to pass in a column, not a decimal number. Did you import *? That could have overridden the builtin function.
Related
I am trying to identify the import module name which is dependent by a variable in the script.
ex:
import module1
import module2
x = module1.func()
y = x.func()
given y, my output should be y -> x -> module1 and the respective line num.
Can someone help me figure this out? Just now started exploring this with AST.
Just using ast is not going to be helpful in a lot of situations. You
need to include some kind of (simulated) execution, akin of exec().
For example, what can the ast tell from this Python program, which outputs
a string of digits? The identifier "digits" seems to come out of thin air
in the syntax tree.
exec('from string import digits')
mystr = digits
print(mystr)
And, to support the argument of execution needed, the following program
would have to result in output nums -> int, but just using a syntax tree, it
would probably output nums -> string | int.
import string
nums = string.digits
if True:
nums = 0
print(nums)
Now, what you are asking can still be done, but rather than solve it with
abstract syntax trees, you would have more success using trace as a
starting point, and enhance the data collected in the various tracing calls.
use __name__ attribute
>>> import numpy
>>> x = numpy
>>> y = x.__name__
>>> y
'numpy'
Edit:
import numpy
arr = []
def a(strofwanted, library, varname):
global arr
arr.append([varname, library])
return exec(strofwanted)
x = a("numpy.array([1, 2])", "numpy", "x")
>>>x
[1,2]
>>>arr
[["x", "numpy"]]
I am using sympy to differentiate a function in python. After differentiating the function, I would like to add in the numerical value of the variable that I differentiated with. However, using .subs() does not return a different answer. Does anyone have an idea s to what my issue is?
Code:
CA1 = CA0 * sympy.exp(-(A1*sympy.exp(-E1/(R*T)))*t)
dCa_dA12 = diff(CA1, A1)
print("No substitution:", dCa_dA12)
dCa_1 = dCa_dA12.subs(A1, theta[0])
print("Substitution:", dCa_1)
Output:
I had the same problem and was tinkering around a bit:
This works:
>>> sympify("k").evalf(subs={"k":1})
1.00000000000000
This doesn't work:
>>> sympify("k+x").evalf(subs={"k":1})
k + x
This again works:
>>> sympify("k+x").evalf(subs={"k":1, "x":2})
3.00000000000000
So it seems the substitution doesn't work if the result is not a number. Strangely, this only applies to the subs part:
>>> sympify("2/3*x")
2*x/3
>>> sympify("2/3*x").evalf()
0.666666666666667*x
This looks like a bug to me. At least, it should be documented properly.
In my Python script for the line
result = sp.solve(eqn)
I get the following output in the format similar to
result = 0.0100503358535014*I
I gather this means the solution is a complex number. So in this case it could also be seen as result = 0.01j.
I want to add formatting to my code to change result = 0.0100503358535014*I to something more like result = 0.01j. However I am finding issues trying to do this as I was trying to use isinstance to check if result was complex
if isinstance(result, complex):
print "Result is complex!"
... my code here...
However this loop is never entered (i.e 0.0100503358535014*I isn't classified as complex here).
What way should I write an if statement to check if result is given in the manner xxxxxxx*I correctly?
SymPy supports Python's built-in function complex():
>>> import sympy
>>> x = sympy.sqrt(-2)
>>> x
sqrt(2)*I
>>> complex(x)
1.4142135623730951j
There are some examples in http://docs.sympy.org/latest/modules/evalf.html
A Python complex number can be formatted similar to a float:
>>> format(complex(x), 'e')
'0.000000e+00+1.414214e+00j'
>>> format(complex(x), 'f')
'0.000000+1.414214j'
>>> format(complex(x), 'g')
'0+1.41421j'
If you want to format the real and imag parts separately you can do it yourself.
The conversion would raise a TypeError if it can't be done:
>>> complex(sympy.Symbol('x'))
Traceback (most recent call last):
TypeError: can't convert expression to float
Here's an alternative which incidentally indicates how to check whether one or both parts of a complex number are available.
>>> from sympy import *
>>> var('x')
x
>>> expr = (x+0.012345678*I)*(x-0.2346678934)*(x-(3-2.67893455*I))
>>> solve(expr)
[0.234667893400000, -0.012345678*I, 3.0 - 2.67893455*I]
>>> roots = solve(expr)
>>> for root in roots:
... r, i = root.as_real_imag()
... '%10.3f %10.3f i' % (r,i)
...
' 0.235 0.000 i'
' 0.000 -0.012 i'
' 3.000 -2.679 i'
You could check the sign of the complex part to decide whether to put a plus sign in. I would like to have been able to use the newfangled formatting but fell afoul of the bug which comes with Py3.4+ mentioned in Python TypeError: non-empty format string passed to object.__format__ for which I have no remedy.
The following checks whether the sympy object result represents a complex number and truncates the float value to 2 decimal points.
import sympy as sp
result = 0.0100503358535014*sp.I
print(result.is_complex)
print(sp.N(result,2))
True
0.01*I
I am trying to give an array as input and expect an array as output for the following code.
from sympy import symbols
from sympy.utilities.lambdify import lambdify
import os
from sympy import *
import numpy as np
text=open('expr.txt','r')
expr=text.read()
x,param1,param2=symbols('x param1 param2')
params=np.array([param1,param2])
T=lambdify((x,params),expr,modules='numpy')
data=np.genfromtxt('datafile.csv',delimiter=',')
print T(data[0],[0.29,4.5])
text.close()
But get the following error.
TypeError: <lambda>() takes exactly 3 arguments (13 given)
How do i tell sympy that its a single array? Thanks in advance.
1. Solution:
Your problem is, that the function T expects a value, but you are handing out a list. Try this instead of print T(data[0],[0.29,4.5])to get a list of results:
print [T(val,[0.29,4.5]) for val in data[0]]
Or use a wrapper function:
def arrayT(array, params):
return [T(val, params) for val in array]
print arrayT(data[0], [0.29, 4.5])
2. Solution: You have to change your mathematical expression. Somehow sympy doesn't work with list of lists, so try this:
expr = "2*y/z*(x**(z-1)-x**(-1-z/2))"
T=lambdify((x,y,z),expr,'numpy')
print T(data[0], 0.29, 4.5)
So, my code is like this:
def func(s,x):
return eval(s.replace('x',x)
#Example:
>> func('x**2 + 3*x',1)
4
The first argument of the function func must be a string because the function eval accepts only string or code objects. However, I'd like to use this function in a kind of calculator, where the user types for example 2 + sin(2*pi-0.15) + func(1.8*x-32,273) and gets the answer of the expression, and it's annoying always to have to write the quotes before in the expression inside func().
Is there a way to make python understands the s argument is always a string, even when it's not between quotes?
No, it is not possible. You can't intercept the Python interpreter before it parses and evaluates 1.8*x-32.
Using eval as a glorified calculator is a highly questionable idea. The user could pass in all kinds of malicious Python code. If you're going to do it, you should provide as minimal an environment as possible for the code to run in. Pass in your own globals dict containing only the variables the user is allowed to reference.
return eval(s, {'x': x})
Besides being safer, this is also a better way to substitute x into the expression.
You could have it handle both cases:
def func(s, x=0):
if isinstance(s, basestring):
# x is in the scope, so you don't need to replace the string
return eval(s)
else:
return s
And the output:
>>> from math import *
>>> func('2 + sin(2*pi-0.15) + func(1.8*x-32,273)')
-30.1494381324736
>>> func('x**2 + 3*x', 1)
4
Caution: eval can do more than just add numbers. I can type __import__('os').system('rm /your/homework.doc') and your calculator will delete your homework.
In a word: no, if I understand you.
In a few more, you can sort of get around the problem by making x be a special object. This is how the Python math library SymPy works. For example:
>>> from sympy import Symbol
>>> x = Symbol('x')
>>> x**2+3*x
x**2 + 3*x
>>> (x**2+3*x).subs(x,1)
4
There's even a handy function to turn strings into sympy objects:
>>> from sympy import sympify, pi
>>> sympify("x**2 - sin(x)")
x**2 - sin(x)
>>> _.subs(x, pi)
pi**2
All the warnings about untrusted user input hold. [I'm too lazy to check whether or not eval or exec is used on the sympify code path, and as they say, every weapon is loaded, even the unloaded ones.]
You can write an interpreter:
import code
def readfunc(prompt):
raw = input(prompt)
if raw.count(',')!=1:
print('Bad expression: {}'.format(raw))
return ''
s, x = raw.split(',')
return '''x={}; {}'''.format(x, s)
code.interact('Calc 0.1', readfunc)