I would like to know how to convert a sympy value with a physical unit into the same unit with another prefix. For example:
>>> import sympy.physics.units as u
>>> x = 0.001 * u.kilogram
0.001kg
should be converted to grams. The approach I have taken so far is very bloated and delivers a wrong result.
>>> x / u.kilogram * u.gram
1.0^-6kg
It should be 1g instead.
If you can accept printing 1 instead of 1g, you could just use division:
>>> x / u.g
1.0
Otherwise, you'd better switch to sympy.physics.unitsystems.
>>> from sympy.physics.unitsystems import Quantity
>>> from sympy.physics.unitsystems.systems import mks
>>> Quantity(0.001, mks['kg'])
0.001kg
>>> _.convert_to(mks['g'])
1g
>>> u.convert_to(x, u.gram)
1.0*gram
Related
I am having problems reading probabilities from CSV using pandas.read_csv; some of the values are read as floats with > 1.0.
Specifically, I am confused about the following behavior:
>>> pandas.read_csv(io.StringIO("column\n0.99999999999999998"))["column"][0]
1.0
>>> pandas.read_csv(io.StringIO("column\n0.99999999999999999"))["column"][0]
1.0000000000000002
>>> pandas.read_csv(io.StringIO("column\n1.00000000000000000"))["column"][0]
1.0
>>> pandas.read_csv(io.StringIO("column\n1.00000000000000001"))["column"][0]
1.0
>>> pandas.read_csv(io.StringIO("column\n1.00000000000000008"))["column"][0]
1.0
>>> pandas.read_csv(io.StringIO("column\n1.00000000000000009"))["column"][0]
1.0000000000000002
Default float-parsing behavior seems to be non-monotonic, and especially some values starting 0.9... are converted to floats that are strictly greater than 1.0, causing problems e.g. when feeding them into sklearn.metrics.
The documentation states that read_csv has a parameter float_precision that can be used to select “which converter the C engine should use for floating-point values”, and setting this to 'high' indeed solves my problem.
However, I would like to understand the default behavior:
Where can I find the source code of the default float converter?
Where can I find documentation on the intended behavior of the default float converter and the other possible choices?
Why does a single-figure change in the least significant position skip a value?
Why does this behave non-monotonically at all?
Edit regarding “duplicate question”: This is not a duplicate. I am aware of the limitations of floating-point math. I was specifically asking about the default parsing mechanism in Pandas, since the builtin float does not show this behavior:
>>> float("0.99999999999999999")
1.0
...and I could not find documentation.
#MaxU already showed the source code for the parser and the relevant tokenizer xstrtod so I'll focus on the "why" part:
The code for xstrtod is roughly like this (translated to pure Python):
def xstrtod(p):
number = 0.
idx = 0
ndecimals = 0
while p[idx].isdigit():
number = number * 10. + int(p[idx])
idx += 1
idx += 1
while idx < len(p) and p[idx].isdigit():
number = number * 10. + int(p[idx])
idx += 1
ndecimals += 1
return number / 10**ndecimals
Which reproduces the "problem" you saw:
print(xstrtod('0.99999999999999997')) # 1.0
print(xstrtod('0.99999999999999998')) # 1.0
print(xstrtod('0.99999999999999999')) # 1.0000000000000002
print(xstrtod('1.00000000000000000')) # 1.0
print(xstrtod('1.00000000000000001')) # 1.0
print(xstrtod('1.00000000000000002')) # 1.0
print(xstrtod('1.00000000000000003')) # 1.0
print(xstrtod('1.00000000000000004')) # 1.0
print(xstrtod('1.00000000000000005')) # 1.0
print(xstrtod('1.00000000000000006')) # 1.0
print(xstrtod('1.00000000000000007')) # 1.0
print(xstrtod('1.00000000000000008')) # 1.0
print(xstrtod('1.00000000000000009')) # 1.0000000000000002
print(xstrtod('1.00000000000000019')) # 1.0000000000000002
The problem seems to be the 9 in the last place which alters the result. So it's floating point accuracy:
>>> float('100000000000000008')
1e+17
>>> float('100000000000000009')
1.0000000000000002e+17
It's the 9 in the last place that is responsible for the skewed results.
If you want high precision you can define your own converters or use python-provided ones, i.e. decimal.Decimal if you want arbitary precision:
>>> import pandas
>>> import decimal
>>> converter = {0: decimal.Decimal} # parse column 0 as decimals
>>> import io
>>> def parse(string):
... return '{:.30f}'.format(pd.read_csv(io.StringIO(string), converters=converter)["column"][0])
>>> print(parse("column\n0.99999999999999998"))
>>> print(parse("column\n0.99999999999999999"))
>>> print(parse("column\n1.00000000000000000"))
>>> print(parse("column\n1.00000000000000001"))
>>> print(parse("column\n1.00000000000000008"))
>>> print(parse("column\n1.00000000000000009"))
which prints:
0.999999999999999980000000000000
0.999999999999999990000000000000
1.000000000000000000000000000000
1.000000000000000010000000000000
1.000000000000000080000000000000
1.000000000000000090000000000000
Exactly representing the input!
If you want to understand how it works - look at the source code - file "_libs/parsers.pyx" lines: 492-499 for Pandas 0.20.1:
self.parser.double_converter_nogil = xstrtod # <------- default converter
self.parser.double_converter_withgil = NULL
if float_precision == 'high':
self.parser.double_converter_nogil = precise_xstrtod # <------- 'high' converter
self.parser.double_converter_withgil = NULL
elif float_precision == 'round_trip': # avoid gh-15140
self.parser.double_converter_nogil = NULL
self.parser.double_converter_withgil = round_trip
Source code for xstrtod
Source code for precise_xstrtod
(update)
Here's the actual problem I'm seeing. Note that round() doesn't seem to be doing the trick.
Here's my code:
t0=time.time()
# stuff
t1=time.time()
perfdat={'et1' : round(t1-t0,6), 'et2': '%.6f'%(t1-t0)}
And the dict and json output, respectively:
{'et2': '0.010214', 'et1': 0.010214000000000001}
{"et2":"0.010214","et1":0.010214000000000001}
(end update)
I've got a floating point value that has a lot of extra digits of precision that I don't need. Is there a way to truncate those digits when formatting a json string?
I can get the truncation I need if I format the value as a string, but I would like to transmit the value as a (truncated) number.
import json
v=2.030000002
json.dumps({'x':v}) # would like to just have 2.030
'{"x": 2.030000002}'
s= '%.3f' % (v) # like this, but not as a string
json.dumps({'x' : s})
'{"x": "2.030"}'
Wrap the number into a float:
>>> s = float('%.3f' % (v))
>>> json.dumps({'x' : s})
{"x": 2.03}
Builtin function round can help
In [16]: v=2.030000002
In [17]: json.dumps({'x': round(v, 3)})
Out[17]: '{"x": 2.03}'
This is something I found from from the Python Standard library:
"Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem:
>>> from decimal import *
>>> getcontext().prec = 6
>>> Decimal(1) / Decimal(7)
Decimal('0.142857')
>>> getcontext().prec = 28
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571428571428571428571429')
"
A better import statement would be:
from decimal import getcontext, Decimal
Then you could apply those same functions to specify an arbitrary precision. Hope this helps! I haven't actually used this before.
For your case: (still has the trailing zero issue)
getcontext().prec = 3
s = '2.030'
var = float(Decimal(s))
var returns 2.03
This following approach seems promising:
import json
v = 2.030000002
result = []
for part in json.JSONEncoder().iterencode({'x': v}):
try:
tmp = round(float(part), 3)
except ValueError:
pass
else:
part = '{:.3f}'.format(tmp)
result.append(part)
result = ''.join(result)
print result # -> {"x": 2.030}
I can get the output I need using R, but I can not reproduce within python's rpy2 module.
In R:
> wilcox.test(c(1,2,3), c(100,200,300), alternative = "less")$p.value
gives
[1] 0.05
In python:
import rpy2.robjects as robjects
rwilcox = robjects.r['wilcox.test']
x = robjects.IntVector([1,2,3,])
y = robjects.IntVector([100,200,300])
z = rwilcox(x,y, alternative = "less")
print z
gives:
Wilcoxon rank sum test
data: 1:3 and c(100L, 200L, 300L)
W = 0, p-value = 0.05
alternative hypothesis: true location shift is less than 0
And:
z1 = z.rx('p.value')
print z1
gives:
$p.value
[1] 0.05
Still trying to get a final value of 0.05 stored as a variable, but this seems to be closer to a final answer.
I am unable to figure out what my python code needs to be to to store the p.value in a new variable.
z1 is a ListVector containing one FloatVector with one element:
>>> z1
<ListVector - Python:0x4173368 / R:0x36fa648>
[FloatVector]
p.value: <class 'rpy2.robjects.vectors.FloatVector'>
<FloatVector - Python:0x4173290 / R:0x35e6b38>
[0.050000]
You can extract the float itself with z1[0][0] or just float(z1[0]):
>>> z1[0][0]
0.05
>>> type(z1[0][0])
<type 'float'>
>>> float(z1[0])
0.05
In general you are going to have an easier time figuring out what is going on in an interactive session if you just supply the name of the object you want a representation of. Using print x statement transforms things through str(x) when the repr(x) representation used implicitly by the interactive loop is much more helpful. If you are doing things in a script, use print repr(x) instead.
Just using list() ?
pval = z.rx2('p-value')
print list(pval) # [0.05]
rpy2 also works well with numpy:
import numpy
pval = numpy.array(pval)
print pval # array([ 0.05])
http://rpy.sourceforge.net/rpy2/doc-2.3/html/numpy.html#from-rpy2-to-numpy
Say I have a set of strings like the following:
"5 m^2"
"17 sq feet"
"3 inches"
"89 meters"
Is there a Python package which will read such strings, convert them to SI, and return the result in an easily-usable form? For instance:
>>> a=dream_parser.parse("17 sq feet")
>>> a.quantity
1.5793517
>>> a.type
'area'
>>> a.unit
'm^2'
Quantulum will do exactly what you described
Excerpt from its description:
from quantulum import parser
quants = parser.parse('I want 2 liters of wine')
# quants [Quantity(2, 'litre')]
More recently, pint is a good place to start for most of these.
Is there an extension for ipython that can do at least part of what you want. It's called ipython-physics
It does store value and units and allows (at least) some basic math. I have never used it myself, so I don't know how easy would be to use in a python script
If you have 'nice' strings then use pint.
(best for unit conversions)
import pint
u = pint.UnitRegistry()
value = u.quantity("89 meters")
If you have text/sentences then use quantulum
from quantulum import parser
value = parser.parse('Pass me a 300 ml beer.')
If you have 'ugly' strings then use try unit_parse.
Examples of 'ugly' strings: (see unit_parse github for more examples)
2.3 mlgcm --> 2.3 cm * g * ml
5E1 g/mol --> 50.0 g / mol
5 e1 g/mol --> 50.0 g / mol
()4.0 (°C) --> 4.0 °C
37.34 kJ/mole (at 25 °C) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
Detection in water: 0.73 ppm; Chemically pure --> 0.73 ppm
(uses pint under the hood)
from unit_parse import parser
result = parser("1.23 g/cm3 (at 25 °C)")
print(result) # [[<Quantity(1.23, 'g / cm ** 3')>, <Quantity(25, 'degC')>]]
Using numpy, how can I do the following:
ln(x)
Is it equivalent to:
np.log(x)
I apologise for such a seemingly trivial question, but my understanding of the difference between log and ln is that ln is logspace e?
np.log is ln, whereas np.log10 is your standard base 10 log.
Correct, np.log(x) is the Natural Log (base e log) of x.
For other bases, remember this law of logs: log-b(x) = log-k(x) / log-k(b) where log-b is the log in some arbitrary base b, and log-k is the log in base k, e.g.
here k = e
l = np.log(x) / np.log(100)
and l is the log-base-100 of x
I usually do like this:
from numpy import log as ln
Perhaps this can make you more comfortable.
Numpy seems to take a cue from MATLAB/Octave and uses log to be "log base e" or ln. Also like MATLAB/Octave, Numpy does not offer a logarithmic function for an arbitrary base.
If you find log confusing you can create your own object ln that refers to the numpy.log function:
>>> import numpy as np
>>> from math import e
>>> ln = np.log # assign the numpy log function to a new function called ln
>>> ln(e)
1.0
from numpy.lib.scimath import logn
from math import e
#using: x - var
logn(e, x)
You could simple just do the reverse by making the base of log to e.
import math
e = 2.718281
math.log(e, 10) = 2.302585093
ln(10) = 2.30258093