I'm using python with numpy, scipy and matplotlib for data evaluation. As results I obtain averages and fitting parameters with errorbars.
I would like python to automatically pretty-print this data according to a given precision. For example:
Suppose I got the result x = 0.012345 +/- 0.000123.
Is there a way to automatically format this as 1.235(12) x 10^-2 when a precision of 2 was specified. That is, counting the precision in the errorbar, rather than in the value.
Does anyone know a package that provides such functionality, or would I have to implement this myself?
Is there a way to inject this into the python string formatting mechanism? I.e. being able to write something like "%.2N" % (0.012345, 0.0000123).
I already looked through the docs of numpy and scipy and googled around, but I couldn't find anything. I think this would be a useful feature for everyone who deals with statistics.
Thanks for your help!
EDIT:
As requested by Nathan Whitehead I'll give a few examples.
123 +- 1 ----precision 1-----> 123(1)
123 +- 1.1 ----precision 2-----> 123.0(11)
0.0123 +- 0.001 ----precision 1-----> 0.012(1)
123.111 +- 0.123 ----precision 2-----> 123.11(12)
The powers of ten are omitted for clarity.
The number inside the parenthesis is a shorthand notation for the standard error. The last digit of the number before the parens and the last digit of the number inside the parens have to be at the same decimal power. For some reason I cannot find a good explanation of this concept online. Only thing I got is this German Wikpedia article here. However, it is a quite common and very handy notation.
EDIT2:
I implemented the shorthand notation thing myself:
#!/usr/bin/env python
# *-* coding: utf-8 *-*
from math import floor, log10
# uncertainty to string
def un2str(x, xe, precision=2):
"""pretty print nominal value and uncertainty
x - nominal value
xe - uncertainty
precision - number of significant digits in uncertainty
returns shortest string representation of `x +- xe` either as
x.xx(ee)e+xx
or as
xxx.xx(ee)"""
# base 10 exponents
x_exp = int(floor(log10(x)))
xe_exp = int(floor(log10(xe)))
# uncertainty
un_exp = xe_exp-precision+1
un_int = round(xe*10**(-un_exp))
# nominal value
no_exp = un_exp
no_int = round(x*10**(-no_exp))
# format - nom(unc)exp
fieldw = x_exp - no_exp
fmt = '%%.%df' % fieldw
result1 = (fmt + '(%.0f)e%d') % (no_int*10**(-fieldw), un_int, x_exp)
# format - nom(unc)
fieldw = max(0, -no_exp)
fmt = '%%.%df' % fieldw
result2 = (fmt + '(%.0f)') % (no_int*10**no_exp, un_int*10**max(0, un_exp))
# return shortest representation
if len(result2) <= len(result1):
return result2
else:
return result1
if __name__ == "__main__":
xs = [123456, 12.34567, 0.123456, 0.001234560000, 0.0000123456]
xes = [ 123, 0.00123, 0.000123, 0.000000012345, 0.0000001234]
precs = [ 1, 2, 3, 4, 1]
for (x, xe, prec) in zip(xs, xes, precs):
print '%.6e +- %.6e #%d --> %s' % (x, xe, prec, un2str(x, xe, prec))
Output:
1.234560e+05 +- 1.230000e+02 #1 --> 1.235(1)e5
1.234567e+01 +- 1.230000e-03 #2 --> 12.3457(12)
1.234560e-01 +- 1.230000e-04 #3 --> 0.123456(123)
1.234560e-03 +- 1.234500e-08 #4 --> 0.00123456000(1235)
1.234560e-05 +- 1.234000e-07 #1 --> 1.23(1)e-5
For people that are still interested in this question, see the gvar library and here for an example of (at last part of) the desired behavior by the OP.
since x +- y is not a standard type (it could be seen as a complex with real and imaginary as x and y i guess, but that does not simplify anything...) but you can get full control over the presentation by creating a type and overriding the string function, i.e. something like this
class Res(object):
def __init__(self, res, delta):
self.res = res
self.delta = delta
def __str__(self):
return "%f +- %f"%(self.res,self.delta)
if __name__ == '__main__':
x = Res(0.2710,0.001)
print(x)
print(" a result: %s" % x)
you could naturally do something a bit more fancy inside the __str__ function...
Related
I am successfully able to read back data from an instrument:
When the read back is a voltage, I typically read back values such as 5.34e-02 Volts.
When the read back is frequency, I typically read values like 2.95e+04or 1.49e+05 with units Hz.
I would like to convert the voltage read back of 5.34e-02 to exponent e-3 (aka millivolts), ie.. 53.4e-3. next, I would like to extract the mantissa 53.4 out of this because I want all my data needs to be in milliVolts.
Similarly, I would like to convert all the frequency such as 2.95e+04 (or 1.49e+05) to kiloHz, ie... 29.5e+03 or 149e+03. Next would like to extract the mantissa 29.5 and 149 from this since all my data needs to be kHz.
Can someone suggest how to do this?
Well, to convert volts to millivolts, you multiply by 1000. To convert Hz to kHz, you divide by 1000.
>>> reading = 5.34e-02
>>> millivolts = reading * 1000
>>> print(millivolts)
53.400000000000006
>>> hz = 2.95e+04
>>> khz = hz /1000
>>> khz
29.5
>>>
FOLLOW-UP
OK, assuming your real goal is to keep the units the same but adjust the exponent to a multiple of 3, see if this meets your needs.
def convert(val):
if isinstance(val,int):
return str(val)
cvt = f"{val:3.2e}"
if 'e' not in cvt:
return cvt
# a will be #.##
# b will be -##
a,b = cvt.split('e')
exp = int(b)
if exp % 3 == 0:
return cvt
if exp % 3 == 1:
a = a[0]+a[2]+a[1]+a[3]
exp = abs(exp-1)
return f"{a}e{b[0]}{exp:02d}"
a = a[0]+a[2]+a[3]+a[1]
exp = abs(exp-2)
return f"{a}e{b[0]}{exp:02d}"
for val in (5.34e-01, 2.95e+03, 5.34e-02, 2.95e+04, 5.34e-03, 2.95e+06):
print( f"{val:3.2e} ->", convert(val) )
Output:
5.34e-01 -> 534.e-03
2.95e+03 -> 2.95e+03
5.34e-02 -> 53.4e-03
2.95e+04 -> 29.5e+03
5.34e-03 -> 5.34e-03
2.95e+06 -> 2.95e+06
In this case, I think multiplying/dividing by 1000 is enough to move between SI prefixes. But when units get more complicated it might help to use a library like Pint to keep track of things and make sure you're calculating what you think you are.
In this case you might do:
import pint
ureg = pint.UnitRegistry()
Q = ureg.Quantity
reading_v = Q(5.34e-02, 'volts')
reading_mv = reading_v.to('millivolts')
print(reading_mv.magnitude)
but it seems overkill here.
Function I tried to replicate:
doing a project for coursework in which I need to make the blackbody function and manipulate it in some ways.
I'm trying out alternate equations and in doing 2 of them i keep getting over flow error.
this is the error message:
alt2_2 = (1/((const_e**(freq/temp))-1))
OverflowError: (34, 'Result too large')
temp is given in kelvin (im using 5800 as my test value as it is approximately the temp of the sun)
freq is speed of light divided by whatever wavelength is inputted
freq = (3*(10**8))/wavelength
in this case i am using 0.00000005 as the test value for wavelength.
and const e is 2.7182
first time using stack. also first time doing a project on my own, any help appreciated.
This does the blackbody computation with your values.
import math
# Planck constant
h = 6.6e-34
# Boltzmann constant
k = 1.38e-23
# Speed of light
c = 3e+8
# Wavelength
wl = 0.00000005
# Temp
T = 5800
# Frequency
f = c/wl
# This is the exponent for e (about 49).
k1 = h*f / (k*T)
# This computes the spectral radiance.
Bvh = 2*f*f*f*h / (math.exp(k1)-1)
print(Bvh)
Output:
9.293819741690355e-08
Since we only used one or two digits on the way in, the resulting value is only good to one or two digits, 9.3E-08.
I was trying to find a fast way to sort strings in Python and the locale is a non-concern i.e. I just want to sort the array lexically according to the underlying bytes. This is perfect for something like radix sort. Here is my MWE
import numpy as np
import timeit
# randChar is workaround for MemoryError in mtrand.RandomState.choice
# http://stackoverflow.com/questions/25627161/how-to-solve-memory-error-in-mtrand-randomstate-choice
def randChar(f, numGrp, N) :
things = [f%x for x in range(numGrp)]
return [things[x] for x in np.random.choice(numGrp, N)]
N=int(1e7)
K=100
id3 = randChar("id%010d", N//K, N) # small groups (char)
timeit.Timer("id3.sort()" ,"from __main__ import id3").timeit(1) # 6.8 seconds
As you can see it took 6.8 seconds which is almost 10x slower than R's radix sort below.
N = 1e7
K = 100
id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE)
system.time(sort(id3,method="radix"))
I understand that Python's .sort() doesn't use radix sort, is there an implementation somewhere that allows me to sort strings as performantly as R?
AFAIK both R and Python "intern" strings so any optimisations in R can also be done in Python.
The top google result for "radix sort strings python" is this gist which produced an error when sorting on my test array.
It is true that R interns all strings, meaning it has a "global character cache" which serves as a central dictionary of all strings ever used by your program. This has its advantages: the data takes less memory, and certain algorithms (such as radix sort) can take advantage of this structure to achieve higher speed. This is particularly true for the scenarios such as in your example, where the number of unique strings is small relative to the size of the vector. On the other hand it has its drawbacks too: the global character cache prevents multi-threaded write access to character data.
In Python, afaik, only string literals are interned. For example:
>>> 'abc' is 'abc'
True
>>> x = 'ab'
>>> (x + 'c') is 'abc'
False
In practice it means that, unless you've embedded data directly into the text of the program, nothing will be interned.
Now, for your original question: "what is the fastest way to sort strings in python"? You can achieve very good speeds, comparable with R, with python datatable package. Here's the benchmark that sorts N = 10⁸ strings, randomly selected from a set of 1024:
import datatable as dt
import pandas as pd
import random
from time import time
n = 10**8
src = ["%x" % random.getrandbits(10) for _ in range(n)]
f0 = dt.Frame(src)
p0 = pd.DataFrame(src)
f0.to_csv("test1e8.csv")
t0 = time(); f1 = f0.sort(0); print("datatable: %.3fs" % (time()-t0))
t0 = time(); src.sort(); print("list.sort: %.3fs" % (time()-t0))
t0 = time(); p1 = p0.sort_values(0); print("pandas: %.3fs" % (time()-t0))
Which produces:
datatable: 1.465s / 1.462s / 1.460s (multiple runs)
list.sort: 44.352s
pandas: 395.083s
The same dataset in R (v3.4.2):
> require(data.table)
> DT = fread("test1e8.csv")
> system.time(sort(DT$C1, method="radix"))
user system elapsed
6.238 0.585 6.832
> system.time(DT[order(C1)])
user system elapsed
4.275 0.457 4.738
> system.time(setkey(DT, C1)) # sort in-place
user system elapsed
3.020 0.577 3.600
Jeremy Mets posted in the comments of this blog post that Numpy can sort string fairly by converting the array to np.araray. This indeed improve performance, however it is still slower than Julia's implementation.
import numpy as np
import timeit
# randChar is workaround for MemoryError in mtrand.RandomState.choice
# http://stackoverflow.com/questions/25627161/how-to-solve-memory-error-in-mtrand-randomstate-choice
def randChar(f, numGrp, N) :
things = [f%x for x in range(numGrp)]
return [things[x] for x in np.random.choice(numGrp, N)]
N=int(1e7)
K=100
id3 = np.array(randChar("id%010d", N//K, N)) # small groups (char)
timeit.Timer("id3.sort()" ,"from __main__ import id3").timeit(1) # 6.8 seconds
# your code goes here
def lagrange(x0, xlist,ylist):
wynik =float(0)
if (len(xlist)!=len(ylist)):
raise BufferError("Rozmiary list wartosci x i y musza byc takie same!")
for i in range(len(xlist)):
licznik=float(1)
mianownik = float(1)
for j in range(len(xlist)):
if (i!=j):
licznik=licznik*(x0-xlist[j])
mianownik=mianownik*(xlist[i]-xlist[j])
wynik=wynik+((licznik/mianownik)*ylist[i])
return wynik
x=[2.0,4.0,5.0,6.0 ]
y=[0.57672, -0.06604, -0.32757, -0.27668]
print ("Lagrange polynomial for point 5.5 is %d" % lagrange(5.5, x, y))
Why do I get answer 0 after I run it? When rewritten to c# and run with the same data it outputs answer -0.3539. Seems to me like casting / rounding error but I'm struggling to find it without debugger.
I am completely new to python, I'm using basic IdleX on windows to code it.
The problem is not your function, it’s the printing.
The formatter %d is a signed integer decimal. So if you have -0.354 as a result, it gets rounded to 0.
Instead, print using %f:
>>> print ("Lagrange polynomial for point 5.5 is %f" % lagrange(5.5, x, y))
Lagrange polynomial for point 5.5 is -0.353952
For a project in one of my classes we have to output numbers up to five decimal places.It is possible that the output will be a complex number and I am unable to figure out how to output a complex number with five decimal places. For floats I know it is just:
print "%0.5f"%variable_name
Is there something similar for complex numbers?
You could do it as is shown below using the str.format() method:
>>> n = 3.4+2.3j
>>> n
(3.4+2.3j)
>>> '({0.real:.2f} + {0.imag:.2f}i)'.format(n)
'(3.40 + 2.30i)'
>>> '({c.real:.2f} + {c.imag:.2f}i)'.format(c=n)
'(3.40 + 2.30i)'
To make it handle both positive and negative imaginary portions properly, you would need a (even more) complicated formatting operation:
>>> n = 3.4-2.3j
>>> n
(3.4-2.3j)
>>> '({0:.2f} {1} {2:.2f}i)'.format(n.real, '+-'[n.imag < 0], abs(n.imag))
'(3.40 - 2.30i)'
Update - Easier Way
Although you cannot use f as a presentation type for complex numbers using the string formatting operator %:
n1 = 3.4+2.3j
n2 = 3.4-2.3j
try:
print('test: %.2f' % n1)
except Exception as exc:
print('{}: {}'.format(type(exc).__name__, exc))
Output:
TypeError: float argument required, not complex
You can however use it with complex numbers via the str.format() method. This isn't explicitly documented, but is implied by the Format Specification Mini-Language documentation which just says:
'f' Fixed point. Displays the number as a fixed-point number. The default precision is 6.
. . .so it's easy to overlook.
In concrete terms, the following works in both Python 2.7.14 and 3.4.6:
print('n1: {:.2f}'.format(n1))
print('n2: {:.2f}'.format(n2))
Output:
n1: 3.10+4.20j
n2: 3.10-4.20j
This doesn't give you quite the control the code in my original answer does, but it's certainly much more concise (and handles both positive and negative imaginary parts automatically).
Update 2 - f-strings
Formatted string literals (aka f-strings) were added in Python 3.6, which means it could also be done like this in that version or later:
print(f'n1: {n1:.2f}') # -> n1: 3.40+2.30j
print(f'n2: {n2:.3f}') # -> n2: 3.400-2.300j
In Python 3.8.0, support for an = specifier was added to f-strings, allowing you to write:
print(f'{n1=:.2f}') # -> n1=3.40+2.30j
print(f'{n2=:.3f}') # -> n2=3.400-2.300j
Neither String Formatting Operations - i.e. the modulo (%) operator) -
nor the newer str.format() Format String Syntax support complex types.
However it is possible to call the __format__ method of all built in numeric types directly.
Here is an example:
>>> i = -3 # int
>>> l = -33L # long (only Python 2.X)
>>> f = -10./3 # float
>>> c = - 1./9 - 2.j/9 # complex
>>> [ x.__format__('.3f') for x in (i, l, f, c)]
['-3.000', '-33.000', '-3.333', '-0.111-0.222j']
Note, that this works well with negative imaginary parts too.
For questions like this, the Python documentation should be your first stop. Specifically, have a look at the section on string formatting. It lists all the string format codes; there isn't one for complex numbers.
What you can do is format the real and imaginary parts of the number separately, using x.real and x.imag, and print it out in a + bi form.
>>> n = 3.4 + 2.3j
>>> print '%05f %05fi' % (n.real, n.imag)
3.400000 2.300000i
As of Python 2.6 you can define how objects of your own classes respond to format strings. So, you can define a subclass of complex that can be formatted. Here's an example:
>>> class Complex_formatted(complex):
... def __format__(self, fmt):
... cfmt = "({:" + fmt + "}{:+" + fmt + "}j)"
... return cfmt.format(self.real, self.imag)
...
>>> z1 = Complex_formatted(.123456789 + 123.456789j)
>>> z2 = Complex_formatted(.123456789 - 123.456789j)
>>> "My complex numbers are {:0.5f} and {:0.5f}.".format(z1, z2)
'My complex numbers are (0.12346+123.45679j) and (0.12346-123.45679j).'
>>> "My complex numbers are {:0.6f} and {:0.6f}.".format(z1, z2)
'My complex numbers are (0.123457+123.456789j) and (0.123457-123.456789j).'
Objects of this class behave exactly like complex numbers except they take more space and operate more slowly; reader beware.
Check this out:
np.set_printoptions(precision=2) # Rounds up to 2 decimals all float expressions
I've successfully printed my complexfloat's expressions:
# Show poles and zeros
print( "zeros = ", zeros_H , "\n")
print( "poles = ", poles_H )
out before:
zeros = [-0.8 +0.6j -0.8 -0.6j -0.66666667+0.j ]
poles = [-0.81542318+0.60991027j -0.81542318-0.60991027j -0.8358203 +0.j ]
out after:
zeros = [-0.8 +0.6j -0.8 -0.6j -0.67+0.j ]
poles = [-0.82+0.61j -0.82-0.61j -0.84+0.j ]