How do I put more precision in my imported data? - python

I'm trying to improve the precision by 10 digits for the columns marked in red (Image 1), I tried pd.set_options ('display.precision', 10), but it didn't work. Any ideas? the only line of code I have used is to import data from excel: df = pd.read_excel ('')

It looks like your numbers are actually really big integers, so pandas converts them to 'float' as opposed to integer when they get imported. (they don't fit in standard int64 data type).
Precision applies to decimals. i.e. if you have something like 0.234566 etc how many decimals to show. But in your case it's not decimals you looking for, but how many relevant digits to display, before cutting off to scientific notation.
Since it's a float data type, you have to control float format, i.e. pd.options.display.float_format
To keep scientific notation, but limit digits to 10,
set display options to this:
pd.options.display.float_format = '{:.10e}'.format
And it will display 10 digits after decimal point before e. You can change 10 to any other digit.
To understand how scientific notation works in python, try this:
print("{:.2e}".format(12345678))
Basically it will format 12345678 limited to 2 digits and will display as 1.23e+07 (which means 1.23 * 10^7)
If you don't want any scientific notation, and just want to display the long integer, use this:
pd.options.display.float_format = '{:.0f}'.format
In this case 0 means show 0 decimals, and it will show the full integer part of the number.
eg:
print("{:.0f}".format(1234567887298739)) returns 1234567887298739. While print("{:.2f}".format(1.234567887298739487)) returns 1.23.
Caveat: If the number is too long, it will start rounding and confusing after awhile. I think 10 digits is ok, but if it's much larger than that, python can't really handle it and it will start rounding and cutting things up... float precision has a system limit too.
Note: In all cases your underlying data stays the same. just the formatting changes.

By "improve precision", I assume you mean increase the number of digits that are displayed. In this case, pd.set_option('display.precision', 10) should give you what you want. I have an example as well.
import pandas as pd
c = pd.read_excel("Book1.xlsx")
print(c)
pd.set_option('display.precision', 10)
print()
print(c)
pd.set_option('display.precision', 20)
print()
print(c)
This yields the following.
https://i.stack.imgur.com/XHpdO.png
Make note that Pandas actually does have the actual values in memory, it is just printing less digits in accordance with default settings. As an example, you can verify this by indexing your data frames.
>>> frame = pd.DataFrame([123.456789101112])
>>> frame
0
0 123.456789
>>> frame[0].data[0]
123.456789101112

Related

Why does my program only print the first few characters of e rather than the whole number?

e = str(2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274)
print(e)
Output:
2.718281828459045
Screenshots: here and here.
Why does the code only print out the first few characters of e instead of the whole string?
A string str has characters, but a number (be it an int or a float) just has a value.
If you do this:
e_first_100 = '2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274'
print(e_first_100)
You'll see all digits printed, because they are just characters in a string, it could have also been the first 100 characters from 'War and Peace' and you would not expect any of that to get lost either.
Since 'e' is not an integer value, you can't use int here, so you'll have to use float, but Python uses a finite number of bits to represent such a number, while there's an infinite number of real numbers. In fact there's an infinite number of values between any two real numbers. So a clever way has to be used to represent at least the ones you use most often, with a limited amount of precision.
You often don't notice the lack of precision, but try something like .1 + .1 + .1 == .3 in Python and you'll see that it can pop up in common situations.
Your computer already has a built-in way to represent these floating point numbers, using either 32 or 64 bits, although many languages (Python included) do offer additional ways of representing floats that aren't part of the way your computer works and allow a bit more precision. By default, Python uses these standard representations of real numbers.
So, if you then do this:
e1 = float(e_first_100)
print(e1)
e2 = 2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274
print(e2)
Both result in a value that, when you print it, looks like:
2.718281828459045
Because that's the precision up to which the number is (more or less) accurately represented.
If you need to use e in a more precise manner, you can use Python's own representation:
from decimal import Decimal
e3 = Decimal(e_first_100)
print(e3)
That looks promising, but even Decimal only has limited precision, although it's better than standard floats:
print(e2 * 3)
print(e3 * Decimal(3))
The difference:
8.154845485377136
8.154845485377135706080862414
To expand on Grismar's answer, you don't see the data because the default string representation of floats cuts off at that point as going further than that wouldn't be very useful, but while the object is a float the data is still there.
To get a string with the data, you could provide a fixed precision to some larger amount of digits, for example
In [2]: e = format(
...: 2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274,
...: ".50f",
...: )
In [3]: e
Out[3]: '2.71828182845904509079559829842764884233474731445312'
which gives us the first 50 digits, but this is of course not particularly useful with floats as the loss of precision picks up the further you go

Dealing with decimals with many digits in Pandas,

pd.set_option('display.max_colwidth', None )
pd.set_option('display.float_format', lambda x: '%.200f' % x)
exData = pd.read_csv('AP11.csv',delimiter=';',float_precision=None)
x = exData.loc[:,['A','B']]
y = exData.loc[:,['C']]
x
my original float on excel is 0.1211101931541032183754113717355410323332315436353654273243543132542237415430173719
what is being displayed is
0.12111019315410319341363987177828676067292690277099609375000000000000000000000000000000000000000000000000000000000...
this is not a display issue. something in pandas rounds my float. i don't want to round any number for it will affect the result of my string. because this is originally a string that is converted to a float. i tried to use int64 but it can't handle big numbers. so instead i decided to use floats with "0.mystring" to not get "inf" displayed in pandas. and i get it rounded. is machine learning limited by these missy variables? or is there another way to deal with big numbers without rounding, displaying inf?
Use decimal instead of float. Just put
from decimal import Decimal
at the top of your code, and write your floats as
x = Decimal(0.121110193154103218375411371735541032333231543635365427324354313254223741543017371)
decimal is a library for floats with a dynamic length, rather than rounded.
Generally you should avoid floats, as they can have strange irregularities and roundings. Often when operations are performed on them, they can have a series of zeros and then some other numbers, when it should just have a few decimal places.

how to round up the number for array format in .txt output i n python

I am trying to extract my analysis result in .txt file. The results show as below :
-3.298409999999999854e+04 -3.298409999999999854e+04
-3.297840000000000146e+04 -3.297840000000000146e+04
Code:
anodeIdx = [10,20,30]
stressAnodeXX = [x for i,x in enumerate(stress_xx[0].Y) if i in anodeIdx]
stressAnodeYY = [x for i,x in enumerate(stress_yy[0].Y) if i in anodeIdx]
np.savetxt('Stress_strain_Anode.txt',np.c_[stressAnodeXX,stressAnodeYY])
I expected the result to be -32984.1 but the actual output is -3.2984099999e+4
To save the number in a specific way, you can use optional parameter fmt of np.savetxt(). Documentation
In your case:
np.savetxt('Stress_strain_Anode.txt',np.c_[stressAnodeXX,stressAnodeYY], fmt='%.1f')
f is specifier wihch saves the number as decimal floating point.
.1 Represents how many decimal numbers should be after the decimal point.
I think the problem here is not the numbers not being rounded, but not being appropiately formatted.
You could use the fmt keyword argument of numpy.savetxt to solve this. (numpy documentation):
np.savetxt('Stress_strain_Anode.txt', np.c_[stressAnodeXX,stressAnodeYY], fmt='%.1f')
Where '%.1f' is a format string which formats numbers with one decimal digit.
Your result is actually -32984.1. Float representation in binary code is not ideal so you see it in a bit confusing way. If you want, you can just round your result (but it is not needed):
np.round(your_result_number, decimals=1)
which will return:
-32984.1
More about your result:
-3.2984099999e+4 has two confusing parts:
099999 in the end of number
e+4 in the end of output
e+4 is a scientific notation of your number. It means: "multiply it to 10^4=10000. If you will do it, you will get 3.29841 * 10000 = 32984.1
099999... in the end of the number appears because computer tries to represent decimal float number in binary code, which leads to small "errors". So your result is actually -32984.1.

Conversion from string to float of numerical values (scientific notation)

I am using Python for reading a file and converting numerical value written as string to float. I observe a weird conversion:
a="-5.970471694E+02"
b = float(a)
b
>> -597.0471694
bb = np.float64(a)
bb
>> -597.04716940000003
e="-5.970471695E+02"
ee = np.float64(e)
ee
>> -597.0471695
ee-bb
>> -9.9999965641472954e-08
What is the reason of the term "0000003" at the end of bb. Why I don't observe the same thing for ee. Is this really a problem? I think this issue is due to the floating-point accuracy but the result seems to be perturbed before I start to use the variables...
What is the reason of the term "0000003" at the end of bb. Why I don't observe the same thing for ee.
b and bb have identical values (try evaluating b == bb). The difference comes down to how they are represented by the interpreter. By default, numpy floats are displayed with 8 digits after the decimal place, whereas Python floats are printed to 13 significant digits (including those before the decimal place).
Is this really a problem?
Since the actual values of b and bb are identical then the answer is almost certainly no. If the display differences bother you, you can use np.set_printoptions to control how numpy floats are represented in the interpreter. If you use IPython, you can also use the %precision magic to control how regular Python floats are printed.
Both float and float64 use a binary representation of the number. Both have to save the approximation that is caused by conversion from the number with base 10 to the number with base 2. The float uses less bits, so the error is greater and it is made visible when the a is copied to b. That is because b takes the a including the rounding error without the loss of information, and the a contains that 000..03 value. In other words, it is a rounding error from converting a decimal number to a binary number.

generate random numbers truncated to 2 decimal places

I would like to generate uniformly distributed random numbers between 0 and 0.5, but truncated to 2 decimal places.
without the truncation, I know this is done by
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
could anyone help me with suggestions on how to generate random numbers up to 2 d.p. only? Thanks!
A float cannot be truncated (or rounded) to 2 decimal digits, because there are many values with 2 decimal digits that just cannot be represented exactly as an IEEE double.
If you really want what you say you want, you need to use a type with exact precision, like Decimal.
Of course there are downsides to doing that—the most obvious one for numpy users being that you will have to use dtype=object, with all of the compactness and performance implications.
But it's the only way to actually do what you asked for.
Most likely, what you actually want to do is either Joran Beasley's answer (leave them untruncated, and just round at print-out time) or something similar to Lauritz V. Thaulow's answer (get the closest approximation you can, then use explicit epsilon checks everywhere).
Alternatively, you can do implicitly fixed-point arithmetic, as David Heffernan suggests in a comment: Generate random integers between 0 and 50, keep them as integers within numpy, and just format them as fixed point decimals and/or convert to Decimal when necessary (e.g., for printing results). This gives you all of the advantages of Decimal without the costs… although it does open an obvious window to create new bugs by forgetting to shift 2 places somewhere.
decimals are not truncated to 2 decimal places ever ... however their string representation maybe
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
print ["%0.2d"%val for val in set]
How about this?
np.random.randint(0, 50, size=(50,1)).astype("float") / 100
That is, create random integers between 0 and 50, and divide by 100.
EDIT:
As made clear in the comments, this will not give you exact two-digit decimals to work with, due to the nature of float representations in memory. It may look like you have the exact float 0.1 in your array, but it definitely isn't exactly 0.1. But it is very very close, and you can get it closer by using a "double" datatype instead.
You can postpone this problem by just keeping the numbers as integers, and remember that they're to be divided by 100 when you use them.
hundreds = random.randint(0, 50, size=(50, 1))
Then at least the roundoff won't happen until at the last minute (or maybe not at all, if the numerator of the equation is a multiple of the denominator).
I managed to find another alternative:
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,2))
for i in range(50):
for j in range(2):
set[i,j] = round(set[i,j],2)

Categories