I have the following pandas Series:
my_series = ['150000000000000000000000', '45064744242514231410', '2618611848503168287542', '7673975728717793369']
Every number in the list has 18 decimal places (that's what dictates what number exactly it is, prior to seeing any formatting).
my_series[0], therefore, is 150,000.000000000000000000 (one hundred and fifty thousand).
my_series[1], therefore, is 45.064744242514231410 (fourty-five...).
And so on.
I basically want Python to recognize the strings and tunr them into the correct float for me to make calculations with thie Series later.
I don't need to print the correct formatted number, rather, have Pythoin recognize it's a 150,000 instead of a 1,500,000,000 and so on.
Example for my_series[2] of what the corrrect float would be:
2,618.61
My current code:
[float("{:.18f}".format(int(item) for item in my_series))]
Which yields me the following error:
TypeError: unsupported format string passed to generator.__format__
How do I format the strings in the Series according to my requirements above and get the correct float?
You can convert the string to float and then apply formatting.
my_series = ['150000000000000000000000', '45064744242514231410',
'2618611848503168287542', '7673975728717793369']
["{:,.2f}".format(float(item)/10**18) for item in my_series]
['150,000.00', '45.06', '2,618.61', '7.67']
Note that this may lose some precision when converting the string to float.
If this is a problem to you, then you may want to use either
Separate the integer part and decimal part and combine them when printing
Use Decimal class
After a few iterations, I think I understand what OP was going for, so I changed my example. OP does not seem to be worried about loss of precision and was getting value errors (probably due to invalid fields coming in as part of the Series). I've modified my sample to be close to how it would happen in Pandas by adding some deliberately fake inputs.
my_series = [
"not a number",
"",
"150000000000000000000000",
"45064744242514231410",
"2618611848503168287542",
"7673975728717793369",
]
def convert_to_float(number):
float_string = None
my_float = None
try:
float_string = f"{int(number[:-18])}.{number[-18:]}"
my_float = float(float_string)
except ValueError as e:
print(e)
return None
return my_float
numbers = list(map(convert_to_float, my_series))
for num in numbers:
if num:
print(f"{num :.18f}")
Related
I am using python to read some float values from a file. The values read are input to an 'Ada'(Programming Language) program.
The values being read are in different formats (scientific, decimal) and I would like to retain the format.
Everything works well with simple float() operation except when converting '1.0e-5' to float.
>>float('1.0e-5')
#returns 1e-5
1e-5 when used in Ada program gives
error:negative exponent not allowed for integer literal
1.0e-35 works with ada program.
I know if I use format I can get 1.0e-5
>>>"{:.1E}".format(float('1.0e-5'))
#returns '1.0E-5'
But this changes format for other read values also as my reading/manipulation function is common.
How should I approach this problem?
and if
float('1.0')
#returns 1.0
why same behavior is not followed when converting a scientific notation string to float?
(My reading/manipulation function is common. Using formatter string will change the formatting of other read values also)
You could use a custom float to string conversion function which checks if the number will be accepted by Ada using a regular expression (which tests if there are only non-dots before the exponent character, and in which case only convert with format):
import re
def ada_compliant_float_as_string(f):
return "{:.1e}".format(f) if re.match("^-?[^\.]e",str(f)) else str(f)
for f in [-1e-5,1e-5,1.4e-5,-12e4,1,1.0]:
print(ada_compliant_float_as_string(f))
prints:
-1.0e-05
1.0e-05
1.4e-05
-120000.0
1
1.0
only the first value is corrected, other values are just the string representation of a float, unchanged.
I asked this because it is possible in R. Note that both 1.5 and 1 are in numeric type (double-precision), and only 1L is an integer. When coercing a string into numeric type, it doesn't show a decimal point if there's not one in the string.
class(1.5)
# "numeric"
class(1)
# "numeric"
class(1L)
# "integer"
x <- as.numeric("3")
x
# 3
class(x)
# "numeric"
Am I allowed to have similar operations in Python? Let's say I have a function called key_in_a_number:
def key_in_a_number():
num = input("Key in a number here: ")
try:
return float(num)
except ValueError:
return "Please key in only numbers."
Now if one keys in "40", it will return 40.0, but 40.0 and 40 are different in certain digits. Thus, 40 should be returned if "40" is keyed in, while 40.0 should be returned only when "40.0" is keyed in.
My work around is:
def key_in_a_number():
num = input("Key in a number here: ")
try:
return int(num)
except ValueError:
try:
return float(num)
except ValueError:
return "Please key in only numbers."
However, in this way, I cannot be sure that the results are always in the same type, which could be problematic in following data storage or processing. Is there any way to have a number in float type without a decimal point?
I think your core problem here is that you're misunderstanding what float is.
A float represents a C double, which almost always means an IEEE 754-1985 double (or an IEEE 754-2008 binary64, which is basically the same thing but slightly better defined). It always has 53 binary digits of precision. It doesn't matter whether you specify it as 40., 40.00000, float(40), float('40'), or float('40.00'); those are all identical in every way.
So, the main problem you're asking about doesn't make any sense:
Now if one keys in "40", it will return 40.0, but 40.0 and 40 are different in certain digits.
No, they aren't. float("40") and float("40.0") are both the exact same value, with no differences in any digits, and no difference in their precision, or anything else.
There's a different type in Python, in the decimal library, that represents an IEEE 754-2008 arbitrary-sized decimal. It has as many decimal digits of precision as you tell it to have.
So, Decimal('40') and Decimal('40.') have two digits; Decimal('40.000') has five digits—they may be equal, but they're not identical, because the last one is more precise.
Decimal, on the other hand, prints out however many digits of precision it actually has:
>>> print(Decimal('40'))
40
>>> print(Decimal('40.'))
40
>>> print(Decimal('40.0'))
40.0
While we're at it, if you do want float and int values, here's how to translate each line of R into Python:
class(1.5) # numeric
type(1.5) # float
class(1) # numeric
type(1) # int
type(1.) # float
class(1L) # integer
type(1) # int
x <- as.numeric("3") # numeric
x = float(3) # float
x = float("3") # float
Notice that, just like as.numeric("3") gives you a numeric rather than an integer, float("3")gives you afloatrather than anint`. I'm not sure why that Python behavior puzzles you given that it's identical to the equivalent R behavior.
Yes,
10 would be an integer in Python, whereas 10. which represents the same number would be a float.
I have rather unusual request, but I hope someone might be able to help.
I am appending floats as values to numpy array.
I would like to make sure that float does not get rounded when its decimal part ends in 0.
For example:
I would like float 31.30 to stay 31.30 in the array, but what I am experiencing now is that it gets set to 31.3.
The reason why I want to make sure float does not change is that later on I am splitting it to two integers(31.30 to 31 and 30) and it is of critical importance for me that both integers have the 'complete' values.
Is there any way to get around this? I tried with making these floats strings, but my problem is that I need to compare the arrays with these floats and numpy.array_equal() does not seem to work for arrays of strings...
Any help will be greatly appreciated,
Best wishes
Since 31.3 and 31.30 are exactly the same number, there is no way to distinguish them. But I guess you don't need to anyway. This seems more like a formatting issue. If you always expect two digits after the dot:
from math import floor
x = 31.3
whole_part = int(floor(x))
two_after_dot = int(floor(x*100))-whole_part*100
print(whole_part, two_after_dot)
Output:
31 30
If this actually isn't the case and the number of digits after the dot should vary while also keeping varying numbers of trailing zeros, then you cannot use numeric types. Use strings from the very beginning instead.
When I ran into a similar problem because of converting millisecond references to microseconds, I had to convert to a string and loop over the string adding the needed 0's until the length of the string was correct. Then when the value was converted back to integer, the calculations worked.
The data will be passed to strptime as as string
vals = vals.split('.') # a fractional part of the seconds
nofrag, frag = vals
length = len(frag)
# This converts the fractional string to microseconds, given unknown precision
if length > 6:
frag = frag(0:5)
else:
while length < 6:
frag = frag + '0'
length += 1
# strptime requires even seconds with microseconds added later
nofrag_dt = DT.datetime.strptime(nofrag, '%Y%m%d %H%M%S')
dt = nofrag_dt.replace(microsecond=int(frag))
return dt
Part of my homework assignment is to write a function that will parse a string such as '-.4e-4' and identify any problems that would prevent it from being cast to a float. For example, in '10e4.5' I would need to detect the decimal in the exponent and provide a relevant error message.
I have attempted many things. The first and, of course, most basic is the try: except:. Attempt to cast it to a float and let Python do the heavy lifting. However, as far as I can see, the errors it can return are not descriptive enough for this assignment.
The second thing I tried was to normalize the string, replacing all digits with n, signs with s, decimals with d, exponents with e (the maketrans function from C made this very fast). Then, I cut down any repeated n's to a single n. I made a list of all valid float formats and checked if the normalized string was in that list. AKA, I white-listed it. It worked perfectly and rather time-efficiently, but again, no error checking. That code is posted below.
import string,time
check_float_trans = string.maketrans("nsd0123456789-+.","???nnnnnnnnnnssd")
check_float_valids = 'n sn sndn ndn ndnen dn sdn sdnen sdnesn dnesn dnen nen nesn snesn sn snen sndnen sndnesn ndnesn'.split()
def check_float( test ):
"""Check if string <test> could be cast as a float, returns boolean."""
test = test.translate(check_float_trans)
test = ''.join([a for a,b in zip(test, ' '+test) if a != b])
return test in check_float_valids
I was hoping someone here could give me some pointers. I don't want this handed to me, but I am relatively stuck. I tried guardian-coding it, trying to identify reasons why the string might not be castable as a float, but I could never put up enough walls to ensure that no bad strings got a false positive.
Thanks.
Here's what I would do... (also this is untested)
def isValid(expression):
if 'e' in expression:
number, exponent = expression.split('e')
else:
print "not a valid format, no 'e' in expression"
return False
# a bunch of other if statments here to check for
#certain conditions like '.' in the exponent component
return float(number) ** float(exponent)
if __name__ == '__main__':
print isValid('-.4e-4')
print isValid('10e4.5')
I'm a programming newbie having difficulty with Python multiplication. I have code like this:
def getPriceDiscount():
price = int(input())
if price > 3000:
priceDiscount = price * 0.6
return priceDiscount
else:
return price
But when I execute it and type an input which is a decimal number like 87.94, I get the following error:
ValueError: invalid literal for int() with base 10: '87.94'
Isn't the int() method able to convert the string '87.94' into a number allowing me to multiply it by 0.6? What should I do to perform that conversion?
I'm using Python 3.2.2.
Actually, you CAN pass a floating point value to int(). In that case int() just rounds the number down and converts that value to an integer type.
However, what you're doing when you call int("87.94") is passing a string resembling a decimal point value to int(). int() can't directly convert from such a string to an integer. You hence must use float(), which can convert from strings to floats.
You can't pass a string with a decimal point to int(). You can use float() instead. If you want only the integral part i.e. to truncate the .94 in 87.94, you can do int(float('87.94')).
An int (short for "integer") is a whole number. A float (short for "floating-point number") is a number with a decimal point.
int() returns an int created from its input argument. You can use it to convert a string like "15" into the int 15, or a float like 12.059 into the int 12.
float() returns a float created from its input argument. You can use it to convert a string like "10.5" into the float 10.5, or even an int like 12 into the float 12.0.
If you want to force price to be an integer, but you want to accept a floating point number as typed input, you need to make the input a float first, then convert it with int():
price = int(float(input())
Note that if you multiply an int by a float such as your discount factor, the result will be a float.
Also note that my example above doesn't round the number -- it just truncates the stuff after the decimal point. For example, if the input is "0.6" then price will end up being 0. Not what you want? Then you'll need to use the round() function on the float first.
price = int(round(float(input()))
If you intended to use floating point calculations (which makes sense if we're talking about a commodity price), then don't perform the int conversion. Just use float. You may still want to do some rounding. If you want to round to 2 decimal places, you can call round with the second argument as 2:
price = round(float(input()),2)
Finally, you might want to look into Python's decimal module, since there are limitations when using floating point numbers. See here for more information: http://docs.python.org/tutorial/floatingpoint.html
def getPriceDiscount():
while True:
try:
price, percent = float(input()), 0.6
break
except ValueError:
continue
return price * percent if price > 3000 else price
print(getPriceDiscount())
The problem is that you're trying to do two convertions at a time, and one of them is implicit.
The following will work because there's an obvious way to transfrom the number '87' to the integer 87.
>>> int('87')
87
And for the same reason, the following will work too:
>>> float('87')
87.0
>>> float('87.94')
87.94
>>> int(87.94)
87
Keep in mind what's been said and look at the difference between:
>>> int(float('87.94'))
87
and
>>> int('87.94')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '87.94'
As you see the problem is that you're implying a float() conversion before passing to int(), but how could the compiler guess that? There are plenty of available working conversions for the string '87.94'. Should the compiler tried them all before finding out which onw will work with int? And what if there are two or more that will return something that will work with integer?
Examples:
>>> int(hash('87.94'))
4165905645189346193
>>> int(id('87.94'))
22325264
>>> int(max('87.94'))
9
>>> int(input('87.94'))
87.94111
111
Should the compiler choose float() for you?
I don't think so: Explicit is better than implicit.
Anyway if you are going to use that number to perform a multiplication with a float number 0.6. You could directly convert it to a float().
Change this line:
price = int(input())
with:
price = float(input())
And everything will be ok.
Also, the operation you are presenting in your example is a multiplication, not a division. In case you might be interested, there is the floor division // that will return an integer insted of a float. Take a look at PEP238 for more informations about this.