Suppose I have a binary string: z = abc, where a,b,c are either 0 or 1, so we can convert c into an integer from 0 to 7. Now I want to give a,b,c another 'layer' of value, where a = 1/2^1 = 1/2, b = 1/2^2 = 1/4, c = 1/2^3 = 1/8. My goal is to create a dictionary, where the keys are integers 0-7, and values are the associated calculations based on a,b,c values.
The only way I'm able to solve this question is to 'brute force' the results. For example, when the key is 5 (z = 101), the value would be 1/2+0+1/8 = 5/8, and perform all calculations manually, then append the item to the dictionary. Is there a tool / method in python that will allow me to create the calculation faster? I really have no idea how I can do that. Any suggestions / help is appreciated.
One naïve approach would be to iterate over the bit-string, and multiply each bit by the matching power of 0.5:
res = 0
for i, bit in enumerate(z, 1):
res += int(bit) * 0.5**i
For z = "101" this will give res as 0.625 which is 5/8
Could be compacted using sum:
res = sum(int(bit) * 0.5**i for i, bit in enumerate(z, 1))
If z is actually an integer, just change the zs above to format(z, 'b') to get its binary string representation.
Just to elaborate on my comment a bit:
for key, value in {bin(key)[2:]: key/8 for key in range(8)}.items():
print(f"{key:>3}: {value}")
Output:
0: 0.0
1: 0.125
10: 0.25
11: 0.375
100: 0.5
101: 0.625
110: 0.75
111: 0.875
>>>
Is this the output you're looking for?
Another way would be to benefit vectorization :
import numpy as np
num =[1,0,1]
d = np.array(num)
r = 1 / np.logspace(1, len(num), num=len(num), base=2)
np.matmul(r,d)
output :
> 0.625
Related
I usually use
x = round(x, 3)
to round a number to the precision of 3 digits. Now I have this array:
[-1.10882605e-04 -2.01874994e-05 3.24209095e-05 -1.56917988e-05
-4.61406358e-05 1.99080610e-05 7.04079594e-05 2.64600122e-05
-3.53022316e-05 1.50542793e-05]
And using the same code just flattens everything down to 0. What I would like to have though is a function that gives me the most significant 3 digits rounded like it usually works for numbers larger than 1. Like this:
special_round(0.00034567, 3)
=
0.000346
Any idea how this could be done? Thanks!
Here is a solution that figures out the order of magnitude and does an elment wise rounding.
Note that this will only work correctly for values < 1 and > -1, which I guess is a valid assumption regarding your example data.
import numpy as np
a = np.array([-1.10882605e-04, -2.01874994e-05, 3.24209095e-05, -1.56917988e-05,
-4.61406358e-05, 1.99080610e-05, 7.04079594e-05 , 2.64600122e-05,
-3.53022316e-05 , 1.50542793e-05])
def special_round(vec):
exponents = np.floor(np.log10(np.abs(vec))).astype(int)
return np.stack([np.round(v, decimals=-e+3) for v, e in zip(vec, exponents)])
b = special_round(a)
>>> array([-1.109e-04, -2.019e-05, 3.242e-05, -1.569e-05, -4.614e-05,
1.991e-05, 7.041e-05, 2.646e-05, -3.530e-05, 1.505e-05])
Problem is, numbers you provided are starting to be so small that you are approaching limit of floating point precision, thus some artifacts show up seemingly for no reason.
def special_round(number, precision):
negative = number < 0
number = abs(number)
i = 0
while number <= 1 or number >= 10:
if number <= 1:
i += 1
number *= 10
else:
i += -1
number /= 10
rounded = round(number, precision)
if negative:
rounded = -rounded
return rounded * (10 ** -i)
Output:
[-0.0001109, -2.019e-05, 3.2420000000000005e-05, -1.569e-05, -4.614e-05, 1.9910000000000004e-05, 7.041000000000001e-05, 2.646e-05, -3.5300000000000004e-05, 1.505e-05]
You can do so by creating a specific function using the math package:
from math import log10 , floor
import numpy as np
def round_it(x, sig):
return round(x, sig-int(floor(log10(abs(x))))-1)
a = np.array([-1.10882605e-04, -2.01874994e-05, 3.24209095e-05, -1.56917988e-05,
-4.61406358e-05, 1.99080610e-05, 7.04079594e-05, 2.64600122e-05,
-3.53022316e-05, 1.50542793e-05])
round_it_np = np.vectorize(round_it) # vectorize the function to apply on numpy array
round_it_np(a, 3) # 3 is rounding with 3 significant digits
This will result in
array([-1.11e-04, -2.02e-05, 3.24e-05, -1.57e-05, -4.61e-05, 1.99e-05,
7.04e-05, 2.65e-05, -3.53e-05, 1.51e-05])
Here is a solution:
from math import log10, ceil
def special_round(x, n) :
lx = log10(abs(x))
if lx >= 0 : return round(x, n)
return round(x, n-ceil(lx))
for x in [10.23456, 1.23456, 0.23456, 0.023456, 0.0023456] :
print (x, special_round(x, 3))
print (-x, special_round(-x, 3))
Output:
10.23456 10.235
-10.23456 -10.235
1.23456 1.235
-1.23456 -1.235
0.23456 0.235
-0.23456 -0.235
0.023456 0.0235
-0.023456 -0.0235
0.0023456 0.00235
-0.0023456 -0.00235
You can use the common logarithm (provided by the built-in math module) to calculate the position of the first significant digit in your number (with 2 representing the hundreds, 1 representing the tens, 0 representing the ones, -1 representing the 0.x, -2 representing the 0.0x and so on...). Knowing the position of the first significant digit, you can use it to properly round the number.
import math
def special_round(n, significant_digits=0):
first_significant_digit = math.ceil((math.log10(abs(n))))
round_digits = significant_digits - first_significant_digit
return round(n, round_digits)
>>> special_round(0.00034567, 3)
>>> 0.000346
I have a column (version number) with more than 200k occurences as float for instance 1.2, 0.2 ...
I need to sum both sides of the floating number into a new column (total version), so that it gives me in the example 3, 2. Just integer numbers
Any advice?
Here is a solution that should be very easy to understand. I can make a oneline also you want to have that.
mylist = [1.3, 2.6, 3.1]
number = 0
fractions = 0
for a in mylist:
(a,b)=str(a).split('.')
number = number + int(a)
fractions = fractions + int(b)
print ("Number: " + str(number))
print ("Fractions: " + str(fractions))
This gives:
Number: 6
Fractions: 10
Do not use str(x).split('.') !
The one comment and the two other answers are currently suggesting to get the integer and fractional parts of a number x using
i,f = (int(s) for s in str(x).split('.'))
While this does give a result, I believe it is a bad idea.
The problem is, if you expect a meaningful result, you need to specify the precision of the fractional part explicitly. "1.20" and "1.2" are two string representations of the same number, but 20 and 2 are two very different integers. In addition, floating-point numbers are subject to precision errors, and you could easily find yourself with a number like "1.19999999999999999999999", which is only a small rounding error away from "1.2", but results in a completely different result with this str(x).split('.') approach.
One way to avoid this chaotic behaviour is to set a precision, ie, a number of decimal places, and stick to it. For instance, when dealing with monetary values, we're used to talk about cents; although 1.5€ and 1.50€ are technically both valid, you'll always hear people say "one euro fifty" and never "one euro five". If you hear someone say "one euro oh five", it actually means 1.05€. We always add exactly two decimal places.
With this approach, there is no chaotic behaviour of 1.2 becoming (1,2) or (1,20) or (1,1999999999). If you fixed the number of decimal places to 2, then 1.2 will always map to (1,20) and that's that.
A more standard way
Here are two standard ways of getting the integer and fractional parts of a number in python:
x = 1.20
# method 1
i = int(x)
f = x - i
# i = 1 and f = 0.2; i is an int and f a float
# method 2
import math
f, i = math.modf(x)
# i = 1.0 and f = 0.2; i and f are both floats
(EDIT: There is also a third method, pandas' divmod function. See user2314737's answer.)
Once you've done that, you can turn the fractional part f into an integer by multiplying it with the chosen power of 10 and converting it to an integer:
f = int(f * 100)
# f = 20
Finally you can apply this method to a whole list:
data = [13.0, 14.20, 12.299, 4.414]
def intfrac_pair(x, decimal_places):
i = int(x)
f = int((10**decimal_places) * (x - i))
return (i, f)
data_as_pairs = [intfrac_pair(x, 2) for x in data]
# = [(13, 0), (14, 20), (12, 30), (4, 41)]
sum_of_integer_parts = sum(i for i,f in data_as_pairs) # = 43
sum_of_fractional_parts = sum(f for i,f in data_as_pairs) # = 91
The following should work:
df['total_number']=[sum([int(i) for i in str(k).split('.')]) for k in df.version_number]
You can use divmod on the column
df = pd.DataFrame([1.2, 2.3, 3.4, 4.5, 0.1])
df
# 0
# 0 1.2
# 1 2.3
# 2 3.4
# 3 4.5
# 4 0.1
df['i'], df['d'] = df[0].divmod(1)
df
# Out:
# 0 i d
# 0 1.2 1.0 0.2
# 1 2.3 2.0 0.3
# 2 3.4 3.0 0.4
# 3 4.5 4.0 0.5
# 4 0.1 0.0 0.1
To sum row-wise as integers (a precision is needed, here I use p=1 assuming the original floats contain only one decimal digit) :
p = 1
df['s'] = (df['i']+10**p*df['d'].round(decimals=p)).astype(np.int)
df
# Out:
# 0 i d s
# 0 1.2 1.0 0.2 3
# 1 2.3 2.0 0.3 5
# 2 3.4 3.0 0.4 7
# 3 4.5 4.0 0.5 9
# 4 0.1 0.0 0.1 1
Sum by columns:
df.sum()
# Out:
# 0 11.5
# i 10.0
# d 1.5
Note: this will only work for positive integers as for instance divmod(-3.4, 1) returns (-4.0, 0.6).
Thank you all guys. I finally managed in a quite stupid, but effictive way. Before splitting, I transformed it to a string:
Allfiles['Version'] = Allfiles['Version'].round(3).astype(str)
Note that I rounded to 3 digits because a number like 2.111 was transformed to 2.11099999999999999999
Then I just did the split, creating a new column for minor versions (and having the major in the original colum
Allfiles[['Version', 'minor']] = Allfiles['Version'].str.split('.', expand=True)
Then I converted again both files into integers and sum both in the first column.
Allfiles['Version'] = Allfiles['Version']+Allfiles['minor']
(My dataframe name is Allfiles and the column version, as you can imagine.
is there any way to output the difference between two float numbers as an integer
below is three examples of the float values provided for script, my goal is to output the difference between these values as an integer , in the first example i should get 2 where num_two - num_one equals 0.000002 but i don't want the zeros as they don't matter i can do it with string format but i have no way of telling how big the number is or how many zeros it has
## example 1
num_one = 0.000012
num_two = 0.000014
## example 2
num_0ne = 0.0123
num_tw0 = 0.013
## example 3
num_1 = 23.32
num_2 = 23.234
print (float(num_2) - float(num_1))
## this should output 86 as an integer
Beware of floats (see https://en.wikipedia.org/wiki/IEEE_754):
>>> 23.32 - 23.234
0.08599999999999852
You need exact precision. Use the decimal module:
>>> from decimal import Decimal
>>> n1 = Decimal("23.32")
>>> n2 = Decimal("23.234")
>>> n1, n2
(Decimal('23.32'), Decimal('23.234'))
>>> d = abs(n1-n2)
>>> d
Decimal('0.086')
Now, just shift the decimal point right (that is * 10) until there is no fractional part left (d % 1 == 0):
>>> while d % 1:
... d *= 10
(Don't be afraid, the loop will stop because you can't have more decimal levels than decimal.getcontext().prec at the beginning and the decimal level decrease on each iteration).
You get the expected result:
>>> d
Decimal('86.000')
>>> int(d)
86
The following is my script. Each equal part has self.number samples, in0 is input sample. There is an error as follows:
pn[i] = pn[i] + d
IndexError: list index out of range
Is this the problem about the size of pn? How can I define a list with a certain size but no exact number in it?
for i in range(0,len(in0)/self.number):
pn = []
m = i*self.number
for d in in0[m: m + self.number]:
pn[i] += d
if pn[i] >= self.alpha:
out[i] = 1
elif pn[i] <= self.beta:
out[i] = 0
else:
if pn[i] >= self.noise:
out[i] = 1
else:
out[i] = 0
if pn[i] >= self.noise:
out[i] = 1
else:
out[i] = 0
There are a number of problems in the code as posted, however, the gist seems to be something that you'd want to do with numpy arrays instead of iterating over lists.
For example, the set of if/else cases that check if pn[i] >= some_value and then sets a corresponding entry into another list with the result (true/false) could be done as a one-liner with an array operation much faster than iterating over lists.
import numpy as np
# for example, assuming you have 9 numbers in your list
# and you want them divided into 3 sublists of 3 values each
# in0 is your original list, which for example might be:
in0 = [1.05, -0.45, -0.63, 0.07, -0.71, 0.72, -0.12, -1.56, -1.92]
# convert into array
in2 = np.array(in0)
# reshape to 3 rows, the -1 means that numpy will figure out
# what the second dimension must be.
in2 = in2.reshape((3,-1))
print(in2)
output:
[[ 1.05 -0.45 -0.63]
[ 0.07 -0.71 0.72]
[-0.12 -1.56 -1.92]]
With this 2-d array structure, element-wise summing is super easy. So is element-wise threshold checking. Plus 'vectorizing' these operations has big speed advantages if you are working with large data.
# add corresponding entries, we want to add the columns together,
# as each row should correspond to your sub-lists.
pn = in2.sum(axis=0) # you can sum row-wise or column-wise, or all elements
print(pn)
output: [ 1. -2.72 -1.83]
# it is also trivial to check the threshold conditions
# here I check each entry in pn against a scalar
alpha = 0.0
out1 = ( pn >= alpha )
print(out1)
output: [ True False False]
# you can easily convert booleans to 1/0
x = out1.astype('int') # or simply out1 * 1
print(x)
output: [1 0 0]
# if you have a list of element-wise thresholds
beta = np.array([0.0, 0.5, -2.0])
out2 = (pn >= beta)
print(out2)
output: [True False True]
I hope this helps. Using the correct data structures for your task can make the analysis much easier and faster. There is a wealth of documentation on numpy, which is the standard numeric library for python.
You initialize pn to an empty list just inside the for loop, never assign anything into it, and then attempt to access an index i. There is nothing at index i because there is nothing at any index in pn yet.
for i in range(0, len(in0) / self.number):
pn = []
m = i*self.number
for d in in0[m: m + self.number]:
pn[i] += d
If you are trying to add the value d to the pn list, you should do this instead:
pn.append(d)
I'm building a range between two numbers (floats) and I'd like this range to be of an exact fixed length (no more, no less). range and arange work with steps, instead. To put things into pseudo Python, this is what I'd like to achieve:
start_value = -7.5
end_value = 0.1
my_range = my_range_function(star_value, end_value, length=6)
print my_range
[-7.50,-5.98,-4.46,-2.94,-1.42,0.10]
This is essentially equivalent to the R function seq which can specify a sequence of a given length. Is this possible in Python?
Thanks.
Use linspace() from NumPy.
>>> from numpy import linspace
>>> linspace(-7.5, 0.1, 6)
array([-7.5 , -5.98, -4.46, -2.94, -1.42, 0.1])
>>> linspace(-7.5, 0.1, 6).tolist()
[-7.5, -5.9800000000000004, -4.46, -2.9399999999999995, -1.4199999999999999, 0.10000000000000001]
It should be the most efficient and accurate.
See Recipe 66472: frange(), a range function with float increments (Python) with various float implementations, their pros and cons.
Alternatively, if precision is important to you, work with decimal.Decimal instead of float (convert to and then back) as answered in Python decimal range() step value.
def my_function(start, end, length):
len = length - 1
incr = (end-start) / len
r = [ start ]
for i in range(len):
r.append ( r[i] + incr )
return r
How about this:
def my_range_function(start, end, length):
if length <= 1: return [ start ]
step = (end - start) / (length - 1)
return [(start + i * step) for i in xrange(length)]
For your sample range, it returns:
[-7.5, -5.9800000000000004, -4.46,
-2.9399999999999995, -1.4199999999999999, 0.099999999999999645]
Of course it's full of round errors, but that's what you get when working with floats.
In order to handle the rounding errors, the following code utilizes Python's decimal module. You can set the rounding; for this sample I've set it to two decimal points via round_setting = '.01'. In order to handle any rounding errors, the last step is adjusted to the remainder.
Code
#!/usr/bin/env python
# encoding: utf-8
from __future__ import print_function
import math
import decimal
start_value = -7.5
end_value = 0.1
num_of_steps = 6
def my_range(start_value, end_value, num_of_steps):
round_setting = '.01'
start_decimal = decimal.Decimal(str(start_value)).quantize(
decimal.Decimal(round_setting))
end_decimal = decimal.Decimal(str(end_value)).quantize(
decimal.Decimal(round_setting))
num_of_steps_decimal = decimal.Decimal(str(num_of_steps)).quantize(
decimal.Decimal(round_setting))
step_decimal = ((end_decimal - start_decimal) /
num_of_steps_decimal).quantize(decimal.Decimal(round_setting))
# Change the last step in case there are rounding errors
last_step_decimal = (end_decimal - ((num_of_steps - 1) * step_decimal) -
start_decimal).quantize(decimal.Decimal(round_setting))
print('Start value = ', start_decimal)
print('End value = ', end_decimal)
print('Number of steps = ', num_of_steps)
print('Normal step for range = ', step_decimal)
print('Last step used for range = ', last_step_decimal)
my_range(start_value, end_value, num_of_steps)
Output
$ ./fixed_range.py
Start value = -7.50
End value = 0.10
Number of steps = 6
Normal step for range = 1.27
Last step used for range = 1.25
From there you can use the normal step and the last step to create your list.