Pandas shows inconsistency in rounding of floats to one decimal place

Pandas shows inconsistency in rounding of floats to one decimal place - python

I have a data which looks like below
data = [[(21.2071607142856,)], [(Decimal('0.11904761904761904762'),)], [(9.54183035714285,)], [(9.54433035714284,)], [(17.1964285714286,)]]
As you can see, all of the values are float except one which is of type Decimal.
Now I need to limit the decimal to one place. So this is the script I use to do that using pandas
formatted_result_list = []
for sub_result in data:
formatted_result = pd.DataFrame(sub_result).round(1).fillna("").to_records(index=False).tolist()
formatted_result_list.append(formatted_result)
return formatted_result_list
This is what I get
[[(21.2,)], [(Decimal('0.11904761904761904762'),)], [(9.5,)], [(9.5,)], [(17.2,)]]
It is able to limit the floats to once decimal place but its unable to limit the value of type Decimal. So I change the third line to this
# use .astype(float)
formatted_result = pd.DataFrame(sub_result).astype(float).round(1).fillna("").to_records(index=False).tolist()
So now I get this
[[(21.2,)], [(0.1,)], [(9.5,)], [(9.5,)], [(17.2,)]]
But it doesn't work for data like this
data = [[('A', 204.593564568,), ('B', 217.421341061, 23.33), ('C', 237.296250326, 20.33), ('D', 217.464281998, 34.44), ('E', 206.329901299, 55.213)], [('F', 210.297625953,), ('G', 228.117692718, 34.22), ('H', 4, 0.99), ('I', 265.319671257, 90.99), ('K',)]]
Here it literally outputs the same result.
So what can I do to ensure if there is a decimal, convert it to float and round off and if there is a float, always round it off?

Data for test:
#data = [[(21.2071607142856,)], [(Decimal('0.11904761904761904762'),)], [(9.54183035714285,)], [(9.54433035714284,)], [(17.1964285714286,)]]
data = [[('A', Decimal(204.593564568),), ('B', 217.421341061, 23.33), ('C', 237.296250326, 20.33), ('D', 217.464281998, 34.44), ('E', 206.329901299, 55.213)], [('F', 210.297625953,), ('G', 228.117692718, 34.22), ('H', 4, 0.99), ('I', 265.319671257, 90.99), ('K',)]]
#data = [21.2071607142856,Decimal(204.593564568)]
I try to create general solution for working with tuples and scalars and also with Decimal:
from decimal import Decimal
def round_custom(x):
out = []
for y in x:
if isinstance(y, tuple):
L = [round(float(z), 2) if isinstance(z, (Decimal, float)) else z for z in y]
out.append(tuple(L))
elif isinstance(y, (Decimal, float)):
out.append(round(float(y), 2))
else:
return x
return pd.Series(out, name=x.name)
df = pd.DataFrame(data).apply(round_custom).values.tolist()
print (df)
[[('A', 204.59), ('B', 217.42, 23.33), ('C', 237.3, 20.33),
('D', 217.46, 34.44), ('E', 206.33, 55.21)], [('F', 210.3),
('G', 228.12, 34.22), ('H', 4, 0.99), ('I', 265.32, 90.99), ('K',)]]

Related

Creating a new list based on lists of tuples

Let's assume there is a list of tuples:
for something in x.something()
print(something)
and it returns
('a', 'b')
('c', 'd')
('e', 'f')
('g', 'h')
('i', 'j')
And I have created two other lists containing certain elements from the x.something():
y = [('a', 'b'), ('c', 'd')]
z = [('e', 'f'), ('g', 'h')]
So I want to assign the tuples from x.something() to a new list based on y and z by
newlist = []
for something in x.something():
if something in 'y':
newlist.append('color1')
elif something in 'z':
newlist.append('color2')
else:
newlist.append('color3')
What I would like to have is the newlist looks like:
['color1', 'color1', 'color2', 'color2', 'color3']
But I've got
TypeError: 'in <string>' requires string as left operand, not tuple
What went wrong and how to fix it?

I think you want to get if something in y instead of if something in 'y' because they are two seperate lists, not strings:
newlist = []
for something in x.something():
if something in y:
newlist.append('color1')
elif something in z:
newlist.append('color2')
else:
newlist.append('color3')

You should remove the quotes from if something in 'y' because it assumes that you're checking if something is in the string 'y'. Same for z.

try this:
t = [('a', 'b'),
('c', 'd'),
('e', 'f'),
('g', 'h'),
('i', 'j')]
y = [('a', 'b'), ('c', 'd')]
z = [('e', 'f'), ('g', 'h')]
new_list = []
for x in t:
if x in y:
new_list.append('color1')
elif x in z:
new_list.append('color2')
else:
new_list.append('color3')
print(new_list)
output:
['color1', 'color1', 'color2', 'color2', 'color3']

Select first item in each list

Here is my list:
[(('A', 'B'), ('C', 'D')), (('E', 'F'), ('G', 'H'))]
Basically, I'd like to get:
[('A', 'C'), ('E', 'G')]
So, I'd like to select first elements from the lowest-level lists and build mid-level lists with them.
====================================================
Additional explanation below:
I could just zip them by
list(zip([w[0][0] for w in list1], [w[1][0] for w in list1]))
But later I'd like to add a condition: the second elements in the lowest level lists must be 'B' and 'D' respectively, so the final outcome should be:
[('A', 'C')] # ('E', 'G') must be sorted out
I'm a beginner, but can't find the case anywhere... Would be grateful for help.

I'd do it the following way
list = [(('A', 'B'), ('C', 'D')), (('E', 'F'), ('G', 'H'))]
out = []
for i in list:
listAux = []
for j in i:
listAux.append(j[0])
out.append((listAux[0],listAux[1]))
print(out)
I hope that's what you're looking for.

How to extract colon separated values from the same line?

I am using python regular expressions. I want all colon separated values in a line.
e.g.
input = 'a:b c:d e:f'
expected_output = [('a','b'), ('c', 'd'), ('e', 'f')]
But when I do
>>> re.findall('(.*)\s?:\s?(.*)','a:b c:d')
I get
[('a:b c', 'd')]
I have also tried
>>> re.findall('(.*)\s?:\s?(.*)[\s$]','a:b c:d')
[('a', 'b')]

The following code works for me:
inpt = 'a:b c:d e:f'
re.findall('(\S+):(\S+)',inpt)
Output:
[('a', 'b'), ('c', 'd'), ('e', 'f')]

Use split instead of regex, also avoid giving variable name like keywords
:
inpt = 'a:b c:d e:f'
k= [tuple(i.split(':')) for i in inpt.split()]
print(k)
# [('a', 'b'), ('c', 'd'), ('e', 'f')]

The easiest way using list comprehension and split :
[tuple(ele.split(':')) for ele in input.split(' ')]
#driver values :
IN : input = 'a:b c:d e:f'
OUT : [('a', 'b'), ('c', 'd'), ('e', 'f')]

You may use
list(map(lambda x: tuple(x.split(':')), input.split()))
where
input.split() is
>>> input.split()
['a:b', 'c:d', 'e:f']
lambda x: tuple(x.split(':')) is function to convert string to tuple 'a:b' => (a, b)
map applies above function to all list elements and returns a map object (in Python 3) and this is converted to list using list
Result
>>> list(map(lambda x: tuple(x.split(':')), input.split()))
[('a', 'b'), ('c', 'd'), ('e', 'f')]

Finding the index in a tuple Python

tuple = ('e', (('f', ('a', 'b')), ('c', 'd')))
how to get the positions: (binary tree)
[('e', '0'), ('f', '100'), ('a', '1010'), ('b', '1011' ), ('c', '110'), ('d', '111')]
is there any way to indexOf ?
arvore[0] # = e
arvore[1][0][0] # = f
arvore[1][0][1][0] # = a
arvore[1][0][1][1] # = b
arvore[1][1][0] # = c
arvore[1][1][1] # = d

You need to traverse the tuple recursively (like tree):
def traverse(t, trail=''):
if isinstance(t, str):
yield t, trail
return
for i, subtree in enumerate(t): # left - 0, right - 1
# yield from traverse(subtree, trail + str(i)) in Python 3.3+
for x in traverse(subtree, trail + str(i)):
yield x
Usage:
>>> t = ('e', (('f', ('a', 'b')), ('c', 'd')))
>>> list(traverse(t))
[('e', '0'), ('f', '100'), ('a', '1010'), ('b', '1011'), ('c', '110'), ('d', '111')]
BTW, don't use tuple as a variable name. It shadows builtin type/function tuple.

Smart way to delete tuples

I having a list of tuple as describes below (This tuple is sorted in decreasing order of the second value):
from string import ascii_letters
myTup = zip (ascii_letters, range(10)[::-1])
threshold = 5.5
>>> myTup
[('a', 9), ('b', 8), ('c', 7), ('d', 6), ('e', 5), ('f', 4), ('g', 3), ('h', 2), \
('i', 1), ('j', 0)]
Given a threshold, what is the best possible way to discard all tuples having the second value less than this threshold.
I am having more than 5 million tuples and thus don't want to perform comparison tuple by tuple basis and consequently delete or add to another list of tuples.

Since the tuples are sorted, you can simply search for the first tuple with a value lower than the threshold, and then delete the remaining values using slice notation:
index = next(i for i, (t1, t2) in enumerate(myTup) if t2 < threshold)
del myTup[index:]
As Vaughn Cato points out, a binary search would speed things up even more. bisect.bisect would be useful, except that it won't work with your current data structure unless you create a separate key sequence, as documented here. But that violates your prohibition on creating new lists.
Still, you could use the source code as the basis for your own binary search. Or, you could change your data structure:
>>> myTup
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'),
(6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]
>>> index = bisect.bisect(myTup, (threshold, None))
>>> del myTup[:index]
>>> myTup
[(6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]
The disadvantage here is that the deletion may occur in linear time, since Python will have to shift the entire block of memory back... unless Python is smart about deleting slices that start from 0. (Anyone know?)
Finally, if you're really willing to change your data structure, you could do this:
[(-9, 'a'), (-8, 'b'), (-7, 'c'), (-6, 'd'), (-5, 'e'), (-4, 'f'),
(-3, 'g'), (-2, 'h'), (-1, 'i'), (0, 'j')]
>>> index = bisect.bisect(myTup, (-threshold, None))
>>> del myTup[index:]
>>> myTup
[(-9, 'a'), (-8, 'b'), (-7, 'c'), (-6, 'd')]
(Note that Python 3 will complain about the None comparison, so you could use something like (-threshold, chr(0)) instead.)
My suspicion is that the linear time search I suggested at the beginning is acceptable in most circumstances.

Here's an exotic approach that wraps the list in a list-like object before performing bisect.
import bisect
def revkey(items):
class Items:
def __getitem__(self, index):
assert 0 <= index < _len
return items[_max-index][1]
def __len__(self):
return _len
def bisect(self, value):
return _len - bisect.bisect_left(self, value)
_len = len(items)
_max = _len-1
return Items()
tuples = [('a', 9), ('b', 8), ('c', 7), ('d', 6), ('e', 5), ('f', 4), ('g', 3), ('h', 2), ('i', 1), ('j', 0)]
for x in range(-2, 12):
assert len(tuples) == 10
t = tuples[:]
stop = revkey(t).bisect(x)
del t[stop:]
assert t == [item for item in tuples if item[1] >= x]

Maybe a bit faster code than of #Curious:
newTup=[]
for tup in myTup:
if tup[1]>threshold:
newTup.append(tup)
else:
break
Because the tuples are ordered, you do not have to go through all of them.
Another possibility would also be, to use bisection, and find the index i of last element, which is above threshold. Then you would do:
newTup=myTup[:i]
I think the last method would be the fastest.

Given the number of tuples you're dealing with, you may want to consider using NumPy.
Define a structured array like
my_array= np.array(myTup, dtype=[('f0',"|S10"), ('f1',float)])
You can access the second elements of your tuples with myarray['f1'] which gives you a float array. Youcan know use fancy indexing techniques to filter the elements you want, like
my_array[myarray['f1'] < threshold]
keeping only the entries where your f1 is less than your threshold..

You can also use itertools e.g.
from itertools import ifilter
iterable_filtered = ifilter(lambda x : x[1] > threshold, myTup)
If you wanted an iterable filtered list or just:
filtered = filter(lambda x: x[1] > threshold, myTup)
to go straight to a list.
I'm not too familiar with the relative performance of these methods and would have to test them (e.g. in IPython using %timeit).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas shows inconsistency in rounding of floats to one decimal place - python

Related

Creating a new list based on lists of tuples

Select first item in each list

How to extract colon separated values from the same line?

Finding the index in a tuple Python

Smart way to delete tuples

Categories

Resources