Say I got a list like this:
list = [1, 2.1, 3.3, 4.5, 3.2, 4.7, 1, 3, 3.3, 3.9, 4.9]
Now everytime the subsequent element in the list is less than the preceding element, i.e. list[i] < list[i-1] I want to add multiples of say 10 in ascending order to all the elements from this point until the next point where the condition is met. The resulting list for the example above should look like this:
new_list = [1, 2.1, 3.3, 4.5, 13.2, 14.7, 21, 23, 23.3, 23.9, 24.9]
So first 10 is added, then 20...
I'll mention that the length of each intervall is abritrarily long.
How could i achieve that elegantly? I did what I wanted by using the list as a temporary list and appending to a new list through a for and if loop but that seems rather ugly. I thought of doing it with list comprehension but I cannot figure out how this would work.
This can be done using the numpy package. The idea is to first determine the positions where a the next value is lower. Then you do a cumulative sum and multiply this by 10. Finally you add this back to the original list.
import numpy as np
a = [1, 2.1, 3.3, 4.5, 3.2, 4.7, 1, 3, 3.3, 3.9, 4.9]
b = [0] + [int(a[i+1] < a[i]) for i in range(len(a)-1)]
aa = np.array(a)
ba = np.array(b).cumsum() * 10
print(aa + ba)
>>> [ 1. 2.1 3.3 4.5 13.2 14.7 21. 23. 23.3 23.9 24.9]
list1 = [1, 2.1, 3.3, 4.5, 3.2, 4.7, 1, 3, 3.3, 3.9, 4.9]
for i in range(len(list1)):
while True:
try:
if list1[i] > list1[i+1]:
list1[i+1] = list1[i+1] + 10
else:
break
except:
print("Done")
break
print(list1)
print("Final List: ")
print(list1)
Related
I'm not sure if there is a name for this kind of process. I have a dataset of entries with many input variables, each entry having an output of 0 or 1. I'm trying to find a good way to find ranges for each variable so that all the entries with those ranges have an output of 1. What would be the best way of finding the ranges with the largest number of entries with an output of 1?
Right now the process I thought of is sorting by each variable then tightening the range of the variable that has the lowest percentage of 1s near the max/min of that variable.
for example (3 inputs, 4th column is output):
[1.2, 6.0, -9.2, 0]
[1.4, 3.6, 3.2, 1]
[1.7, 3.8, -4.2, 0]
[2.2, 7.0, -3.3, 1]
[4.7, 3.4, -8.9, 1]
[4.9, 3.4, -8.9, 1]
The best ranges would be A (2.2 to 4.9) B (3.4 to 7.0) C (-8.9 to -3.3) which has three entries.
Sorting by the first variable, you could remove the entries that have values less than 2.2 in order to end up with only entries that have an output of 1.
(In reality there would be much more variables and entries)
Does this type of process have a name and is there a better way of doing it? Thank you!
Follow below code to get min & max for each variable where output=1; Need to check on large amount of data to gain more confidence if this is what required...
import pandas as pd
lst=[[1.2, 6.0, -9.2, 0],
[1.4, 3.6, 3.2, 1],
[1.7, 3.8, -4.2, 0],
[2.2, 7.0, -3.3, 1],
[4.7, 3.4, -8.9, 1],
[4.9, 3.4, -8.9, 1]]
df = pd.DataFrame(lst,columns=list('ABC')+["Output"])
df0 = df[df["Output"] == 0]
df1 = df[df["Output"] == 1]
lst_min_value,lst_max_value=[],[]
df2 = df.copy()
for col in df.columns[:-1]:
max_0 = df0[col].max()
min_1 = df1[col].min()
min_val = max(max_0,min_1)
min_value = df[df[col] > min_val][col].min()
max_value = df1[col].max()
df2 = df2[(df2[col] >= min_value) & (df2[col] <= max_value)]
for col in df2.columns[:-1]:
print("Range for",col,":",df2[col].min(),df2[col].min())
# Output;
Range for A : 2.2 2.2
Range for B : 7.0 7.0
Range for C : -3.3 -3.3
Hope this Helps...
I am using 'pd.cut' to separate the array elements into different bins and use 'value_counts' to count the frequency of each bin. My code and the result I get are like this.
s = pd.Series([5,9,2,4,5,6,7,9,5,3,8,7,4,6,8])
pd.cut(s,5).value_counts()
>>> pd.cut(s,5).value_counts()
(4.8, 6.2] 5
(7.6, 9.0] 4
(1.993, 3.4] 2
(3.4, 4.8] 2
(6.2, 7.6] 2
I want to get the values of the first three lines of the index part of the result, that is:
[4.8, 6.2]
[7.6, 9.0]
[1.993, 3.4]
or is better:
[4.8, 6.2, 7.6, 9.0, 1.993, 3.4]
but I searched for some information and found that pandas does not seem to have a method to directly handle this interval data, so I had to use the following stupid method, then combine them into list or array:
v1 = pd.cut(s,5).value_counts().index[0].left
v2 = pd.cut(s,5).value_counts().index[0].right
v3 = pd.cut(s,5).value_counts().index[1].left
...
v6 = pd.cut(s,5).value_counts().index[2].right
So is there an easier way to achieve what I need?
Convert CategoricalIndex to IntervalIndex, so possible use IntervalIndex.left,
IntervalIndex.right:
s = pd.cut(s,5).value_counts()
i = pd.IntervalIndex(s.index)
L1 = list(zip(i.left, i.right))[:3]
print (L1)
[(4.8, 6.2), (7.6, 9.0), (1.993, 3.4)]
L2 = [y for x in L1 for y in x]
print (L2)
[4.8, 6.2, 7.6, 9.0, 1.993, 3.4]
say i have a dataframe:
x
0 [0.5, 1.5, 2.5, 3.5, 4.5]
1 [5.5, 6.5, 7.5]
2 [8.5, 9.5, 10.5, 11.5]
3 [12.5, 13.5, 14.5, 15.5]
and i want to split the values to three separate columns(each column with two values each) as:
a b c
0 [0.5, 1.5] [2.5, 3.5] [4.5]
1 [5.5, 6.5] [7.5] 0
2 [8.5, 9.5] [10.5, 11.5] 0
3 [12.5, 13.5] [14.5, 15.5] 0
how do i do this?
First I think working with lists in pandas is not good idea.
But possible, use list comprehension wit custom function and DataFrame constructor:
#https://stackoverflow.com/a/312464/2901002
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
df1 = pd.DataFrame([list(chunks(x, 2)) for x in df['x']]).fillna(0)
print (df1)
0 1 2
0 [0.5, 1.5] [2.5, 3.5] [4.5]
1 [5.5, 6.5] [7.5] 0
2 [8.5, 9.5] [10.5, 11.5] 0
3 [12.5, 13.5] [14.5, 15.5] 0
You have not mentioned if c will have anything after fourth element or just next two in case the list has more than six elements.
This is the code if you want everything after fourth element in c
df['a']=df['x'].apply(lambda x:x[:2] if len(x)>0 else 0)
df['b']=df['x'].apply(lambda x:x[2:4] if len(x)>2 else 0)
df['c']=df['x'].apply(lambda x:x[4:] if len(x)>4 else 0)
df.drop('x',axis=1,inplace=True)
Or,
This is the code if you want two element in c even if the list has more after fourth element
df['a']=df['x'].apply(lambda x:x[:2] if len(x)>0 else 0)
df['b']=df['x'].apply(lambda x:x[2:4] if len(x)>2 else 0)
df['c']=df['x'].apply(lambda x:x[4:6] if len(x)>4 else 0)
df.drop('x',axis=1,inplace=True)
I have a list of values: [0,2,3,5,6,7,9] and want to get a list of the numbers in the middle in between each number: [1, 2.5, 4, 5.5, 6.5, 8]. Is there a neat way in python to do that?
It's a simple list comprehension (note I'm asuming you want all your values as floats rather than a mixture of ints and floats):
>>> lst = [0,2,3,5,6,7,9]
>>> [(a + b) / 2.0 for a,b in zip(lst, lst[1:])]
[1.0, 2.5, 4.0, 5.5, 6.5, 8.0]
(Dividing by 2.0 ensure floor division is not applied in Python 2)
Use a for loop:
>>> a = [0,2,3,5,6,7,9]
>>> [(a[x] + a[x + 1])/2 for x in range(len(a)-1)]
[1.0, 2.5, 4.0, 5.5, 6.5, 8.0]
However using zip as #Chris_Rands said is better... (and more readable ¬¬)
Obligatory itertools solution:
>>> import itertools
>>> values = [0,2,3,5,6,7,9]
>>> [(a+b)/2.0 for a,b in itertools.izip(values, itertools.islice(values, 1, None))]
[1.0, 2.5, 4.0, 5.5, 6.5, 8.0]
values = [0,2,3,5,6,7,9]
middle_values = [(values[i] + values[i + 1]) / 2.0 for i in range(len(values) - 1)]
Dividing by 2.0 rather than 2 is unnecessary in Python 3, or if you use from __future__ import division to change the integer division behavior.
The zip or itertools.izip answers are more idiomatic.
Simple for loop:
nums = [0,2,3,5,6,7,9]
betweens = []
for i in range(1, len(nums)):
if nums[i] - nums[i-1] > 1:
betweens.extend([item for item in range(nums[i-1]+1, nums[i])])
else:
betweens.append((nums[i] + nums[i-1]) / 2)
Output is as desired, which doesn't need further conversion (in Python3.x):
[1, 2.5, 4, 5.5, 6.5, 8]
[(l[i]+l[i+1])/2 for i in range(len(l)-1)]
I have one list
a = [1.0, 2.0, 2.1, 3.0, 3.1, 4.2, 5.1, 7.2, 9.2]
I want to compare this list with other list but also I want to extract the information regarding the list content in numeric order.All other list have the elements that are same as a.
So I have tried this
a = [1.0, 2.0, 2.1, 3.0, 3.1, 4.2, 5.1, 7.2, 9.2]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print dict(zip(a,b))
a1=[2.1, 3.1, 4.2, 7.2]
I want to compare a1 with a and extract dict values [3, 5, 6, 8].
Just loop through a1 and see if there is a matching key in the dictionary you created:
mapping = dict(zip(a, b))
matches = [mapping[value] for value in a1 if value in mapping]
Demo:
>>> a = [1.0, 2.0, 2.1, 3.0, 3.1, 4.2, 5.1, 7.2, 9.2]
>>> b = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a1 = [2.1, 3.1, 4.2, 7.2]
>>> mapping = dict(zip(a, b))
>>> [mapping[value] for value in a1 if value in mapping]
[3, 5, 6, 8]
However, take into account that you are using floating point numbers. You may not be able to match values exactly, since floating point numbers are binary approximations to decimal values; the value 2.999999999999999 (15 nines) for example, may be presented by the Python str() function as 3.0, but is not equal to 3.0:
>>> 2.999999999999999
2.999999999999999
>>> str(2.999999999999999)
'3.0'
>>> 2.999999999999999 == 3.0
False
>>> 2.999999999999999 in mapping
False
If your input lists a is sorted, you could use the math.isclose() function (or a backport of it), together with the bisect module to keep matching efficient:
import bisect
try:
from math import isclose
except ImportError:
def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
# simplified backport, doesn't handle NaN or infinity.
if a == b: return True
return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
result = []
for value in a1:
index = bisect.bisect(a, value)
if index and isclose(a[index - 1], value):
result.append(b[index - 1])
elif index < len(a) and isclose(a[index], value):
result.append(b[index])
This tests up to two values from a per input value; one that is guaranteed to equal or lower (at index - 1) and the next, higher value. For your sample a, the value 2.999999999999999 is bisected to index 3, between 2.1 and 3.0. Since isclose(3.0, 2.999999999999999) is true, that would still let you map that value to 4 in b.