Easiest way to classify these values?

Easiest way to classify these values? - python

I have a list of layer heights that I want to sort various z values into. The list should remain in descending order and the function should return the index of the layer the z value belongs to.
For example, for the layers = [10,9,8,7] the value 9 should be returned as 1 since that's the index of its layer, the value 8.5 should also be returned as 1, the value 8 should return 2, 7.9 returns 2, and so on.
The function I wrote raises an error when it looks for an index outside the length of the list for the last layer.
def less_than(layers,z):
index = 0
current = layers[index]
while current>z:
index += 1
current = layers[index]
return(index-1)
So, what's the best method for producing such a function with these properties?

Here's a solution using bisect:
import bisect
class ReverseAccessor:
def __init__(self, ls):
self.ls = ls
def __getitem__(self, item):
return self.ls[-(item) - 1]
def __len__(self):
return len(self.ls)
def less_than(layers, z):
index = len(layers) - (bisect.bisect(ReverseAccessor(layers), z)) - 1
if layers[index + 1] == z:
return index + 1
return index
This is a bit more complicated than your solution, but will theoretically perform better when the list gets large. We define a custom object that holds a reference to the original list and translates item accesses into their reversed form. That way, we can retain the O(log n) complexity of bisect.

Your function is fine, you just need to check that you are not trying to access elements that don't exist (I also simplified it a little bit):
def less_than(layers, z):
index = 0
while index < len(layers) and layers[index] > z:
index += 1
return index - 1
Apart from the index error this should behave exactly like your function.

If your list of layers is not that big and you don't really care about the O(logN) of bisect, here's a very simple O(N) solution:
import math
layers.index(math.ceil(z))
ceil is rounding z up, and then we use built-in method of list to find the index of the item.
You can substitute ceil(z) for int(z + 0.5) if you don't want any imports.

Solution based on np.where:
import numpy as np
layers = [10,9,8,7]
def less_than(layers, z):
return np.max(np.where(np.array(layers)>=z))
Example:
less_than(layers, 8.5)
output:
1

Related

Replicate =LARGE and =SMALL function in Excel to Python

I would like to obtain the k-th largest/ k-th smallest value from numerical columns in a .xlsx file imported in Python. I have heard that a sorted array is required for the same.
So I tried isolating different columns into an array in Python using openpyxl like so
col_array = []
for i in range(1,1183):
col_array = factor2_db.cell(row=i,column=2).value
print(col_array)
And then used the function below to find the kth Largest value but that resulted in an error in Line 2
``
0 class Solution(object):
1 def findKthLargest(self,nums, k):
2 nums_sorted= sorted(nums) **TypeError: 'float' object is not iterable**
3 if k ==1:
4 return nums_sorted[-1]
5 temp = 1
6 return nums_sorted[len(nums_sorted)-k]
7 ob1 = Solution()
8 print(ob1.findKthLargest(col_array,5))

Soln
Strictly speaking, your variable col_array is not a multi-dimensional array - it's simply a single value presented as a list that gets overwritten 1182 times (i.e. range(1,1183)).
I tried the following and had favourable results:
col_array=[]
for i in range(10): col_array.append(i)
print(col_array)
you can customise the append using something (untested) like this: col_array.append( factor2_db.cell(row=i, column= 2).value)
def kth(k, large=False):
global col_array
col_array.sort(key=None, reverse=large)
print(f'col_array = {col_array}, col_array[{k}] = {col_array[k]}')
return col_array[k-1]
print(kth(4, True))
Noting the first element of a list is indexed 0, although you wouldn't ordinarily talk of returning the '0th' smallest / largest element, whence the adjustment -1 in return col_array[k-1]
Demonstration:
Additional notes:
To preserve the original ordering, replicate col_array at outset - one way to achieve this:
col_array_copy = []; col_array_copy += (x for x in col_array)
Then proceed as above after replacing 'col_array' with 'col_array_copy'.

Python all()/any() like method for a portion/part of list?

What would be the most elegant/pythonic way of achieving: "if x% of total values in a list are greater than the y, return true". I have currently implemented a function:
def check(listItems, val):
'''A method to check all elements of a list against a given value.
Returns true if all items of list are greater than value.'''
return all(x>val for x in listItems)
But for my use case, waiting for this particular condition is quite costly and somewhat useless. I would want to proceed if ~80% of the items in list are greater than the given value.
One approach in my mind is to sort the list in descending order, create another list and copy 80% of the elements of list to the new list, and run the function for that new list. However, I am hoping that there must be a more elegant way of doing this. Any suggestions?

It sounds like you are dealing with long lists which is why this is costly. If would be nice if you could exit early as soon as a condition is met. any() will do this, but you'll want to avoid reading the whole list before passing it to any(). One options might be to use itertools.accumulate to keep a running total of True values and the pass that to any. Something like:
from itertools import accumulate
a = [1, 2, 2, 3, 4, 2, 4, 1, 1, 1]
# true if 50% are greater than 1
goal = .5 * len(a) # at least 5 out of 10
any( x > goal for x in accumulate(n > 1 for n in a))
accumulate won't need to read the whole list — it will just start passing the number of True values seen up to that point. any should short-circuit as soon as it finds a true value, which in the above case is at index 5.

What about this:
def check(listItems, val, threshold=0.8):
return sum(x > val for x in listItems) > len(listItems) * threshold
It states: check is True if more than threshold% (0.80 by default) of the elements in listItems are greater than val.

You can use filter for this. By far this is the fastest method. Refer to my other answer as this is faster than the methods in that.
def check(listItems, val, goal=0.8):
return len((*filter(val.__lt__, listItems),)) >= len(listItems) * goal
Tested result time for this ran along with the methods in my other question is:
1.684135717988247

Check each item in order.
If you reach a point where you are satisfied then return True early.
If you reach a point where you can never be satisfied, even if every future item passes the test, then return False early.
Otherwise keep going (in case the later elements help you satisfy the requirement).
This is the same idea as FatihAkici in the comments above, but with a further optimization.
def check(list_items, ratio, val):
passing = 0
satisfied = ratio * len(list_items)
for index, item in enumerate(list_items):
if item > val:
passing += 1
if passing >= satisfied:
return True
remaining_items = len(list_items) - index - 1
if passing + remaining_items < satisfied:
return False

I don’t want to take credit for Mark Meyer’s answer as he came up with the concept of using accumulate and any as well as theirs being more pythonic/readable, but if you’re looking for the "fastest" approach then modifying his approach with using map vs using comprehensions is faster.
any(map(goal.__le__, accumulate(map(val.__lt__, listItems))))
Just to test:
from timeit import timeit
from itertools import accumulate
def check1(listItems, val):
goal = len(listItems)*0.8
return any(x > goal for x in accumulate(n > val for n in listItems))
def check2(listItems, val):
goal = len(listItems)*0.8
return any(map(goal.__le__, accumulate(map(val.__lt__, listItems))))
items = [1, 2, 2, 3, 4, 2, 4, 1, 1, 1]
for t in (check1, check2):
print(timeit(lambda: t(items, 1)))
The results are:
3.2596251670038328
2.0594907909980975

TypeError: '<' not supported between instances Python

I am solving a problem with genetic algorithm in python 3. I have not completed the full code yet. I test a part of the code whenever I complete it.
At present, I am stuck with an error saying:
TypeError: '<' not supported between instances of 'part' and 'part'
The interesting thing is, this error does not always show. Sometimes the code runs smoothly and show the desired output, but sometimes it shows this error.
What is the reason for this?
I am attaching the code and the error message.
I am using PyCharm.
import random
class part():
def __init__(self, number):
self.number = number
self.machine_sequence = []
def add_volume(self, volume):
self.volume = volume
def add_machine(self, machine_numbers):
self.machine_sequence.append(machine_numbers)
def create_initial_population():
part_family = []
for i in range(8):
part_family.append(part(i))
part_population = []
for i in range(6):
part_population.append(random.sample(part_family, len(part_family)))
for i in part_population:
for j in i:
j.add_volume(random.randrange(100, 200))
return part_population
def fitness(part_family):
sum_of_boundary = []
for i in range(0, 8, 2):
sum_of_boundary.append(sum(j.volume for j in part_family[i:i + 2]))
fitness_value = 0
for i in range(len(sum_of_boundary) - 1):
for j in range(i + 1, len(sum_of_boundary)):
fitness_value = fitness_value + abs(sum_of_boundary[i] - sum_of_boundary[j])
return fitness_value
def sort_population_by_fitness(population):
pre_sorted = [[fitness(x),x] for x in population]
sort = [x[1] for x in sorted(pre_sorted)]
for i in sort:
for j in i:
print(j.volume, end = ' ')
print()
return sort
def evolve(population):
population = sort_population_by_fitness(population)
return population
population = create_initial_population()
population = evolve(population)
the error message:
The Output is (which is randomized every time):

Given that pre_sorted is a list of lists with items [fitness, part], this croaks whenever comparing two sublists with the same fitness.
Python lists sort lexicographically and are compared element-wise left to right until a mismatching element is found. In your case, the second element (part) is only accessed if the fitness of two parts is the same.
[0, part0] < [1, part1] => does not compare part0 and part1 since the fitness is already different.
[0, part0] < [0, part1] => does compare part0 and part1 since the fitness is the same.
Suggestion 1
Sort only by fitness: sorted(pre_sorted, key=operator.itemgetter(0))
Suggestion 2
Read the documentation for functools.total_ordering give part a total order:
#total_ordering
class part():
[...]
def __lt__(self, other):
return self.number < other.number
And yeah, sorting lists of lists seems wrong. The inner elements might better be tuples, so you cannot accidentally modify the contents.

So pre_sorted is a list with elements of [int, part]. When you sort this list and have two elements with the same integer value, it then compares the part values to try to determine which goes first. However, since you have no function for determining if a part is less than a part, it throws that error.
Try adding a function __lt__(self, other) to be able to order parts.
More on operators here

Returning the index of the largest element in an array in Python

I'm trying to create a function that returns the largest element of an array, I feel I have the correct code but my syntax is in the wrong order, I'm trying to use a for/while loop in order to do so. So far I have the following:
def manindex(arg):
ans = 0
while True:
for i in range (len(arg)):
if arg[i] > arg[ans]:
pass
ans = i
return ans
Not sure where I'm going wrong if anyone could provide some guidance, thanks
EDIT: So it's been pointing out I'm causing an infinite loop so if I take out the while statement I'm left with
def manindex(arg):
ans = 0
for i in range (len(arg)):
if arg[i] > arg[ans]:
ans = i
return ans
But I have a feeling it's still not correct

When you say array I think you mean list in Python, you don't need a for/loop or while/loop to achieve this at all.
You can also use index with max, like so:
xs.index(max(xs))
sample:
xs = [1,123,12,234,34,23,42,34]
xs.index(max(xs))
3

You could use max with the key parameter set to seq.__getitem__:
def argmax(seq):
return max(range(len(seq)), key=seq.__getitem__)
print(argmax([0,1,2,3,100,4,5]))
yields
4

The idea behind finding the largest index is always the same, iterating over the elements of the array, compare to the max value we have at the moment, if it's better, the index of the current element is the maximum now, if it's not, we keep looking for it.
enumerate approach:
def max_element_index(items):
max_index, max_value = None, None
for index, item in enumerate(items):
if item > max_value:
max_index, max_value = index, item
return max_index
functional approach:
def max_element_index(items):
return reduce(lambda x,y: x[1] > y[1] and x or y,
enumerate(items), (None, None))[0]
At the risk of looking cryptic, the functional approach uses the reduce function which takes two elements and decides what is the reduction. Those elements are tuples (index, element), which are the result of the enumerate function.
The reduce function, defined on the lambda body takes two elements and return the tuple of the largest. As the reduce function reduces until only one element in the result is encountered, the champion is the tuple containing the index of the largest and the largest element, so we only need to access the 0-index of the tuple to get the element.
On the other hand if the list is empty, None object is returned, which is granted on the third parameter of the reduce function.

Before I write a long winded explanation, let me give you the solution:
index, value = max(enumerate(list1), key=lambda x: x[1])
One line, efficient (single pass O(n)), and readable (I think).
Explanation
In general, it's a good idea to use as much of python's incredibly powerful built-in functions as possible.
In this instance, the two key functions are enumerate() and max().
enumerate() converts a list (or actually any iterable) into a sequence of indices and values. e.g.
>>> list1 = ['apple', 'banana', 'cherry']
>>> for tup in enumerate(list1):
... print tup
...
(0, 'apple')
(1, 'banana')
(2, 'cherry')
max() takes an iterable and returns the maximum element. Unfortunately, max(enumerate(list1)) doesn't work, because max() will sort based on the first element of the tuple created by enumerate(), which sadly is the index.
One lesser-known feature of max() is that it can take a second argument in the form max(list1, key=something). The key is a function that can be applied to each value in the list, and the output of that function is what gets used to determine the maximum. We can use this feature to tell max() that it should be ranking items by the second item of each tuple, which is the value contained in the list.
Combining enumerate() and max() with key (plus a little help from lambda to create a function that returns the second element of a tuple) gives you this solution.
index, value = max(enumerate(list1), key=lambda x: x[1])
I came up with this recently (and am sprinkling it everywhere in my code) after watching Raymond Hettinger's talk on Transforming Code into Beautiful, Idiomatic Python, where he suggests exorcising the for i in xrange(len(list1)): pattern from your code.
Alternatively, without resorting to lambda (Thanks #sweeneyrod!):
from operator import itemgetter
index, value = max(enumerate(list1), key=itemgetter(1))

I believe if you change your for loop to....
for i in range (len(arg)):
if arg[i] > ans:
ans = arg[i]
it should work.

You could try something like this. If the list is empty, then the function will return an error.
m is set to the first element of the list, we then iterate over the list comparing the value at ever step.
def findMax(xs):
m = xs[0]
for x in xs:
if x > m:
m = x
return m
findMax([]) # error
findMax([1]) # 1
findMax([2,1]) # 2
if you wanted to use a for loop and make it more generic, then:
def findGeneric(pred, xs):
m = xs[0]
for x in xs:
if pred(x,m):
m = x
return m
findGeneric(lambda a,b: len(a) > len(b), [[1],[1,1,1,1],[1,1]]) # [1,1,1,1]

Trouble with zip when running for loop monte carlo sim; python

Working in python 2.7.
I'm sure the code is a little unwieldy, but I'll try to explain it as simply as I can.
I have two lists:
T = [[1,0], [1,0], [0,5]]
S = [[1], [3], [2]]
I need to add the corresponding value from B to the end of the corresponding list in T, so using zip, I put them together.
I then calculate the result of the first value of each list subtracted from the third, and append that value using another zip function.
So when I run my function, the T variable now looks like [[1,0,1,0], [1,0,3,-2], [0,5,2,-2]].
I then have a series of if statements that if certain values are higher or lower than others, the list returns win, loss, or tie.
I would like to simulate the results of my function (starterTrans) multiple times. The problem is that when I use:
def MonteCarlo(T, S, x):
for i in range(0, x):
starterTrans(T, S)
For each simulation I am getting a different version of T. So the first time through the simulation T has the appropriate number of elements in each list (four), but after each run through, more and more variables are added.
I need a way to lock T to it's original four variables no matter how many times I want to use it. And I'm struggling finding a way to do so. Any ideas?
I know my code is convoluted, but here it is if it helps anyone follow my attempt to describe my problem:
def starterTrans(team, starter):
wins = 0
losses = 0
nd = 0
random.shuffle(team)
for t, s in zip(team, starter):
t.extend(s)
score_add(team, exit_score(team, starter))
length = len(starter)
for i in range(0, length):
if team[i][4] > 0 and (team[i][1] > -team[i][4]) and team[i][2] >= 5:
wins += 1
elif team[i][4] < 0 and (team[i][1] <= -team[i][4]):
losses += 1
elif (team[i][4] <= 0 and team[i][1] >= -team[i][4]):
nd += 1
return wins, losses, nd
def score_add(team, exit_scores):
for t, e in zip(team, exit_scores):
t.append(e)
return team
def exit_score(team, starter):
exit_scores = []
length = len(starter)
for i in range(0, length):
score = team[i][0]-team[i][3]
exit_scores.append(score)
return exit_scores
def MonteCarlo(team, starter, x):
for i in range(0, x):
starterTrans(team, starter)
Thanks for any help.

I think you just need to change this:
def MonteCarlo(T, S, x):
for i in range(0, x):
starterTrans(T, S)
to this:
def MonteCarlo(T, S, x):
for i in range(0, x):
starterTrans(T[:], S)
This will pass a copy of T to starterTrans(..) instead of the original list. If you're editing the elements of T in starterTrans(..) this will not help. Here you would need a deep copy. Have a look here for the difference between shallow and deep copies: What is the difference between a deep copy and a shallow copy?.

Change the last line to starterTrans(team[:], starter). That will pass in a copy of team, leaving the original intact.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Easiest way to classify these values? - python

Solution based on np.where: import numpy as np layers = [10,9,8,7] def less_than(layers, z): return np.max(np.where(np.array(layers)>=z)) Example: less_than(layers, 8.5) output: 1

Related

Replicate =LARGE and =SMALL function in Excel to Python

Python all()/any() like method for a portion/part of list?

TypeError: '<' not supported between instances Python

Returning the index of the largest element in an array in Python

Trouble with zip when running for loop monte carlo sim; python

Categories

Resources