How to rewrite the code more elegant - python

I wrote this function. The input and expected results are indicated in the docstring.
def summarize_significance(sign_list):
"""Summarizes a series of individual significance data in a list of ocurrences.
For a group of p.e. 5 measurements and two diferent states, the input data
has the form:
sign_list = [[-1, 1],
[0, 1],
[0, 0],
[0,-1],
[0,-1]]
where -1, 0, 1 indicates decrease, no change or increase respectively.
The result is a list of 3 items lists indicating how many measurements
decrease, do not change or increase (as list items 0,1,2 respectively) for each state:
returns: [[1, 4, 0], [2, 1, 2]]
"""
swaped = numpy.swapaxes(sign_list, 0, 1)
summary = []
for row in swaped:
mydd = defaultdict(int)
for item in row:
mydd[item] += 1
summary.append([mydd.get(-1, 0), mydd.get(0, 0), mydd.get(1, 0)])
return summary
I am wondering if there is a more elegant, efficient way of doing the same thing. Some ideas?

Here's one that uses less code and is probably more efficient because it just iterates through sign_list once without calling swapaxes, and doesn't build a bunch of dictionaries.
summary = [[0,0,0] for _ in sign_list[0]]
for row in sign_list:
for index,sign in enumerate(row):
summary[index][sign+1] += 1
return summary

No, just more complex ways of doing so.
import itertools
def summarize_significance(sign_list):
res = []
for s in zip(*sign_list):
d = dict((x[0], len(list(x[1]))) for x in itertools.groupby(sorted(s)))
res.append([d.get(x, 0) for x in (-1, 0, 1)])
return res

For starters, you could do:
swapped = numpy.swapaxes(sign_list, 0, 1)
for row in swapped:
mydd = {-1:0, 0:0, 1:0}
for item in row:
mydd[item] += 1
summary.append([mydd[-1], mydd[0], mydd[1])
return summary

Related

how to separate a binary [0,1] list according custom setting?

I have a binary list including only two elements(such as 0,1)
1010100010100000000000100000100000101000000000000000100000001000010
How to do paired transcoding with a custom setting of occurrence?
this is encoding rule:
if element occurs continuously less than 3 times, the encoding is 0,
if elements occurs continuously occurs 4-7 times, the encoding is 1,
if elements occurs continuously more than 7 times, the encoding is 2.
custom show up setting:
0-3 :0(short)
4-7 :1(medium)
more than 7 : 2(long)
for example :
how to let 0100111100011100000000 be transformed into [[0,0],[1,0],[0,0],[1,1],[0,0],[1,0],[0,2]] following by the above rule
*[a,b]
a: 0,1(there is only binary outcomes in my list )
b:0,1,2(it's my custom frequency setting)
The solution with only basic statements is:
word = '0100111100011100000000'
#Consecutive counts
count=1
counts = []
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
counts.append([ word[i-1],count])
count=1
counts.append([ word[i-1],count])
else:
i=0
counts.append([ word[i-1],count])
#check your conditions
output = []
for l in counts:
if l[1]<= 3 :
output.append([int(l[0]), 0])
elif l[1]> 3 and l[1]<8 :
output.append([int(l[0]), 1])
else:
output.append([int(l[0]), 2])
print(output)
Output :
[[0, 0], [1, 0], [0, 0], [1, 1], [0, 0], [1, 0], [0, 2]]
You can define a function to translate length of groups to numbers, then use e.g. itertools.groupby to separate the different groups of characters and apply that function is a list comprehension.
from itertools import groupby
def f(g):
n = len(list(g))
return 0 if n <= 3 else 1 if n <= 7 else 2
s = "0100111100011100000000"
res = [(int(k), f(g)) for k, g in groupby(s)]
# [(0, 0), (1, 0), (0, 0), (1, 1), (0, 0), (1, 0), (0, 2)]

How I can optimize the given below my python code?

I have an array and given an array of size N containing positive integers and I want to count number of smaller elements on right side of each array.
for example:-
Input:
N = 7
arr[] = {12, 1, 2, 3, 0, 11, 4}
Output: 6 1 1 1 0 1 0
Explanation: There are 6 elements right
after 12. There are 1 element right after
1. And so on.
And my code for this problem is like as :-
# python code here
n=int(input())
arr=list(map(int,input().split()))
ans=0
ANS=[]
for i in range(n-1):
for j in range(i+1,n):
if arr[i]>arr[j]:
ans+=1
ANS.append(ans)
ans=0
ANS.append(0)
print(ANS)
but the above my code take O(n^2) time complexity and I want to reduce the this. If anyone have any idea to reduce above python code time complexity please help me. Thank you.
This solution is O(n log(n)) as it is three iterations over the values and one sorting.
arr = [12, 1, 2, 3, 0, 11, 4]
# Gather original index and values
tups = []
for origin_index, el in enumerate(arr):
tups.append([origin_index, el])
# sort on value
tups.sort(key=lambda t: t[1])
res = []
for sorted_index, values in enumerate(tups):
# check the difference between the sorted and original index
# If there is a positive value we have the n difference smaller
# values to the right of this index value.
if sorted_index - values[0] > 0:
res.append([values[0], (sorted_index - values[0])])
elif sorted_index - values[0] == 0:
res.append([values[0], (sorted_index - values[0]) + 1])
else:
res.append([values[0], 0])
origin_sort_res = [0 for i in range(len(arr))]
for v in res:
# Return the from the sorted array to the original indexing
origin_sort_res[v[0]] = v[1]
print(origin_sort_res)
try this(nlog2n)
def solution(nums):
sortns = []
res = []
for n in reversed(nums):
idx = bisect.bisect_left(sortns, n)
res.append(idx)
sortns.insert(idx,n)
return res[::-1]
print(solution([12, 1, 2, 3, 0, 11, 4]))
# [6, 1, 1, 1, 0, 1, 0]

multiple dimensional permutations [duplicate]

This question already has answers here:
How do I generate all permutations of a list?
(40 answers)
Closed 3 years ago.
given a list of non-zero integers like, [2, 3, 4, 2]
generate a list of all the permutations possible where each element above reflects its maximum variance (I am sure there is a better way to express this, but I don't have the math background); each element in the above array can be considered a dimension; the above 2 would allow for values 0 and 1; the 3 would allow for values 0, 1 and 2, etc
the result would be a list of zero-based tuples:
[(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 0, 1, 1), (0, 0, 2, 0)...
and so on till (1, 2, 3, 1)]
the length of the array could vary, from 1 element to x
you can use itertools.product:
try this:
from itertools import product
limits = [2, 3, 4, 2]
result = list(product(*[range(x) for x in limits]))
print(result)
What you're basically doing is trying to represent integers in a changing base. In your example, some of the digits are base 2, some base 3, and some base 4. So you can use an algorithm that chance base 10 to any base, and have the base you convert to depend on the current digit. Here's what I threw together, not sure if it's completely clear how it works.
n = [2, 3, 4, 2]
max_val = 1
for i in n:
max_val *= i
ans = [] # will hold the generated lists
for i in range(max_val):
current_value = i
current_perm = []
for j in n[::-1]: # For you, the 'least significant bit' is on the right
current_perm.append(current_value % j)
current_value //= j # integer division in python 3
ans.append(current_perm[::-1]) # flip it back around!
print(ans)
So you basically just want to count, but you have a different limit for each position?
limits = [2,3,4,2]
counter = [0] * len(limits)
def check_limits():
for i in range(len(limits)-1, 0, -1):
if counter[i] >= limits[i]:
counter[i] = 0
counter[i-1] += 1
return not counter[0] >= limits[0]
while True:
counter[len(counter)-1] += 1
check = check_limits()
if check:
print(counter)
else:
break
Not a list of tuples, but you get the idea...

Iterative Permutations of Distribution

I'm trying to generate all possible combinations of a distribution of sorts.
For example, say you have 5 points to spend on 4 categories, but you can only spend a maximum of 2 points on any given category.
In this instance, all possible solutions would be as follows:
[0, 1, 2, 2]
[0, 2, 1, 2]
[0, 2, 2, 1]
[1, 0, 2, 2]
[1, 1, 1, 2]
[1, 1, 2, 1]
[1, 2, 0, 2]
[1, 2, 1, 1]
[1, 2, 2, 0]
[2, 0, 1, 2]
[2, 0, 2, 1]
[2, 1, 0, 2]
[2, 1, 1, 1]
[2, 1, 2, 0]
[2, 2, 0, 1]
[2, 2, 1, 0]
I have successfully been able to make a recursive function that accomplishes this, but for larger numbers of categories it takes extremely long to generate. I have attempted making an iterative function instead in hopes of speeding it up, but I can't seem to get it to account for the category maximums.
Here is my recursive function (count = points, dist = zero-filled array w/ same size as max_allo)
def distribute_recursive(count, max_allo, dist, depth=0):
for ration in range(max(count - sum(max_allo[depth + 1:]), 0), min(count, max_allo[depth]) + 1):
dist[depth] = ration
count -= ration
if depth + 1 < len(dist):
distribute_recursive(count, max_allo, dist, depth + 1)
else:
print(dist)
count += ration
recursion isn't slow
Recursion isn't what's making it slow; consider a better algorithm
def dist (count, limit, points, acc = []):
if count is 0:
if sum (acc) is points:
yield acc
else:
for x in range (limit + 1):
yield from dist (count - 1, limit, points, acc + [x])
You can collect the generated results in a list
print (list (dist (count = 4, limit = 2, points = 5)))
pruning invalid combinations
Above, we use a fixed range of limit + 1, but watch what happens if we're generating a combination with a (eg) limit = 2 and points = 5 ...
[ 2, ... ] # 3 points remaining
[ 2, 2, ... ] # 1 point remaining
At this point, using a fixed range of limit + 1 ([ 0, 1, 2 ]) is silly because we know we only have 1 point remaining to spend. The only remaining options here are 0 or 1...
[ 2, 2, 1 ... ] # 0 points remaining
Above we know we can use an empty range of [ 0 ] because there's no points left to spend. This will prevent us from attempting to validate combinations like
[ 2, 2, 2, ... ] # -1 points remaining
[ 2, 2, 2, 0, ... ] # -1 points remaining
[ 2, 2, 2, 1, ... ] # -2 points remaining
[ 2, 2, 2, 2, ... ] # -3 points remaining
If count was significantly large, this could rule out a huge amount of invalid combinations
[ 2, 2, 2, 2, 2, 2, 2, 2, 2, ... ] # -15 points remaining
To implement this optimization, we could add yet another parameter to our dist function, but at 5 parameters, it would start to look messy. Instead we introduce an auxiliary function to control the loop. Adding our optimization, we trade the fixed range for a dynamic range of min (limit, remaining) + 1. And finally, since we know how many points have been allocated, we no longer need to test the sum of each combination; yet another expensive operation removed from our algorithm
# revision: prune invalid combinations
def dist (count, limit, points):
def loop (count, remaining, acc):
if count is 0:
if remaining is 0:
yield acc
else:
for x in range (min (limit, remaining) + 1):
yield from loop (count - 1, remaining - x, acc + [x])
yield from loop (count, points, [])
benchmarks
In the benchmarks below, the first version of our program is renamed to dist1 and the faster program using a dynamic range dist2. We setup three tests, small, medium, and large
def small (prg):
return list (prg (count = 4, limit = 2, points = 5))
def medium (prg):
return list (prg (count = 8, limit = 3, points = 7))
def large (prg):
return list (prg (count = 16, limit = 5, points = 10))
And now we run the tests, passing each program as an argument. Note for the large test, only 1 pass is done as dist1 takes awhile to generate the result
print (timeit ('small (dist1)', number = 10000, globals = globals ()))
print (timeit ('small (dist2)', number = 10000, globals = globals ()))
print (timeit ('medium (dist1)', number = 100, globals = globals ()))
print (timeit ('medium (dist2)', number = 100, globals = globals ()))
print (timeit ('large (dist1)', number = 1, globals = globals ()))
print (timeit ('large (dist2)', number = 1, globals = globals ()))
The results for the small test show that pruning invalid combinations doesn't make much of a difference. However in the medium and large cases, the difference is dramatic. Our old program takes over 30 minutes for the large set, but just over 1 second using the new program!
dist1 small 0.8512216459494084
dist2 small 0.8610155049245805 (0.98x speed-up)
dist1 medium 6.142372329952195
dist2 medium 0.9355670949444175 (6.57x speed-up)
dist1 large 1933.0877765258774
dist2 large 1.4107366011012346 (1370.26x speed-up)
For frame of reference, the size of each result is printed below
print (len (small (dist2))) # 16 (this is the example in your question)
print (len (medium (dist2))) # 2472
print (len (large (dist2))) # 336336
checking our understanding
In the large benchmark with count = 12 and limit = 5, using our unoptimized program we were iterating through 512, or 244,140,625 possible combinations. Using our optimized program, we skip all invalid combinations resulting in 336,336 valid answers. By analyzing combination count alone, we see a staggering 99.86% of possible combinations are invalid. If analysis of each combination costs an equal amount of time, we can expect our optimized program to perform at a minimum of 725.88x better, due to invalid combination pruning.
In the large benchmark, measured at 1370.26x faster, the optimized program meets our expectations and even goes beyond. The additional speed-up is likely owed to the fact we eliminated the call to sum
huuuuge
To show this technique works for extremely large data sets, consider the huge benchmark. Our program finds 17,321,844 valid combinations amongst 716, or 33,232,930,569,601 possibilities.
In this test, our optimized program prunes 99.99479% of the invalid combinations. Correlating these numbers to the previous data set, we estimate the optimized program runs 1,918,556.16x faster than the unoptimized version.
The theoretical running time of this benchmark using the unoptimized program is 117.60 years. The optimized program finds the answer in just over 1 minute.
def huge (prg):
return list (prg (count = 16, limit = 7, points = 12))
print (timeit ('huge (dist2)', number = 1, globals = globals ()))
# 68.06868170504458
print (len (huge (dist2)))
# 17321844
You can use a generator function for the recursion, while applying additional logic to cut down on the number of recursive calls needed:
def listings(_cat, points, _max, current = []):
if len(current) == _cat:
yield current
else:
for i in range(_max+1):
if sum(current+[i]) <= points:
if sum(current+[i]) == points or len(current+[i]) < _cat:
yield from listings(_cat, points, _max, current+[i])
print(list(listings(4, 5, 2)))
Output:
[[0, 1, 2, 2], [0, 2, 1, 2], [0, 2, 2, 1], [1, 0, 2, 2], [1, 1, 1, 2], [1, 1, 2, 1], [1, 2, 0, 2], [1, 2, 1, 1], [1, 2, 2, 0], [2, 0, 1, 2], [2, 0, 2, 1], [2, 1, 0, 2], [2, 1, 1, 1], [2, 1, 2, 0], [2, 2, 0, 1], [2, 2, 1, 0]]
While it is unclear at around what category size your solution drastically slows down, this solution will run under one second for category sizes up to 24, searching for a total of five points with a maximum slot value of two. Note that for large point and slot values, the number of possible category sizes computed under a second increases:
import time
def timeit(f):
def wrapper(*args):
c = time.time()
_ = f(*args)
return time.time() - c
return wrapper
#timeit
def wrap_calls(category_size:int) -> float:
_ = list(listings(category_size, 5, 2))
benchmark = 0
category_size = 1
while benchmark < 1:
benchmark = wrap_calls(category_size)
category_size += 1
print(category_size)
Output:
24

Find start and end of abrupt changes in an arrary in Python

Very new to Python and seems to me this task is not solvable based on my learning.
Please help:
I have an array created by the Python program, which provides a 1d array like this:
[0,0,.01,.1,1,1,.1,.01,0,0,0,0.01,.1,1,.1,.01,0,0,0,.01,.1,1,1,.1,.01,0,0,0,.01,.1,1,1]
You can see the array number go from zero to max and then again to zero many times.
I need to find index where it starts to go up and down every time. So here it would be [3,9,12,17,20,26,29]
This is what I tried so far, but in vain
My_array==[0,0,.01,.1,1,1,.1,.01,0,0,0,0.01,.1,1,.1,.01,0,0,0,.01,.1,1,1,.1,.01,0,0,0,.01,.1,1,1]
def _edge(ii):
for i in range (ii, len(My_array)):
if np.abs(My_array[i]-My_array[i-1])>.01;
index=i # save the index where the condition met
break
for ii in range (1, len(My_array))
if ii <len(My_array): # make sure the loop continues till the end
F1_Index=_edge(ii)
F1_Index1.append(F1_Index)
If you use numpy you can do something like this:
import numpy as np
a = np.array([0,0,.01,.1,1,1,.1,.01,0,0,0,0.01,.1,1,.1,.01,0,0,0,.01,.1,1,1,.1,.01,0,0,0,.01,.1,1,1])
b = a[1:] - a[:-1] # find differences between sequential elements
v = abs(b) == 0.01 # find differences whose magnitude are 0.01
# This returns an array of True/False values
edges = v.nonzero()[0] # find indexes of True values
edges += 2 # add 1 because of the differencing scheme,
# add 1 because the results you give in the question
# are 1 based arrays and python uses zero based arrays
edges
> array([ 3, 9, 12, 17, 20, 26, 29], dtype=int64)
This is the fastest way I've found to do this sort of thing.
The following I think does what you need. It first builds a list holding -1, 0 or 1 giving the difference between adjacent values (unfortunately cmp has been removed from Python 3 as this was the perfect function for doing this). It then uses the groupby function and a non-zero filter to generate a list of indexes for when the direction changes:
import itertools
My_array = [0, 0, .01, .1, 1, 1, .1, .01, 0, 0, 0, 0.01, .1, 1, .1, .01, 0, 0, 0, .01, .1, 1, 1, .1, .01, 0, 0, 0, .01, .1, 1, 1]
def my_cmp(x,y):
if x == y: # Or for non-exact changes use: if abs(x-y) <= 0.01:
return 0
else:
return 1 if y > x else -1
def pairwise(iterable):
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
slope = [(my_cmp(pair[1], pair[0]), index) for index, pair in enumerate(pairwise(My_array))]
indexes_of_changes = [next(g)[1]+1 for k,g in itertools.groupby(slope, lambda x: x[0]) if k != 0]
print(indexes_of_changes)
Giving you the following result for your data:
[2, 6, 11, 14, 19, 23, 28]
Note, this gives you ANY change in direction, not just > 0.01.
Tested using Python 3.
Here is the way I did it which is working for me (mostly learned from Brad's code). Still quite do not understand how b.nonzero()[0] is working,. Brad please explain, if possible.
import numpy as np
a = np.array([0,0,.01,.1,1,1,.1,.01,0,0,0,0.01,.1,1,.1,.01,0,0,0,.01,.1,1,1,.1,.01,0,0,0,.01,.1,1,1])
b0=[x>.1 for x in a] # making array of true and false
b0=np.array(b0)
b0=b0*1# converting true false to 1 and 0
b=abs(b0[1:]-b0[:-1])# now I only have 1 where there is a change
edges = b.nonzero()[0] # find indexes of 1 values
edges
array([ 3, 5, 12, 13, 20, 22, 29], dtype=int64)

Categories