Substituting missing values in Python - python

I want to substitute missing values (None) with the last previous known value. This is my code. But it doesn't work. Any suggestions for a better algorithm?
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
for line in table:
for value in line:
if value == None:
value = line[line.index(value)-1]
return table
print treat_missing_values(t)

This is probably how I'd do it:
>>> def treat_missing_values(table):
... for line in table:
... prev = None
... for i, value in enumerate(line):
... if value is None:
... line[i] = prev
... else:
... prev = value
... return table
...
>>> treat_missing_values([[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[1, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]
>>> treat_missing_values([[None, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[None, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]

When you do an assignment in python, you are merely creating a reference on an object in memory. You can't use value to set the object in the list because you're effectively making value reference another object in memory.
To do what you want, you need to set directly in the list at the right index.
As stated, your algorithm won't work if one of the inner lists has None as the first value.
So you can do it like this:
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table, default_value):
last_value = default_value
for line in table:
for index in xrange(len(line)):
if line[index] is None:
line[index] = last_value
else:
last_value = line[index]
return table
print treat_missing_values(t, 0)

That thing about looking up the index from the value won't work if the list start with None or if there's a duplicate value. Try this:
def treat(v):
p = None
r = []
for n in v:
p = p if n == None else n
r.append(p)
return r
def treat_missing_values(table):
return [ treat(v) for v in table ]
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
print treat_missing_values(t)
This better not be your homework, dude.
EDIT A functional version for all you FP fans out there:
def treat(l):
def e(first, remainder):
return [ first ] + ([] if len(remainder) == 0 else e(first if remainder[0] == None else remainder[0], remainder[1:]))
return l if len(l) == 0 else e(l[0], l[1:])

That's because the index method returns the first occurence of the argument you pass to it. In the first line, for example, line.index(None) will always return 2, because that's the first occurence of None in that list.
Try this instead:
def treat_missing_values(table):
for line in table:
for i in range(len(line)):
if line[i] == None:
if i != 0:
line[i] = line[i - 1]
else:
#This line deals with your other problem: What if your FIRST value is None?
line[i] = 0 #Some default value here
return table

I'd use a global variable to keep track of the most recent valid value. And I'd use map() for the iteration.
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
prev = 0
def vIfNone(x):
global prev
if x:
prev = x
else:
x = prev
return x
print map( lambda line: map( vIfNone, line ), t )
EDIT: Malvolio, here. Sorry to be writing in your answer, but there were too many mistakes to corrected in a comment.
if x: will fail for all falsy values (notably 0 and the empty string).
Mutable global values are bad. They aren't thread-safe and produce other peculiar behaviors (in this case, if a list starts with None, it is set to the last value that happened to be processed by your code.
The re-writing of x is unnecessary; prev always has the right value.
In general, things like this should be wrapped in functions, for naming and for scoping
So:
def treat(n):
prev = [ None ]
def vIfNone(x):
if x is not None:
prev[0] = x
return prev[0]
return map( vIfNone, n )
(Note the weird use of prev as a closed variable. It will be local to each invocation of treat, and global across all invocations of vIfNone from the same treat invocation, exactly what you need. For dark and probably disturbing Python reasons I don't understand, it has to be an array.)

EDIT1
# your algorithm won't work if the line start with None
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
for line in table:
for index in range(len(line)):
if line[index] == None:
line[index] = line[index-1]
return table
print treat_missing_values(t)

Related

Function to make a list as unsorted as possible

I am looking for a function to make the list as unsorted as possible. Preferably in Python.
Backstory:
I want to check URLs statuses and see if URLs give a 404 or not. I just use asyncio and requests modules. Nothing fancy.
Now I don't want to overload servers, so I want to minimize checking URLs which are on the same domain at the same time. I have this idea to sort the URLs in a way that items which are close to one another (having the same sort key = domain name) are placed as far apart from each other in the list as possible.
An example with numbers:
a=[1,1,2,3,3] # <== sorted list, sortness score = 2
0,1,2,3,4 # <== positions
could be unsorted as:
b=[1,3,2,1,3] # <== unsorted list, sortness score = 6
0,1,2,3,4 # <== positions
I would say that we can compute a sortness score by summing up the distances between equal items (which have the same key = domain name). Higher sortness means better unsorted. Maybe there is a better way for testing unsortness.
The sortness score for list a is 2. The sum of distances for 1 is (1-0)=1, for 2 is 0, for 3 is (4-3)=1.
The sortness score for list b is 6. The sum of distances for 1 is (3-0)=3, for 2 is 0, for 3 is (4-1)=3.
URLs list would look something like a list of (domain, URL) tuples:
[
('example.com', 'http://example.com/404'),
('test.com', 'http://test.com/404'),
('test.com', 'http://test.com/405'),
('example.com', 'http://example.com/405'),
...
]
I am working on a prototype which works Ok-ish, but not optimal as I can find some variants which are better unsorted by hand.
Anyone wants to give it a go?
This is my code, but it's not great :):
from collections import Counter
from collections import defaultdict
import math
def test_unsortness(lst:list) -> float:
pos = defaultdict(list)
score = 0
# Store positions for each key
# input = [1,3,2,3,1] => {1: [0, 4], 3: [1, 3], 2: [2]}
for c,l in enumerate(lst):
pos[l].append(c)
for k,poslst in pos.items():
for i in range(len(poslst)-1):
score += math.sqrt(poslst[i+1] - poslst[i])
return score
def unsort(lst:list) -> list:
free_positions = list(range(0,len(lst)))
output_list = [None] * len(free_positions)
for val, count in Counter(lst).most_common():
pos = 0
step = len(free_positions) / count
for i in range(count):
output_list[free_positions[int(pos)]] = val
free_positions[int(pos)] = None # Remove position later
pos = pos + step
free_positions = [p for p in free_positions if p]
return output_list
lsts = list()
lsts.append( [1,1,2,3,3] )
lsts.append( [1,3,2,3,1] ) # This has the worst score after unsort()
lsts.append( [1,2,3,0,1,2,3] ) # This has the worst score after unsort()
lsts.append( [3,2,1,0,1,2,3] ) # This has the worst score after unsort()
lsts.append( [3,2,1,3,1,2,3] ) # This has the worst score after unsort()
lsts.append( [1,2,3,4,5] )
for lst in lsts:
ulst = unsort(lst)
print( ( lst, '%.2f'%test_unsortness(lst), '====>', ulst, '%.2f'%test_unsortness(ulst), ) )
# Original score Unsorted score
# ------- ----- -------- -----
# ([1, 1, 2, 3, 3], '2.00', '====>', [1, 3, 1, 3, 2], '2.83')
# ([1, 3, 2, 3, 1], '3.41', '====>', [1, 3, 1, 3, 2], '2.83')
# ([1, 2, 3, 0, 1, 2, 3], '6.00', '====>', [1, 2, 3, 1, 2, 3, 0], '5.20')
# ([3, 2, 1, 0, 1, 2, 3], '5.86', '====>', [3, 2, 1, 3, 2, 1, 0], '5.20')
# ([3, 2, 1, 3, 1, 2, 3], '6.88', '====>', [3, 2, 3, 1, 3, 2, 1], '6.56')
# ([1, 2, 3, 4, 5], '0.00', '====>', [1, 2, 3, 4, 5], '0.00')
PS. I am not looking just for a randomize function and I know there are crawlers which can manage domain loads, but this is for the sake of exercise.
Instead of unsorting your list of URLs, why not grouping them by domain, each in a queue, then process them asynchronously with a delay (randomised?) in between?
It looks to me less complex than what you're trying to do to achieve the same thing and if you have a lot of domain, you can always throttle the number to run through concurrently at that point.
I used Google OR Tools to solve this problem. I framed it as a constraint optimization problem and modeled it that way.
from collections import defaultdict
from itertools import chain, combinations
from ortools.sat.python import cp_model
model = cp_model.CpModel()
data = [
('example.com', 'http://example.com/404'),
('test.com', 'http://test.com/404'),
('test.com', 'http://test.com/405'),
('example.com', 'http://example.com/405'),
('google.com', 'http://google.com/404'),
('example.com', 'http://example.com/406'),
('stackoverflow.com', 'http://stackoverflow.com/404'),
('test.com', 'http://test.com/406'),
('example.com', 'http://example.com/407')
]
tmp = defaultdict(list)
for (domain, url) in sorted(data):
var = model.NewIntVar(0, len(data) - 1, url)
tmp[domain].append(var) # store URLs as model variables where the key is the domain
vals = list(chain.from_iterable(tmp.values())) # create a single list of all variables
model.AddAllDifferent(vals) # all variables must occupy a unique spot in the output
constraint = []
for urls in tmp.values():
if len(urls) == 1: # a single domain does not need a specific constraint
constraint.append(urls[0])
continue
combos = combinations(urls, 2)
for (x, y) in combos: # create combinations between each URL of a specific domain
constraint.append((x - y))
model.Maximize(sum(constraint)) # maximize the distance between similar URLs from our constraint list
solver = cp_model.CpSolver()
status = solver.Solve(model)
output = [None for _ in range(len(data))]
if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
for val in vals:
idx = solver.Value(val)
output[idx] = val.Name()
print(output)
['http://example.com/407',
'http://test.com/406',
'http://example.com/406',
'http://test.com/405',
'http://example.com/405',
'http://stackoverflow.com/404',
'http://google.com/404',
'http://test.com/404',
'http://example.com/404']
There is no obvious definition of unsortedness that would work best for you, but here's something that at least works well:
Sort the list
If the length of the list is not a power of two, then spread the items out evenly in a list with the next power of two size
Find a new index for each item by reversing the bits in its old index.
Remove the gaps to bring the list back to its original size.
In sorted order, the indexes of items that are close together usually differ only in the smallest bits. By reversing the bit order, you make the new indexes for items that are close together differ in the largest bits, so they will end up far apart.
def bitreverse(x, bits):
# reverse the lower 32 bits
x = ((x & 0x55555555) << 1) | ((x & 0xAAAAAAAA) >> 1)
x = ((x & 0x33333333) << 2) | ((x & 0xCCCCCCCC) >> 2)
x = ((x & 0x0F0F0F0F) << 4) | ((x & 0xF0F0F0F0) >> 4)
x = ((x & 0x00FF00FF) << 8) | ((x & 0xFF00FF00) >> 8)
x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16)
# take only the appropriate length
return (x>>(32-bits)) & ((1<<bits)-1)
def antisort(inlist):
if len(inlist) < 3:
return inlist
inlist = sorted(inlist)
#get the next power of 2 list length
p2len = 2
bits = 1
while p2len < len(inlist):
p2len *= 2
bits += 1
templist = [None] * p2len
for i in range(len(inlist)):
newi = i * p2len // len(inlist)
newi = bitreverse(newi, bits)
templist[newi] = inlist[i]
return [item for item in templist if item != None]
print(antisort(["a","b","c","d","e","f","g",
"h","i","j","k","l","m","n","o","p","q","r",
"s","t","u","v","w","x","y","z"]))
Output:
['a', 'n', 'h', 'u', 'e', 'r', 'k', 'x', 'c', 'p', 'f', 's',
'm', 'z', 'b', 'o', 'i', 'v', 'l', 'y', 'd', 'q', 'j', 'w', 'g', 't']
You could implement an inverted binary search.
from typing import Union, List
sorted_int_list = [1, 1, 2, 3, 3]
unsorted_int_list = [1, 3, 2, 1, 3]
sorted_str_list = [
"example.com",
"example.com",
"test.com",
"stackoverflow.com",
"stackoverflow.com",
]
unsorted_str_list = [
"example.com",
"stackoverflow.com",
"test.com",
"example.com",
"stackoverflow.com",
]
def inverted_binary_search(
input_list: List[Union[str, int]],
search_elem: Union[int, str],
list_selector_start: int,
list_selector_end: int,
) -> int:
if list_selector_end - list_selector_start <= 1:
if search_elem < input_list[list_selector_start]:
return list_selector_start - 1
else:
return list_selector_start
list_selector_mid = (list_selector_start + list_selector_end) // 2
if input_list[list_selector_mid] > search_elem:
return inverted_binary_search(
input_list=input_list,
search_elem=search_elem,
list_selector_start=list_selector_mid,
list_selector_end=list_selector_end,
)
elif input_list[list_selector_mid] < search_elem:
return inverted_binary_search(
input_list=input_list,
search_elem=search_elem,
list_selector_start=list_selector_start,
list_selector_end=list_selector_mid,
)
else:
return list_selector_mid
def inverted_binary_insertion_sort(your_list: List[Union[str, int]]):
for idx in range(1, len(your_list)):
selected_elem = your_list[idx]
inverted_binary_search_position = (
inverted_binary_search(
input_list=your_list,
search_elem=selected_elem,
list_selector_start=0,
list_selector_end=idx,
)
+ 1
)
for idk in range(idx, inverted_binary_search_position, -1):
your_list[idk] = your_list[idk - 1]
your_list[inverted_binary_search_position] = selected_elem
return your_list
Output
inverted_sorted_int_list = inverted_binary_insertion_sort(sorted_int_list)
print(inverted_sorted_int_list)
>> [1, 3, 3, 2, 1]
inverted_sorted_str_list = inverted_binary_insertion_sort(sorted_str_list)
print(inverted_sorted_str_list)
>> ['example.com', 'stackoverflow.com', 'stackoverflow.com', 'test.com', 'example.com']
Update:
Given the comments, you could also run the function twice. This will untangle duplicates.
inverted_sorted_int_list = inverted_binary_insertion_sort(
inverted_binary_insertion_sort(sorted_int_list)
)
>> [1, 3, 2, 1, 3]
Here's a stab at it, but I am not sure it wouldn't degenerate a bit given particular input sets.
We pick the most frequent found item and append its first occurrence to a list. Then same with the 2nd most frequent and so on.
Repeat half the size of the most found item. That's the left half of the list.
Then moving from least frequent to most frequent, pick first item and add its values. When an item is found less than half the max, pick on which side you want to put it.
Essentially, we layer key by key and end up with more frequent items at left-most and right-most positions in the unsorted list, leaving less frequent ones in the middle.
def unsort(lst:list) -> list:
"""
build a dictionary by frequency first
then loop thru the keys and append
key by key with the other keys in between
"""
result = []
#dictionary by keys (this would be domains to urls)
di = defaultdict(list)
for v in lst:
di[v].append(v)
#sort by decreasing dupes length
li_len = [(len(val),key) for key, val in di.items()]
li_len.sort(reverse=True)
#most found count
max_c = li_len[0][0]
#halfway point
odd = max_c % 2
num = max_c // 2
if odd:
num += 1
#frequency, high to low
by_freq = [tu[1] for tu in li_len]
di_side = {}
#for the first half, pick from frequent to infrequent
#alternating by keys
for c in range(num):
#frequent to less
for key in by_freq:
entries = di[key]
#first pass: less than half the number of values
#we don't want to put all the infrequents first
#and have a more packed second side
if not c:
#pick on way in or out?
if len(entries) <= num:
#might be better to alternate L,R,L
di_side[key] = random.choice(["L","R"])
else:
#pick on both
di_side[key] = "B"
#put in the left half
do_it = di_side[key] in ("L","B")
if do_it and entries:
result.append(entries.pop(0))
#once you have mid point pick from infrequent to frequent
for c in range(num):
#frequent to less
for key in reversed(by_freq):
entries = di[key]
#put in the right half
do_it = di_side[key] in ("R","B")
if entries:
result.append(entries.pop(0))
return result
Running this I got:
([1, 1, 2, 3, 3], '2.00', '====>', [3, 1, 2, 1, 3], '3.41')
([1, 3, 2, 3, 1], '3.41', '====>', [3, 1, 2, 1, 3], '3.41')
([1, 2, 3, 0, 1, 2, 3], '6.00', '====>', [3, 2, 1, 0, 1, 2, 3], '5.86')
([3, 2, 1, 0, 1, 2, 3], '5.86', '====>', [3, 2, 1, 0, 1, 2, 3], '5.86')
([3, 2, 1, 3, 1, 2, 3], '6.88', '====>', [3, 2, 3, 2, 1, 3, 1], '5.97')
([1, 2, 3, 4, 5], '0.00', '====>', [5, 1, 2, 3, 4], '0.00')
Oh, and I also added an assert to check nothing had been dropped or altered by the unsorting:
assert(sorted(lst) == sorted(ulst))
alternate approach?
I'll leave it as a footnote for now, but the general idea of not clustering (not the OP's specific application of not overloading domains) seems like it would be a candidate for a force-repulsive approach, where identical domains would try to keep as far from each other as possible. i.e. 1, 1, 2 => 1, 2, 1 because the 1s would repel each other. That's a wholly different algorithmic approach however.
When I faced a similar problem, here's how I solved it:
Define the "distance" between two strings (URLs in this case) as their Levenshtein distance (code to compute this value is readily available)
Adopt your favorite travelling-salesman algorithm to find the (approximate) shortest path through your set of strings (finding the exact shortest path isn't computationally feasible but the approximate algorithms are fairly efficient)
Now modify your "distance" metric to be inverted -- i.e. compute the distance between two strings (s1,s2) as MAX_INT - LevenshteinDistance(s1,s2)
With this modification, the "shortest path" through your set is now really the longest path, i.e. the most un-sorted one.
An easy way to scramble a list is to maximize its "sortness" score using a genetic algorithm with a permutation chromosome. I was able to hack quickly a version in R using the GA package. I'm not a Python user, but I am sure there are GA libraries for Python that include permutation chromosomes. If not, a general GA library with real-valued vector chromosomes can be adapted. You just use a vector with values in [0, 1] as a chromosome and convert each vector to its sort index.
I hope this algorithm works correctly:
unsorted_list = ['c', 'a', 'a', 'a', 'a', 'b', 'b']
d = {i: unsorted_list.count(i) for i in unsorted_list}
print(d) # {'c': 1, 'a': 4, 'b': 2}
d = {k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse=True)}
print(d) # {'a': 4, 'b': 2, 'c': 1}
result = [None] * len(unsorted_list)
border_index_left = 0
border_index_right = len(unsorted_list) - 1
it = iter(d)
def set_recursively(k, nk):
set_borders(k)
set_borders(nk)
if d[k]:
set_recursively(k, nk)
def set_borders(key):
global border_index_left, border_index_right
if key is not None and d[key]:
result[border_index_left] = key
d[key] = d[key] - 1
border_index_left = border_index_left + 1
if key is not None and d[key]:
result[border_index_right] = key
d[key] = d[key] - 1
border_index_right = border_index_right - 1
next_element = next(it, None)
for k, v in d.items():
next_element = next(it, None)
set_recursively(k, next_element)
print(result) # ['a', 'b', 'a', 'c', 'a', 'b', 'a']
Visually, it looks as walking from the edge to the middle:
[2, 3, 3, 3, 1, 1, 0]
[None, None, None, None, None, None, None]
[3, None, None, None, None, None, None]
[3, None, None, None, None, None, 3]
[3, 1, None, None, None, None, 3]
[3, 1, None, None, None, 1, 3]
[3, 1, 3, None, None, 1, 3]
[3, 1, 3, 2, None, 1, 3]
[3, 1, 3, 2, 0, 1, 3]
Just saying, putting a short time delay would work just fine. I think someone mentioned this already. It is very simple and very reliable. You could do something like:
from random import sample
from time import sleep
import requests
intervalList = list(range(0.1, 0.5))
error404 = []
connectionError = []
for i in your_URL_list:
ststusCode = req.get(str(i)).status_code
if ststusCode == 404:
error404.append(i)
sleep(sample(intervalList,1))
Cheers

Get ranges where values are not None

The Goal
I would like to get the ranges where values are not None in a list, so for example:
test1 = [None, 0, None]
test2 = [2,1,None]
test3 = [None,None,3]
test4 = [1,0,None,0,0,None,None,1,None,0]
res1 = [[1,1]]
res2 = [[0,1]]
res3 = [[2,2]]
res4 = [[0,1],[3,4],[7,7],[9,9]]
What I have tried
This is my super lengthy implementation, which does not perfectly work...
def get_not_None_ranges(list_):
# Example [0, 2, None, 1, 4] -> [[0, 1], [3, 4]]
r = []
end_i = len(list_)-1
if list_[0] == None:
s = None
else:
s = 0
for i, elem in enumerate(list_):
if s != None:
if elem == None and end_i != i:
r.append([s,i-1])
s = i+1
if end_i == i:
if s > i:
r=r
elif s==i and elem == None:
r=r
else:
r.append([s,i])
else:
if elem != None:
s = i
if end_i == i:
if s > i:
r=r
else:
r.append([s,i])
return r
As you can see the results are sometimes wrong:
print(get_not_None_ranges(test1))
print(get_not_None_ranges(test2))
print(get_not_None_ranges(test3))
print(get_not_None_ranges(test4))
[[1, 2]]
[[0, 2]]
[[2, 2]]
[[0, 1], [3, 4], [6, 5], [7, 7], [9, 9]]
So, I was wondering if you guys know a much better way to achieve this?
Use itertools.groupby:
from itertools import groupby
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
def get_not_None_ranges(lst):
result = []
for key, group in groupby(enumerate(lst), key=lambda x: x[1] is not None):
if key:
index, _ = next(group)
result.append([index, index + sum(1 for _ in group)])
return result
print(get_not_None_ranges(test1))
print(get_not_None_ranges(test2))
print(get_not_None_ranges(test3))
print(get_not_None_ranges(test4))
Output
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]
A non-groupby solution that doesn't need extra treatment for the last group:
def get_not_None_ranges(lst):
result = []
it = enumerate(lst)
for i, x in it:
if x is not None:
first = last = i
for i, x in it:
if x is None:
break
last = i
result.append([first, last])
return result
Whenever I find the first of a non-None streak, I use an inner loop to right away run to the last of that streak. To allow both loops to use the same iterator, I store it in a variable.
You just need to iterate over the list, and check for two conditions:
If the previous element is None and the current element is not None, start a new "range".
If the previous element is not None and the current element is None, end the currently active range at the previous index.
def gnnr(lst):
all_ranges = []
current_range = []
prev_item = None
for index, item in enumerate(lst):
# Condition 1
if prev_item is None and item is not None:
current_range.append(index)
# Condition 2
elif prev_item is not None and item is None:
current_range.append(index - 1) # Close current range at the previous index
all_ranges.append(current_range) # Add to all_ranges
current_range = [] # Reset current_range
prev_item = item
# If current_range isn't closed, close it at the last index of the list
if current_range:
current_range.append(index)
all_ranges.append(current_range)
return all_ranges
Calling this function with your test cases gives the expected output:
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]
Well, we can solve this by using classic sliding window approach.
Here is the solution which works fine:
def getRanges(nums):
left = right = 0
ranges, n = [], len(nums)
while right < n:
while left < n and nums[left] == None:
left += 1
right += 1
while right < n and nums[right] != None:
right += 1
if right >= n:
break
ranges.append([left, right - 1])
left = right = right + 1
return ranges + [[left, right - 1]] if right - 1 >= left else ranges
Lets test it:
test = [
[1, 0, None, 0, 0, None, None, 1, None, 0],
[None, None, 3],
[2, 1, None],
[None, 0, None],
]
for i in test:
print(getRanges(i))
Output:
[[0, 1], [3, 4], [7, 7], [9, 9]]
[[2, 2]]
[[0, 1]]
[[1, 1]]
Give it a try. Code uses Type Hint and a named tuple in order to increase readablity.
from typing import NamedTuple,List,Any
class Range(NamedTuple):
left: int
right: int
def get_ranges(lst: List[Any]) -> List[Range]:
ranges : List[Range] = []
left = None
right = None
for i,x in enumerate(lst):
is_none = x is None
if is_none:
if left is not None :
right = right if right is not None else left
ranges.append(Range(left,right))
left = None
right = None
else:
if left is None:
left = i
else:
right = i
if left is not None:
right = right if right is not None else left
ranges.append(Range(left,right))
return ranges
data = [[1,0,None,0,0,None,None,1,None,0],[None,None,3],[2,1,None],[None, 0, None]]
for entry in data:
print(get_ranges(entry))
outut
[Range(left=0, right=1), Range(left=3, right=4), Range(left=7, right=7), Range(left=9, right=9)]
[Range(left=2, right=2)]
[Range(left=0, right=1)]
[Range(left=1, right=1)]
Using first and last of each group of not nones:
from itertools import groupby
def get_not_None_ranges(lst):
result = []
for nones, group in groupby(enumerate(lst), lambda x: x[1] is None):
if not nones:
first = last = next(group)
for last in group:
pass
result.append([first[0], last[0]])
return result
Here's my example. It is definitely NOT the most efficient way, but I think it is more intuitive and you can optimize it later.
def get_not_None_ranges(list_: list):
res = []
start_index = -1
for i in range(len(list_)):
e = list_[i]
if e is not None:
if start_index < 0:
start_index = i
else:
if start_index >= 0:
res.append([start_index, i - 1])
start_index = -1
if start_index >= 0:
res.append([start_index, len(list_) - 1])
return res
The main thought of this function:
start_index is initialized with -1
When we meet not None element, set start_index to its index
When we meet None, save [start_index, i - 1 (since the previous element is the end of the session)]. Then set start_index back to -1.
When we meet None but start_index is -1, we need to do nothing since we have not met the not None element this turn. For the same reason, do nothing when we meet not None when start_index > 0.
When the loop end but start_index still larger than 0, it means we haven't record this valid turn. So we need to do that manually.
I think it may be a little bit complex, it may help to paste the code above and debug it line by line in a debugger.
How about:-
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
def goal(L):
r = []
_r = None
for i, e in enumerate(L):
if e is not None:
if _r:
_r[1] = i
else:
_r = [i, i]
else:
if _r:
r.append(_r)
_r = None
if _r:
r.append(_r)
return r
for _l in [test1, test2, test3, test4]:
print(goal(_l))
Another solution (one-liner with itertools.groupby):
from itertools import groupby
out = [[(v := list(g))[0][1], v[-1][1]] for _, g in groupby(enumerate(i for i, v in enumerate(testX) if not v is None), lambda k: k[0] - k[1],)]
Tests:
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
tests = [test1, test2, test3, test4]
for t in tests:
out = [
[(v := list(g))[0][1], v[-1][1]]
for _, g in groupby(
enumerate(i for i, v in enumerate(t) if not v is None),
lambda k: k[0] - k[1],
)
]
print(out)
Prints:
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]

How to to remove all zeros from a list

I want to remove all zeros from the list after sorting in descending order.
for x in range (1,count):
exec("col"+str(x) + "=[]")
with open (xvg_input, 'r') as num:
line_to_end = num.readlines()
for line in line_to_end:
if "#" not in line and "#" not in line:
line=list(map(float,line.split()))
for x in range (2,count):
exec("col" +str (x)+ ".append(line["+ str(x-1) + "])")
exec("col" +str(x) + ".sort(reverse = True)")
exec("while (col"+str(x) + ".count(0.000)):")
exec("col" +str(x) +".remove(0.000)")
I am getting the syntax error. I am not getting where I am doing wrong. I just want to sort in descending order and delete all the zeroes.
Does this make sense
def remove_values(the_list, val):
return [value for value in the_list if value != val]
x = [1, 0, 3, 4, 0, 0, 3]
x = remove_values(x, 0)
print x
# [1, 3, 4, 3]
Try using filter method:
list = [9,8,7,6,5,4,3,2,1,0,0,0,0,0,0]
filter(lambda x: x != 0,a) #iterates items, returning the ones that meet the condition in the lambda function
# [9, 8, 7, 6, 5, 4, 3, 2, 1]

Check if values in list exceed threshold a certain amount of times and return index of first exceedance

I am searching for a clean and pythonic way of checking if the contents of a list are greater than a given number (first threshold) for a certain number of times (second threshold). If both statements are true, I want to return the index of the first value which exceeds the given threshold.
Example:
# Set first and second threshold
thr1 = 4
thr2 = 5
# Example 1: Both thresholds exceeded, looking for index (3)
list1 = [1, 1, 1, 5, 1, 6, 7, 3, 6, 8]
# Example 2: Only threshold 1 is exceeded, no index return needed
list2 = [1, 1, 6, 1, 1, 1, 2, 1, 1, 1]
I don't know if it's considered pythonic to abuse the fact that booleans are ints but I like doing like this
def check(l, thr1, thr2):
c = [n > thr1 for n in l]
if sum(c) >= thr2:
return c.index(1)
Try this:
def check_list(testlist)
overages = [x for x in testlist if x > thr1]
if len(overages) >= thr2:
return testlist.index(overages[0])
# This return is not needed. Removing it will not change
# the outcome of the function.
return None
This uses the fact that you can use if statements in list comprehensions to ignore non-important values.
As mentioned by Chris_Rands in the comments, the return None is unnecessary. Removing it will not change the result of the function.
If you are looking for a one-liner (or almost)
a = filter(lambda z: z is not None, map(lambda (i, elem) : i if elem>=thr1 else None, enumerate(list1)))
print a[0] if len(a) >= thr2 else false
A naive and straightforward approach would be to iterate over the list counting the number of items greater than the first threshold and returning the index of the first match if the count exceeds the second threshold:
def answer(l, thr1, thr2):
count = 0
first_index = None
for index, item in enumerate(l):
if item > thr1:
count += 1
if not first_index:
first_index = index
if count >= thr2: # TODO: check if ">" is required instead
return first_index
thr1 = 4
thr2 = 5
list1 = [1, 1, 1, 5, 1, 6, 7, 3, 6, 8]
list2 = [1, 1, 6, 1, 1, 1, 2, 1, 1, 1]
print(answer(list1, thr1, thr2)) # prints 3
print(answer(list2, thr1, thr2)) # prints None
This is probably not quite pythonic though, but this solution has couple of advantages - we keep the index of the first match only and have an early exit out of the loop if we hit the second threshold.
In other words, we have O(k) in the best case and O(n) in the worst case, where k is the number of items before reaching the second threshold; n is the total number of items in the input list.
I don't know if I'd call it clean or pythonic, but this should work
def get_index(list1, thr1, thr2):
cnt = 0
first_element = 0
for i in list1:
if i > thr1:
cnt += 1
if first_element == 0:
first_element = i
if cnt > thr2:
return list1.index(first_element)
else:
return "criteria not met"
thr1 = 4
thr2 = 5
list1 = [1, 1, 1, 5, 1, 6, 7, 3, 6, 8]
list2 = [1, 1, 6, 1, 1, 1, 2, 1, 1, 1]
def func(lst)
res = [ i for i,j in enumerate(lst) if j > thr1]
return len(res)>=thr2 and res[0]
Output:
func(list1)
3
func(list2)
false

Iterating over parts of the Stern-Brocot tree in Python

My goal is to iterate over the pairs [a,b] a coprime to b and a+b<=n. For example, if n=8, I want to iterate over [1, 2], [2, 3], [3, 4], [3, 5], [1, 3], [2, 5], [1, 4], [1, 5], [1, 6], [1, 7].
My first thought was a recursive function using the Stern-Brocot tree:
def Stern_Brocot(n,a=0,b=1,c=1,d=1):
if(a+b+c+d>n):
return 0
x=Stern_Brocot(n,a+c,b+d,c,d)
y=Stern_Brocot(n,a,b,a+c,b+d)
if(x==0):
if(y==0):
return [a+c,b+d]
else:
return [a+c]+[b+d]+y
else:
if(y==0):
return [a+c]+[b+d]+x
else:
return [a+c]+[b+d]+x+y
As expected,
>>> Stern_Brocot(8)
[1, 2, 2, 3, 3, 4, 3, 5, 1, 3, 2, 5, 1, 4, 1, 5, 1, 6, 1, 7]
And for n<=995, it works well. But suddenly at n>=996, it gives this error:
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
a=Stern_Brocot(996)
File "C:\Users\Pim\Documents\C Programmeren en Numerieke Wisk\Python\PE\PE127.py", line 35, in Stern_Brocot
y=Stern_Brocot(n,a,b,a+c,b+d)
...
File "C:\Users\Pim\Documents\C Programmeren en Numerieke Wisk\Python\PE\PE127.py", line 35, in Stern_Brocot
y=Stern_Brocot(n,a,b,a+c,b+d)
RuntimeError: maximum recursion depth exceeded in comparison
And since I want n to equal 120000, this approach won't work.
So my question is: what would be a good approach to iterate over parts of the Stern_Brocot tree? (if there's another way to iterate over coprime integers, that'd be good as well).
Here's an non-recursive implementation
def Stern_Brocot(n):
states = [(0, 1, 1, 1)]
result = []
while len(states) != 0:
a, b, c, d = states.pop()
if a + b + c + d <= n:
result.append((a+c, b+d))
states.append((a, b, a+c, b+d))
states.append((a+c, b+d, c, d))
return result
Before defining Stern_Brocot, add sys.setrecursionlimit(120000). This will set the program's recursion limit to 120000.
So, instead, you can do this:
import sys
sys.setrecursionlimit(120000)
def Stern_Brocot(n,a=0,b=1,c=1,d=1):
if(a+b+c+d>n):
return 0
x=Stern_Brocot(n,a+c,b+d,c,d)
y=Stern_Brocot(n,a,b,a+c,b+d)
if(x==0):
if(y==0):
return [a+c,b+d]
else:
return [a+c]+[b+d]+y
else:
if(y==0):
return [a+c]+[b+d]+x
else:
return [a+c]+[b+d]+x+y

Categories