pythonic format for indices

pythonic format for indices - python

I am after a string format to efficiently represent a set of indices.
For example "1-3,6,8-10,16" would produce [1,2,3,6,8,9,10,16]
Ideally I would also be able to represent infinite sequences.
Is there an existing standard way of doing this? Or a good library? Or can you propose your own format?
thanks!
Edit: Wow! - thanks for all the well considered responses. I agree I should use ':' instead. Any ideas about infinite lists? I was thinking of using "1.." to represent all positive numbers.
The use case is for a shopping cart. For some products I need to restrict product sales to multiples of X, for others any positive number. So I am after a string format to represent this in the database.

You don't need a string for that, This is as simple as it can get:
from types import SliceType
class sequence(object):
def __getitem__(self, item):
for a in item:
if isinstance(a, SliceType):
i = a.start
step = a.step if a.step else 1
while True:
if a.stop and i > a.stop:
break
yield i
i += step
else:
yield a
print list(sequence()[1:3,6,8:10,16])
Output:
[1, 2, 3, 6, 8, 9, 10, 16]
I'm using Python slice type power to express the sequence ranges. I'm also using generators to be memory efficient.
Please note that I'm adding 1 to the slice stop, otherwise the ranges will be different because the stop in slices is not included.
It supports steps:
>>> list(sequence()[1:3,6,8:20:2])
[1, 2, 3, 6, 8, 10, 12, 14, 16, 18, 20]
And infinite sequences:
sequence()[1:3,6,8:]
1, 2, 3, 6, 8, 9, 10, ...
If you have to give it a string then you can combine #ilya n. parser with this solution. I'll extend #ilya n. parser to support indexes as well as ranges:
def parser(input):
ranges = [a.split('-') for a in input.split(',')]
return [slice(*map(int, a)) if len(a) > 1 else int(a[0]) for a in ranges]
Now you can use it like this:
>>> print list(sequence()[parser('1-3,6,8-10,16')])
[1, 2, 3, 6, 8, 9, 10, 16]

If you're into something Pythonic, I think 1:3,6,8:10,16 would be a better choice, as x:y is a standard notation for index range and the syntax allows you to use this notation on objects. Note that the call
z[1:3,6,8:10,16]
gets translated into
z.__getitem__((slice(1, 3, None), 6, slice(8, 10, None), 16))
Even though this is a TypeError if z is a built-in container, you're free to create the class that will return something reasonable, e.g. as NumPy's arrays.
You might also say that by convention 5: and :5 represent infinite index ranges (this is a bit stretched as Python has no built-in types with negative or infinitely large positive indexes).
And here's the parser (a beautiful one-liner that suffers from slice(16, None, None) glitch described below):
def parse(s):
return [slice(*map(int, x.split(':'))) for x in s.split(',')]
There's one pitfall, however: 8:10 by definition includes only indices 8 and 9 -- without upper bound. If that's unacceptable for your purposes, you certainly need a different format and 1-3,6,8-10,16 looks good to me. The parser then would be
def myslice(start, stop=None, step=None):
return slice(start, (stop if stop is not None else start) + 1, step)
def parse(s):
return [myslice(*map(int, x.split('-'))) for x in s.split(',')]
Update: here's the full parser for a combined format:
from sys import maxsize as INF
def indices(s: 'string with indices list') -> 'indices generator':
for x in s.split(','):
splitter = ':' if (':' in x) or (x[0] == '-') else '-'
ix = x.split(splitter)
start = int(ix[0]) if ix[0] is not '' else -INF
if len(ix) == 1:
stop = start + 1
else:
stop = int(ix[1]) if ix[1] is not '' else INF
step = int(ix[2]) if len(ix) > 2 else 1
for y in range(start, stop + (splitter == '-'), step):
yield y
This handles negative numbers as well, so
print(list(indices('-5, 1:3, 6, 8:15:2, 20-25, 18')))
prints
[-5, 1, 2, 6, 7, 8, 10, 12, 14, 20, 21, 22, 23, 24, 25, 18, 19]
Yet another alternative is to use ... (which Python recognizes as the built-in constant Ellipsis so you can call z[...] if you want) but I think 1,...,3,6, 8,...,10,16 is less readable.

This is probably about as lazily as it can be done, meaning it will be okay for even very large lists:
def makerange(s):
for nums in s.split(","): # whole list comma-delimited
range_ = nums.split("-") # number might have a dash - if not, no big deal
start = int(range_[0])
for i in xrange(start, start + 1 if len(range_) == 1 else int(range_[1]) + 1):
yield i
s = "1-3,6,8-10,16"
print list(makerange(s))
output:
[1, 2, 3, 6, 8, 9, 10, 16]

import sys
class Sequencer(object):
def __getitem__(self, items):
if not isinstance(items, (tuple, list)):
items = [items]
for item in items:
if isinstance(item, slice):
for i in xrange(*item.indices(sys.maxint)):
yield i
else:
yield item
>>> s = Sequencer()
>>> print list(s[1:3,6,8:10,16])
[1, 2, 6, 8, 9, 16]
Note that I am using the xrange builtin to generate the sequence. That seems awkward at first because it doesn't include the upper number of sequences by default, however it proves to be very convenient. You can do things like:
>>> print list(s[1:10:3,5,5,16,13:5:-1])
[1, 4, 7, 5, 5, 16, 13, 12, 11, 10, 9, 8, 7, 6]
Which means you can use the step part of xrange.

This looked like a fun puzzle to go with my coffee this morning. If you settle on your given syntax (which looks okay to me, with some notes at the end), here is a pyparsing converter that will take your input string and return a list of integers:
from pyparsing import *
integer = Word(nums).setParseAction(lambda t : int(t[0]))
intrange = integer("start") + '-' + integer("end")
def validateRange(tokens):
if tokens.from_ > tokens.to:
raise Exception("invalid range, start must be <= end")
intrange.setParseAction(validateRange)
intrange.addParseAction(lambda t: list(range(t.start, t.end+1)))
indices = delimitedList(intrange | integer)
def mergeRanges(tokens):
ret = set()
for item in tokens:
if isinstance(item,int):
ret.add(item)
else:
ret += set(item)
return sorted(ret)
indices.setParseAction(mergeRanges)
test = "1-3,6,8-10,16"
print indices.parseString(test)
This also takes care of any overlapping or duplicate entries, such "3-8,4,6,3,4", and returns a list of just the unique integers.
The parser takes care of validating that ranges like "10-3" are not allowed. If you really wanted to allow this, and have something like "1,5-3,7" return 1,5,4,3,7, then you could tweak the intrange and mergeRanges parse actions to get this simpler result (and discard the validateRange parse action altogether).
You are very likely to get whitespace in your expressions, I assume that this is not significant. "1, 2, 3-6" would be handled the same as "1,2,3-6". Pyparsing does this by default, so you don't see any special whitespace handling in the code above (but it's there...)
This parser does not handle negative indices, but if that were needed too, just change the definition of integer to:
integer = Combine(Optional('-') + Word(nums)).setParseAction(lambda t : int(t[0]))
Your example didn't list any negatives, so I left it out for now.
Python uses ':' for a ranging delimiter, so your original string could have looked like "1:3,6,8:10,16", and Pascal used '..' for array ranges, giving "1..3,6,8..10,16" - meh, dashes are just as good as far as I'm concerned.

Related

How to append to second value of dictionary value after underscore in +1 manner

Imagine I have a dictionary as such:
barcodedict={"12_20":[10,15,20], "12_21":[5, "5_1","5_2",6]}
Then I have a number that corresponds to a date, lets say 12_21 and we append it to the values of this date if it is not there as such:
if 8 not in barcodedict["12_21"]:
barcodedict["12_21"].append(8)
{'12_20': [10, 15, 20], '12_21': [5, "5_1", "5_2", 6, 8]}
However, if this number is already present in the value list, I want to add it to the value list with an extra integer that states that its a new occurrence as such:
if 5 not in barcodedict["12_21"]:
barcodedict["12_21"].append(5)
else: #which is now the case
barcodedict["12_21"].append(5_(2+1))
Desired output:
{"12_20":[10,15,20], "12_21":[5, "5_1","5_2","5_3",6, 8]}
As can be seen from the second example, I am not allowed to put underscore in list numbers and they are removed (5_1 becomes 51). And how can I achieve adding a new listing with +1 to the last number? I tried iterating over them and then splitting them but this seems unpythonic and didn't work because the underscore is ignored.
Edit 7/19/2022 10:46AM,
I found a bit of a hackish way around but it seems to hold for now:
placeholder=[]
for i in barcodedict["12_21"]:
if "5" in str(i):
try:
placeholder.append(str(i).split("_")[1])
except:
print("this is for the first 5 occurence, that has no _notation")
print(placeholder)
if len(placeholder) == 0 :
placeholder=[0]
occurence=max(list(map(int, placeholder)))+1
barcodedict["12_21"].append("5_"+occurence)
prints {'12_20': [10, 15, 20], '12_21': [5, '5_1', '5_2', 6, '5_3']}

With the requested number/string mixture it can be done with:
if 5 not in barcodedict["12_21"]:
barcodedict["12_21"].append(5)
else: #which is now the case
i = 1
while True:
if f"5_{i}" not in barcodedict["12_21"]:
barcodedict["12_21"].append(f"5_{i}")
break
i += 1

Underscores used like that do not show up in print, because they are meant to be used for convenience in representing big numbers, but when interpreted they don't show like that. You should use string manipulation if the way they're are displayed matters, or the other way around if you want to actually use them as numbers and want simply to represent them in a convenient way.

Another solution:
def fancy_append(dct, key, val):
last_num = max(
(
int(s[1])
for v in dct[key]
if isinstance(v, str) and (s := v.split("_"))[0] == str(val)
),
default=0,
)
dct[key].append(f"{val}_{last_num+1}" if last_num > 0 else val)
barcodedict = {"12_20": [10, 15, 20], "12_21": [5, "5_1", "5_2", 6]}
fancy_append(barcodedict, "12_21", 5)
print(barcodedict)
Prints:
{'12_20': [10, 15, 20], '12_21': [5, '5_1', '5_2', 6, '5_3']}

How to improve time complexity of remove all multiplicands from array or list?

I am trying to find elements from array(integer array) or list which are unique and those elements must not divisible by any other element from same array or list.
You can answer in any language like python, java, c, c++ etc.
I have tried this code in Python3 and it works perfectly but I am looking for better and optimum solution in terms of time complexity.
assuming array or list A is already sorted and having unique elements
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
while i<len(A)-1:
while j<len(A):
if A[j]%A[i]==0:
A.pop(j)
else:
j+=1
i+=1
j=i+1
For the given array A=[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] answer would be like ans=[2,3,5,7,11,13]
another example,A=[4,5,15,16,17,23,39] then ans would be like, ans=[4,5,17,23,39]
ans is having unique numbers
any element i from array only exists if (i%j)!=0, where i!=j

I think it's more natural to do it in reverse, by building a new list containing the answer instead of removing elements from the original list. If I'm thinking correctly, both approaches do the same number of mod operations, but you avoid the issue of removing an element from a list.
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
ans = []
for x in A:
for y in ans:
if x % y == 0:
break
else: ans.append(x)
Edit: Promoting the completion else.

This algorithm will perform much faster:
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
if (A[-1]-A[0])/A[0] > len(A)*2:
result = list()
for v in A:
for f in result:
d,m = divmod(v,f)
if m == 0: v=0;break
if d<f: break
if v: result.append(v)
else:
retain = set(A)
minMult = 1
maxVal = A[-1]
for v in A:
if v not in retain : continue
minMult = v*2
if minMult > maxVal: break
if v*len(A)<maxVal:
retain.difference_update([m for m in retain if m >= minMult and m%v==0])
else:
retain.difference_update(range(minMult,maxVal,v))
if maxVal%v == 0:
maxVal = max(retain)
result = list(retain)
print(result) # [2, 3, 5, 7, 11, 13]
In the spirit of the sieve of Eratostenes, each number that is retained, removes its multiples from the remaining eligible numbers. Depending on the magnitude of the highest value, it is sometimes more efficient to exclude multiples than check for divisibility. The divisibility check takes several times longer for an equivalent number of factors to check.
At some point, when the data is widely spread out, assembling the result instead of removing multiples becomes faster (this last addition was inspired by Imperishable Night's post).
TEST RESULTS
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] (100000 repetitions)
Original: 0.55 sec
New: 0.29 sec
A = list(range(2,5000))+[9697] (100 repetitions)
Original: 3.77 sec
New: 0.12 sec
A = list(range(1001,2000))+list(range(4000,6000))+[9697**2] (10 repetitions)
Original: 3.54 sec
New: 0.02 sec

I know that this is totally insane but i want to know what you think about this:
A = [4,5,15,16,17,23,39]
prova=[[x for x in A if x!=y and y%x==0] for y in A]
print([A[idx] for idx,x in enumerate(prova) if len(prova[idx])==0])
And i think it's still O(n^2)

If you care about speed more than algorithmic efficiency, numpy would be the package to use here in python:
import numpy as np
# Note: doesn't have to be sorted
a = [2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 29, 29]
a = np.unique(a)
result = a[np.all((a % a[:, None] + np.diag(a)), axis=0)]
# array([2, 3, 5, 7, 11, 13, 29])
This divides all elements by all other elements and stores the remainder in a matrix, checks which columns contain only non-0 values (other than the diagonal), and selects all elements corresponding to those columns.

This is O(n*M) where M is the max size of an integer in your list. The integers are all assumed to be none negative. This also assumes your input list is sorted (came to that assumption since all lists you provided are sorted).
a = [4, 7, 7, 8]
# a = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
# a = [4, 5, 15, 16, 17, 23, 39]
M = max(a)
used = set()
final_list = []
for e in a:
if e in used:
continue
else:
used.add(e)
for i in range(e, M + 1):
if not (i % e):
used.add(i)
final_list.append(e)
print(final_list)
Maybe this can be optimized even further...
If the list is not sorted then for the above method to work, one must sort it. The time complexity will then be O(nlogn + Mn) which equals to O(nlogn) when n >> M.

Can you for loop completely through a range, but starting from the nth element?

I would like to know if there exists a base solution to do something like this:
for n in range(length=8, start_position= 3, direction= forward)
The problem I'm encountering is I would like the loop to continue past the final index, and pick up again at idx =0, then idx=1, etc. and stop at idx= 3, the start_position.
To give context, I seek all possible complete solutions to the n-queen problem.

Based on your latest edit, you need a "normal" range and the modulo operator:
for i in range(START, START + LEN):
do_something_with(i % LEN)

from itertools import chain
for n in chain(range(3,8), range(3)):
...
The chain() returns an iterator with 3, 4, ..., 7, 0, 1, 2

Another option for solving this is to use modular arithmetic. You could do something like this, for example:
for i in range(8)
idx = (i + 3) % 8
# use idx
This easily can be generalized to work with different lengths and offsets.

def loop_around_range(length, start_position, direction='forward'):
looped_range = [k % length for k in range(start_position, start_position+length)]
if direction == 'forward':
return looped_range
else:
return looped_range[::-1]

You could implement this for an arbitrary iterable by using itertools.cycle.
from itertools import cycle
def circular_iterator(iterable, skip=0, length=None, reverse=False):
"""Produces a full cycle of #iterable#, skipping the first #skip# elements
then tacking them on to the end.
if #iterable# does not implement #__len__#, you must provide #length#
"""
if reverse:
iterable = reversed(iterable)
cyc_iter = cycle(iterable)
for _ in range(skip):
next(cyc_iter, None)
if length:
total_length = length
else:
total_length = len(iterable)
for _ in range(total_length):
yield next(cyc_iter, None)
>>> lst = [x for x in range(1, 9)]
# [1, 2, 3, 4, 5, 6, 7, 8]
>>> list(circular_iterator(lst, skip=3))
[4, 5, 6, 7, 8, 1, 2, 3]

Compare nums need optimisation (codingame.com)

www.codingame.com
Task
Write a program which, using a given number of strengths,
identifies the two closest strengths and shows their difference with an integer
Info
n = Number of horses
pi = strength of each horse
d = difference
1 < n < 100000
0 < pi ≤ 10000000
My code currently
def get_dif(a, b):
return abs(a - b)
horse_str = [10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7]
n = len(horse_str)
d = 10000001
for x in range(len(horse_str)):
for y in range(x, len(horse_str) - 1):
d = min([get_dif(horse_str[x], horse_str[y + 1]), d])
print(d)
Test cases
[3,5,8, 9] outputs: 1
[10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7] outputs: 1
Problem
They both work but then the next test gives me a very long list of horse strengths and i get **Process has timed out. This may mean that your solution is not optimized enough to handle some cases.
How can i optimise it? Thank you!
EDIT ONE
Default code given
import sys
import math
# Auto-generated code below aims at helping you parse
# the standard input according to the problem statement.
n = int(input())
for i in range(n):
pi = int(input())
# Write an action using print
# To debug: print("Debug messages...", file=sys.stderr)
print("answer")

Since you can use sort method (which is optimized to avoid performing a costly bubble sort or double loop by hand which has O(n**2) complexity, and times out with a very big list), let me propose something:
sort the list
compute the minimum of absolute value of difference of the adjacent values, passing a generator comprehension to the min function
The minimum has to be the abs difference of adjacent values. Since the list is sorted using a fast algorithm, the heavy lifting is done for you.
like this:
horse_str = [10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7]
sh = sorted(horse_str)
print(min(abs(sh[i]-sh[i+1]) for i in range(len(sh)-1)))
I also get 1 as a result (I hope I didn't miss anything)

Python split list if sequence of numbers is found

I've been trying to find a relevant question, though I can't seem to search for the right words and all I'm finding is how to check if a list contains an intersection.
Basically, I need to split a list once a certain sequence of numbers is found, similar to doing str.split(sequence)[0], but with lists instead. I have working code, though it doesn't seem very efficient (also no idea if raising an error was the right way to go about it), and I'm sure there must be a better way to do it.
For the record, long_list could potentially have a length of a few million values, which is why I think iterating through them all might not be the best idea.
long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
end_marker = [6,43,23,95]
end_marker_len = len(end_marker)
class SuccessfulTruncate(Exception):
pass
try:
counter = 0
for i in range(len(long_list)):
if long_list[i] == end_marker[counter]:
counter += 1
else:
counter = 0
if counter == end_marker_len:
raise SuccessfulTruncate()
except SuccessfulTruncate:
long_list = long_list[:2 + i - end_marker_len]
else:
raise IndexError('sequence not found')
>>> long_list
[2,6,4,2,7,98,32,5,15,4,2]
Ok, timing a few answers with a big list of 1 million values (the marker is very near the end):
Tim: 3.55 seconds
Mine: 2.7 seconds
Dan: 0.55 seconds
Andrey: 0.28 seconds
Kasramvd: still executing :P

I have working code, though it doesn't seem very efficient (also no idea if raising an error was the right way to go about it), and I'm sure there must be a better way to do it.
I commented on the exception raising in my comment
Instead of raising an exception and catching it in the same try/except you can just omit the try/except and do if counter == end_marker_len: long_list = long_list[:2 + i - end_marker_len]. Successful is not a word thats fitting for an exception name. Exceptions are used to indicate that something failed
Anyway, here is a shorter way:
>>> long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
>>> end_marker = [6,43,23,95]
>>> index = [i for i in range(len(long_list)) if long_list[i:i+len(end_marker)] == end_marker][0]
>>> long_list[:index]
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
List comprehension inspired by this post

As a more pythonic way instead of multiple slicing you can use itertools.islice within a list comprehension :
>>> from itertools import islice
>>> M,N=len(long_list),len(end_maker)
>>> long_list[:next((i for i in range(0,M) if list(islice(long_list,i,i+N))==end_marker),0)]
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
Note that since the default value of next function is 0 if it doesn't find any match it will returns the whole of long_list.

In my solution used approach with index method:
input = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
brk = [6,43,23,95]
brk_len = len(brk)
brk_idx = 0
brk_offset = brk_idx + brk_len
try:
while input[brk_idx:brk_offset] != brk:
brk_idx = input.index(brk[0], brk_idx + 1)
brk_offset = brk_idx + brk_len
except ValueError:
print("Not found")
else:
print(input[:brk_idx])

If the values are of limited range, say fit in bytes (this can also be adapted to larger types), why not then encode the lists so that the string method find could be used:
long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
end_marker = [6,43,23,95]
import struct
long_list_p = struct.pack('B'*len(long_list), *long_list)
end_marker_p = struct.pack('B'*len(end_marker), *end_marker)
print long_list[:long_list_p.find(end_marker_p)]
Prints:
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
I tried using bytes as in but the find method they had didn't work:
print long_list[:bytes(long_list).find(bytes(end_marker))]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pythonic format for indices - python

Related

How to append to second value of dictionary value after underscore in +1 manner

How to improve time complexity of remove all multiplicands from array or list?

Can you for loop completely through a range, but starting from the nth element?

Compare nums need optimisation (codingame.com)

Python split list if sequence of numbers is found

Categories

Resources