Does there exist an implementation of ordinal numbers in python?
An example use-case is the following:
We want to maintain a sorted dictionary indexed by numbers.
Over time we may insert an arbitrary number of "standard" elements (idx, e)
The dict should also contain certain "special" elements (idx, w) which should always appear at the end, after all other elements.
A very clean solution (*) would be to simply index the elements with ordinal numbers, so the "standard" elements would use the indices 0,1,2,3,... and the "special" elements could be assigned to ω, ω+1, ω+2, ω+3, ... Obviously this approach also is incredibly powerful in its generalization power.
Of course there are alternative solutions, for example one could maintain multiple lists. However this has the disadvantage that if we want to iterate over all elements, we need to work through the lists one-by-one. With ordinal number indexing, one would just have to iterate over one list.
(*) At least from my perspective as a mathematician
Related
I have following problem:
There are n=20 characters in the sequence. For each position there is a predefined list of possible characters which can be 1 to m (where m usually is a single digit).
How can I enumerate all possible permutations efficiently?
Or in essence is there some preexisting library (numpy?) that could do that before I try it myself?
itertools.product seems to offer what I need. I just need to pass it a list of list:
itertools.product(*positions)
where positions is a list of lists (eg which chars at which positions).
In my case the available options for each position are small and often also just 1 so that keeps the number of possibilities in check but might crash your application if too many get generated.
I then build the final string:
for s in itertools.product(*positions):
result = ''.join(s)
results.append(result)
I will iterate over a list of integers, nums, multiple times, and each time, when an integer has been 'used' for something (doesn't matter what), I want to mark the index as used. So that in future iterations, I do not use this integer again.
Two questions:
My idea is to simply create a separate list marker = [1]*len(nums) ; and each time I use a number in nums, I will subtract 1 from the corresponding index in marker as a way to keep track of the numbers in nums I have used.
My first question is, is there a well known efficient way to do this? As I believe this would make the SPACE COMPLEXITY O(n)
My other idea is to replace each entry in nums, like this. nums = [1,2,3,4] -> nums = [(1,1),(2,1),(3,1),(4,1)]. And each time I use an integer in nums, I would subtract 1 from the second index in each pair as a way of marking that it has been used. My question is, am I right in understanding that this would optimise the SPACE COMPLEXITY relative to solution 1. above? And the SPACE COMPLEXITY here would be O(1)?
For reference, I am solving the following question: https://leetcode.com/contest/weekly-contest-256/problems/minimum-number-of-work-sessions-to-finish-the-tasks/
Where each entry in tasks needs to be used once.
I don't think there is a way to do it in O(1) space. Although, I believe that using a boolean value instead of an integer value or using the concept of sets would be a better solution.
No, the space complexity is still O(n). Think about it like this. Let us assume n is the size of the list. In the first method that you mentioned, we are storing n 'stuff' separately. So, the space complexity is O(n). In the second method also, we are storing n 'stuff' separately. It's just that those n 'stuff' are being stored as part of the same array. So, the space complexity still remains the same which is O(n).
Firstly, In both cases, the space Complexity comes out to be O(n). This is because nums itself utilizes O(n) space whether or not you use a separate list to store usage of elements. So space complexity in any way cannot come down to O(1).
However, here is a suggestion.
If you don't want to use the used element again then why not just remove it from the list.
Or, in case you don't want to disrupt the indexing, just change the number to -1.
I have two lists, one of words, and another of character combinations. What would be the fastest way to only return the combinations that don't match anything in the list?
I've tried to make it as streamlined as possible, but it's still very slow when it uses 3 characters for the combinations (goes up to 290 seconds for 4 characters, not even going to try 5)
Here's some example code, currently I'm converting all the words to a list, and then searching the string for each list value.
#Sample of stuff
allCombinations = ["a","aa","ab","ac","ad"]
allWords = ["testing", "accurate" ]
#Do the calculations
allWordsJoined = ",".join( allWords )
invalidCombinations = set( i for i in allCombinations if i not in allWordsJoined )
print invalidCombinations
#Result: set(['aa', 'ab', 'ad'])
I'm just curious if there's a better way to do this with sets? With a combination of 3 letters, there are 18278 list items to search for, and for 4 letters, that goes up to 475254, so currently my method isn't really fast enough, especially when the word list string is about 1 million characters.
Set.intersection seems like a very useful method if you need the whole string, so surely there must be something similar to search for a substring.
The first thing that comes to mind is that you can optimize lookup by checking current combination against combinations that are already "invalid". I.e. if ab is invalid, than ab.? will be invalid too and there's no point to check such.
And one more thing: try using
for i in allCombinations:
if i not in allWordsJoined:
invalidCombinations.add(i)
instead of
invalidCombinations = set(i for i in allCombinations if i not in allWordsJoined)
I'm not sure, but less memory allocations can be a small boost for real data run.
Seeing if a set contains an item is O(1). You would still have to iterate through your list of combinations (with some exceptions. If your word doesn't have "a" it's not going to have any other combinations that contain "a". You can use some tree-like data structure for this) to compare with your original set of words.
You shouldn't convert your wordlist to a string, but rather a set. You should get O(N) where N is the length of your combinations.
Also, I like Python, but it isn't the fastest of languages. If this is the only task you need to do, and it needs to be very fast, and you can't improve the algorithm, you might want to check out other languages. You should be able to very easily prototype something to get an idea of the difference in speed for different languages.
I have N strings that I want to divide lexicographic into M even-sized buckets (+/- 1 string). Also, N>>M.
The direct way would be to sort all the strings and split the resulting list into the M buckets.
I would like to instead approximate this by routing each string as it is created to a bucket, before the full list is available.
Is there a fast and pythonic way to assign strings to buckets? I'm essentially looking for a string-equivalent of the integer modulo operator. Perhaps a hash that preserves lexicographic order? Is that even possible?
You can sort by first two chars of a string, or something of this sort.
Let's say that M=100, so you should divide the characters into sqrt(M) regions, and each should point to another sqrt(M) regions, then for each string you get, you can compare the first char to decide which region to direct the string to and again for the second char, something like a tree with buckets as leaves and comparisons as nodes.
A hash by definition doesn't preserve any order.
And I don't think there is any pythonic way to do this.
You could just create dictionaries (which are basically hashing functions) and keep adding a string to each round-robin style, but it wouldn't preserve any order.
I have always seen in python articles/books that python is simple and it has only one way of doing things. I would like someone to explain to me this concept keeping in mind the example below, if I wanted to get the min and max values of sequence I would do the following;
seq=[1,2,3,4,5,6]
min(seq) #1
max(seq) #6
but I can also do this;
seq[:1] #1
seq[-1] #6
surely this are two ways of doing one simple thing. This confuses me a bit.
Not that it "has one way of doing things" as more "There should be one-- and preferably only one --obvious way to do it." (from Zen of Python).
This doesn't exclude possibility of having more than one way of doing things. We're talking about programming where creativity is one of the most important skills and finding new ways of solving problems is a must.
In your example you're doing two different things:
getting minimum and maximum of the list
getting first and the last element of the list
it happens that in this particular case result is exactly the same.
Those are two different things. max() gives you the largest element of the list (using regular number comparison by default), while [-1] gives you the last element – in your example, this happens to be the same thing. But consider this:
>>> seq = [2, 7, 5, 4]
>>> max(seq)
7
>>> seq[-1]
4
BTW, seq[:1] gives you something different again – namely [1] (or [2] in my example), a one-element list. What you probably meant was seq[0], which is the first element of the list, compared to min(seq), the smallest one.
One obvious way.
>>> import this
In your example, you actually do two different things -- they just happen to give the same result, because your input list is sorted. However, there's always multiple ways of doing things. Python's approach isn't really to avoid or forbid multiple ways of doing the same thing, but have one - and preferably only one - obvious way of doing things.
max(), min() and index slicing they all do different things. In your list, the order may not be sorted like your example, so using slicing will not get the max/min for you in those cases. if you want to get max/min values, just use max()/min() functions
There is always more than one way to solve a problem, but the python developers try not to add language features that offer redundant functionality, which is very unlike perl.