I'm planning to read millions of small files from disk. To minimize i/o, I planned to use a dictionary that maps a file path to its content. I only want the dictionary to retain the last n keys inserted into it, though (so the dictionary will act as a cache).
Is there a data structure in Python that already implements this behavior? I wanted to check before reinventing the wheel.
Use collections.deque for this with a maxlen of 6, so that it stores only the last 6 elements and store the information as key value pairs
from collections import deque
d = deque(maxlen=6)
d.extend([(1,1),(2,2),(3,3),(4,4), (5,5), (6,6)])
d
# deque([(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)], maxlen=6)
d.extend([(7,7)])
d
# deque([(2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7)], maxlen=6)
For my particular problem, since I needed to read files from disk, I think I'll use the lru cache as #PatrickHaugh suggested. Here's one way to use the cache:
from functools import lru_cache
#lru_cache(maxsize=10)
def read_file(file_path):
print(' * reading', file_path)
return file_path # update to return the read file
for i in range(100):
if i % 2 == 0:
i = 0 # test that requests for 0 don't require additional i/o
print(' * value of', i, 'is', read_file(i))
The output shows that requests for 0 do not incur additional i/o, which is perfect.
You can use collections.OrderedDict and its method popitem to ensure you keep only the last n keys added to the dictionary. Specifying last=False with popitem ensures the behaviour is "FIFO", i.e. First-In, First-Out. Here's a trivial example:
from collections import OrderedDict
n = 3
d = OrderedDict()
for i in range(5):
if len(d) == n:
removed = d.popitem(last=False)
print(f'Item removed: {removed}')
d[i] = i+1
print(d)
Item removed: (0, 1)
Item removed: (1, 2)
OrderedDict([(2, 3), (3, 4), (4, 5)])
Related
For example given points: points = [(1,2),(3,4),(5,7),(4,7),(6,7)], i need the program to find all combination such that there's a path between points (let's say 7 is the destination we want to reach)
so the output would be: [(1,2),(3,4),(5,7)] [(1,2),(3,4),(4,7)] [(1,2),(3,4),(6,7)]
u get the idea?
I'm really stuck with it an i cannot find something similar on the internet.
Truly, I don't get why you need this, but it is a simple task.
source_list = [(1,2),(3,4),(5,7),(4,7),(6,7)]
final_point = 7 # ??? did not get the logic, btw
def some_magic_shit(lst, fp):
out = []
while True:
way = []
for k, v in enumerate(lst):
way.append(v)
if v[1] >= fp:
lst.pop(k)
out.append(way)
if k >= len(lst):
return out
else:
break
print(some_magic_shit(source_list, final_point)) # [[(1, 2), (3, 4), (5, 7)], [(1, 2), (3, 4), (4, 7)], [(1, 2), (3, 4), (6, 7)]]
The code above should be rewritten with the proper logic.
It only uses Y axis as a final point.
I have created a class with attributes and sorted them based on their level of x, from 1-6. I then want to sort the list into pairs, where the objects with the highest level of "x" and the object with the lowest level of "x" are paired together, and the second most and second less and so on. If it was my way it would look like this, even though objects are not itereable.
for objects in sortedlist:
i = 0
row(i) = [[sortedlist[i], list[-(i)-1]]
i += 1
if i => len(sortedlist)
break
Using zip
I think the code you want is:
rows = list(zip(sortedList, reversed(sortedList)))
However, note that this would "duplicate" the elements:
>>> sortedList = [1, 2, 3, 4, 5]
>>> list(zip(sortedList, reversed(sortedList)))
[(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)]
If you know that the list has an even number of elements and want to avoid duplicates, you can instead write:
rows = list(zip(sortedList[:len(sortedList)//2], reversed(sortedList[len(sortedList)//2:])))
With the following result:
>>> sortedList = [1,2,3,4,5,6]
>>> list(zip(sortedList[:len(sortedList)//2], reversed(sortedList[len(sortedList)//2:])))
[(1, 6), (2, 5), (3, 4)]
Using loops
Although I recommend using zip rather than a for-loop, here is how to fix the loop you wrote:
rows = []
for i in range(len(sortedList)):
rows.append((sortedList[i], sortedList[-i-1]))
With result:
>>> sortedList=[1,2,3,4,5]
>>> rows = []
>>> for i in range(len(sortedList)):
... rows.append((sortedList[i], sortedList[-i-1]))
...
>>> rows
[(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)]
I'm trying to find duplicates in a list. I want to preserve the values and insert them into a tuple with their number of occurrences.
For example:
list_of_n = [2, 3, 5, 5, 5, 6, 2]
occurance_of_n = zip(set(list_of_n), [list_of_n.count(n) for n in set(list_of_n)])
[(2, 2), (3, 1), (5, 3), (6, 1)]
This works fine with small sets. My question is: as list_of_n gets larger, will I have to worry about arg1 and arg2 in zip(arg1, arg2) not lining up correctly if they're the same set?
I.e. Is there a conceivable future where I call zip() and it accidentally aligns index [0] of list_of_n in arg1 with some other index of list_of_n in arg2?
(in case it's not clear, I'm converting the list to a set for purposes of speed in arg2, and under the pretense that zip will behave better if they're the same in arg1)
Since your sample output preserves the order of appearance, you might want to go with a collections.OrderedDict to gather the counts:
list_of_n = [2, 3, 5, 5, 5, 6, 2]
d = OrderedDict()
for x in list_of_n:
d[x] = d.get(x, 0) + 1
occurance_of_n = list(d.items())
# [(2, 2), (3, 1), (5, 3), (6, 1)]
If order does not matter, the appropriate approach is using a collections.Counter:
occurance_of_n = list(Counter(list_of_n).items())
Note that both approach require only one iteration of the list. Your version could be amended to sth like:
occurance_of_n = list(set((n, list_of_n.count(n)) for n in set(list_of_n)))
# [(6, 1), (3, 1), (5, 3), (2, 2)]
but the repeated calls to list.count make an entire iteration of the initial list for each (unique) element.
I would like to change my data-structure that get from my data-files in such a way that I get a list of all coordinate-values for every coordinates (so a list for all coordinates filled with values)
e.g.
for i in range (files):
open file
file_output = [[0,4,6],[9,4,1],[2,5,3]]
second loop
file_output = [[6,1,8],[4,7,3],[3,7,0]]
to
coordinates = [[0,6],[4,1],[6,8],[9,4],[4,7],[1,3],[2,3],[5,7],[3,0]]
It should be noted that I use over 1000 files of this format, which I should merge.
You could also explore the built-in zip() function
>>> l = []
>>> for k,v in zip(a,b):
l.append(zip(k,v))
>>> print l
[[0,6],[4,1],[6,8],[9,4],[4,7],[1,3],[2,3],[5,7],[3,0]]
>>> a = [[0,4,6],[9,4,1],[2,5,3]]
>>> b = [[6,1,8],[4,7,3],[3,7,0]]
>>> from itertools import chain
>>> zip(chain(*a),chain(*b))
[(0, 6), (4, 1), (6, 8), (9, 4), (4, 7), (1, 3), (2, 3), (5, 7), (3, 0)]
>>>
This should be useful.
[zip(i,j) for i in a for j in b]
However it provides list of tuples, which should satisfy your needs.
If there will only be two lists, you can use this as well.
[[i, j] for i in a for j in b]
Given a list l and all combinations of the list elements is it possible to remove any combination containing x while iterating over all combinations, so that you never consider a combination containing x during the iteration after it is removed?
for a, b in itertools.combinations(l, 2):
if some_function(a,b):
remove_any_tup_with_a_or_b(a, b)
My list l is pretty big so I don't want to keep the combinations in memory.
A cheap trick to accomplish this would be to filter by disjoint testing using a dynamically updated set of exclusion values, but it wouldn't actually avoid generating the combinations you wish to exclude, so it's not a major performance benefit (though filtering using a C built-in function like isdisjoint will be faster than Python level if checks with continue statements typically, by pushing the filter work to the C layer):
from future_builtins import filter # Only on Py2, for generator based filter
import itertools
blacklist = set()
for a, b in filter(blacklist.isdisjoint, itertools.combinations(l, 2)):
if some_function(a,b):
blacklist.update((a, b))
If you want to remove all tuples containing the number x from the list of combinations itertools.combinations(l, 2), consider that you there is a one-to-one mapping (mathematically speaking) from the set itertools.combinations([i for i in range(1,len(l)], 2) to the itertools.combinations(l, 2) that don't contain the number x.
Example:
The set of all of combinations from itertools.combinations([1,2,3,4], 2) that don't contain the number 1 is given by [(2, 3), (2, 4), (3, 4)]. Notice that the number of elements in this list is equal to the number of elements of combinations in the list itertools.combinations([1,2,3], 2)=[(1, 2), (1, 3), (2, 3)].
Since order doesn't matter in combinations, you can map 1 to 4 in [(1, 2), (1, 3), (2, 3)] to get [(1, 2), (1, 3), (2, 3)]=[(4, 2), (4, 3), (2, 3)]=[(2, 4), (3, 4), (2, 3)]=[(2, 3), (2, 4), (3, 4)].