How can I make a Python equivalent of pdtolist from Pop-11?
Assume I have a generator called g that returns (say) integers one at a time. I'd like to construct a list a that grows automatically as I ask for values beyond the current end of the list. For example:
print a # => [ 0, 1, 2, g]
print a[0] # => 0
print a[1] # => 1
print a[2] # => 2
# (obvious enough up to here)
print a[6] # => 6
print a # => [ 0, 1, 2, 3, 4, 5, 6, g]
# list has automatically expanded
a = a[4:] # discard some previous values
print a # => [ 4, 5, 6, g]
print a[0] # => 4
Terminology - to anticipate a likely misunderstanding: a list is a "dynamic array" but that's not what I mean; I'd like a "dynamic list" in a more abstract sense.
To explain the motivation better, suppose you have 999999999 items to process. Trying to fit all those into memory (in a normal list) all at once would be a challenge. A generator solves that part of the problem by presenting them one at a time; each one created on demand or read individually from disk. But suppose during processing you want to refer to some recent values, not just the current one? You could remember the last (say) ten values in a separate list. But a dynamic list is better, as it remembers them automatically.
This might get you started:
class DynamicList(list):
def __init__(self, gen):
self._gen = gen
def __getitem__(self, index):
while index >= len(self):
self.append(next(self._gen))
return super(DynamicList, self).__getitem__(index)
You'll need to add some special handling for slices (currently, they just return a normal list, so you lose the dynamic behavior). Also, if you want the generator itself to be a list item, that'll add a bit of complexity.
Just answered another similar question and decided to update my answer for you
hows this?
class dynamic_list(list):
def __init__(self,num_gen):
self._num_gen = num_gen
def __getitem__(self,index):
if isinstance(index, int):
self.expandfor(index)
return super(dynamic_list,self).__getitem__(index)
elif isinstance(index, slice):
if index.stop<index.start:
return super(dynamic_list,self).__getitem__(index)
else:
self.expandfor(index.stop if abs(index.stop)>abs(index.start) else index.start)
return super(dynamic_list,self).__getitem__(index)
def __setitem__(self,index,value):
if isinstance(index, int):
self.expandfor(index)
return super(dynamic_list,self).__setitem__(index,value)
elif isinstance(index, slice):
if index.stop<index.start:
return super(dynamic_list,self).__setitem__(index,value)
else:
self.expandfor(index.stop if abs(index.stop)>abs(index.start) else index.start)
return super(dynamic_list,self).__setitem__(index,value)
def expandfor(self,index):
rng = []
if abs(index)>len(self)-1:
if index<0:
rng = xrange(abs(index)-len(self))
else:
rng = xrange(abs(index)-len(self)+1)
for i in rng:
self.append(self._num_gen.next())
Many thanks to all who contributed ideas! Here's what I have gathered together from all the responses. This retains most functionality from the normal list class, adding additional behaviours where necessary to meet additional requirements.
class DynamicList(list):
def __init__(self, gen):
self.gen = gen
def __getitem__(self, index):
while index >= len(self):
self.append(next(self.gen))
return super(DynamicList, self).__getitem__(index)
def __getslice__(self, start, stop):
# treat request for "last" item as "most recently fetched"
if stop == 2147483647: stop = len(self)
while stop > len(self):
self.append(next(self.gen))
return super(DynamicList, self).__getslice__(start, stop)
def __iter__(self):
return self
def next(self):
n = next(self.gen)
self.append(n)
return n
a = DynamicList(iter(xrange(10)))
Previously generated values can be accessed individually as items or slices. The recorded history expands as necessary if the requested item(s) are beyond the current end of the list. The entire recorded history can be accessed all at once, using print a, or assigned to a normal list using b = a[:]. A slice of the recorded history can be deleted using del a[0:4]. You can iterate over the whole list using for, deleting as you go, or whenever it suits. Should you reach the end of the generated values, StopIteration is raised.
Some awkwardness remains. Assignments like a = a[0:4] successfully truncate the history, but the resulting list no longer auto-expands. Instead use del a[0:4] to retain the automatic growth properties. Also, I'm not completely happy with having to recognise a magic value, 2147483647, representing the most recent item.
Thanks for this thread; it helped me solve my own problem. Mine was a bit simpler: I wanted a list that automatically extended if indexed past its current length --> allow reading and writing past current length. If reading past current length, return 0 values.
Maybe this helps someone:
class DynamicList(list):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __getitem__(self, idx):
self.expand(idx)
return super().__getitem__(idx)
def __setitem__(self, idx, val):
self.expand(idx)
return super().__setitem__(idx, val)
def expand(self, idx):
if isinstance(idx, int):
idx += 1
elif isinstance(idx, slice):
idx = max(idx.start, idx.stop)
if idx > len(self):
self.extend([0] * (idx - len(self)))
Related
This is kind of a question, but it's also kind of me just hoping I don't have to write a bunch of code to get behavior I want. (Plus if it already exists, it probably runs faster than what I would write anyway.) I have a number of large lists of numbers that cannot fit into memory -- at least not all at the same time. Which is fine because I only need a small portion of each list at a time, and I know how to save the lists into files and read out the part of the list I need. The problem is that my method of doing this is somewhat inefficient as it involves iterating through the file for the part I want. So, I was wondering if there happened to be some library or something out there that I'm not finding that allows me to index a file as though it were a list using the [] notation I'm familiar with. Since I'm writing the files myself, I can make the formatting of them whatever I need to, but currently my files contain nothing but the elements of the list with \n as a deliminator between values.
Just to recap what I'm looking for/make it more specific.
I want to use the list indexing notation (including slicing into sub-list and negative indexing) to access the contents of a list written in a file
A accessed sub-list (e.g. f[1:3]) should return as a python list object in memory
I would like to be able to assign to indices of the file (e.g. f[i] = x should write the value x to the file f in the location corresponding to index i)
To be honest, I don't expect this to exist, but you never know when you miss something in your research. So, I figured I'd ask. On a side note if this doesn't exist, is possible to overload the [] operator in python?
If your data is purely numeric you could consider using numpy arrays, and storing the data in npy format. Once stored in this format, you could load the memory-mapped file as:
>>> X = np.load("some-file.npy", mmap_mode="r")
>>> X[1000:1003]
memmap([4, 5, 6])
This access will load directly from disk without requiring the loading of leading data.
You can actually do this by writing a simple class, I think:
class FileWrapper:
def __init__(self, path, **kwargs):
self._file = open(path, 'r+', **kwargs)
def _do_single(self, where, s=None):
if where >= 0:
self._seek(where)
else:
self._seek(where, 2)
if s is None:
return self._read(1)
else:
return self._write(s)
def _do_slice_contiguous(self, start, end, s=None):
if start is None:
start = 0
if end is None:
end = -1
self._seek(start)
if s is None:
return self._read(end - start)
else:
return self._write(s)
def _do_slice(self, where, s=None):
if s is None:
result = []
for index in where:
file._seek(index)
result.append(file.read(1))
return result
else:
for index, char in zip(where, s):
file._seek(index)
file._write(char)
return len(s)
def __getitem__(self, key):
if isinstance(key, int):
return self._do_single(key)
elif isinstance(key, slice):
if self._is_contiguous(key):
return self._do_slice_contiguous(key.start, key.stop)
else:
return self._do_slice(self._process_slice(key))
else:
raise ValueError('File indices must be ints or slices.')
def __setitem__(self, key, value):
if isinstance(key, int):
return self._do_single(key, value)
elif isinstance(key, slice):
if self._is_contiguous(key):
return self._do_slice_contiguous(key.start, key.stop, value)
else:
where = self._process_slice(key)
if len(where) == len(value):
return self._do_slice(where, value)
else:
raise ValueError('Length of slice not equal to length of string to be written.')
def __del__(self):
self._file.close()
def _is_contiguous(self, key):
return key.step is None or key.step == 1
def _process_slice(self, key):
return range(key.start, key.stop, key.step)
def _read(self, size):
return self._file.read(size)
def _seek(self, offset, whence=0):
return self._file.seek(offset, whence)
def _write(self, s):
return self._file.write(s)
I'm sure many optimisations could be made, since I rushed through this, but it was fun to write.
This does not answer the question in full, because it supports random access of characters, as supposed to lines, which are at a higher level of abstraction and more complicated to handle (since they can be variable length)
I have a linked list in python and I want to write a filter function that returns a new link list if a call to f(item) is true, this implementation has a filtered that builds the list from the bottom up. I'm having trouble understanding this recursion. What type of recursion is this?
I'm more familiar with recursion like fibonacci where the return recursion is at the very bottom.
class Link:
empty = ()
def __init__(self, first, rest=empty):
assert rest is Link.empty or isinstance(rest, Link)
self.first = first
self.rest = rest
def __getitem__(self, i):
if i == 0:
return self.first
else:
return self.rest[i-1]
def __len__(self):
return 1 + len(self.rest)
def __repr__(self):
if self.rest == Link.empty:
return "Link(" + str(self.first) + ")"
return 'Link({0}, {1})'.format(self.first, repr(self.rest))
def filter_link(f, s):
if s is Link.empty:
return s
else:
filtered = filter_link(f,s.rest) # How does this work?
if f(s.first):
return Link(s.first, filtered)
else:
return filtered
This is the sort of recursion you are used to.
I just looked up a recursive fibonacci solution where the early return is on the second line, just like your code. Also, like your code, the recursion in the example occurs before the more normal returns.
It looks like your code returns a new linked list of the elements that the function f approves of, from the bottom up. That is, it creates new instances of Link around elements s.first, terminated by the single instance of Link.empty.
Now I have the source code above:
class Stats(object):
def __init__(self):
self._pending = []
self._done = []
#property
def pending(self):
return self._pending
The way those lists are filled is not important for my question.
The situation is that I'm getting a sublist of these lists this way:
stats = Stats()
// code to fill the lists
stats.pending[2:10]
The problem here is that I expect to get as many elements as I retrieved.
In the example above I expect a sublist that contains 8 elements (10-2).
Of course, actually I'll get less than 8 elements if the list is shorter.
So, what I need is:
When the list has enough items, it returns the corresponding sublist.
When the list is shorter, it returns a sublist with the expected length, filled with the last elements of the original lists and a default value (for example None) for the extra items.
This way, if I did:
pending_tasks = stats.pending[44:46]
And the pending list only contains 30 elements, it should returns a list of two default elements, for example: [None, None]; instead of an empty list ([]) which is the default behaviour of the lists.
I guess I already know how to do it inside a normal method/function, but I want to do it in the most clean way, trying to follow the #property approach, if possible.
Thanks a lot!
This is not easy to do because the slicing operation is what you want to modify, and that happens after the original list has been returned by the property. It's not impossible though, you'll just need to wrap the regular list with another object that will take care of padding the slices for you. How easy or difficult that will be may depend on how much of the list interface you need your wrapper to implement. If you only need indexing and slicing, it's really easy:
class PadSlice(object):
def __init__(self, lst, default_value=None):
self.lst = lst
self.default_value
def __getitem__(self, index):
item = getitem(self.lst, index)
if isinstance(index, slice):
expected_length = (index.stop - index.start) // (index.step or 1)
if len(item) != expected_length:
item.extend([default_value] * (expected_length - len(item)))
return item
This code probably won't work right for negative step slices, or for slices that don't specify one of the end points (it does have logic to detect an omitted step, since that's common). If this was important to you, you could probably fix up those corner cases.
This is not easy. How would the object (list) you return know how it will be sliced later? You could subclass list, however, and override __getitem__ and __getslice__ (Python2 only):
class L(list):
def __getitem__(self, key):
if isinstance(key, slice):
return [list(self)[i] if 0 <= i < len(self) else None for i in xrange(key.start, key.stop, key.step or 1)]
return list(self)[key]
def __getslice__(self, i, j):
return self.__getitem__(slice(i, j))
This will pad all slices with None, fully compatible with negative indexing and steps != 1. And in your property, return an L version of the actual list:
#property
def pending(self):
return L(self._pending)
You can construct a new class, which is a subclass of list. Then you can overload the __getitem__ magic method to overload [] operator to the appropriate behavior. Consider this subclass of list called MyList:
class MyList(list):
def __getitem__(self, index):
"""Modify index [] operator"""
result = super(MyList, self).__getitem__(index)
if isinstance(index, slice):
# Get sublist length.
if index.step: # Check for zero to avoid divide by zero error
sublist_len = (index.stop - index.start) // index.step
else:
sublist_len = (index.stop - index.start)
# If sublist length is greater (or list is shorter), then extend
# the list to length requested with default value of None
if sublist_len > len(self) or index.start > len(self):
result.extend([None for _ in range(sublist_len - len(result))])
return result
Then you can just change the pending method to return a MyList type instead of list.
class Stats(object):
#property
def pending(self):
return MyList(self._pending)
Hopefully this helps.
I want to create a list in python of a fixed size, let's so 3 to begin with. I have a method that writes data to this list every time the method is called, I want to add this to the list until the list is full, once the list is full it should start overwriting the data in the list in ascending order (e.g. starting with element 0). I also want to add a function which will increase the length of the array, i.e. if the method is called the array will increase from size 3 to size 4. How do I go about doing either of these?
This simple solution should do:
def add(l, item, max_len):
l.insert(0, item)
return l[:max_len]
l = ["banana", "peanut", "bicycle", "window"]
l = add(l, "monkey", 3)
print(newlist)
prints:
> ['monkey', 'banana', 'peanut']
The the list to edit (l), the item to add and the max size (max_len) of the list are arguments.
The item will then be added at index 0, while the list is limited to max_len.
This should do the trick. There are tons of prebuilt modules that have similar functionality but I thought it would be best if you could visualize the process!
class SizedList(list):
def __init__(self, size):
list().__init__(self)
self.__size = size
self.__wrap_location = 0
self.len = len(self)
def append(self, *args, **kwargs):
self.len = len(self)
if self.len >= self.__size:
if self.__wrap_location == self.__size-1:
self.__wrap_location = 0
self.__wrap_location += 1
self.pop(self.__wrap_location)
return self.insert(self.__wrap_location-1, *args)
return list.append(self, *args, **kwargs)
def increase_size(self, amount=1):
self.__size += 1
Here is how I went about it in the end. I set a variable called length_list which was originally set at 3.
def add_to_list():
if len(list) < length_list:
append
else:
del list[0]
self.add_to_list()
def increase_length():
length_list += 1
I'm looking for a function that returns a linked list that doesn't contain a specific node.
Here is an example implementation:
Nil = None # empty node
def cons(head, tail=Nil):
""" Extends list by inserting new value. """
return (head, tail)
def head(xs):
""" Returns the frst element of a list. """
return xs[0]
def tail(xs):
""" Returns a list containing all elements except the first. """
return xs[1]
def is_empty(xs):
""" Returns True if the list contains zero elements """
return xs is Nil
def length(xs):
"""
Returns number of elements in a given list. To find the length of a list we need to scan all of its
elements, thus leading to a time complexity of O(n).
"""
if is_empty(xs):
return 0
else:
return 1 + length(tail(xs))
def concat(xs, ys):
""" Concatenates two lists. O(n) """
if is_empty(xs):
return ys
else:
return cons(head(xs), concat(tail(xs), ys))
How can a remove_item function be implemented?
def remove_item(xs, value):
if is_empty(xs):
return xs
elif head(xs) == value:
return tail(xs) # or remove_item(tail(xs), value) to remove all
else:
return cons(head(xs), remove_item(tail(xs), value))
Note: I am not a Lisp programmer, I haven't necessarily done this the best possible way.
[Edit: I might have misinterpreted what you meant by removing a specific node. If you're starting with a suffix of xs rather than a value in xs then the principle is the same but the test involving value is different]
If you want a tail-recursive solution, you can say:
def remove_item(xs, value):
before_rev, after = split_remove(Nil, xs, value)
return reverse_append(before_rev, after)
def reverse_append(a, b):
if is_empty(a):
return b
else:
return reverse_append(tail(a), cons(head(a),b))
def split_remove(before_rev, xs, value):
if is_empty(xs):
return (before_rev, xs)
elif head(xs) == value:
return (before_rev, tail(xs))
else:
return split_remove(cons(head(xs), before_rev), tail(xs), value)
Although I don't know if Python does tail-call optimization