I'm trying to figure out this python what the brackets mean around this python statement:
[self.bids.insert_order(bid[2], Decimal(bid[1]), Decimal(bid[0]), initial=True) for bid in json_doc['bids']]
I'm trying to figure out what the brackets are doing in this. Is it modifying a list in place? I don't get it and I can't figure out how to ask google for the right answer. This is the complete function right here:
def get_level3(self, json_doc=None):
if not json_doc:
json_doc = requests.get('http://api.exchange.coinbase.com/products/BTC-USD/book', params={'level': 3}).json()
[self.bids.insert_order(bid[2], Decimal(bid[1]), Decimal(bid[0]), initial=True) for bid in json_doc['bids']]
[self.asks.insert_order(ask[2], Decimal(ask[1]), Decimal(ask[0]), initial=True) for ask in json_doc['asks']]
self.level3_sequence = json_doc['sequence']
What is a list comprehension?
In essence it means: Do something for every item in the list
Here's a simple example:
exampleList = [1, 2, 3, 4, 5]
#for every item in exampleList, replace it with a value one greater
[item + 1 for item in exampleList]
print(exampleList)
#[2, 3, 4, 5, 6]
List comprehensions are useful when you need do something relatively simple for every item in a list
Understanding your code:
[self.bids.insert_order(bid[2], Decimal(bid[1]), Decimal(bid[0]), initial=True) for bid in json_doc['bids']]
The list we are working with is json_doc['bids']. So for every bid in json_doc['bids'] we insert a new bid with self.bids.insert_order() with all the qualities of that bid which are stored as bid[0], bid[1] etc. In summary, this list comprehension calls the function self.bids.insert_order() for every bid in your json list
To begin with, a list comprehension is a way of constructing a list inline. Say you want to make a list of the first five square numbers. One way would be:
square_list = []
for i in range(1,6):
square_list.append(i**2)
# square_list = [1, 4, 9, 16, 25]
Python has some syntactical sugar to ease this process.
square_list = [i**2 for i in range(1,6)]
I think the most important thing to note is that your example is questionable code. It's using a list comprehension to repeatedly apply a function. That line generates a list and immediately throws it away. In the context of the previous example, it might be akin to:
square_list = []
[square_list.append(i**2) for i in range(1,6)]
This, in general, is kind of a silly structure. It would be better to use either of the first two formulations (rather than mix them). In my opinion, the line you're confused about would be better off as an explicit loop.
for bid in json_doc['bids']:
self.bids.insert_order(bid[2], Decimal(bid[1]), Decimal(bid[0]), initial=True)
This way, it is clear that the object self.bids is being altered. Also, you probably wouldn't be asking this question if it was written this way.
Related
I am really really new to coding and this is my first post but I haven't found anybody else with the same problem yet.
This is a snippet of my code:
nr_list = [3, 4, 9]
nr_list_r = []
nr_list_r = nr_list
nr_list_r.reverse()
print(nr_list)
It returns [9, 4, 3]
I honestly don't know why nr_list is reversed when I only used the reverse function on nr_list_r.
Why is nr_list reversed as well?
In Python, variables refer to values rather than contain them. So when you do
nr_list_r = nr_list
You're not making a new list. You're making a new variable refer to the same list. If you want to make a copy, you can use the slice syntax [:]
nr_list_r = nr_list[:]
But we also already have a way to reverse a list without modifying it, so you may as well just do it using that built-in function.
nr_list_r = list(reversed(nr_list))
We use reversed to reverse the iterable and then list to convert the result (which is an arbitrary iterable) into a concrete list.
Basically with:
nr_list_r = nr_list
You are just atributting another name to the list "nr_list"
So when you reverse "nr_list_r" the "nr_list" is gonna get reversed aswell.
In order to reverse just one of them you should copy the list
nr_list_r = nr_list.copy()
the full code is like this:
nr_list = [3, 4, 9]
nr_list_r = nr_list.copy()
nr_list_r.reverse()
print(nr_list)
print(nr_list_r)
The concept of References is one of the key optimization in many programming languages.
An object starts its life when it is created. This point of time at least one reference is created.
nr_list = [3, 4, 9] # one reference.
As the execution proceeds, more number of references may get created, based on the usage.
nr_list_r = nr_list # one more reference added.
And also the number of references may be reduced.
nr_list_r = [4,6,8] # one reference reduced.
Finally the object is considered alive, if there at least one reference exists.
When there are no more references, the object is no more accessible and the memory is reclaimed.
I want to define a function that takes a list as an argument and removes all duplicates from the list except the last one.
For example:
remove_duplicates([3,4,4,3,6,3]) should be [4,6,3]. The other post answers do not solve this one.
The function is removing each element if it exists later in the list.
This is my code:
def remove(y):
for x in y:
if y.count(x) > 1:
y.remove(x)
return y
and for this list:
[1,2,1,2,1,2,3] I am getting this output:
[2,1,2,3]. The real output should be [1,2,3].
Where am I going wrong and how do I fix it?
The other post does actually answer the question, but there's an extra step: reverse the input then reverse the output. You could use reversed to do this, with an OrderedDict:
from collections import OrderedDict
def remove_earlier_duplicates(sequence):
d = OrderedDict.fromkeys(reversed(sequence))
return reversed(d)
The output is a reversed iterator object for greater flexibility, but you can easily convert it to a list.
>>> list(remove_earlier_duplicates([3,4,4,3,6,3]))
[4, 6, 3]
>>> list(remove_earlier_duplicates([1,2,1,2,1,2,3]))
[1, 2, 3]
BTW, your remove function doesn't work because you're changing the size of the list as you're iterating over it, meaning certain items get skipped.
I found this way to do after a bit of research. #wjandrea provided me with the fromkeys method idea and helped me out a lot.
def retain_order(arr):
return list(dict.fromkeys(arr[::-1]))[::-1]
I was doing one of the course exercises on codeacademy for python and I had a few questions I couldn't seem to find an answer to:
For this block of code, how exactly does python check whether something is "in" or "not in" a list? Does it run through each item in the list to check or does it use a quicker process?
Also, how would this code be affected if it were running with a massive list of numbers (thousands or millions)? Would it slow down as the list size increases, and are there better alternatives?
numbers = [1, 1, 2, 3, 5, 8, 13]
def remove_duplicates(list):
new_list = []
for i in list:
if i not in new_list:
new_list.append(i)
return new_list
remove_duplicates(numbers)
Thanks!
P.S. Why does this code not function the same?
numbers = [1, 1, 2, 3, 5, 8, 13]
def remove_duplicates(list):
new_list = []
new_list.append(i for i in list if i not in new_list)
return new_list
In order to execute i not in new_list Python has to do a linear scan of the list. The scanning loop breaks as soon as the result of the test is known, but if i is actually not in the list the whole list must be scanned to determine that. It does that at C speed, so it's faster than doing a Python loop to explicitly check each item. Doing the occasional in some_list test is ok, but if you need to do a lot of such membership tests it's much better to use a set.
On average, with random data, testing membership has to scan through half the list items, and in general the time taken to perform the scan is proportional to the length of the list. In the usual notation the size of the list is denoted by n, and the time complexity of this task is written as O(n).
In contrast, determining membership of a set (or a dict) can be done (on average) in constant time, so its time complexity is O(1). Please see TimeComplexity in the Python Wiki for further details on this topic. Thanks, Serge, for that link.
Of course, if your using a set then you get de-duplication for free, since it's impossible to add duplicate items to a set.
One problem with sets is that they generally don't preserve order. But you can use a set as an auxilliary collection to speed up de-duping. Here is an illustration of one common technique to de-dupe a list, or other ordered collection, which does preserve order. I'll use a string as the data source because I'm too lazy to type out a list. ;)
new_list = []
seen = set()
for c in "this is a test":
if c not in seen:
new_list.append(c)
seen.add(c)
print(new_list)
output
['t', 'h', 'i', 's', ' ', 'a', 'e']
Please see How do you remove duplicates from a list whilst preserving order? for more examples. Thanks, Jean-François Fabre, for the link.
As for your PS, that code appends a single generator object to new_list, it doesn't append what the generate would produce.
I assume you alreay tried to do it with a list comprehension:
new_list = [i for i in list if i not in new_list]
That doesn't work, because the new_list doesn't exist until the list comp finishes running, so doing in new_list would raise a NameError. And even if you did new_list = [] before the list comp, it won't be modified by the list comp, and the result of the list comp would simply replace that empty list object with a new one.
BTW, please don't use list as a variable name (even in example code) since that shadows the built-in list type, which can lead to mysterious error messages.
You are asking multiple questions and one of them asking if you can do this more efficiently. I'll answer that.
Ok let's say you'd have thousands or millions of numbers. From where exactly? Let's say they were stored in some kind of txtfile, then you would probably want to use numpy (if you are sticking with Python that is). Example:
import numpy as np
numbers = np.array([1, 1, 2, 3, 5, 8, 13], dtype=np.int32)
numbers = np.unique(numbers).tolist()
This will be more effective (above all memory-efficient compared) than reading it with python and performing a list(set..)
numbers = [1, 1, 2, 3, 5, 8, 13]
numbers = list(set(numbers))
You are asking for the algorithmic complexity of this function. To find that you need to see what is happening at each step.
You are scanning the list one at a time, which takes 1 unit of work. This is because retrieving something from a list is O(1). If you know the index, it can be retrieved in 1 operation.
The list to which you are going to add it increases at worst case 1 at a time. So at any point in time, the unique items list is going to be of size n.
Now, to add the item you picked to the unique items list is going to take n work in the worst case. Because we have to scan each item to decide that.
So if you sum up the total work in each step, it would be 1 + 2 + 3 + 4 + 5 + ... n which is n (n + 1) / 2. So if you have a million items, you can just find that by applying n = million in the formula.
This is not entirely true because of how list works. But theoretically, it would help to visualize this way.
to answer the question in the title: python has more efficient data types but the list() object is just a plain array, if you want a more efficient way to search values you can use dict() which uses a hash of the object stored to insert it into a tree which i assume is what you were thinking of when you mentioned "a quicker process".
as to the second code snippet:
list().append() inserts whatever value you give it to the end of the list, i for i in list if i not in new_list is a generator object and it inserts that generator as an object into the array, list().extend() does what you want: it takes in an iterable and appends all of its elements to the list
My problem is about managing insert/append methods within loops.
I have two lists of length N: the first one (let's call it s) indicates a subset to which, while the second one represents a quantity x that I want to evaluate. For sake of simplicity, let's say that every subset presents T elements.
cont = 0;
for i in range(NSUBSETS):
for j in range(T):
subcont = 0;
if (x[(i*T)+j] < 100):
s.insert(((i+1)*T)+cont, s[(i*T)+j+cont]);
x.insert(((i+1)*T)+cont, x[(i*T)+j+cont]);
subcont += 1;
cont += subcont;
While cycling over all the elements of the two lists, I'd like that, when a certain condition is fulfilled (e.g. x[i] < 100), a copy of that element is put at the end of the subset, and then going on with the loop till completing the analysis of all the original members of the subset. It would be important to maintain the "order", i.e. inserting the elements next to the last element of the subset it comes from.
I thought a way could have been to store within 2 counter variables the number of copies made within the subset and globally, respectively (see code): this way, I could shift the index of the element I was looking at according to that. I wonder whether there exists some simpler way to do that, maybe using some Python magic.
If the idea is to interpolate your extra copies into the lists without making a complete copy of the whole list, you can try this with a generator expression. As you loop through your lists, collect the matches you want to append. Yield each item as you process it, then yield each collected item too.
This is a simplified example with only one list, but hopefully it illustrates the idea. You would only get a copy if you do like i've done and expand the generator with a comprehension. If you just wanted to store or further analyze the processed list (eg, to write it to disk) you could never have it in memory at all.
def append_matches(input_list, start, end, predicate):
# where predicate is a filter function or lambda
for item in input_list[start:end]:
yield item
for item in filter(predicate, input_list[start:end]):
yield item
example = lambda p: p < 100
data = [1,2,3,101,102,103,4,5,6,104,105,106]
print [k for k in append_matches (data, 0, 6, example)]
print [k for k in append_matches (data, 5, 11, example)]
[1, 2, 3, 101, 102, 103, 1, 2, 3]
[103, 4, 5, 6, 104, 105, 4, 5, 6]
I'm guessing that your desire not to copy the lists is based on your C background - an assumption that it would be more expensive that way. In Python lists are not actually lists, inserts have O(n) time as they are more like vectors and so those insert operations are each copying the list.
Building a new copy with the extra elements would be more efficient than trying to update in-place. If you really want to go that way you would need to write a LinkedList class that held prev/next references so that your Python code really was a copy of the C approach.
The most Pythonic approach would not try to do an in-place update, as it is simpler to express what you want using values rather than references:
def expand(origLs) :
subsets = [ origLs[i*T:(i+1)*T] for i in range(NSUBSETS) ]
result = []
for s in subsets :
copies = [ e for e in s if e<100 ]
result += s + copies
return result
The main thing to keep in mind is that the underlying cost model for an interpreted garbage-collected language is very different to C. Not all copy operations actually cause data movement, and there are no guarantees that trying to reuse the same memory will be successful or more efficient. The only real answer is to try both techniques on your real problem and profile the results.
I'd be inclined to make a copy of your lists and then, while looping across the originals, as you come across a criteria to insert you insert into the copy at the place you need it to be at. You can then output the copied and updated lists.
I think to have found a simple solution.
I cycle from the last subset backwards, putting the copies at the end of each subset. This way, I avoid encountering the "new" elements and get rid of counters and similia.
for i in range(NSUBSETS-1, -1, -1):
for j in range(T-1, -1, -1):
if (x[(i*T)+j] < 100):
s.insert(((i+1)*T), s[(i*T)+j])
x.insert(((i+1)*T), x[(i*T)+j])
One possibility would be using numpy's advanced indexing to provide the illusion of copying elements to the ends of the subsets by building a list of "copy" indices for the original list, and adding that to an index/slice list that represents each subset. Then you'd combine all the index/slice lists at the end, and use the final index list to access all your items (I believe there's support for doing so generator-style, too, which you may find useful as advanced indexing/slicing returns a copy rather than a view). Depending on how many elements meet the criteria to be copied, this should be decently efficient as each subset will have its indices as a slice object, reducing the number of indices needed to keep track of.
I have two models with a many to many, Group and Individual.
I can access group.individuals and get a list of the related individuals. I have a 'last_individual_id' column on the Group models to keep track of the Last used individual. With this information I was wandering how get get the next individual for a Group.
I thought of getting the id's for the Individuals and using itertools.cycle but i can't specify the start point. Plus that may be a slow way to do it if I can just do it properly in SQLAlchemy.
Any thoughts on how to accomplish this? I feel like I will be embarrassed at how simple the answer is... but I didn't have caffeine today!
Thanks
One possibility is using itertools.dropwhile (but this may be limited as the scope of your desired solution, as it is not clear, also itertools.dropwhile doesn't infinitely cycle through your list like itertools.cycle does).
You didn't provide a lot of detail of your data model. I'll presume that group.individuals returns a list of individuals objects and the '.id' property represents each individual's id number. For example, say your looking for individual id 7, within the list of group.individuals:
from itertools import dropwhile
for item in dropwhile(lambda x: x.id != 7, group.individuals):
print item
This will go through the list until it finds id=7 (or more acurately, until x.id != 7, as itertools.dropwhile won't return anything until the first False test), then it will start returning individuals objects from the list. So if the id's from that list are like: [1, 19, 5, 6, 7, 2, 4, 5] this would return: 7 2 4 5.
For your purposes it sounds like you would ignore the first returned response, and then use the responses after.
Another alternative is to iterate through your original list of group.individuals to find the index of the id you want, then just increment the index - this would also allow you to start the index over at 0 if your calculated index exceeds your list, for example:
def nextrecord(recordlist, value):
for i in range(len(recordlist)):
if recordlist[i].id == value:
newindex = i + 1
break
if newindex >= len(recordlist):
newindex = 0
return recordlist[newindex]
print nextrecord(group.individuals, 7)
But, of course, this is more discrete, and isn't iterable. There are probably lots of other ways to approach this (including making my function iterable) but I also haven't had my caffeine yet!