I'm using python with the binance.client wrapper. I'm gathering all of the BTC trade pairs from the exchange and wanting to create a simple dict with tradepair: price.
I have figured a way to do this but it seems clunky to me and takes a minute or so to run. I'm currently a programming student just getting started with python as well as some other languages.
Is there a better way to do it than this?
def BTCPair():
BTCPair = []
BTCPrice = []
BTCPairAndPrice = {}
exchange_info = client.get_exchange_info()
for s in exchange_info['symbols']:
if 'BTC' in (s['symbol'])[-3:]:
BTCPair.append(s['symbol'])
BTCPrice.append(client.get_avg_price(symbol=s['symbol'])['price'])
for i in range(len(BTCPair)):
BTCPairAndPrice[BTCPair[i]] = BTCPrice[i]
return BTCPairAndPrice
I don't understand why you are using two loops; one to put the data into lists, and another to convert those lists into a dictionary - why not just construct the dictionary directly?
You could just construct your dictionary directly using a comprehension:
BTCPairAndPrice = {
s['symbol']: client.get_avg_price(symbol=s['symbol'])['price']
for s in exchange_info['symbols']
if 'BTC' in (s['symbol'])[-3:]
}
The way the dictionary is constructed is unlikely to have a big impact on performance, but not iterating through all the data twice should have an impact if there is a lot of data.
Also consider that contacting a web service is also likely to take some time, so contacting the exchange might be the slowest part.
First of all, for i in range(len(BTCPair)) is an antipattern. You could instead iterate over those zipped together.
But we don't actually need to do that either! Rather than creating two lists and then iterating over them to fill in your dictionary, you could create everything in one go with a dictionary comprehension. Also, a cleaner way to check the end of a string is endswith().
def btc_pair():
symbols = client.get_exchange_info()['symbols']
return {
s['symbol']: client.get_avg_price(symbol=s['symbol'])['price']
for s in symbols
if s['symbol'].endswith('BTC')
}
This probably will run a little bit faster, but I doubt that the dictionary creation itself is the real performance bottleneck in your code.
Related
I'm guessing this will be a really simple problem but I have no solution yet.
I have a long code that does modelling and updates values of variables for optimisation. The code is initially written like this:
init_old(x,y):
return {(k):olddict[k][x][0]*prod[y] for k in realnames}
Q_house=init_old(“Q_house”,"P_house")
Q_car=init_old(“Q_car”,"P_car")
Q_holiday=init_old(“Q_holiday”,"P_holiday")
I already can simplify it a bit with a comprehension:
ListOfExpenses=["house","car","holiday"]
Q_house, Q_car, Q_holiday=[init_old(“Q_”+i,"P0_"+i) for i in ListOfExpenses]
I am trying to find an equivalent but more flexible way of writing that final line, so that I change the list of Expenses and the "Q_..." variables together easily:
ListOfExpenses=["house","car","holiday"]
ListOfCost=["Q_house","Q_car","Q_holiday"]
Elements_Of_ListOfCost=[init_old(“Q_”+i,"P0_"+i) for i in ListOfExpenses]
So when I look for Q_house, Q_car or Q_holiday later, it returns the same Q_house=init_old(“Q_house”,"P_house") calculated in the original code.
I don't want to use dictionaries for now as they would require major change to the rest of the code and calling dictionaries causes problems in some of the other functions. Thanks in advance for the help.
I am writing a program and need to use this function to add Dependencies to an xml file. This code works, but I would like to ask if there is a more pythonic way to do so.
The part I believe I am doing in a non-pythonic way is the nested for loop. Is there a better way to iterate a list of dictionaries and each value?
def add_Dependencies(self):
""" Adds the dependencies in a feature using dictionaries. When
a feature is loaded, its dependencies are added to
dictionaries. Three for each type of software that the
dependencies are categorized as."""
dependency_dict_list = [
self.os_dict, self.visual_dict, self.audio_dict
]
dependencies = self.dependencies
for dictionary in dependency_dict_list:
for feature, software in dictionary.items():
if all(dependency.text != feature for dependency in dependencies):
etree.SubElement(dependencies,"Dependency", Software = software).text = feature
You can use collections.ChainMap to merge your three dictionaries into a single dictionary-like mapping. Or, since you don't care about the values, you could merge their keys into a set
It's not very important to use this to avoid the nested loops. Nested loops can be perfectly Pythonic, though you might want to factor some inner bits out into functions if the nesting gets too deep.
The real reason you might want to use a ChainMap or set here is to avoid the O(N**2) runtime complexity of searching your XML tree to eliminate duplicate dependencies. That they also eliminate a level of nesting is a minor side benefit.
Try something like this:
new_dependencies = (set(self.os_dict).union(self.visual_dict, self.audio_dict) -
set(dependency.text for dependency in self.dependencies))
for feature in new_dependencies:
etree.SubElement(self.dependencies,"Dependency", Software = software).text = feature
Honestly, nothing wrong with that at all. I put this together if you'd be interested it should be a little faster and is a tad less verbose. I cut out the nested for loop and flipped the all to an any (it reads a little cleaner to me, but that's really just taste).
all_dependencies = dict(self.os_dist, **self.visual_dict)
all_dependencies.update(self.audio_dict)
for feature, software in dictionary.items():
if not any(dependency.text == feature for dependency in dependencies):
etree.SubElement(dependencies,"Dependency", Software = software).text = feature
I have a dict that has unix epoch timestamps for keys, like so:
lookup_dict = {
1357899: {} #some dict of data
1357910: {} #some other dict of data
}
Except, you know, millions and millions and millions of entries. I'd like to subset this dict, over and over again. Ideally, I'd love to be able to write something like I can in R, like:
lookup_value = 1357900
dict_subset = lookup_dict[key >= lookup_value]
# dict_subset now contains {1357910: {}}
But I confess, I can't find any actual proof that this is something Python can do without having, one way or the other, to iterate over every row. If I understand Python correctly (and I might not), key lookup of the form key in dict uses binary search, and is thus very fast; any way to do a binary search, on dict keys?
To do this without iterating, you're going to need the keys in sorted order. Then you just need to do a binary search for the first one >= lookup_value, instead of checking each one for >= lookup_value.
If you're willing to use a third-party library, there are plenty out there. The first two that spring to mind are bintrees (which uses a red-black tree, like C++, Java, etc.) and blist (which uses a B+Tree). For example, with bintrees, it's as simple as this:
dict_subset = lookup_dict[lookup_value:]
And this will be as efficient as you'd hope—basically, it adds a single O(log N) search on top of whatever the cost of using that subset. (Of course usually what you want to do with that subset is iterate the whole thing, which ends up being O(N) anyway… but maybe you're doing something different, or maybe the subset is only 10 keys out of 1000000.)
Of course there is a tradeoff. Random access to a tree-based mapping is O(log N) instead of "usually O(1)". Also, your keys obviously need to be fully ordered, instead of hashable (and that's a lot harder to detect automatically and raise nice error messages on).
If you want to build this yourself, you can. You don't even necessarily need a tree; just a sorted list of keys alongside a dict. You can maintain the list with the bisect module in the stdlib, as JonClements suggested. You may want to wrap up bisect to make a sorted list object—or, better, get one of the recipes on ActiveState or PyPI to do it for you. You can then wrap the sorted list and the dict together into a single object, so you don't accidentally update one without updating the other. And then you can extend the interface to be as nice as bintrees, if you want.
Using the following code will work out
some_time_to_filter_for = # blah unix time
# Create a new sub-dictionary
sub_dict = {key: val for key, val in lookup_dict.items()
if key >= some_time_to_filter_for}
Basically we just iterate through all the keys in your dictionary and given a time to filter out for we take all the keys that are greater than or equal to that value and place them into our new dictionary
I have a Excel CSV files with employee records in them. Something like this:
mail,first_name,surname,employee_id,manager_id,telephone_number
blah#blah.com,john,smith,503422,503423,+65(2)3423-2433
foo#blah.com,george,brown,503097,503098,+65(2)3423-9782
....
I'm using DictReader to put this into a nested dictionary:
import csv
gd_extract = csv.DictReader(open('filename 20100331 original.csv'), dialect='excel')
employees = dict([(row['employee_id'], row) for row in gp_extract])
Is the above the proper way to do it - it does work, but is it the Right Way? Something more efficient? Also, the funny thing is, in IDLE, if I try to print out "employees" at the shell, it seems to cause IDLE to crash (there's approximately 1051 rows).
2. Remove employee_id from inner dict
The second issue issue, I'm putting it into a dictionary indexed by employee_id, with the value as a nested dictionary of all the values - however, employee_id is also a key:value inside the nested dictionary, which is a bit redundant? Is there any way to exclude it from the inner dictionary?
3. Manipulate data in comprehension
Thirdly, we need do some manipulations to the imported data - for example, all the phone numbers are in the wrong format, so we need to do some regex there. Also, we need to convert manager_id to an actual manager's name, and their email address. Most managers are in the same file, while others are in an external_contractors CSV, which is similar but not quite the same format - I can import that to a separate dict though.
Are these two items things that can be done within the single list comprehension, or should I use a for loop? Or does multiple comprehensions work? (sample code would be really awesome here). Or is there a smarter way in Python do it?
Cheers,
Victor
Your first part has one simple issue (which might not even be an issue). You don't handle key collisions at all (unless you intend to simply overwrite).
>>> dict([('a', 'b'), ('a', 'c')])
{'a': 'c'}
If you're guaranteed that employee_id is unique, there isn't an issue though.
2) Sure you can exclude it, but no real harm done. Actually, especially in python, if employee_id is a string or int (or some other primitive), the inner dict's reference and the key actually reference the same thing. They both point to the same spot in memory. The only duplication is in the reference (which isn't that big). If you're worried about memory consumption, you probably don't have to.
3) Don't try to do too much in one list comprehension. Just use a for loop after the first list comprehension.
To sum it all up, it sounds like you're really worried about the performance of iterating over the loop twice. Don't worry about performance initially. Performance problems come from algorithm problems, not specific language constructs like for loops vs list comprehensions.
If you're familiar with Big O notation, the list comprehension and for loop after (if you decide to do that) both have a Big O of O(n). Add them together and you get O(2n), but as we know from Big O notation, we can simplify that to O(n). I've over simplified a lot here, but the point is, you really don't need to worry.
If there are performance concerns, raise them after you written the code and prove it to yourself with a code profiler.
response to comments
As for your #2 reply, python really doesn't have a lot of mechanisms for making one liners cute and extra snazzy. It's meant to force you into simply writing the code out vs sticking it all in one line. That being said, it's still possible to do quite a bit of work in one line. My suggestion is to not worry about how much code you can stick in one line. Python looks a lot more beautiful (IMO) when its written out, not jammed in one line.
As for your #1 reply, you could try something like this:
employees = {}
for row in gd_extract:
if row['employee_id'] in employees:
... handle duplicates in employees dictionary ...
else:
employees[row['employee_id']] = row
As for your #3 reply, not sure what you're looking for and what about the telephone numbers you'd like to fix, but... this may give you a start:
import re
retelephone = re.compile(r'[-\(\)\s]') # remove dashes, open/close parens, and spaces
for empid, row in employees.iteritems():
retelephone.sub('',row['telephone'])
I'm working through some tutorials on Python and am at a position where I am trying to decide what data type/structure to use in a certain situation.
I'm not clear on the differences between arrays, lists, dictionaries and tuples.
How do you decide which one is appropriate - my current understanding doesn't let me distinguish between them at all - they seem to be the same thing.
What are the benefits/typical use cases for each one?
How do you decide which data type to use? Easy:
You look at which are available and choose the one that does what you want. And if there isn't one, you make one.
In this case a dict is a pretty obvious solution.
Tuples first. These are list-like things that cannot be modified. Because the contents of a tuple cannot change, you can use a tuple as a key in a dictionary. That's the most useful place for them in my opinion. For instance if you have a list like item = ["Ford pickup", 1993, 9995] and you want to make a little in-memory database with the prices you might try something like:
ikey = tuple(item[0], item[1])
idata = item[2]
db[ikey] = idata
Lists, seem to be like arrays or vectors in other programming languages and are usually used for the same types of things in Python. However, they are more flexible in that you can put different types of things into the same list. Generally, they are the most flexible data structure since you can put a whole list into a single list element of another list, but for real data crunching they may not be efficient enough.
a = [1,"fred",7.3]
b = []
b.append(1)
b[0] = "fred"
b.append(a) # now the second element of b is the whole list a
Dictionaries are often used a lot like lists, but now you can use any immutable thing as the index to the dictionary. However, unlike lists, dictionaries don't have a natural order and can't be sorted in place. Of course you can create your own class that incorporates a sorted list and a dictionary in order to make a dict behave like an Ordered Dictionary. There are examples on the Python Cookbook site.
c = {}
d = ("ford pickup",1993)
c[d] = 9995
Arrays are getting closer to the bit level for when you are doing heavy duty data crunching and you don't want the frills of lists or dictionaries. They are not often used outside of scientific applications. Leave these until you know for sure that you need them.
Lists and Dicts are the real workhorses of Python data storage.
Best type for counting elements like this is usually defaultdict
from collections import defaultdict
s = 'asdhbaklfbdkabhvsdybvailybvdaklybdfklabhdvhba'
d = defaultdict(int)
for c in s:
d[c] += 1
print d['a'] # prints 7
Do you really require speed/efficiency? Then go with a pure and simple dict.
Personal:
I mostly work with lists and dictionaries.
It seems that this satisfies most cases.
Sometimes:
Tuples can be helpful--if you want to pair/match elements. Besides that, I don't really use it.
However:
I write high-level scripts that don't need to drill down into the core "efficiency" where every byte and every memory/nanosecond matters. I don't believe most people need to drill this deep.