I'm guessing this will be a really simple problem but I have no solution yet.
I have a long code that does modelling and updates values of variables for optimisation. The code is initially written like this:
init_old(x,y):
return {(k):olddict[k][x][0]*prod[y] for k in realnames}
Q_house=init_old(“Q_house”,"P_house")
Q_car=init_old(“Q_car”,"P_car")
Q_holiday=init_old(“Q_holiday”,"P_holiday")
I already can simplify it a bit with a comprehension:
ListOfExpenses=["house","car","holiday"]
Q_house, Q_car, Q_holiday=[init_old(“Q_”+i,"P0_"+i) for i in ListOfExpenses]
I am trying to find an equivalent but more flexible way of writing that final line, so that I change the list of Expenses and the "Q_..." variables together easily:
ListOfExpenses=["house","car","holiday"]
ListOfCost=["Q_house","Q_car","Q_holiday"]
Elements_Of_ListOfCost=[init_old(“Q_”+i,"P0_"+i) for i in ListOfExpenses]
So when I look for Q_house, Q_car or Q_holiday later, it returns the same Q_house=init_old(“Q_house”,"P_house") calculated in the original code.
I don't want to use dictionaries for now as they would require major change to the rest of the code and calling dictionaries causes problems in some of the other functions. Thanks in advance for the help.
Related
I'm using python with the binance.client wrapper. I'm gathering all of the BTC trade pairs from the exchange and wanting to create a simple dict with tradepair: price.
I have figured a way to do this but it seems clunky to me and takes a minute or so to run. I'm currently a programming student just getting started with python as well as some other languages.
Is there a better way to do it than this?
def BTCPair():
BTCPair = []
BTCPrice = []
BTCPairAndPrice = {}
exchange_info = client.get_exchange_info()
for s in exchange_info['symbols']:
if 'BTC' in (s['symbol'])[-3:]:
BTCPair.append(s['symbol'])
BTCPrice.append(client.get_avg_price(symbol=s['symbol'])['price'])
for i in range(len(BTCPair)):
BTCPairAndPrice[BTCPair[i]] = BTCPrice[i]
return BTCPairAndPrice
I don't understand why you are using two loops; one to put the data into lists, and another to convert those lists into a dictionary - why not just construct the dictionary directly?
You could just construct your dictionary directly using a comprehension:
BTCPairAndPrice = {
s['symbol']: client.get_avg_price(symbol=s['symbol'])['price']
for s in exchange_info['symbols']
if 'BTC' in (s['symbol'])[-3:]
}
The way the dictionary is constructed is unlikely to have a big impact on performance, but not iterating through all the data twice should have an impact if there is a lot of data.
Also consider that contacting a web service is also likely to take some time, so contacting the exchange might be the slowest part.
First of all, for i in range(len(BTCPair)) is an antipattern. You could instead iterate over those zipped together.
But we don't actually need to do that either! Rather than creating two lists and then iterating over them to fill in your dictionary, you could create everything in one go with a dictionary comprehension. Also, a cleaner way to check the end of a string is endswith().
def btc_pair():
symbols = client.get_exchange_info()['symbols']
return {
s['symbol']: client.get_avg_price(symbol=s['symbol'])['price']
for s in symbols
if s['symbol'].endswith('BTC')
}
This probably will run a little bit faster, but I doubt that the dictionary creation itself is the real performance bottleneck in your code.
I am using a package that has operations inside the class (? not sure what either is really), and normally the data is called this way data[package.operation]. Since I have to do multiple operations thought of shortening it and do the following
list =["o1", "o2", "o3", "o4", "o5", "o6"]
for i in list:
print data[package.i]
but since it's considering i as a string it doesnt do the operation, and if I take away the string then it is an undefined variable. Is there a way to go around this? Or will I just have to write it the long way?.
In particular I am using pymatgen, its package Orbital and with the .operation I want to call specific suborbitals. A real example of how it would be used is data[0][Orbital.s], the first [0] denotes the element in question for which to get the orbitals s (that's why I omitted it in the code above).
You can use getattr in order to dynamically select attributes from objects (the Orbital package in your case; for example getattr(Orbital, 's')).
So your loop would be rewritten to:
for op in ['o1', 'o2', 'o3', 'o4', 'o5', 'o6']:
print(data[getattr(package, op)])
So I'm a longtime perl scripter who's been getting used to python since I changed jobs a few months back. Often in perl, if I had a list of values that I needed to check a variable against (simply to see if there is a match in the list), I found it easier to generate hashes to check against, instead of putting the values into an array, like so:
$checklist{'val1'} = undef;
$checklist{'val2'} = undef;
...
if (exists $checklist{$value_to_check}) { ... }
Obviously this wastes some memory because of the need for a useless right-hand value, but IMO is more efficients and easier to code than to loop through an array.
Now in python, the code for this is exactly the same no matter if you're searching an list or a dictionary:
if value_to_check in checklist_which_can_be_list_or_dict:
<code>
So my real question here is: in perl, the hash method was preferred for speed of processing vs. iterating through an array, but is this true in python? Given the code is the same, I'm wondering if python does list iteration better? Should I still use the dictionary method for larger lists?
Dictionaries are hashes. An in test on a list has to walk through every element to check it against, while an in test on a dictionary uses hashing to see if the key exists. Python just doesn't make you explicitly loop through the list.
Python also has a set datatype. It's basically a hash/dictionary without the right-hand values. If what you want is to be able to build up a collection of things, then test whether something is already in that collection, and you don't care about the order of the things or whether a thing is in the collection multiple times, then a set is exactly what you want!
I have a library that does some "translation" and uses the awesome tokenize.generate_tokens() function to do so.
And it is pretty fast and I have things working correctly. But when translating, I've found that the function keeps growing with new tokens that I want to translate and the if and elif conditions start to pop all over. I also keep a few variables outside the generator that keeps track of "last keyword seen" and similar.
A good example of this is the actual Python documentation one seen here (at the bottom): http://docs.python.org/library/tokenize.html#tokenize.untokenize
Every time I add a new thing I need to translate this function grows a couple of conditionals. I don't think that having a function with so many conditionals is the way to or the proper way to pave the ground to grow.
Furthermore, I feel that the tokenizer consumes a lot of irrelevant lines that do not contain any of the keywords I am translating.
So 2 questions:
How can I avoid adding more and more conditional statements that will make this translation function easy/clean to keep growing (without a performance hit)?
How can I make it efficient for all the irrelevant lines I am not interested in?
You could use a dict dispatcher. For example, the code you linked to might look like this:
def process_number(result,tokval):
if '.' in tokval:
result.extend([
(NAME, 'Decimal'),
(OP, '('),
(STRING, repr(tokval)),
(OP, ')')
])
def process_default(result,tokval):
result.append((toknum, tokval))
dispatcher={NUMBER: process_number, }
for toknum, tokval, _, _, _ in g:
dispatcher.get(toknum,process_default)(result,tokval)
Instead of adding more if-blocks, you add key-value pairs to dispatcher.
This may be more efficient than evaluating a long list of if-else conditionals, since dict lookup is O(1), but it does require a function call. You'll have to benchmark to see how this compares to many if-else blocks.
I think its main advantage is that it keeps code organized in small(er), comprehensible units.
I have a Excel CSV files with employee records in them. Something like this:
mail,first_name,surname,employee_id,manager_id,telephone_number
blah#blah.com,john,smith,503422,503423,+65(2)3423-2433
foo#blah.com,george,brown,503097,503098,+65(2)3423-9782
....
I'm using DictReader to put this into a nested dictionary:
import csv
gd_extract = csv.DictReader(open('filename 20100331 original.csv'), dialect='excel')
employees = dict([(row['employee_id'], row) for row in gp_extract])
Is the above the proper way to do it - it does work, but is it the Right Way? Something more efficient? Also, the funny thing is, in IDLE, if I try to print out "employees" at the shell, it seems to cause IDLE to crash (there's approximately 1051 rows).
2. Remove employee_id from inner dict
The second issue issue, I'm putting it into a dictionary indexed by employee_id, with the value as a nested dictionary of all the values - however, employee_id is also a key:value inside the nested dictionary, which is a bit redundant? Is there any way to exclude it from the inner dictionary?
3. Manipulate data in comprehension
Thirdly, we need do some manipulations to the imported data - for example, all the phone numbers are in the wrong format, so we need to do some regex there. Also, we need to convert manager_id to an actual manager's name, and their email address. Most managers are in the same file, while others are in an external_contractors CSV, which is similar but not quite the same format - I can import that to a separate dict though.
Are these two items things that can be done within the single list comprehension, or should I use a for loop? Or does multiple comprehensions work? (sample code would be really awesome here). Or is there a smarter way in Python do it?
Cheers,
Victor
Your first part has one simple issue (which might not even be an issue). You don't handle key collisions at all (unless you intend to simply overwrite).
>>> dict([('a', 'b'), ('a', 'c')])
{'a': 'c'}
If you're guaranteed that employee_id is unique, there isn't an issue though.
2) Sure you can exclude it, but no real harm done. Actually, especially in python, if employee_id is a string or int (or some other primitive), the inner dict's reference and the key actually reference the same thing. They both point to the same spot in memory. The only duplication is in the reference (which isn't that big). If you're worried about memory consumption, you probably don't have to.
3) Don't try to do too much in one list comprehension. Just use a for loop after the first list comprehension.
To sum it all up, it sounds like you're really worried about the performance of iterating over the loop twice. Don't worry about performance initially. Performance problems come from algorithm problems, not specific language constructs like for loops vs list comprehensions.
If you're familiar with Big O notation, the list comprehension and for loop after (if you decide to do that) both have a Big O of O(n). Add them together and you get O(2n), but as we know from Big O notation, we can simplify that to O(n). I've over simplified a lot here, but the point is, you really don't need to worry.
If there are performance concerns, raise them after you written the code and prove it to yourself with a code profiler.
response to comments
As for your #2 reply, python really doesn't have a lot of mechanisms for making one liners cute and extra snazzy. It's meant to force you into simply writing the code out vs sticking it all in one line. That being said, it's still possible to do quite a bit of work in one line. My suggestion is to not worry about how much code you can stick in one line. Python looks a lot more beautiful (IMO) when its written out, not jammed in one line.
As for your #1 reply, you could try something like this:
employees = {}
for row in gd_extract:
if row['employee_id'] in employees:
... handle duplicates in employees dictionary ...
else:
employees[row['employee_id']] = row
As for your #3 reply, not sure what you're looking for and what about the telephone numbers you'd like to fix, but... this may give you a start:
import re
retelephone = re.compile(r'[-\(\)\s]') # remove dashes, open/close parens, and spaces
for empid, row in employees.iteritems():
retelephone.sub('',row['telephone'])