I am trying to write a python function that takes two lists as input: one that contains some molecules SMILES codes and another one that contains the molecule names.
Then it calculates the TANIMOTO coefficient between all pairs of molecules (I already have a function for this) and returns two new lists with the SMILES and names, respectively, of all molecules whose Tanimoto with any other is not higher than a certain threshold.
This is what I have done so far, but it gives wrong results (most of the molecules I get are almost the same...):
def TanimotoFilter(molist,namelist,threshold):
# molist is the smiles list
# namelist is the name list (SURPRISE!) is this some global variable name?
# threshold is the tanimoto threshold (SURPRISE AGAIN!)
smilesout=[]
names=[]
tans=[]
exclude=[]
for i in range(1,len(molist)):
if i not in exclude:
smilesout.append(molist[i])
names.append(namelist[i])
for j in range(i,len(molist)):
if i==j:
tans.append('SAME')
else:
tanimoto=tanimoto_calc(molist[i],molist[j])
if tanimoto>threshold:
exclude.append(j)
#print 'breaking for '+str(i)+' '+str(j)
break
else:
tans.append(tanimoto)
return smilesout, names, tans
I'd be very thankful if the modifications you propose are as basic as possible, as this code is for people who have scarcely seen a terminal in their lives... It doesn't matter if it is full of loops that make it slow.
Thank you all!
I have made some changes to the logic of the function. As mentioned in the question, it returns two lists with the SMILES and names. I am not sure about the purpose of tans since the tanimoto value is for a tuple and not for single molecule. Could not test the code without data, let me know if this works.
def TanimotoFilter(molist, namelist, threshold):
# molist is the smiles list
# namelist is the name list (SURPRISE!) is this some global variable name?
# threshold is the tanimoto threshold (SURPRISE AGAIN!)
smilesout=[]
names=[]
tans=[]
exclude=[]
for i in range(0, len(molist)):
if i not in exclude:
temp_exclude = []
for j in range(i + 1, len(molist)):
tanimoto = tanimoto_calc(molist[i], molist[j])
if tanimoto > threshold:
temp_exclude.append(j)
if temp_exclude:
temp_exclude.append(i)
exclude.extend(temp_exclude)
else:
smilesout.append(molist[i])
names.append(namelist[i])
return smilesout, names
Update: Apparently, I noticed that in my main code, when I extract the values from the list of dictionaries that I get from readExpenses.py, I store it as a set, not as a list of dictionaries.
Now, I know that I store each dictionary in the 'exp' list with these lines of code:
for e in expenses:
exp.append(e)
However, I only want the Keys Amount, and Type from those dictionaries, and not the other entries.
For reference, here is the list of keys in an expense dictionary:
"Date","Description","Type","Check Number","Amount","Balance"
As mentioned before, I only need Type and Amount.
I am trying to make a budget program, So I have this list of dictionaries:
[{'Bills': 30.0}, {'Bills': 101.53}, {'Bills': 60.0}, {'Bills': 52.45}, {'Gas': 51.17}, {500.0: 'Mortgage'}, {'Food': 5.1}]
And I'm trying to compare it to this list of dictionaries:
[{400.0: 'Bills'}, {'Gas': 100.0}, {500.0: 'Mortgage'}, {'Food': 45.0}]
The first list is how much money I spent on different services in a given month, and what category it was in, and the second dictionary is the max amount that the budget allows me to spend on said category.
The goal is, in the first dictionary, to combine all the values of the same key into one key:value pair, then compare it to the second dictionary.
So I should get this list of dictionaries out of the first one:
[{'Bills': 295.15), {'Gas': 51.17}, {500.0: 'Mortgage'}, {'Food': 5.1}]
I tried looking at this example and this one, but they are just about merging the dictionaries lists together, and not summing the values of the same key. I did try the code in the latter, but it only joined the dictionaries together. I noticed that sum only seems to work with "raw" dictionaries, and not with lists of dictionaries.
I did try this as a thought experiment:
print(sum(item['amount'] for item in exp))
I know that would sum up all the numbers under amount, rather than return a number for each category, but I wanted to try out it for the heck of it, to see if it would lead to a solution, but I got this error in return:
TypeError: 'set' object is not subscriptable
The Counter function seemed to show promise as a solution as well when I was messing around, however, it seems to only work with dictionaries that are on their own, and not with list of dictionaries.
#where exp is the first dictionary that I mentioned
a = Counter(exp)
b = Counter(exp)
c = a + b #I'm aware the math would have be faulty on this, but this was a test run
print (c)
This attempt returned this error:
TypeError: unhashable type: 'set'
Also, is there a way to do it without importing the collections module and using what comes with python as well?
My code:
from readExpense import *
from budget import *
from collections import *
#Returns the expenses by expenses type
def expensesByType(expenses, budget):
exp = []
expByType = []
bud = []
for e in expenses:
entry = {e['exptype'], e['amount']}
exp.append(entry)
for b in budget:
entry = {b['exptype'], b['maxamnt']}
bud.append(entry)
return expByType;
def Main():
budget = readBudget("budget.txt")
#printBudget(budget)
expenses = readExpenses("expenses.txt")
#printExpenses(expenses)
expByType = expensesByType(expenses, budget)
if __name__ == '__main__':
Main()
And for reference, the code from budget and readexpense respectively.
budget.py
def readBudget(budgetFile):
# Read the file into list lines
f = open(budgetFile)
lines = f.readlines()
f.close()
budget = []
# Parse the lines
for i in range(len(lines)):
list = lines[i].split(",")
exptype = list[0].strip('" \n')
if exptype == "Type":
continue
maxamount = list[1].strip('$" \n\r')
entry = {'exptype':exptype, 'maxamnt':float(maxamount)}
budget.append(entry)
return budget
def printBudget(budget):
print()
print("================= BUDGET ==================")
print("Type".ljust(12), "Max Amount".ljust(12))
total = 0
for b in budget:
print(b['exptype'].ljust(12), str("$%0.2f" %b['maxamnt']).ljust(50))
total = total + b['maxamnt']
print("Total: ", "$%0.2f" % total)
def Main():
budget = readBudget("budget.txt")
printBudget(budget)
if __name__ == '__main__':
Main()
readExpense.py
def readExpenses(file):
#read file into list of lines
#split lines into fields
# for each list create a dictionary
# add dictionary to expense list
#return expenses in a list of dictionary with fields
# date desc, exptype checknm, amnt
f = open(file)
lines=f.readlines()
f.close()
expenses = []
for i in range(len(lines)):
list = lines[i].split(",")
date = list[0].strip('" \n')
if date == "Date":
continue
description = list[1].strip('" \n\r')
exptype= list[2].strip('" \n\r')
checkNum = list[3].strip('" \n\r')
amount = list[4].strip('($)" \n\r')
balance = list[5].strip('" \n\r')
entry ={'date':date, 'description': description, 'exptype':exptype, 'checkNum':checkNum, 'amount':float(amount), 'balance': balance}
expenses.append(entry)
return expenses
def printExpenses(expenses):
#print expenses
print()
print("================= Expenses ==================")
print("Date".ljust(12), "Description".ljust(12), "Type".ljust(12),"Check Number".ljust(12), "Amount".ljust(12), "Balance".ljust(12))
total = 0
for e in expenses:
print(str(e['date']).ljust(12), str(e['description']).ljust(12), str(e['exptype']).ljust(12), str(e['checkNum']).ljust(12), str(e['amount']).ljust(12))
total = total + e['amount']
print()
print("Total: ", "$%0.2f" % total)
def Main():
expenses = readExpenses("expenses.txt")
printExpenses(expenses)
if __name__ == '__main__':
Main()
Is there a reason you're avoiding creating some objects to manage this? If it were me, I'd go objects and do something like the following (this is completely untested, there may be typos):
#!/usr/bin/env python3
from datetime import datetime # why python guys, do you make me write code like this??
from operator import itemgetter
class BudgetCategory(object):
def __init__(self, name, allowance):
super().__init__()
self.name = name # string naming this category, e.g. 'Food'
self.allowance = allowance # e.g. 400.00 this month for Food
self.expenditures = [] # initially empty list of expenditures you've made
def spend(self, amount, when=None, description=None):
''' Use this to add expenditures to your budget category'''
timeOfExpenditure = datetime.utcnow() if when is None else when #optional argument for time of expenditure
record = (amount, timeOfExpenditure, '' if description is None else description) # a named tuple would be better here...
self.expenditures.append(record) # add to list of expenditures
self.expenditures.sort(key=itemgetter(1)) # keep them sorted by date for the fun of it
# Very tempting to the turn both of the following into #property decorated functions, but let's swallow only so much today, huh?
def totalSpent(self):
return sum(t[0] for t in self.expenditures)
def balance(self):
return self.allowance - self.totalSpent()
Now I can right code that looks like:
budget = BudgetCategory(name='Food', allowance=200)
budget.spend(5)
budget.spend(8)
print('total spent:', budget.totalSpent())
print('left to go:', budget.balance())
This is just a starting point. Now you can you add methods that group (and sum) the expenditures list by decoration (e.g. "I spent HOW MUCH on Twinkies last month???"). You can add a method that parses entries from a file, or emits them to a csv list. You can do some charting based on time.
First: you don't have to code this for me, unless you're a super awesome nice guy. But since you're all great at programming and understand it so much better than me and all, it might just be easier (since it's probably not too many lines of code) than writing paragraph after paragraph trying to make me understand it.
So - I need to make a list of high scores that updates itself upon new entries. So here it goes:
First step - done
I have player-entered input, which has been taken as a data for a few calculations:
import time
import datetime
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
time1 = datetime.datetime.now()
a = raw_input("Enter weight: ")
b = raw_input("Enter height: ")
c = a/b
Second step - making high score list
Here, I would need some sort of a dictionary or a thing that would read the previous entries and check if the score (c) is (at least) better than the score of the last one in "high scores", and if it is, it would prompt you to enter your name.
After you entered your name, it would post your name, your a, b, c, and time in a high score list.
This is what I came up with, and it definitely doesn't work:
list = [("CPU", 200, 100, 2, time1)]
player = "CPU"
a = 200
b = 100
c = 2
time1 = "20.12.2012, 21:38"
list.append((player, a, b, c, time1))
list.sort()
import pickle
scores = open("scores", "w")
pickle.dump(list[-5:], scores)
scores.close()
scores = open("scores", "r")
oldscores = pickle.load(scores)
scores.close()
print oldscores()
I know I did something terribly stupid, but anyways, thanks for reading this and I hope you can help me out with this one. :-)
First, don't use list as a variable name. It shadows the built-in list object. Second, avoid using just plain date strings, since it is much easier to work with datetime objects, which support proper comparisons and easy conversions.
Here is a full example of your code, with individual functions to help divide up the steps. I am trying not to use any more advanced modules or functionality, since you are obviously just learning:
import os
import datetime
import cPickle
# just a constants we can use to define our score file location
SCORES_FILE = "scores.pickle"
def get_user_data():
time1 = datetime.datetime.now()
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
a = None
while True:
a = raw_input("Enter weight: ")
try:
a = float(a)
except:
continue
else:
break
b = None
while True:
b = raw_input("Enter height: ")
try:
b = float(b)
except:
continue
else:
break
c = a/b
return ['', a, b, c, time1]
def read_high_scores():
# initialize an empty score file if it does
# not exist already, and return an empty list
if not os.path.isfile(SCORES_FILE):
write_high_scores([])
return []
with open(SCORES_FILE, 'r') as f:
scores = cPickle.load(f)
return scores
def write_high_scores(scores):
with open(SCORES_FILE, 'w') as f:
cPickle.dump(scores, f)
def update_scores(newScore, highScores):
# reuse an anonymous function for looking
# up the `c` (4th item) score from the object
key = lambda item: item[3]
# make a local copy of the scores
highScores = highScores[:]
lowest = None
if highScores:
lowest = min(highScores, key=key)
# only add the new score if the high scores
# are empty, or it beats the lowest one
if lowest is None or (newScore[3] > lowest[3]):
newScore[0] = raw_input("Enter name: ")
highScores.append(newScore)
# take only the highest 5 scores and return them
highScores.sort(key=key, reverse=True)
return highScores[:5]
def print_high_scores(scores):
# loop over scores using enumerate to also
# get an int counter for printing
for i, score in enumerate(scores):
name, a, b, c, time1 = score
# #1 50.0 jdi (20.12.2012, 15:02)
print "#%d\t%s\t%s\t(%s)" % \
(i+1, c, name, time1.strftime("%d.%m.%Y, %H:%M"))
def main():
score = get_user_data()
highScores = read_high_scores()
highScores = update_scores(score, highScores)
write_high_scores(highScores)
print_high_scores(highScores)
if __name__ == "__main__":
main()
What it does now is only add new scores if there were no high scores or it beats the lowest. You could modify it to always add a new score if there are less than 5 previous scores, instead of requiring it to beat the lowest one. And then just perform the lowest check after the size of highscores >= 5
The first thing I noticed is that you did not tell list.sort() that the sorting should be based on the last element of each entry. By default, list.sort() will use Python's default sorting order, which will sort entries based on the first element of each entry (i.e. the name), then mode on to the second element, the third element and so on. So, you have to tell list.sort() which item to use for sorting:
from operator import itemgetter
[...]
list.sort(key=itemgetter(3))
This will sort entries based on the item with index 3 in each tuple, i.e. the fourth item.
Also, print oldscores() will definitely not work since oldscores is not a function, hence you cannot call it with the () operator. print oldscores is probably better.
Here are the things I notice.
These lines seem to be in the wrong order:
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
time1 = datetime.datetime.now()
When the user enters the height and weight, they are going to be read in as strings, not integers, so you will get a TypeError on this line:
c = a/b
You could solve this by casting a and b to float like so:
a = float(raw_input("Enter weight: "))
But you'll probably need to wrap this in a try/catch block, in case the user puts in garbage, basically anything that can't be cast to a float. Put the whole thing in a while block until they get it right.
So, something like this:
b = None
while b == None:
try:
b = float(raw_input("Enter height: "))
except:
print "Weight should be entered using only digits, like '187'"
So, on to the second part, you shouldn't use list as a variable name, since it's a builtin, I'll use high_scores.
# Add one default entry to the list
high_scores = [("CPU", 200, 100, 2, "20.12.2012, 4:20")]
You say you want to check the player score against the high score, to see if it's best, but if that's the case, why a list? Why not just a single entry? Anyhow, that's confusing me, not sure if you really want a high score list, or just one high score.
So, let's just add the score, no matter what:
Assume you've gotten their name into the name variable.
high_score.append((name, a, b, c, time1))
Then apply the other answer from #Tamás
You definitely don't want a dictionary here. The whole point of a dictionary is to be able to map keys to values, without any sorting. What you want is a sorted list. And you've already got that.
Well, as Tamás points out, you've actually got a list sorted by the player name, not the score. On top of that, you want to sort in downward order, not upward. You could use the decorate-sort-undecorate pattern, or a key function, or whatever, but you need to do something. Also, you've put it in a variable named list, which is a very bad idea, because that's already the name of the list type.
Anyway, you can find out whether to add something into a sorted list, and where to insert it if so, using the bisect module in the standard library. But it's probably simpler to just use something like SortedCollection or blist.
Here's an example:
highscores = SortedCollection(scores, key=lambda x: -x[3])
Now, when you finish the game:
highscores.insert_right((player, a, b, newscore, time1))
del highscores[-1]
That's it. If you were actually not in the top 10, you'll be added at #11, then removed. If you were in the top 10, you'll be added, and the old #10 will now be #11 and be removed.
If you don't want to prepopulate the list with 10 fake scores the way old arcade games used to, just change it to this:
highscores.insert_right((player, a, b, newscore, time1))
del highscores[10:]
Now, if there were already 10 scores, when you get added, #11 will get deleted, but if there were only 3, nothing gets deleted, and now there are 4.
Meanwhile, I'm not sure why you're writing the new scores out to a pickle file, and then reading the same thing back in. You probably want to do the reading before adding the highscore to the list, and then do the writing after adding it.
You also asked how to "beautify the list". Well, there are three sides to that.
First of all, in the code, (player, a, b, c, time1) isn't very meaningful. Giving the variables better names would help, of course, but ultimately you still come down to the fact that when accessing list, you have to do entry[3] to get the score or entry[4] to get the time.
There are at least three ways to solve this:
Store a list (or SortedCollection) of dicts instead of tuples. The code gets a bit more verbose, but a lot more readable. You write {'player': player, 'height': a, 'weight': b, 'score': c, 'time': time1}, and then when accessing the list, you do entry['score'] instead of entry[3].
Use a collection of namedtuples. Now you can actually just insert ScoreEntry(player, a, b, c, time1), or you can insert ScoreEntry(player=player, height=a, weight=b, score=c, time=time1), whichever is more readable in a given case, and they both work the same way. And you can access entry.score or as entry[3], again using whichever is more readable.
Write an explicit class for score entries. This is pretty similar to the previous one, but there's more code to write, and you can't do indexed access anymore, but on the plus side you don't have to understand namedtuple.
Second, if you just print the entries, they look like a mess. The way to deal with that is string formatting. Instead of print scores, you do something like this:
print '\n'.join("{}: height {}, weight {}, score {} at {}".format(entry)
for entry in highscores)
If you're using a class or namedtuple instead of just a tuple, you can even format by name instead of by position, making the code much more readable.
Finally, the highscore file itself is an unreadable mess, because pickle is not meant for human consumption. If you want it to be human-readable, you have to pick a format, and write the code to serialize that format. Fortunately, the CSV format is pretty human-readable, and most of the code is already written for you in the csv module. (You may want to look at the DictReader and DictWriter classes, especially if you want to write a header line. Again, there's the tradeoff of a bit more code for a lot more readability.)
I'm trying to complete a Project that will show total annual sales from an specific list contained in a .txt file.
The list is formatted this way:
-lastname, firstname (string)
-45.7 (float)
-456.4 (float)
-345.5 (float)
-lastname2, firstname2 (string)
-3354.7 (float)
-54.6 (float)
-56.2 (float)
-lastname3, firstname3 (string)
-76.6 (float)
-34.2 (float)
-48.2 (float)
And so on.... Actually, 7 different "employees" followed by 12 set of "numbers" (months of the year)....but that example should suffice to give an idea of what I'm trying to do.
I need to output this specific information of every "employee"
-Name of employee
-Total Sum (sum of the 12 numbers in the list)
So my logic is taking me to this conclusion, but I don't know where to start:
Create 7 different arrays to store each "employee" data.
With this logic, I need to split the main list into independent arrays so I can work with them.
How can this be achieved? And also, if I don't have a predefined number of employees (but a defined format :: "Name" followed by 12 months of numbers)...how can I achieve this?
I'm sure I can figure once I get an idea how to "split" a list in different sections -- Every 13 lines?
Yes, at every thirteenth line you'd have the information of an employee.
However, instead of using twelve different lists, you can use a dictionary of lists, so that you wouldn't have to worry about the number of employees.
And you can either use a parameter on the number of lines directed to each employee.
You could do the following:
infile = open("file.txt", "rt")
employee = dict()
name = infile.readline().strip()
while name:
employee[name] = list()
for i in xrange(1, 12):
val = float(infile.readline().strip())
employee[name].append(val)
name = infile.readline().strip()
Some ways to access dictionary entries:
for name, months in employee.items():
print name
print months
for name in employee.keys():
print name
print employee[name]
for months in employee.values():
print months
for name, months in (employee.keys(), employee.values()):
print name
print months
The entire process goes as follows:
infile = open("file.txt", "rt")
employee = dict()
name = infile.readline().strip()
while name:
val = 0.0
for i in xrange(1, 12):
val += float(infile.readline().strip())
employee[name] = val
print ">>> Employee:", name, " -- salary:", str(employee[name])
name = infile.readline().strip()
Sorry for being round the bush, somehow (:
Here is option.
Not good, but still brute option.
summed = 0
with open("file.txt", "rt") as f:
print f.readline() # We print first line (first man)
for line in f:
# then we suppose every line is float.
try:
# convert to float
value = float(line.strip())
# add to sum
summed += value
# If it does not convert, then it is next person
except ValueError:
# print sum for previous person
print summed
# print new name
print line
# reset sum
summed = 0
# on end of file there is no errors, so we print lst result
print summed
since you need more flexibility, there is another option:
data = {} # dict: list of all values for person by person name
with open("file.txt", "rt") as f:
data_key = f.readline() # We remember first line (first man)
data[data_key] = [] # empty list of values
for line in f:
# then we suppose every line is float.
try:
# convert to float
value = float(line.strip())
# add to data
data[data_key].append(value)
# If it does not convert, then it is next person
except ValueError:
# next person's name
data_key = line
# new list
data[data_key] = []
Q: let's say that I want to print a '2% bonus' to employees that made more than 7000 in total sales (12 months)
for employee, stats in data.iteritems():
if sum(stats) > 7000:
print employee + " done 7000 in total sales! need 2% bonus"
I would not create 7 different arrays. I would create some sort of data structure to hold all the relevant information for one employee in one data type (this is python, but surely you can create data structures in python as well).
Then, as you process the data for each employee, all you have to do is iterate over one array of employee data elements. That way, it's much easier to keep track of the indices of the data (or maybe even eliminates the need to!).
This is especially helpful if you want to sort the data somehow. That way, you'd only have to sort one array instead of 7.