Finding a max value in python - python

I am doing a problem where I need to use data from a csv file to find which film has the high gross total for each year.
I already have the dict 'year' with each film and the year it came out and the same for 'gross'.
Despite this, my code is still returning 0 as the max gross. What am I missing here?
def MaxGrossFinder(c):
for film in year:
MaxGross = 0
f = int(gross[film])
if year[film] == c:
if f > MaxGross:
MaxGross = f
return MaxGross

Use the max() function. It does this operation correctly.
max(int(gross[film]) for film in year if year[film] == c)
Your problem is that you set MaxGross to zero in each iteration. So all values but the last are ignored.
Please also look into creating a Film class and using one dict of film objects, rather than having multiple parallel dicts.

Take MaxGross out of the for-loop:
def MaxGrossFinder(c):
MaxGross = 0
for film in year:
f = int(gross[film])
if year[film] == c:
if f > MaxGross:
MaxGross = f
return MaxGross
With MaxGross = 0 inside the for-loop, all the prior iterations mean nothing. Only the last would affect MaxGross. That's probably not the intention.
Another problem might occur if c and year[film] are floats. Don't compare floats for equality (unless you know what you are doing) since floats can have inexact representations. Instead
define some concept of nearness:
def near(a,b,rtol=1e-5,atol=1e-8):
return abs(a-b)<(atol+rtol*abs(b))
and compare if near(year[film],c).

You can use max() directly,
for film in year:
if year[film]==c:
print(max(int(gross[film]))

Related

Find the largest number in a pool of integers

I have been working on this code for quite a while now and frankly, I have no more ideas on how to solve this. I have sought different threads on how to do this, unfortunately, still have no answers.
To start off, I have this pool of data that is a string but needs to be considered as a list. For example:
# empDataLT
200401003,Luisa,Jurney,Accounting,800,21,4/8/2002,;
200208006,Clorinda,Heimann,Accounting,1050,15,5/21/1994,;
200307014,Dick,Wenzinger,Admin,565,15,10/13/1973,;
200901005,Ahmed,Angalich,Purchasing,750,20,2/10/1973,;
200704013,Iluminada,Ohms,Marketing,750,16,7/13/1972,;
201701018,Joanna,Leinenbach,Finance,1050,15,11/6/1980,;
201003007,Caprice,Suell,Admin,750,18,6/28/1992
a = empRecords.strip().split(";")
This pool is in the format: Employee Number, First name, Last Name, Department, Rate per day, No. of Days Worked, Birthdate
What I have been trying to do is to compute the employees' rate per day multiplied to the number of days worked, then find which of them is the highest earning employee. I have the following code which works decent, but of course it lacks the latter result needed (aka, the highest earning).
import empDataLT as x
def earn():
empEarn = list() # convert module to a list
for er in x.a:
empErn = er.strip().split(",")
empEarn.append(empErn)
b = sorted(empEarn, key=lambda x: x[4])
for e in b:
ern = (int(e[4]) * int(e[5]))
print(ern)
This will result to something like this:
20800
14400
21600
24000
12800
24000
Which is great because I have the result (yay). However, I am unable to look for the highest earning as I usually get error when I try max() since it's an integer. I tried converting it to a str then use max() and it will just give me the highest number per integer.
I'm not really sure what to do anymore.
Try this :
empRecords = '''200401003,Luisa,Jurney,Accounting,800,21,4/8/2002,;
200208006,Clorinda,Heimann,Accounting,1050,15,5/21/1994,;
200307014,Dick,Wenzinger,Admin,565,15,10/13/1973,;
200901005,Ahmed,Angalich,Purchasing,750,20,2/10/1973,;
200704013,Iluminada,Ohms,Marketing,750,16,7/13/1972,;
201701018,Joanna,Leinenbach,Finance,1050,15,11/6/1980,;
201003007,Caprice,Suell,Admin,750,18,6/28/1992'''
a = empRecords.strip().split(";")
earn = []
for i in a:
t = i.split(',')
cur = int(t[4])*int(t[5])
earn.append(cur)
print(cur)
print("Maximum Earning :",max(earn))
You can use max, you just need to keep the previous largest-found:
highestEarning = 0
for e in b:
highestEarning = max((int(e[4]) * int(e[5])),highestEarning)
Once the for loop is done, highestEarning will be the highest earning in the list.
max() takes an iterable, for example a list.
You probably tried to run max() on a single int, which doesn't work, because how should it? Finding the maximum value of one value is not really a thing.
You can create a list of ernand then use max() on this list.
You might use yield keyword for your task following way:
import empDataLT as x
def earn():
empEarn = list() # convert module to a list
for er in x.a:
empErn = er.strip().split(",")
empEarn.append(empErn)
b = sorted(empEarn, key=lambda x: x[4])
for e in b:
ern = (int(e[4]) * int(e[5]))
print(ern)
yield ern
highest = max(earn())
For discussion of yield and how to use I suggest realpython tutorial

Python: replace for loop with function

Can anyone help me to understand how I would create a function with def whatever() instead of using a for loop. I'm trying to do thing more Pythonically but don't really understand how to apply a function well instead of a loop. For instance, I have a loop below that works well and gives the output I would like, is there a way to do this with a function?
seasons = leaguesFinal['season'].unique()
teams = teamsDF['team_long_name'].unique()
df = []
for i in seasons:
season = leaguesFinal['season'] == i
season = leaguesFinal[season]
for j in teams:
team_season_wins = season['win'] == j
team_season_win_record = team_season_wins[team_season_wins].count()
team_season_loss = season['loss'] == j
team_season_loss_record = team_season_loss[team_season_loss].count()
df.append((j, i, team_season_win_record, team_season_loss_record))
df = pd.DataFrame(df, columns=('Team', 'Seasons', 'Wins', 'Losses'))
The output looks as follows:
Team Seasons Wins Losses
0 KRC Genk 2008/2009 15 14
1 Beerschot AC 2008/2009 11 14
2 SV Zulte-Waregem 2008/2009 16 11
3 Sporting Lokeren 2008/2009 13 9
4 KSV Cercle Brugge 2008/2009 14 15
Solution
def some_loop(something, something_else):
for i in something:
season = leaguesFinal['sesaon'] == i
season = leaguesFinal[season]
for j in something_else:
team_season_wins = season['win'] == j
team_season_win_record = team_season_wins[team_season_wins].count()
team_season_loss = season['loss'] == j
team_season_loss_record = team_season_loss[team_season_loss].count()
df.append((j, i, team_season_win_record, team_season_loss_record))
some_loop(seasons, teams)
Comments
This is what you are mentioning, creating a function out of the for loop although you still have a for loop its in a function that you can use in different areas of your code without re-using the entire code for the loop.
All there is to to is define a function that accepts two variables for this particular loop that would be def some_loop(something, something_else), I used basic naming so you could see clearer whats taking place.
Then you would replace all the instanes of seasons and teams with those variables.
Now you call your function will replace all occurences of something and something_else with whatever inputs you send to it.
Also I am not completely sure of the statements that involve x = y = i and what this accomplishes or if its even a valid statement?
actually youre mixing stuff up - functions just aggregate lines of code and thus make them reproducable without writing everything again, whereas for-loops are for iteration purposes.
In your above mentioned example, a function would just contain the for-loop and return the resulting dataframe, which you could use then. but it will not change anything or make your code smarter.

Need help iterating over python dictionary values

I'm working on a program to search through a dictionaries value set and perform a method on values that match a user input. I have to compare and sort these values.
This is the code I'm working with right now
Code for value search and compare (very rough)
import nation
import pickle
KK = 1000000
pickle_in = open("nationsDict.dat","rb")
d = pickle.load(pickle_in)
k = raw_input("Enter a continent: ")
for value in d.values():
if k in d.values()[0]:
print d.values()[0]
Code for Nation class
class Nations:
KK = 1000000
def __init__(self, ctry, cont, pop, area):
self.country = ctry
self.continent = cont
self.population = float(pop)
self.area = float(area)
def popDensity(self):
popDensity = (self.population*self.KK) / self.area
popDensity = str(round(popDensity, 2))
return popDensity
Code for creating pickle dictionary
import nation
import pickle
i=0
dictUN = {}
with open('UN.txt') as f:
for line in f:
"""Data get from file"""
elements = line.strip().split(",")
n = nation.Nations(elements[0],elements[1],elements[2],elements[3])
"""Density"""
n.popDensity()
print "The density of", n.country, "is",n.popDensity(),"people per square mile."
"""Dictionary creation"""
dictVal = (n.continent, n.population, n.area)
dictUN.update({n.country: dictVal})
pickle_out = open("nationsDict.dat", "wb")
pickle.dump(dictUN, pickle_out)
pickle_out.close()
Here's a snippet from UN.txt
Mauritania,Africa,3.5,397954
Mauritius,Africa,1.3,787
Mexico,North America,120.3,761606
Micronesia,Australia/Oceania,.11,271
Monaco,Europe,.031,0.76
Mongolia,Asia,3.0,603909
Montenegro,Europe,.65,5019
Morocco,Africa,33.0,172414
My problems at this point are pretty contained to the value search and compare. Specifically, my program has to
Allow the user to search a continent (first element in value list)
Perform the method, Nations.popDensity (contained in nation class) on all matching countries
Compare the nations and return the top 5 density values per continent
I would say my one big question is how to handle the search of a dictionary by an element in a value. I've also considered about making a temp dictionary with the continent element as the key, but I'm not sure that would make my life any easier as I have to perform the popDensity method on it.
Any help is appreciated, Thanks!
Initialize pandas series object
Iterate through the dictionary.
If the continent matches:
a. calculate the population density.
b. if the value is larger than the smallest value in the pandas series:
i. remove the last entry
ii.append the value into the pandas series values and the country to the index
iii. sort the pandas series object ascending = False
If you're going to do this repeatedly, then creating a continent->country dictionary definitely will save time.
Glad it was helpful. I'll add it as an answer, so you can accept it, if you like.
Just as there is list comprehension, there is dictionary comprehension... It's pretty cool stuff! d2 = {k:d[k] for k in d.keys() if <some_elem> in d[k]} would give you a dict with a subset of the original dict that satisfies your requirements. You would have to fill in the <some_elem> in d[k] portion, because I haven't gone through all your code. You said that this is the main Q you have. Hopefully this gives you enough to solve it.

Python: creating a dictionary that writes high scores to a file

First: you don't have to code this for me, unless you're a super awesome nice guy. But since you're all great at programming and understand it so much better than me and all, it might just be easier (since it's probably not too many lines of code) than writing paragraph after paragraph trying to make me understand it.
So - I need to make a list of high scores that updates itself upon new entries. So here it goes:
First step - done
I have player-entered input, which has been taken as a data for a few calculations:
import time
import datetime
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
time1 = datetime.datetime.now()
a = raw_input("Enter weight: ")
b = raw_input("Enter height: ")
c = a/b
Second step - making high score list
Here, I would need some sort of a dictionary or a thing that would read the previous entries and check if the score (c) is (at least) better than the score of the last one in "high scores", and if it is, it would prompt you to enter your name.
After you entered your name, it would post your name, your a, b, c, and time in a high score list.
This is what I came up with, and it definitely doesn't work:
list = [("CPU", 200, 100, 2, time1)]
player = "CPU"
a = 200
b = 100
c = 2
time1 = "20.12.2012, 21:38"
list.append((player, a, b, c, time1))
list.sort()
import pickle
scores = open("scores", "w")
pickle.dump(list[-5:], scores)
scores.close()
scores = open("scores", "r")
oldscores = pickle.load(scores)
scores.close()
print oldscores()
I know I did something terribly stupid, but anyways, thanks for reading this and I hope you can help me out with this one. :-)
First, don't use list as a variable name. It shadows the built-in list object. Second, avoid using just plain date strings, since it is much easier to work with datetime objects, which support proper comparisons and easy conversions.
Here is a full example of your code, with individual functions to help divide up the steps. I am trying not to use any more advanced modules or functionality, since you are obviously just learning:
import os
import datetime
import cPickle
# just a constants we can use to define our score file location
SCORES_FILE = "scores.pickle"
def get_user_data():
time1 = datetime.datetime.now()
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
a = None
while True:
a = raw_input("Enter weight: ")
try:
a = float(a)
except:
continue
else:
break
b = None
while True:
b = raw_input("Enter height: ")
try:
b = float(b)
except:
continue
else:
break
c = a/b
return ['', a, b, c, time1]
def read_high_scores():
# initialize an empty score file if it does
# not exist already, and return an empty list
if not os.path.isfile(SCORES_FILE):
write_high_scores([])
return []
with open(SCORES_FILE, 'r') as f:
scores = cPickle.load(f)
return scores
def write_high_scores(scores):
with open(SCORES_FILE, 'w') as f:
cPickle.dump(scores, f)
def update_scores(newScore, highScores):
# reuse an anonymous function for looking
# up the `c` (4th item) score from the object
key = lambda item: item[3]
# make a local copy of the scores
highScores = highScores[:]
lowest = None
if highScores:
lowest = min(highScores, key=key)
# only add the new score if the high scores
# are empty, or it beats the lowest one
if lowest is None or (newScore[3] > lowest[3]):
newScore[0] = raw_input("Enter name: ")
highScores.append(newScore)
# take only the highest 5 scores and return them
highScores.sort(key=key, reverse=True)
return highScores[:5]
def print_high_scores(scores):
# loop over scores using enumerate to also
# get an int counter for printing
for i, score in enumerate(scores):
name, a, b, c, time1 = score
# #1 50.0 jdi (20.12.2012, 15:02)
print "#%d\t%s\t%s\t(%s)" % \
(i+1, c, name, time1.strftime("%d.%m.%Y, %H:%M"))
def main():
score = get_user_data()
highScores = read_high_scores()
highScores = update_scores(score, highScores)
write_high_scores(highScores)
print_high_scores(highScores)
if __name__ == "__main__":
main()
What it does now is only add new scores if there were no high scores or it beats the lowest. You could modify it to always add a new score if there are less than 5 previous scores, instead of requiring it to beat the lowest one. And then just perform the lowest check after the size of highscores >= 5
The first thing I noticed is that you did not tell list.sort() that the sorting should be based on the last element of each entry. By default, list.sort() will use Python's default sorting order, which will sort entries based on the first element of each entry (i.e. the name), then mode on to the second element, the third element and so on. So, you have to tell list.sort() which item to use for sorting:
from operator import itemgetter
[...]
list.sort(key=itemgetter(3))
This will sort entries based on the item with index 3 in each tuple, i.e. the fourth item.
Also, print oldscores() will definitely not work since oldscores is not a function, hence you cannot call it with the () operator. print oldscores is probably better.
Here are the things I notice.
These lines seem to be in the wrong order:
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
time1 = datetime.datetime.now()
When the user enters the height and weight, they are going to be read in as strings, not integers, so you will get a TypeError on this line:
c = a/b
You could solve this by casting a and b to float like so:
a = float(raw_input("Enter weight: "))
But you'll probably need to wrap this in a try/catch block, in case the user puts in garbage, basically anything that can't be cast to a float. Put the whole thing in a while block until they get it right.
So, something like this:
b = None
while b == None:
try:
b = float(raw_input("Enter height: "))
except:
print "Weight should be entered using only digits, like '187'"
So, on to the second part, you shouldn't use list as a variable name, since it's a builtin, I'll use high_scores.
# Add one default entry to the list
high_scores = [("CPU", 200, 100, 2, "20.12.2012, 4:20")]
You say you want to check the player score against the high score, to see if it's best, but if that's the case, why a list? Why not just a single entry? Anyhow, that's confusing me, not sure if you really want a high score list, or just one high score.
So, let's just add the score, no matter what:
Assume you've gotten their name into the name variable.
high_score.append((name, a, b, c, time1))
Then apply the other answer from #Tamás
You definitely don't want a dictionary here. The whole point of a dictionary is to be able to map keys to values, without any sorting. What you want is a sorted list. And you've already got that.
Well, as Tamás points out, you've actually got a list sorted by the player name, not the score. On top of that, you want to sort in downward order, not upward. You could use the decorate-sort-undecorate pattern, or a key function, or whatever, but you need to do something. Also, you've put it in a variable named list, which is a very bad idea, because that's already the name of the list type.
Anyway, you can find out whether to add something into a sorted list, and where to insert it if so, using the bisect module in the standard library. But it's probably simpler to just use something like SortedCollection or blist.
Here's an example:
highscores = SortedCollection(scores, key=lambda x: -x[3])
Now, when you finish the game:
highscores.insert_right((player, a, b, newscore, time1))
del highscores[-1]
That's it. If you were actually not in the top 10, you'll be added at #11, then removed. If you were in the top 10, you'll be added, and the old #10 will now be #11 and be removed.
If you don't want to prepopulate the list with 10 fake scores the way old arcade games used to, just change it to this:
highscores.insert_right((player, a, b, newscore, time1))
del highscores[10:]
Now, if there were already 10 scores, when you get added, #11 will get deleted, but if there were only 3, nothing gets deleted, and now there are 4.
Meanwhile, I'm not sure why you're writing the new scores out to a pickle file, and then reading the same thing back in. You probably want to do the reading before adding the highscore to the list, and then do the writing after adding it.
You also asked how to "beautify the list". Well, there are three sides to that.
First of all, in the code, (player, a, b, c, time1) isn't very meaningful. Giving the variables better names would help, of course, but ultimately you still come down to the fact that when accessing list, you have to do entry[3] to get the score or entry[4] to get the time.
There are at least three ways to solve this:
Store a list (or SortedCollection) of dicts instead of tuples. The code gets a bit more verbose, but a lot more readable. You write {'player': player, 'height': a, 'weight': b, 'score': c, 'time': time1}, and then when accessing the list, you do entry['score'] instead of entry[3].
Use a collection of namedtuples. Now you can actually just insert ScoreEntry(player, a, b, c, time1), or you can insert ScoreEntry(player=player, height=a, weight=b, score=c, time=time1), whichever is more readable in a given case, and they both work the same way. And you can access entry.score or as entry[3], again using whichever is more readable.
Write an explicit class for score entries. This is pretty similar to the previous one, but there's more code to write, and you can't do indexed access anymore, but on the plus side you don't have to understand namedtuple.
Second, if you just print the entries, they look like a mess. The way to deal with that is string formatting. Instead of print scores, you do something like this:
print '\n'.join("{}: height {}, weight {}, score {} at {}".format(entry)
for entry in highscores)
If you're using a class or namedtuple instead of just a tuple, you can even format by name instead of by position, making the code much more readable.
Finally, the highscore file itself is an unreadable mess, because pickle is not meant for human consumption. If you want it to be human-readable, you have to pick a format, and write the code to serialize that format. Fortunately, the CSV format is pretty human-readable, and most of the code is already written for you in the csv module. (You may want to look at the DictReader and DictWriter classes, especially if you want to write a header line. Again, there's the tradeoff of a bit more code for a lot more readability.)

Most concise way to define multiple variables

I have data consisting of about 10,000 entries. Each row is a price for a product in a specific currency. For example:
- Purchase 1 = 10.25 USD
- Purchase 2 = 11.76 SEK
I have ten different database columns to total sales for each currency (this is a requirement). The columns are earnings_in_usd, earnings_in_sek, earnings_in_eur, etc. In my function to do an insert statement to the database, I need to define the necessary variable. By default all other entries will be 0.00. This is basically the code that would accomplish what I need to do:
if currency == 'USD':
earnings_in_usd = value
elif currency == 'SEK':
earnings_in_sek = value
elif ...
Is there a more straightforward way to do this (a way do to something like earnings_in_$ = value)?
Use a defaultdict indexed by the currency.
from collections import defaultdict
earnings = defaultdict(float) # float has a default value of 0.
Instead of your long if-then-else, use this single line:
earnings[currency] = value
and retrieve the earnings in, say, US$, with
earnings["USD"]
Perhaps use a dictionary?
earnings = {}
earnings[currency] = value
One way to do it, which may very well have someone confounded when it breaks, is to use a list comprehension:
earnings_in_usd, earnings_in_sek, ... = [(value if currency == c else 0) for c in CURRENCIES]
The drawback is that the left hand side would have to include all your variables, and CURRENCIES would have to be a list of string constants with exactly the same order as the variables on the left hand side. Like I said, this may very well break if you tamper with other parts of the program...
If earnings is an object/array then
earnings[currency] = value
or
earnings.currency = value;

Categories