Finding highest value in a dictionary - python

I'm new to programming and currently taking a CSC 110 class. Our assignment is to create a bunch functions that do all sorts of things with some data that is given. I have taken all that data and put it into a dictionary but I'm having some trouble getting the data I want out of it.
Here is my problem:
I have a dictionary that stores a bunch of countries followed by a list that includes their population and GDP. Formatted something like this
{'country': [population, GDP], ...}
My task is to loop through this and find the country with the highest population or GDP then print:
'The country with the highest population is ' + highCountry+\
' with a population of ' + format(highPop, ',.0f')+'.')
In order to do this I wrote this function (this one is specifically for highest population but they all look about the same).
def highestPop(worldInfo):
highPop = worldInfo[next(iter(worldInfo))][0] #Grabs first countries Population
highCountry = next(iter(worldInfo))#Grabs first country in worldInfo
for k,v in worldInfo.items():
if v[0] > highPop:
highPop = v[0]
highCountry = k
return highPop,highCountry
While this is working for me I gotta think there is an easier way to do this. Also I'm not 100% sure how [next(iter(worldInfo))] works. Does this just grab the first value it sees?
Thanks for your help in advance!
Edit: Sorry I guess I wasn't clear. I need to pass the countries population but also the countries name. So I can print both of them in my main function.

I think you're looking for this:
max(worldInfo.items(), key=lambda x: x[1][0])
This will return both the country name and its info. For instance:
('france', [100, 22])
The max() function can work on python "iterables" which is a fancy word for anything that can be cycled or looped through. Thus it cycles or loops through the thing you put into it and spits out the item that's the highest.
But how does it judge which tuple is highest? Which is higher: France or Germany? You have to specify a key (some specification for how to judge each item). The key=lambda etc specifies a function that given an item (x), judge that item based on x[1][0]. In this instance if the item is ('france', [100, 22]) then x[1][0] is 100. So the x[1][0] of each item is compared and the item with the highest one is returned.
The next() and iter() functions are for python iterators. For example:
mytuple = ("apple", "banana", "cherry")
myit = iter(mytuple)
print(next(myit)) #=> apple
print(next(myit)) #=> banana
print(next(myit)) #=> cherry

Use the max() function, like so:
max(item[0] for item in county_dict.values()) #use item[1] for GDP!
Also try storing the values not in a list ([a, b]) but in a tuple ((a, b)).
Edit: Like iamanigeeit said in the comments, this works to give you the country name as well:
max(data[0], country for country, data in country_dict.items())

An efficient solution to get the key with the highest value: you can use the max function this way:
highCountry = max(worldInfo, key=lambda k: worldInfo[k][0])
The key argument is a function that specifies what values you want to use to determine the max.max(data[0], country for country, data in country_dict.items())
And obviously :
highPop = worldInfo[highCountry][0]

Related

Python Max function - Finding highest value in a dictionary

My question is about finding highest value in a dictionary using max function.
I have a created dictionary that looks like this:
cc_GDP = {'af': 1243738953, 'as': 343435646, etc}
I would like to be able to simply find and print the highest GDP value for each country.
My best attempt having read through similar questions is as follows (I'm currently working through the Python crash course book at which the base of this code has been taken, note the get_country_code function is simply providing 2 letter abbreviations for the countries in the GDP_data json file):
#Load the data into a list
filename = 'gdp_data.json'
with open(filename) as f:
gdp_data = json.load(f)
cc_GDP` = {}
for gdp_dict in gdp_data:
if gdp_dict['Year'] == 2016:
country_name = gdp_dict['Country Name']
GDP_total = int(gdp_dict['Value'])
code = get_country_code(country_name)
if code:
cc_GDP[code] = int(GDP_total)
print(max(cc_GDP, key=lambda key: cc_GDP[key][1]))
This provides the following error 'TypeError: 'int' object is not subscriptable'
Note if leaving out the [1] in the print function, this does provide the highest key which relates to the highest value, but does not return the highest value itself which is what I wish to achieve.
Any help would be appreciated.
So you currently extract the key of the country that has the highest value with this line:
country_w_highest_val = max(cc_GDP, key=lambda key: cc_GDP[key]))
You can of course just look that up in the dictionary again:
highest_val = cc_GDP[contry_w_highest_val]
But simpler, disregard the keys completely, and just find the highest value of all values in the dictionary:
highest_val = max(cc_GDP.values())
How about something like this:
print max(cc_GDP.values())
That will give you the highest value but not the key.
The error is being cause because you need to look at the entire dictionary, not just one item. remove the [1] and then use the following line:
print(cc_GDP[max(cc_GDP, key=lambda key: cc_GDP[key])])
Your code currently just returns the dictionary key. You need to plug this key back into the dictionary to get the GDP.
You could deploy .items() method of dict to get key-value pairs (tuples) and process it following way:
cc_GDP = {'af': 1243738953, 'as': 343435646}
m = max(list(cc_GDP.items()), key=lambda x:x[1])
print(m) #prints ('af', 1243738953)
Output m in this case is 2-tuple, you might access key 'af' via m[0] and value 1243738953 via m[1].

python nested for loop iteration

I would like the inner for loop to give me one value and goes on to the next iteration for each outer loop. Appreciate your responds.
Currently the result:
Personal NameJohn
Personal NamePeter
Personal Name123456
ID #John
ID #Peter
ID #123456
Emergency contactJohn
Emergency contactPeter
Emergency contact123456
Result should just be:
ID #123456
Personal NameJohn
Emergency contactPeter
employees={'ID #','Personal Name','Emergency contact'}
excel={'123456',
'John',
'Peter'}
for key in employees:
for value in excel:
print(key + value)
Use zip to iterate over two objects at the same time.
note: you are using sets (created using {"set", "values"}) here. Sets have no order, so you should use lists (created using ["list", "values"]) instead.
for key, value in zip(employees, excel):
print(key, value)
You can use zip after changing the type of your input data. Sets order their original content, thus producing incorrect pairings:
employees=['ID #','Personal Name','Emergency contact']
excel=['123456', 'John','Peter']
new_data = [a+b for a, b in zip(employees, excel)]
Output:
['ID #123456', 'Personal NameJohn', 'Emergency contactPeter']
First of all, use square brackets [] instead of curly brackets {}. Then you could use zip() (see other answers) or use something very basic like this:
for i in range(len(employees)):
print(employees[i], excel[i])

Simplifying a list into categories

I am a new Python developer and was wondering if someone can help me with this. I have a dataset that has one column that describes a company type. I noticed that the column has, for example, surgical, surgery listed. It has eyewear, eyeglasses and optometry listed. So instead of having a huge list in this column, i want to simply the category to say that if you find a word that contains "eye," "glasses" or "opto" then just change it to "eyewear." My initial code looks like this:
def map_company(row):
company = row['SIC_Desc']
if company in 'Surgical':
return 'Surgical'
elif company in ['Eye', 'glasses', 'opthal', 'spectacles', 'optometers']:
return 'Eyewear'
elif company in ['Cotton', 'Bandages', 'gauze', 'tape']:
return 'First Aid'
elif company in ['Dental', 'Denture']:
return 'Dental'
elif company in ['Wheelchairs', 'Walkers', 'braces', 'crutches', 'ortho']:
return 'Mobility equipments'
else:
return 'Other'
df['SIC_Desc'] = df.apply(map_company,axis=1)
This is not correct though because it is changing every item into "Other," so clearly my syntax is wrong. Can someone please help me simplify this column that I am trying to relabel?
Thank you
It is hard to answer without having the exact content of your data set, but I can see one mistake. According to your description, it seems you are looking at this the wrong way. You want one of the words to be in your company description, so it should look like that:
if any(test in company for test in ['Eye', 'glasses', 'opthal', 'spectacles', 'optometers'])
However you might have a case issue here so I would recommend:
company = row['SIC_Desc'].lower()
if any(test.lower() in company for test in ['Eye', 'glasses', 'opthal', 'spectacles', 'optometers']):
return 'Eyewear'
You will also need to make sure company is a string and 'SIC_Desc' is a correct column name.
In the end your function will look like that:
def is_match(company,names):
return any(name in company for name in names)
def map_company(row):
company = row['SIC_Desc'].lower()
if 'surgical' in company:
return 'Surgical'
elif is_match(company,['eye','glasses','opthal','spectacles','optometers']):
return 'Eyewear'
elif is_match(company,['cotton', 'bandages', 'gauze', 'tape']):
return 'First Aid'
else:
return 'Other'
Here is an option using a reversed dictionary.
Code
import pandas as pd
# Sample DataFrame
s = pd.Series(["gauze", "opthal", "tape", "surgical", "eye", "spectacles",
"glasses", "optometers", "bandages", "cotton", "glue"])
df = pd.DataFrame({"SIC_Desc": s})
df
LOOKUP = {
"Eyewear": ["eye", "glasses", "opthal", "spectacles", "optometers"],
"First Aid": ["cotton", "bandages", "gauze", "tape"],
"Surgical": ["surgical"],
"Dental": ["dental", "denture"],
"Mobility": ["wheelchairs", "walkers", "braces", "crutches", "ortho"],
}
REVERSE_LOOKUP = {v:k for k, lst in LOOKUP.items() for v in lst}
def map_company(row):
company = row["SIC_Desc"].lower()
return REVERSE_LOOKUP.get(company, "Other")
df["SIC_Desc"] = df.apply(map_company, axis=1)
df
Details
We define a LOOKUP dictionary with (key, value) pairs of expected output and associated words, respectively. Note, the values are lowercase to simplify searching. Then we use a reversed dictionary to automatically invert the key value pairs and improve the search performance, e.g.:
>>> REVERSE_LOOKUP
{'bandages': 'First Aid',
'cotton': 'First Aid',
'eye': 'Eyewear',
'gauze': 'First Aid',
...}
Notice these reference dictionaries are created outside the mapping function to avoid rebuilding dictionaries for every call to map_company(). Finally the mapping function quickly returns the desired output using the reversed dictionary by calling .get(), a method that returns the default argument "Other" if no entry is found.
See #Flynsee's insightful answer for an explanation of what is happening in your code. The code is cleaner compared a bevy of conditional statements.
Benefits
Since we have used dictionaries, the search time should be relatively fast, O(1) compared to a O(n) complexity using in. Moreover, the main LOOKUP dictionary is adaptable and liberated from manually implementing extensive conditional statements for new entries.

How to get 'specific' keys with maximum value in dictionary?

My dict is:
rec_Dict = {'000000000500test.0010': -103,
'000000000500test.0012': -104,
'000000000501test.0015': -105,
'000000000501test.0017': -106}
I know how to find maximum value:
>>print 'max:' + str(max(recB_Dict.iteritems(), key=operator.itemgetter(1)))
max:(u'000000000500test.0010', -103)`
But I want to find keys beginning with '000000000501test', but not including '000000000501test.0015' or any starting with '000000000500test'.
It should print like:
max:(u'000000000501test.0015', -105)`
How can I use keyword to get?
I can't understand the conditions you want to filter keys, but you can use the below scripts (just fix the conditions)
genetator_filter = genetator_filter = ((a,b) for a,b in rec_Dict.iteritems() if (not '.0015' in a) and (not '000000000500test.' in a) )
#(you need to fix filter conditions for keys)
print 'max:' + str(max(genetator_filter, key = lambda x:x[1]))
Separating responsibility to achieve the final result, you can find your max based on what you are looking to match on exactly. Then using that max, just output the value. Granted, some will argue that it is not the most optimized, or not the most functional way to go about it. But, personally, it works just fine, and achieves the result with good enough performance. Furthermore, makes it more readable and easy to test.
Get the max based on the part of the string you want by extracting keys and finding the max:
max_key_substr = max(i.split('.')[0] for i in rec_Dict)
Iterate with that max_key_substr and output the key/value pair:
for key, value in rec_Dict.items():
if max_key_substr in key:
print(key, value)
The output will be:
000000000501test.0015 -105
000000000501test.0017 -106
What you say it should print like doesn't make sense because the key '000000000501test.0015'should have been excluded according to other things you said.
Ignoring that, you could use a generator expression to sift-out the items you don't want processed:
from operator import itemgetter
rec_Dict = {'000000000500test.0010': -103,
'000000000500test.0012': -104,
'000000000501test.0015': -105,
'000000000501test.0017': -106}
def get_max(items):
def sift(record):
key, value = record
return key.startswith('000000000501') and not key.endswith('.0015')
max_record = max((item for item in items if sift(item)), key=itemgetter(1))
return max_record
print(get_max(rec_Dict.iteritems())) # -> ('000000000501test.0017', -106)

How to Re-arrange items in a Python Dictionary during For Loop?

I am building a Python dictionary from a table in Excel. It's a Category:Name relationship. So, the first column in the spreadsheet is a category and the second column is the name of a file:
Forests - Tree Type
Forests - Soil Type
Administrative - Cities
Administrative - Buildings
Mineral - Gold
Mineral - Platinum
Water - Watershed
Water - Rivers
Water - Lakes
Water - Streams
and so on...
I use this code to build the dictionary:
layerListDict = dict()
for row in arcpy.SearchCursor(xls):
# Set condition to pull out the Name field in the xls file.
# LayerList being the list of all the 'Name' from the 'Name' column built earlier in the script
if str(row.getValue("Name")).rstrip() in layerList:
# Determine if the category item is in the dictionary as a key already. If so, then append the Name to the list of values associated with the category
if row.getValue("Category") in layerListDict:
layerListDict[row.getValue("Category")].append(str(row.getValue("Name")))
# if not, create a new category key and add the associated Name value to it
else:
layerListDict[row.getValue("Category")] = [str(row.getValue("Name"))]
So, now I have a dictionary with Category as the key and a list of Names as the values:
{u'Forests': ['Tree Type', 'Soil Type'], u'Administrative': ['Cities', 'Buildings'], u'Mineral': ['Gold', 'Platinum'], u'Water': ['Watershed', 'Rivers', 'Lakes', 'Streams']}
I can now iterate over the sorted dictionary by key:
for k,v in sorted(layerListDict.iteritems()):
print k, v
PROBLEM: What I would like to do is to iterate over the sorted dictionary with one caveat...I wanted to have the 'Mineral' key to be the very first key and then have the rest of the keys print out in alphabetical order like this:
Mineral ['Gold', 'Platinum']
Administrative ['Cities', 'Buildings']
Forests ['Tree Type', 'Soil Type']
Water ['Watershed', 'Rivers', 'Lakes', 'Streams']
Can anyone suggest how I can accomplish this?
I tried to set a variable to a sorted list, but it returns as a python list and I cannot iterate over the Python list by a key value pair anymore.
List2 = sorted(layerListDict.iteritems())
[u'Forests':['Tree Type', 'Soil Type'], u'Administrative': ['Cities', 'Buildings'], u'Mineral': ['Gold', 'Platinum'], u'Water': ['Watershed', 'Rivers', 'Lakes', 'Streams']]
print "Mineral", layerListDict.pop("Mineral")
for k, v in sorted(layerListDict.iteritems()):
print k, v
If you don't want to modify layerListDict:
print "Mineral", layerListDict["Mineral"]
for k, v in sorted(layerListDict.iteritems()):
if k != "Mineral":
print k, v
An overly general solution:
import itertools
first = 'Mineral'
for k, v in itertools.chain([(first, layersListDict[first])],
((k,v) for (k,v) in layerListDict.iteritems() if k != first)):
print k, v
or closer to my original incorrect solution:
for k, layersListDict[k] in itertools.chain((first,),
(k for k in layerListDict
if k != first)):
print k, v
If you're just looking to print the key-value pairs, then the other solutions get the job done quite well. If you're looking for the resulting dictionary to have a certain order so that you can perform other operations on it, you should look into the OrderedDict class:
https://docs.python.org/2/library/collections.html#collections.OrderedDict
Objects are stored in the order that they are inserted. In your case, you would do something similar to the other answers first to define the order:
dict_tuples = sorted(layerListDict.items())
ordered_tuples = [("Mineral", layerListDict["Mineral"],)]
ordered_tuples += [(k, v,) for k, v in dict_tuples if k != "Mineral"]
ordered_dict = collections.OrderedDict(ordered_tuples) #assumes import happened above
Now you can do whatever you want with ordered_dict (careful with deleting then reinserting, see the link above). Don't know if that helps you more than some of the other answers (which are all pretty great!).
EDIT: Whoops, my recollection of the update behavior of OrderedDicts was a bit faulty. Fixed above. Also streamlined the code a little. You could potentially generate the tuples in your first for loop and then put them in the OrderedDict, too.
EDIT 2: Forgot that tuples are naturally sorted by the first element (thanks John Y), removed the unnecessary key param in the sorted() call.
Keep a list of keys in the order you want to iterate over the map. Then iterate through the list, using the values as keys into the map.
Actually, after seeing the other solutions, I like chepner's answer with itertools.chain() better, especially if the list of keys is large, because mine will move things around in the list too much.
# sort the keys
keyList = sorted(keys(layerListDict))
# remove 'Mineral' from it's place
del keyList[keyList.index('Mineral')]
# Put it in the beginning
keyList = ['Mineral'] + keyList
# Iterate
for k in keyList:
for v in layerListDict[k]:
print k, v
Second shot at an answer. This is pretty different from my original, and makes some possibly wrong assertions, but I like the feel of it a lot better. Since you're able to determine all of the values in the "name" column (layerList), I'm going to assume you can do the same for the "categories" column. This code assumes you've placed your categories (including "Mineral") into an unsorted list called categories, and replaces the original code:
categories.sort()
categories = ["Mineral"] + [cat for cat in categories if cat != "Mineral"]
# Insert the categories into our dict with placeholder lists that we can append to
layerListDict = collections.OrderedDict([(cat, [],) for cat in categories])
for row in arcpy.SearchCursor(xls):
if str(row.getValue("Name")).rstrip() in layerList:
layerListDict[row.getValue("Category")].append(str(row.getValue("Name")))
Now you can just iterate over layerListDict.items().

Categories