I am creating a dictionary with "Full Name": "Birthday" for numerous people as an exercise.
The program should ask
"Who's birthday do you want to look up?"
I will input a name, say "Benjamin Franklin"
And it will return his birthday: 1706/01/17.
Alright, the problem I am encountering is name capitalization.
How can I input "benjamin franklin" and still find "Benjamin Franklin" in my dictionary? I am familiar with .lower() and .upper() functions, however I am not able to implement them correctly, is that the right way to approach this problem?
Here is what I have
bday_dict = {"Person1": "YYYY/MM/DD1",
"Person2": "YYYY/MM/DD2",
"Benjamin Franklin": "1706/01/17"}
def get_name(dict_name):
name = input("Who's birthday do you want to look up? > ")
return name
def find_bday(name):
print(bday_dict[name])
find_bday(get_name(bday_dict))
The best way to do this is to keep the keys in your dictionary lowercase. If you can't do that for whatever reason, have a dictionary from lowercase to the real key, and then keep the original dictionary.
Otherwise, Kraigolas's solution works well, but it is O(N) whereas hashmaps are supposed to be constant-time, and thus for really large dictionaries the other answer will not scale.
So, when you are setting your keys, do bday_dict[name.lower()] = value and then you can query by bday_dict[input.lower()].
Alternatively:
bday_dict = {"John": 1}
name_dict = {"john": "John"}
def access(x):
return bday_dict[name_dict[x.lower()]]
Probably the most straight forward way I can think of to solve this is the following:
def get_birthday(name):
global bday_dict
for person, bday in bday_dict.items():
if name.lower() == person.lower():
return bday
return "This person is not in bday_dict"
Here, you just iterate through the entire dictionary using the person's name paired with their birthday, and if we don't find them, just return a message saying we don't have their birthday.
If you know that all names will capitalize the first letter of each word, you can just use:
name = ' '.join([word.capitalize() for word in name.split()])
then you can just search for that. This is not always the case. For example, for "Leonardo da Vinci" this will not work, so the original answer is probably the most reliable way to do this.
One final way to do this would be to just store the names as lowercase from the beginning in your dictionary, but this might not be practical when you want to draw a name from the dictionary as well.
Depending what your exercise allows, I would put the names in the dictionary as all lowercase or uppercase. So:
bday_dict = {"person1": "YYYY/MM/DD1",
"person2": "YYYY/MM/DD2",
"benjamin franklin": "1706/01/17"}
And then look up the entered name in the dictionary like this:
def find_bday(name):
print(bday_dict[name.lower()])
You may also want to do a check that the name is in the dictionary beforehand to avoid an error:
def find_bday(name):
bday = bday_dict.get(name.lower(), None)
if bday:
print(bday)
else:
print("No result for {}.".format(name))
I have a dataframe of which one column ('entity) contains various names of countries and non-state entities. I need to clean the column because the string values (provided by manual data-entry) are all lower-case (china instead of China). I can't just perform the .title() operation on the column since there are string values for which I want nothing to done (e.g., al Something should not be turned into AL Something).
I'm have trouble creating a function to help me with this problem and could use some guidance from the community. In the past I've used dictionaries to help map/replace incorrect strings with correct strings, and I can still revert to that way of doing things, but I thought creating this function might be more straightforward and efficient and plus I wanted to challenge myself. But no changes occurs to the entity column when I execute the function. Thanks in advance!
myString = ['al Group1', 'al Group2']
entities = df['entity']
def title_fix(entities):
new_titles = []
for entity in entities:
if entity in myString:
new_titles.append(myString)
else:
new_title.append(entity.title())
return new_title
title_fix(df)
The entities in the line entities = df['entity'] is not the same variable as the entities in the line def title_fix(entities):. This second entities variable is the argument to the function title_fix, and it exists only within the function. It takes on whatever argument you pass into your call to title_fix, which is df.
Try this instead of your function:
# A list of entity names to leave alone (must exactly match character-for-character)
myString = ['al Group1', 'al Group2']
# Apply title case to every entity NOT in myString
df['entity'] = df['entity'].apply(lambda x: x if x in myString else x.title())
# Print the modified DataFrame
df
Note that this solution requires that each string in myString exactly matches the target string in df['entity'], otherwise the target string will not be replaced.
Your code had several bugs, such as spelling and indentation. Fixed code:
myString = ['al Group1', 'al Group2']
entities = df['entity']
def title_fix(entities):
new_titles = []
for entity in entities:
if entity in myString:
new_titles.append(entity)
else:
new_titles.append(entity.title())
return new_titles
df['entity'] = title_fix(entities)
However, what you want to achieve can be done in a one-liner. I came up with 3 solutions. I don't know pandas that well and I have no idea about the performance differences between these solutions, but here they are.
ignored makes a little bit more sense than myString so I'll use it.
ignored = ['al Group1', 'al Group2']
First solution:
df['entity'] = df['entity'].apply(lambda x: x.title() if x not in ignored else x)
Second:
df.entity[~df.entity.isin(ignored)] = df.entity.str.title()
Third:
df.loc[~df.entity.isin(ignored), 'entity'] = df.entity.str.title()
import collections
header_dict = {'account number':'ACCOUNT_name','accountID':'ACCOUNT_name','name':'client','first name':'client','tax id':'tin'}
#header_dict = collections.defaultdict(lambda: 'tin') # attempted use of defaultdict...destroys my dictionary
given_header = ['account number','name','tax id']#,'tax identification number']#,'social security number'
#given_header = ['account number','name','tax identification number']...non working header layout
fileLayout = [header_dict[ting] for ting in given_header if ting] #create if else..if ting exists, add to list...else if not in list, add to dictionary
def getLayout(ting):
global given_header
global fileLayout
return given_header[fileLayout.index(ting)]
print getLayout('ACCOUNT_name')
print getLayout('client')
print getLayout('tin')
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
I am working with many files of random, mixed up layouts/column orders. I have a set template for my db table of 'ACCOUNT_name','client','tin' that I want the files to be ordered in. I have created a dictionary of the possible header/column names I might find in other files as keys and my set header names as values. So, for example, if I wanted to see where to put the column 'account number' from one of my given files, I would type header_dict['account number'].
This would give me the corresponding column from my template, 'ACCOUNT_name'. This works great...I also added another feature. Instead of having to type 'account number'..I made a list comprehension that looks up each value by key.
This list I just created with the 'fileLayout' list comprehension essentially transforms my given file's header into my desired names: ['ACCOUNT_name','client']
That makes life a lot easier...I know that I want to look up 'ACCOUNT_name', or 'client'. Next I run a function 'getLayout' that returns the index of the desired columns I am searching...So if I want to see where my desired column 'ACCOUNT_name' is in the file, I just run the function which is called like this...
getLayout('ACCOUNT_name')
Now at this point, I can easily print the columns to my order...with:
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
The above code gives me [('account number'),('name'),('tax id')], which is exactly what I want...
But what if there is a new header I am not used to ?? Lets use the same example code above but change the list 'given_header' to this:
given_header = ['account number','name','tax identification number']
I most certainly get the key error, KeyError: 'tax identification number' I know I can use defaultdict but when I try to use it with the set value 'tin', I end up overwriting my entire dictionary... What I would ultimately like to end up doing is this...
I would like to create an else within my list comprehension that allows me to standard input dictionary entries if they don't exist. In other words, since 'tax identification number' does not exists as a key, add it as one to my dict and give it the value 'tin' via raw_input. Has anyone ever done or tried anything like this? Any ideas? If you have and have any suggestions, I am all ears. I'm struggling on this issue...
The way I would want to go about this is in the list comprehension..
fileLayout = [header_dict[ting] for ting in given_header if ting else raw_input('add missing key value pair to dictionary')] # or do something of the sort.
I need to sort the catalog results by multiple fields.
In my case, first sort by year, then by month. The year and month field are included in my custom content type (item_publication_year and item_publication_month respectively).
However, I'm not getting the results that I want. The year and month are not ordered at all. They should appear in descending order i.e. 2006, 2005, 2004 etc.
Below is my code:
def queryItemRepository(self):
"""
Perform a search returning items matching the criteria
"""
query = {}
portal_catalog = getToolByName(self, 'portal_catalog')
folder_path = '/'.join( self.context.getPhysicalPath() )
query['portal_type'] = "MyContentType"
query['path'] = {'query' : folder_path, 'depth' : 2 }
results = portal_catalog.searchResults(query)
# convert the results to a python list so we can use the sort function
results = list(results)
results.sort(lambda x, y : cmp((y['item_publication_year'], y['item_publication_year']),
(x['item_publication_month'], x['item_publication_month'])
))
return results
Anyone care to help?
A better bet is to use the key parameter for sorting:
results.sort(key=lambda b: (b.item_publication_year, b.item_publication_month))
You can also use the sorted() built-in function instead of using list(); it'll return a sorted list for you, it's the same amount of work for Python to first call list on the results, then sort, as it is to just call sorted:
results = portal_catalog.searchResults(query)
results = sorted(results, key=lambda b: (b.item_publication_year, b.item_publication_month))
Naturally, both item_publication_year and item_publication_month need to be present in the catalog metadata.
You can get multiple sorting straight from catalog search using advanced query see also its official docs
Python newb here looking for some assistance...
For a variable number of dicts in a python list like:
list_dicts = [
{'id':'001', 'name':'jim', 'item':'pencil', 'price':'0.99'},
{'id':'002', 'name':'mary', 'item':'book', 'price':'15.49'},
{'id':'002', 'name':'mary', 'item':'tape', 'price':'7.99'},
{'id':'003', 'name':'john', 'item':'pen', 'price':'3.49'},
{'id':'003', 'name':'john', 'item':'stapler', 'price':'9.49'},
{'id':'003', 'name':'john', 'item':'scissors', 'price':'12.99'},
]
I'm trying to find the best way to group dicts where the value of key "id" is equal, then add/merge any unique key:value and create a new list of dicts like:
list_dicts2 = [
{'id':'001', 'name':'jim', 'item1':'pencil', 'price1':'0.99'},
{'id':'002', 'name':'mary', 'item1':'book', 'price1':'15.49', 'item2':'tape', 'price2':'7.99'},
{'id':'003', 'name':'john', 'item1':'pen', 'price1':'3.49', 'item2':'stapler', 'price2':'9.49', 'item3':'scissors', 'price3':'12.99'},
]
So far, I've figured out how to group the dicts in the list with:
myList = itertools.groupby(list_dicts, operator.itemgetter('id'))
But I'm struggling with how to build the new list of dicts to:
1) Add the extra keys and values to the first dict instance that has the same "id"
2) Set the new name for "item" and "price" keys (e.g. "item1", "item2", "item3"). This seems clunky to me, is there a better way?
3) Loop over each "id" match to build up a string for later output
I've chosen to return a new list of dicts only because of the convenience of passing a dict to a templating function where setting variables by a descriptive key is helpful (there are many vars). If there is a cleaner more concise way to accomplish this, I'd be curious to learn. Again, I'm pretty new to Python and in working with data structures like this.
Try to avoid complex nested data structures. I believe people tend to
grok them only while they are intensively using the data structure. After the
program is finished, or is set aside for a while, the data structure quickly
becomes mystifying.
Objects can be used to retain or even add richness to the data structure in a saner, more organized way. For instance, it appears the item and price always go together. So the two pieces of data might as well be paired in an object:
class Item(object):
def __init__(self,name,price):
self.name=name
self.price=price
Similarly, a person seems to have an id and name and a set of possessions:
class Person(object):
def __init__(self,id,name,*items):
self.id=id
self.name=name
self.items=set(items)
If you buy into the idea of using classes like these, then your list_dicts could become
list_people = [
Person('001','jim',Item('pencil',0.99)),
Person('002','mary',Item('book',15.49)),
Person('002','mary',Item('tape',7.99)),
Person('003','john',Item('pen',3.49)),
Person('003','john',Item('stapler',9.49)),
Person('003','john',Item('scissors',12.99)),
]
Then, to merge the people based on id, you could use Python's reduce function,
along with take_items, which takes (merges) the items from one person and gives them to another:
def take_items(person,other):
'''
person takes other's items.
Note however, that although person may be altered, other remains the same --
other does not lose its items.
'''
person.items.update(other.items)
return person
Putting it all together:
import itertools
import operator
class Item(object):
def __init__(self,name,price):
self.name=name
self.price=price
def __str__(self):
return '{0} {1}'.format(self.name,self.price)
class Person(object):
def __init__(self,id,name,*items):
self.id=id
self.name=name
self.items=set(items)
def __str__(self):
return '{0} {1}: {2}'.format(self.id,self.name,map(str,self.items))
list_people = [
Person('001','jim',Item('pencil',0.99)),
Person('002','mary',Item('book',15.49)),
Person('002','mary',Item('tape',7.99)),
Person('003','john',Item('pen',3.49)),
Person('003','john',Item('stapler',9.49)),
Person('003','john',Item('scissors',12.99)),
]
def take_items(person,other):
'''
person takes other's items.
Note however, that although person may be altered, other remains the same --
other does not lose its items.
'''
person.items.update(other.items)
return person
list_people2 = [reduce(take_items,g)
for k,g in itertools.groupby(list_people, lambda person: person.id)]
for person in list_people2:
print(person)
This looks very much like a homework problem.
As the above poster mentioned, there are a few more appropriate data structures for this kind of data, some variant on the following might be reasonable:
[ ('001', 'jim', [('pencil', '0.99')]),
('002', 'mary', [('book', '15.49'), ('tape', '7.99')]),
('003', 'john', [('pen', '3.49'), ('stapler', '9.49'), ('scissors', '12.99')])]
This can be made with the relatively simple:
list2 = []
for id,iter in itertools.groupby(list_dicts,operator.itemgetter('id')):
idList = list(iter)
list2.append((id,idList[0]['name'],[(z['item'],z['price']) for z in idList]))
The interesting thing about this question is the difficulty in extracting 'name' when using groupby, without iterating past the item.
To get back to the original goal though, you could use code like this (as the OP suggested):
list3 = []
for id,name,itemList in list2:
newitem = dict({'id':id,'name':name})
for index,items in enumerate(itemList):
newitem['item'+str(index+1)] = items[0]
newitem['price'+str(index+1)] = items[1]
list3.append(newitem)
I imagine it would be easier to combine the items in list_dicts into something that looks more like this:
list_dicts2 = [{'id':1, 'name':'jim', 'items':[{'itemname':'pencil','price':'0.99'}], {'id':2, 'name':'mary', 'items':[{'itemname':'book','price':'15.49'}, {'itemname':'tape','price':'7.99'}]]
You could also use a list of tuples for 'items' or perhaps a named tuple.