Ok, so I am working on an application that can go through a number of different database objects, compare the string and return the associated id, first name and last name. I currently have it to where I am building a list of tuples and then populating a dictionary with the key and values(using a list). What I want to do next is find the Max percentage and then return the associated fist and last name from the dictionary. I know the description is a little confusing so please look at the below examples and code:
# My Dictionary:
{'percent': [51.9, 52.3, 81.8, 21.0], 'first_name': ['Bob', 'Bill', 'Matt', 'John'], 'last_name': ['Smith', 'Allen', 'Naran', 'Jacobs']}
# I would want this to be returned:
percent = 81.8 (Max percentage match)
first_name = 'Matt' (First name associated with the max percentage match)
last_name = 'Naran' (Last name associated with the max percentage match)
# Code so Far:
compare_list = []
compare_dict = {}
# Builds my list of Tuples
compare_list.append(tuple(("percent", percentage)))
compare_list.append(tuple(("first_name", first_name)))
compare_list.append(tuple(("last_name", last_name)))
# Builds my Dictionary
for x, y in compare_list:
compare_dict.setdefault(x, []).append(y)
Not sure where to go to return the first and last name associated with the Max percentage.
I really appreciate any and all help that you provide!
I hope this will help you:
data = {'percent': [51.9, 52.3, 81.8, 21.0], 'first_name': ['Bob', 'Bill', 'Matt', 'John'], 'last_name': ['Smith', 'Allen', 'Naran', 'Jacobs']}
percentage_list = data['percent']
percentage = max(percentage_list)
max_index = percentage_list.index(percentage)
first_name = data['first_name'][max_index]
last_name = data['last_name'][max_index]
# Code so Far:
compare_list = []
compare_dict = {}
# Builds my list of Tuples
compare_list.append(tuple(("percent", percentage)))
compare_list.append(tuple(("first_name", first_name)))
compare_list.append(tuple(("last_name", last_name)))
# Builds my Dictionary
for x, y in compare_list:
compare_dict.setdefault(x, []).append(y)
print compare_dict
Related
I have created the following dictionary:
Book_list={
'Fiction': {1001: ['Pride and Prejudice', 'available'],
1002: ['Fahrenheit 451', 'available']},
'Horror': {2001: ['House of leaves', 'available'],
2002: ['The Shinking', 'available']}}
now I want to store the status that is "available" in a variable with its key so that it can be used further for delete or update.
So the output I mean is like:
status={1001:available,1002:available,2001:available,2002:available}
Please help me by telling that how could I get this output.
One approach is to use a dictionary comprehension:
rs = {ii : status for category in Book_list.values() for ii, (name, status) in category.items() if status == "available"}
print(rs)
Output
{1001: 'available', 1002: 'available', 2001: 'available', 2002: 'available'}
The above is equivalent to the followings nested for loops:
for category in Book_list.values():
for ii, (name, status) in category.items():
if status == "available":
rs[ii] = status
For understanding the unpacking expressions such as:
# _, category
# ii, (name, status)
you could read this link. For a general introduction to Python's data structures I suggest reading the documentation.
def receive_available_books(Book_list):
status = {}
for cat in Book_list.values():
for code, book in cat.items():
status[code] = book[1]
return status
Output:
{1001: 'available', 1002: 'available', 2001: 'available', 2002: 'available'}
Using Python 3.10's structural pattern matching, maybe not super appropriate/helpful for this, but I just wanted to try it :-)
rs = {}
for category in Book_list.values():
for item in category.items():
match item:
case ii, [_, 'available' as status]:
rs[ii] = status
(code adapted from Dani's)
With this code you get almost exactly the output you want:
d={}
for a in Book_list.values():
for x,y in a.items():
d[x]=y[1]
print("status=", d)
You could also, of course, assign d to your a status variable.
This code just creates first an empty d as dict and then fills it with data and finally prints it. To fill d with data, a, x and y take different values while walking (iterating) over your object:
a: For example {1001: ['Pride and Prejudice', 'available'], 1002: ['Fahrenheit 451', 'available']}
x: For example 1001
y: For example ['Pride and Prejudice', 'available']
y[1] would then be 'available'
I need to convert two unsorted lists into a dictionary. For example, I have a full name list:
full_name = ['taylor/swift', 'lady/gaga', 'leborn/james', 'james/harden']
And I have a last name list:
last_name = ['harden', 'james', 'swift', 'smith']
I need to convert these two lists into a dictionary:
{'harden':'james/harden', 'james':'leborn/james', 'swift':'taylor/swift'}
Notice that length of two lists are not equal. What's more, some of elements in last name list cannot be found in full_name list.I wrote a python script to complete the task.
def index_match(full_index, last_index):
last_valid = [s for s in last_index if any(s in xs for xs in full_index)]
matcher = []
for s in last_valid:
for xs in full_index:
if s in xs:
matcher.append(xs)
break
return dict(zip(last_valid, matcher))
matcher = index_match(full_name, last_name)
for item in matcher.items():
print(item)
And it works fine. But when the length of list increases, the program runs slowly. I tried to use dictionary comprehension to solve the problem, but I got syntax errors. What should i do to write the program in a more pythonic way to improve the speed?
full_name = ['taylor/swift', 'lady/gaga', 'leborn/james', 'james/harden']
last_name = ['harden', 'james', 'swift', 'smith']
out = {l:f for l in last_name for f in full_name if f.split("/")[1] == l}
print(out)
Output:
{'harden': 'james/harden', 'james': 'leborn/james', 'swift': 'taylor/swift'}
Use dictionary comprehension for a faster method:
full_name = ['taylor/swift', 'lady/gaga', 'leborn/james', 'james/harden']
last_name = ['harden', 'james', 'swift', 'smith']
last_name_to_full_name = {f.split(r'/')[1]: f for f in full_name}
last_name_to_full_name = {L: last_name_to_full_name[L] for L in last_name
if L in last_name_to_full_name}
print(last_name_to_full_name)
# {'harden': 'james/harden', 'james': 'leborn/james', 'swift': 'taylor/swift'}
Trying to create a dict that holds name,position and number for each player for each team. But when trying to create the final dictionary players[team_name] =dict(zip(number,name,position)) it throws an error (see below). I can't seem to get it right, any thoughts on what I'm doing wrong here would be highly appreciated. Many thanks,
from bs4 import BeautifulSoup as soup
import requests
from lxml import html
clubs_url = 'https://www.premierleague.com/clubs'
parent_url = clubs_url.rsplit('/', 1)[0]
data = requests.get(clubs_url).text
html = soup(data, 'html.parser')
team_name = []
team_link = []
for ul in html.find_all('ul', {'class': 'block-list-5 block-list-3-m block-list-1-s block-list-1-xs block-list-padding dataContainer'}):
for a in ul.find_all('a'):
team_name.append(str(a.h4).split('>', 1)[1].split('<')[0])
team_link.append(parent_url+a['href'])
team_link = [item.replace('overview', 'squad') for item in team_link]
team = dict(zip(team_name, team_link))
data = {}
players = {}
for team_name, team_link in team.items():
player_page = requests.get(team_link)
cont = soup(player_page.content, 'lxml')
clud_ele = cont.find_all('span', attrs={'class' : 'playerCardInfo'})
for i in clud_ele:
v_number = [100 if v == "-" else v.get_text(strip=True) for v in i.select('span.number')]
v_name = [v.get_text(strip=True) for v in i.select('h4.name')]
v_position = [v.get_text(strip=True) for v in i.select('span.position')]
key_number = [key for element in i.select('span.number') for key in element['class']]
key_name = [key for element in i.select('h4.name') for key in element['class']]
key_position = [key for element in i.select('span.position') for key in element['class']]
number = dict(zip(key_number,v_number))
name = dict(zip(key_name,v_name))
position = dict(zip(key_position,v_name))
players[team_name] = dict(zip(number,name,position))
---> 21 players[team_name] = dict(zip(number,name,position))
22
23
ValueError: dictionary update sequence element #0 has length 3; 2 is required
There are many problems in your code. The one causing the error is that you are trying to instantiate a dictionary with a 3-items tuple in list which is not possible. See the dict doc for details.
That said, I would suggest to rework the whole nested loop.
First, you have in clud_ele a list of player info, each player info concerns only one player and provides only one position, only one name and only one number. So there is no need to store those informations in lists, you could use simple variables:
for player_info in clud_ele:
number = player_info.select('span.number')[0].get_text(strip=True)
if number == '-':
number = 100
name = player_info.select('h4.name')[0].get_text(strip=True)
position = player_info.select('span.position')[0].get_text(strip=True)
Here, usage of select method returns a list but since you know that the list contains only one item, it's ok to get this item to call get_text on. But you could check that the player_info.select('span.number') length is actually 1 before continuing to work if you want to be sure...
This way, you get scalar data type which will be much easier to manipulate.
Also note that I renamed the i to player_info which is much more explicit.
Then you can easily add your player data to your players dict:
players[team_name].append({'name': name,
'position': position
'number': number})
This assume that you create the players[team_name] before the nested loop with players[team_name] = [].
Edit: as stated in the #kederrac's answer, usage of a defaultdict is a smart and convenient way to avoid the manual creation of each players[team_name] list
Finally, this will give you:
a dictionary containing values for name, position and number keys for each player
a team list containg player dictionaries for each team
a players dictionary associating a team list for each team_name
It is the data structure you seems to want, but other structures are possible. Remember to think about your data structure to make it logical AND easily manipulable.
you can't instantiate a dict with 3 arguments, the problem is the fact that you have 3 variables in the zip: zip(number, name, position) with which you want to instantiate a dict, you should give only 2 arguments at a time, the key and the value
I've rewritten your las part of the code:
from collections import defaultdict
data = {}
players = defaultdict(list)
for team_name, team_link in team.items():
player_page = requests.get(team_link)
cont = soup(player_page.text, 'lxml')
clud_ele = cont.find_all('span', attrs={'class' : 'playerCardInfo'})
for i in clud_ele:
num = i.select('span.number')[0].get_text(strip=True)
number = 100 if num == '-' else num
name = i.select('h4.name')[0].get_text(strip=True)
position = i.select('span.position')[0].get_text(strip=True)
players[team_name].append({'number': number, 'position': position, 'name': name})
output:
defaultdict(list,
{'Arsenal': [{'number': '1',
'position': 'Goalkeeper',
'name': 'Bernd Leno'},
{'number': '26',
'position': 'Goalkeeper',
'name': 'Emiliano Martínez'},
{'number': '33', 'position': 'Goalkeeper', 'name': 'Matt Macey'},
{'number': '2',
'position': 'Defender',
'name': 'Héctor Bellerín'},
.......................
As an exercise, I wanted to be less reliant on pandas and build a custom merge function on a list of dictionaries. Essentially, this is a left merge, where the original list is preserved and if the key has multiple matches then extra rows are added. However in my case, the extra rows appear to be added but with the exact same information.
Could anyone steer me in the right direction, as to where this code is going wrong?
def merge(self, l2, key):
#self.data is a list of dictionaries
#l2 is the second list of dictionaries to merge
headers = l2[0]
found = {}
append_list = []
for row in self.data:
for row_b in l2:
if row_b[key] == row[key] and row[key] not in found:
found[row[key]] = ""
for header in headers:
row[header] = row_b[header]
elif row_b[key] == row[key]:
new_row = row
for header in headers:
new_row[header] = row_b[header]
append_list.append(new_row)
self.data.extend(append_list)
Edit: Here is some sample input, and expected output:
self.data = [{'Name':'James', 'Country':'Australia'}, {'Name':'Tom', 'Country':'France'}]
l2 = [{'Country':'France', 'Food':'Frog Legs'}, {'Country':'Australia', 'Food':'Meat Pie'},{'Country':'Australia', 'Food':'Pavlova'}]
I would expect self.data to equal the following after passing through the function, with a parameter of 'Country':
[{'Name':'James', 'Country':'Australia', 'Food':'Meat Pie'}, {'Name':'James', 'Country':'Australia', 'Food':'Pavlova'}, {'Name':'Tom', 'Country':'France', 'Food':'Frog Legs'}]
The function below takes two lists of dictionaries, where the dictionaries are expected to all have keyprop as one of their properties:
from collections import defaultdict
from itertools import product
def left_join(left_table, right_table, keyprop):
# create a dictionary indexed by `keyprop` on the left
left = defaultdict(list)
for row in left_table:
left[row[keyprop]].append(row)
# create a dictionary indexed by `keyprop` on the right
right = defaultdict(list)
for row in right_table:
right[row[keyprop]].append(row)
# now simply iterate through the "left side",
# grabbing rows from the "right side" if they are available
result = []
for key, left_rows in left.items():
right_rows = right.get(key)
if right_rows:
for left_row, right_row in product(left_rows, right_rows):
result.append({**left_row, **right_row})
else:
result.extend(left_rows)
return result
sample1 = [{'Name':'James', 'Country':'Australia'}, {'Name':'Tom', 'Country':'France'}]
sample2 = [{'Country':'France', 'Food':'Frog Legs'}, {'Country':'Australia', 'Food':'Meat Pie'},{'Country':'Australia', 'Food':'Pavlova'}]
print(left_join(sample1, sample2, 'Country'))
# outputs:
# [{'Name': 'James', 'Country': 'Australia', 'Food': 'Meat Pie'},
# {'Name': 'James', 'Country': 'Australia', 'Food': 'Pavlova'},
# {'Name': 'Tom', 'Country': 'France', 'Food': 'Frog Legs'}]
In a data set where you can assume that rows are unique on the value of keyprop in their respective data sets, the implementation is quite a bit simpler:
def left_join(left_table, right_table, keyprop):
# create a dictionary indexed by `keyprop` on the left
left = {row[keyprop]: row for row in left_table}
# create a dictionary indexed by `keyprop` on the right
right = {row[keyprop]: row for row in right_table}
# now simply iterate through the "left side",
# grabbing rows from the "right side" if they are available
return [{**leftrow, **right.get(key, {})} for key, leftrow in left.items()]
def salary_sort(thing):
def importantparts(thing):
for i in range(1, len(thing)):
a=thing[i].split(':')
output = (a[1],a[0],a[8])
sortedlist = sorted(output, key = lambda item: item[2], reverse=True)
print(sortedlist)
return importantparts(thing)
salary_sort(employee_data)
This function is supposed to sort out a list of names by their salary.
I managed to isolate the first last names and salaries but I can't seem to get it to sort by their salaries
'Thing' aka employee_data
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary",
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000",
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500",
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500",.... etc.]
Output
['Putie', 'Arthur', '126000']
['Kertz', 'Barbara', '268500']
['Betty', 'Boop', '14500']
['Hardy', 'Ephram', '56700']
['Fardbarkle', 'Fred', '780900']
['Igor', 'Chevsky', '23400']
['James', 'Ikeda', '45000']
['Cowan', 'Jennifer', '58900']
['Jesse', 'Neal', '500']
['Jon', 'DeLoach', '85100']
['Jose', 'Santiago', '95600']
['Karen', 'Evich', '58200']
['Lesley', 'Kirstin', '52600']
['Gortz', 'Lori', '35200']
['Corder', 'Norma', '245700']
There are a number of issues with your code, but the key one is that you are sorting each row as you create it, rather than the list of lists.
Also:
importantparts() doesn't return anything (so salarysort() returns None).
You need to cast the Salary field to an int so that it sorts properly by value (they don't all have the same field-width, so an alphanumeric sort will be incorrect).
Finally, you don't need to use for i in range(1, len(thing)):, you can iterate directly over thing, taking a slice to remove the first element1.
1Note that this last is not wrong per se, but iterating directly over an iterable is considered more 'Pythonic'.
def salary_sort(thing):
def importantparts(thing):
unsortedlist = []
for item in thing[1:]:
a=item.split(':')
unsortedlist.append([a[1],a[0],int(a[8])])
print unsortedlist
sortedlist = sorted(unsortedlist, key = lambda item: item[2], reverse=True)
return (sortedlist)
return importantparts(thing)
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary",
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000",
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500",
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500"]
print salary_sort(employee_data)
Output:
[['Kertz', 'Barbara', 268500], ['Putie', 'Arthur', 126000], ['Boop', 'Betty', 14500]]
You main problem is that you reset the output sequence with each new line instead of first accumulating the data and then sorting. Another problem is that your external function declared an inner one and called it, but the inner one did not return anything. Finally, if you sort strings without converting them to integers, you will get an alphanumeric sort: ('9', '81', '711', '6') which is probably not what you expect.
By the way, the outer-inner functions pattern is of no use here, and you can use a simple direct function.
def salary_sort(thing):
output = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output.append([a[1],a[0],a[8]])
sortedlist = sorted(output, key = lambda item: int(item[2]), reverse=True)
return sortedlist
the result is as expected:
[['Kertz', 'Barbara', '268500'], ['Putie', 'Arthur', '126000'], ['Boop', 'Betty', '14500']]
If you prefer numbers for the salaries, you do the conversion one step higher:
def salary_sort(thing):
output = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output.append([a[1],a[0],int(a[8])])
sortedlist = sorted(output, key = lambda item: item[2], reverse=True)
return sortedlist
and the result is again correct:
[['Kertz', 'Barbara', 268500], ['Putie', 'Arthur', 126000], ['Boop', 'Betty', 14500]]
The problem is that you sort individual elements (meaning ['Putie', 'Arthur', '126000']), based on the salary value, and not the whole array.
Also, since you want to sort the salaries, you have to cast them to int, otherwise alphabetical sort is going to be used.
You can take a look at the following :
def salary_sort(thing):
def importantparts(thing):
data = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output = (a[1],a[0],int(a[8]))
data.append(output)
data.sort(key=lambda item: item[2], reverse=True)
return data
return importantparts(thing)
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary", \
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000", \
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500", \
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500"]
print(salary_sort(employee_data))
Which gives, as expected :
[('Kertz', 'Barbara', 268500), ('Putie', 'Arthur', 126000), ('Boop', 'Betty', 14500)]
What I did there is pushing all the relevant data for the employees into a new array (named data), and then sorted this array using the lambda function.