I am creating a function which returns the product names in order of most revenue generated. I have managed to get the function to return the costs in the correct descending order, but I am having stuck in trying to mapthe prices to the products. Is this the right way to go about solving this?
products = ["Computer", "Cell Phones", "Vacuum Cleaner"]
amounts = [3,24,8]
prices = [199,299,399]
def top3(products, amounts, prices):
totals = []
items = []
for item, num1, num2 in zip(products, amounts, prices):
totals.append(num1 * num2)
items.append(item)
return sorted(totals, reverse = True)
Using the sorted() with dictionary
def top3(products, amounts, prices):
d = dict(zip(products, zip(amounts, prices)))
return sorted(d.keys(), key=lambda x: d[x][1] * d[x][0], reverse=True)
I like your approach. However, your code will not sort the product names. You can do this:
items = []
for item, num1, num2 in zip(products, amounts, prices):
totals.append((item, num1 * num2))
return sorted(totals, reverse = True)
Alternate solution using comprehensions:
items = [(item, num1 * num2) for item, num1, num2 in zip(products, amounts, prices)]
return sorted(totals, reverse = True)
but I am having stuck in trying to map the prices to the products
First, create data that includes both the product name and revenue, and then sort that. The built-in comparison for sequences (including tuples and lists) in Python compares them an element at a time (just as strings are compared a character at a time). So:
totals = sorted(
[
(amount * price, name)
for name, amount, price in zip(products, amounts, prices)
],
reverse=True
)
You can see that each item in the list will be a pair (2-tuple) of the revenue and name; you can do what you need with this information.
The named tuple from python's built-in collections is an overlooked tools that's handy for making code like this readable. You can create a lightweight type that gives you properties like object attributes, but with the storage requirements of a simple tuple.
For example, you can create a Sales tuple type with:
Sales = namedtuple('Sales', ('name', 'amount', 'price'))
s = Sales('A Product', 20, 30.99)
# Sales(name='A Product', amount=20, price=30.99)
s.name
# 'A Product'
You can solve your problem in a nicely readable way since you can refer to the properties by attribute:
from collections import namedtuple
Sales = namedtuple('Sales', ('name', 'amount', 'price'))
products = ["Computer", "Cell Phones", "Vacuum Cleaner"]
amounts = [3,24,8]
prices = [199,299,399]
sales = [Sales(*item) for item in zip(products, amounts, prices)]
# [Sales(name='Computer', amount=3, price=199),
# Sales(name='Cell Phones', amount=24, price=299),
# Sales(name='Vacuum Cleaner', amount=8, price=399)]
#get just the names sorted by amount * price
[s.name for s in sorted(sales, key=lambda s: s.price * s.amount, reverse=True)]
# ['Cell Phones', 'Vacuum Cleaner', 'Computer']
You can merge those three lists into one list containing the product name and the revenue. Then from there, you can sort the list by the product name using sorted(). With this, you can also sort the list by the revenue generated by changing lambda x : x[0] to lambda x : x[1]. This is also assuming that all lists have the same length and the prices/amounts are in the same order as products.
merged = [(products, amounts * price) for products, amounts, price in zip(products, amounts, price)]
total = sorted(merged, key = lambda x : x[0])
In one form or other, you will want to put the products and the items together into some kind of collection, then use the total as the key for sorting but return the name.
In every case, the key argument to sort contains a reference to a function which takes one of the elements being sorted and returns a key to use for determining the sort position of that element.
Example with list of 2-tuples. (item, total)
def top3(products, amounts, prices):
item_totals = []
for item, num1, num2 in zip(products, amounts, prices):
item_totals.append((item, num1 * num2))
item_totals.sort(key=lambda t: t[1], reverse=True)
return [t[0] for t in item_totals]
Here, the lambda t: t[1] is used to construct a function which takes the 2-tuple and returns the second item, i.e. the total. It is equivalent to defining a little function:
def get_sort_key(t):
return t[1]
and then passing the function:
item_totals.sort(key=get_sort_key, reverse=True)
Example with list of dictionaries
For better readability instead of using a list of 2-tuples, we could have a list of dictionaries with keys 'item' and 'total':
def top3(products, amounts, prices):
item_totals = []
for item, num1, num2 in zip(products, amounts, prices):
item_totals.append({'item': item, 'total': num1 * num2})
item_totals.sort(key=lambda t: t['total'], reverse=True)
return [t['item'] for t in item_totals]
In that case, a function that you could use instead of the lambda can be generated by using itemgetter from the operator package. You would use:
from operator import itemgetter
and the call to sort can then be changed to:
item_totals.sort(key=itemgetter('total'), reverse=True)
Example with single dictionary
We could also just put everything into one dictionary, with the product as the key and the total as the value. But in this case, it relies on the names not being duplicated.
def top3(products, amounts, prices):
item_totals = {}
for item, num1, num2 in zip(products, amounts, prices):
item_totals[item] = num1 * num2
return sorted(item_totals.keys(),
key=lambda item: item_totals[item],
reverse=True)
There are more compact ways to write all of these, which avoid the need for the explicit for loop when building the collection prior to sorting, as demonstrated by some of the other answers, but this is the general principle.
Here is what you can do to list each product in the order of their prices, from most expensive to cheapest:
products = ["Computer", "Cell Phones", "Vacuum Cleaner"]
amounts = [3,24,8]
prices = [199,299,399]
def top3(prd, amt, prc):
lst = sorted([(prc, prd) for prd, amt, prc in zip(products, amounts, prices)],reverse = True)
return [t[1] for t in lst]
print(top3(products, amounts, prices))
Output:
['Vacuum Cleaner', 'Cell Phones', 'Computer']
Here is what you can do to list each product in the order of how much was spent on them, most to least:
products = ["Computer", "Cell Phones", "Vacuum Cleaner"]
amounts = [3,24,8]
prices = [199,299,399]
def top3(prd, amt, prc):
lst = sorted([(prc*amt, prd) for prd, amt, prc in zip(products, amounts, prices)],reverse = True)
return [t[1] for t in lst]
print(top3(products, amounts, prices))
Output:
['Cell Phones', 'Vacuum Cleaner', 'Computer']
Related
So I'm trying to create a function totalcalories(recipes, tcal) where recipes would refer to the ingredients that is going to be provided while tcal is the amount of calories that each ingredient contains...
I have figured out a way to find tcal from the given input but I have no idea how to extract the information from recipes into tcal
So like if the input is
`totalcalories(recipes = ["Pork Stew: Cabbage*5,Carrot*1, Fatty Pork*10",
"Green Salad1:Cabbage*10,Carrot*2,Pineapple*5",
"T-Bone: Carrot*2,Steak Meat*1"],
tcal = [["Cabbage:30", "Carrot:95", "Fatty Pork:2205",
"Pineapple:40", "Steak Meat:215", "Rabbit Meat:225"])
And so I expect the output to return
"22295","690", "405"
So 22295 is the result of the recipe Pork Stew, which is Cabbage(30) from tcal times 5, which is 150, and 1 carrot, which is 95 and 10 Fatty Pork, which is 2205 each. Adding all three numbers give 22295.
The same principle applies for every recipe in recipes, where Green Salad would return 690 and T-bone 405...
What I'm trying to do is to write a function that will return the total calories like the examples I just provided...
Here is my attempt at it... which clearly doesn't work..
def totalcalories(recipes: list[str], tcal: list[str]):
g = []
for x in tcal:
if x in recipes:
g.append(x)
return g
print(totalcalories(["T-Bone", "T-Bone", "Green Salad1"],["Pork Stew:Cabbage*5,Carrot*1,Fatty Pork*10",
"Green Salad1:Cabbage*10,Carrot*2,Pineapple*5",
"T-Bone:Carrot*2,Steak Meat*1"]))
What should I write to make my code work...
Please write the code in the simplest way possible. I would like to take things really slow and understand what the code means so yea, please picture yourself teaching a beginner when you resolve my issue, so that I can also learn along
Thank you!
`
The input data structure is parseable (after all, there's always a way) but would be better if it was in the form of a dictionary. So, this code converts your structure into dictionaries and then goes on to show how much easier it is to process once they're in that form.
recipes = ["Pork Stew: Cabbage*5,Carrot*1, Fatty Pork*10",
"Green Salad1:Cabbage*10,Carrot*2,Pineapple*5",
"T-Bone: Carrot*2,Steak Meat*1"]
calories = ["Cabbage:30", "Carrot:95", "Fatty Pork:2205",
"Pineapple:40", "Steak Meat:215", "Rabbit Meat:225"]
def cdict(): # convert calories list to dictionary
d = dict()
for e in calories:
e_ = e.split(':')
d[e_[0]] = int(e_[1])
return d
def rdict(): # convert recipe list to dictionary
d = dict()
for r in recipes:
i = dict()
r_ = r.split(':')
for c_ in r_[1].split(','):
i_ = c_.split('*')
i[i_[0].strip()] = int(i_[1])
d[r_[0]] = i
return d
recipes_d = rdict()
calories_d = cdict()
for k, v in recipes_d.items():
tcal = 0
for k_, v_ in v.items():
tcal += calories_d[k_] * v_
print(f'Recipe {k} has {tcal} calories')
You can define 2 patterns: one to capture the name of the recipe, the second to capture each single ingredient in the recipe itself.
pattern_recipe = re.compile(r'([^\:]*):.*')
pattern_ingredients = re.compile(r'([^\*\:,]*)\*(\d*),?', re.DOTALL)
Now, process each line of your totalcalories_recipes list create a dictionary which is keyed with the name of the recipe and element the list of ingredients with calories:
recipes_dict = {pattern_recipe.findall(x)[0]:
[(ingredient.strip(), int(calories)) for ingredient, calories in pattern_ingredients.findall(x)]
for x in totalcalories_recipes}
To be clear, the nested list comprehension cleans ingredient name, stripping spaces, and casts calories into int.
At this point, your recipes_dict should look as follows:
{'Pork Stew': [('Cabbage', 5), ('Carrot', 1), ('Fatty Pork', 10)], 'Green Salad1': [('Cabbage', 10), ('Carrot', 2), ('Pineapple', 5)], 'T-Bone': [('Carrot', 2), ('Steak Meat', 1)]}
In a similar fashion, you can get a dictionary from your tcal list:
tcal_dict = {x[0]:int(x[1]) for x in [re.findall("([^\:]*)\:(\d+)", x)[0] for x in tcal]}
Now you are in business! Looping through each recipe in recipes_dict, you can calculate the calories by recipe:
{recipe[0]:sum([ingredient[1] * tcal_dict[ingredient[0]] for ingredient in recipe[1]]) for recipe in recipes_dict.items()}
OUTPUT:
{'Pork Stew': 22295, 'Green Salad1': 690, 'T-Bone': 405}
I need to find a city with the highest population using regex, data is presented in such way:
data = ["id,name,poppulation,is_capital",
"3024,eu_kyiv,24834,y",
"3025,eu_volynia,20231,n",
"3026,eu_galych,23745,n",
"4892,me_medina,18038,n",
"4401,af_cairo,18946,y",
"4700,me_tabriz,13421,n",
"4899,me_bagdad,22723,y",
"6600,af_zulu,09720,n"]
I've done this so far:
def max_population(data):
lst = []
for items in data:
a = re.findall(r',\S+_\S+,[0-9]+', items)
lst += [[b for b in i.split(',') if b] for i in a]
return max(lst, key=lambda x:int(x[1]))
But function should return (str, int) tuple, is it possible to change my code in a way that it will return tuple without iterating list once again?
All your strings are separated by a comma. You could get the max value using split and check if the third value is a digit and is greater than the first value of the tuple.
If it is, set it as the new highest value.
def max_population(data):
result = None
for s in data:
parts = s.split(",")
if not parts[2].isdigit():
continue
tup = (parts[1], int(parts[2]))
if result is None or tup[1] > result[1]:
result = tup
return result
print(max_population(items))
Output
('eu_kyiv', 24834)
Python demo
The following long line get the wanted result (str, int) tuple:
def max_population(data):
p=max([(re.findall(r"(\w*),\d*,\w$",i)[0],int(re.findall(r"(\d*),\w$",i)[0])) for n,i in enumerate(data) if n>0],key=lambda x:int(x[1]) )
return p
in this line,enumerate(data) and n>0 were used to skip the header "id,name,poppulation,is_capital". But if data has no-header the, line would be:
def max_population(data):
p=max([(re.findall(r"(\w*),\d*,\w$",i)[0],int(re.findall(r"(\d*),\w$",i)[0])) for i in data],key=lambda x:int(x[1]) )
return p
The result for both is ('eu_kyiv', 24834)
Create a list of tuples instead of a list of lists.
import re
data = ["id,name,poppulation,is_capital",
"3024,eu_kyiv,24834,y",
"3025,eu_volynia,20231,n",
"3026,eu_galych,23745,n",
"4892,me_medina,18038,n",
"4401,af_cairo,18946,y",
"4700,me_tabriz,13421,n",
"4899,me_bagdad,22723,y",
"6600,af_zulu,09720,n"]
def max_population(data):
lst = []
for items in data:
a = re.findall(r',\S+_\S+,[0-9]+', items)
lst += [tuple(b for b in i.split(',') if b) for i in a]
return max(lst, key=lambda x:int(x[1]))
print(max_population(data))
You could create a mapping function to map the types to the data and use the operator.itemgetter function as your key in max:
from operator import itemgetter
def f(row):
# Use a tuple of types to cast str to the desired type
types = (str, int)
# slice here to get the city and population values
return tuple(t(val) for t, val in zip(types, row.split(',')[1:3]))
# Have max consume a map on the data excluding the
# header row (hence the slice)
max(map(f, data[1:]), key=itemgetter(1))
('eu_kyiv', 24834)
def salary_sort(thing):
def importantparts(thing):
for i in range(1, len(thing)):
a=thing[i].split(':')
output = (a[1],a[0],a[8])
sortedlist = sorted(output, key = lambda item: item[2], reverse=True)
print(sortedlist)
return importantparts(thing)
salary_sort(employee_data)
This function is supposed to sort out a list of names by their salary.
I managed to isolate the first last names and salaries but I can't seem to get it to sort by their salaries
'Thing' aka employee_data
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary",
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000",
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500",
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500",.... etc.]
Output
['Putie', 'Arthur', '126000']
['Kertz', 'Barbara', '268500']
['Betty', 'Boop', '14500']
['Hardy', 'Ephram', '56700']
['Fardbarkle', 'Fred', '780900']
['Igor', 'Chevsky', '23400']
['James', 'Ikeda', '45000']
['Cowan', 'Jennifer', '58900']
['Jesse', 'Neal', '500']
['Jon', 'DeLoach', '85100']
['Jose', 'Santiago', '95600']
['Karen', 'Evich', '58200']
['Lesley', 'Kirstin', '52600']
['Gortz', 'Lori', '35200']
['Corder', 'Norma', '245700']
There are a number of issues with your code, but the key one is that you are sorting each row as you create it, rather than the list of lists.
Also:
importantparts() doesn't return anything (so salarysort() returns None).
You need to cast the Salary field to an int so that it sorts properly by value (they don't all have the same field-width, so an alphanumeric sort will be incorrect).
Finally, you don't need to use for i in range(1, len(thing)):, you can iterate directly over thing, taking a slice to remove the first element1.
1Note that this last is not wrong per se, but iterating directly over an iterable is considered more 'Pythonic'.
def salary_sort(thing):
def importantparts(thing):
unsortedlist = []
for item in thing[1:]:
a=item.split(':')
unsortedlist.append([a[1],a[0],int(a[8])])
print unsortedlist
sortedlist = sorted(unsortedlist, key = lambda item: item[2], reverse=True)
return (sortedlist)
return importantparts(thing)
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary",
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000",
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500",
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500"]
print salary_sort(employee_data)
Output:
[['Kertz', 'Barbara', 268500], ['Putie', 'Arthur', 126000], ['Boop', 'Betty', 14500]]
You main problem is that you reset the output sequence with each new line instead of first accumulating the data and then sorting. Another problem is that your external function declared an inner one and called it, but the inner one did not return anything. Finally, if you sort strings without converting them to integers, you will get an alphanumeric sort: ('9', '81', '711', '6') which is probably not what you expect.
By the way, the outer-inner functions pattern is of no use here, and you can use a simple direct function.
def salary_sort(thing):
output = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output.append([a[1],a[0],a[8]])
sortedlist = sorted(output, key = lambda item: int(item[2]), reverse=True)
return sortedlist
the result is as expected:
[['Kertz', 'Barbara', '268500'], ['Putie', 'Arthur', '126000'], ['Boop', 'Betty', '14500']]
If you prefer numbers for the salaries, you do the conversion one step higher:
def salary_sort(thing):
output = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output.append([a[1],a[0],int(a[8])])
sortedlist = sorted(output, key = lambda item: item[2], reverse=True)
return sortedlist
and the result is again correct:
[['Kertz', 'Barbara', 268500], ['Putie', 'Arthur', 126000], ['Boop', 'Betty', 14500]]
The problem is that you sort individual elements (meaning ['Putie', 'Arthur', '126000']), based on the salary value, and not the whole array.
Also, since you want to sort the salaries, you have to cast them to int, otherwise alphabetical sort is going to be used.
You can take a look at the following :
def salary_sort(thing):
def importantparts(thing):
data = []
for i in range(1, len(thing)):
a=thing[i].split(':')
output = (a[1],a[0],int(a[8]))
data.append(output)
data.sort(key=lambda item: item[2], reverse=True)
return data
return importantparts(thing)
employee_data = ["FName LName Tel Address City State Zip Birthdate Salary", \
"Arthur:Putie:923-835-8745:23 Wimp Lane:Kensington:DL:38758:8/31/1969:126000", \
"Barbara:Kertz:385-573-8326:832 Ponce Drive:Gary:IN:83756:12/1/1946:268500", \
"Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:91464:6/23/1923:14500"]
print(salary_sort(employee_data))
Which gives, as expected :
[('Kertz', 'Barbara', 268500), ('Putie', 'Arthur', 126000), ('Boop', 'Betty', 14500)]
What I did there is pushing all the relevant data for the employees into a new array (named data), and then sorted this array using the lambda function.
I have a list of tuples in the format:
[(security, price paid, number of shares purchased)....]
[('MSFT', '$39.458', '1,000'), ('AAPL', '$638.416', '200'), ('FOSL', '$52.033', '1,000'), ('OCZ', '$5.26', '34,480'), ('OCZ', '$5.1571', '5,300')]
I want to consolidate the data. Such that each security is only listed once.
[(Name of Security, Average Price Paid, Number of shares owned), ...]
I used a dictionary as Output.
lis=[('MSFT', '$39.458', '1,000'), ('AAPL', '$638.416', '200'), ('FOSL', '$52.033', '1,000'), ('OCZ', '$5.26', '34,480'), ('OCZ', '$5.1571', '5,300')]
dic={}
for x in lis:
if x[0] not in dic:
price=float(x[1].strip('$'))
nos=int("".join(x[2].split(',')))
#print(nos)
dic[x[0]]=[price,nos]
else:
price=float(x[1].strip('$'))
nos=int("".join(x[2].split(',')))
dic[x[0]][1]+=nos
dic[x[0]][0]=(dic[x[0]][0]+price)/2
print(dic)
output:
{'AAPL': [638.416, 200], 'OCZ': [5.20855, 39780], 'FOSL': [52.033, 1000], 'MSFT': [39.458, 1000]}
It's not very clear what you're trying to do. Some example code would help, along with some information of what you've tried. Even if your approach is dead wrong, it'll give us a vague idea of what you're aiming for.
In the meantime, perhaps numpy's numpy.mean function is appropriate for your problem? I would suggest transforming your list of tuples into a numpy array and then applying the mean function on a slice of said array.
That said, it does work on any list-like data structure and you can specify along which access you would like to perform the average.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
EDIT:
From what I've gathered, your list of tuples organizes data in the following manner:
(name, dollar ammount, weight)
I'd start by using numpy to transform your list of tuples into an array. From there, find the unique values in the first column (the names):
import numpy as np
a = np.array([(tag, 23.00, 5), (tag2, 25.00, 10)])
unique_tags = np.unique(a[0,:]) # note the slicing of the array
Now calculate the mean for each tag
meandic = {}
for element in unique_tags:
tags = np.nonzero(a[0,:] == element) # identify which lines are tagged with element
meandic[element] = np.mean([t(1) * t(2) for t in a[tags]])
Please note that this code is untested. I may have gotten small details wrong. If you can't figure something out, just leave a comment and I'll gladly correct my mistake. You'll have to remove '$' and convert strings to floats where necessary.
>>> lis
[('MSFT', '$39.458', '1,000'), ('AAPL', '$638.416', '200'), ('FOSL', '$52.033', '1,000'), ('OCZ', '$5.26', '34,480'), ('OCZ', '$5.1571', '5,300')]
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i in lis:
... amt = float(i[1].strip('$'))
... num = int(i[2].replace(",", ""))
... d[i[0]].append((amt,num))
...
>>> for i in d.iteritems():
... average_price = sum([s[0] for s in i[1]])/len([s[0] for s in i[1]])
... total_shares = sum([s[1] for s in i[1]])
... print (i[0],average_price,total_shares)
...
('AAPL', 638.416, 200)
('OCZ', 5.20855, 39780)
('FOSL', 52.033, 1000)
('MSFT', 39.458, 1000)
Here you go:
the_list = [('msft', '$31', 5), ('msft','$32', 10), ('aapl', '$100', 1)]
clean_list = map (lambda x: (x[0],float (x[1][1:]), int(x[2])), the_list)
out = {}
for name, price, shares in clean_list:
if not name in out:
out[name] = [price, shares]
else:
out[name][0] += price * shares
out[name][1] += shares
# put the output in the requested format
# not forgetting to calculate avg price paid
# out contains total # shares and total price paid
nice_out = [ (name, "$%0.2f" % (out[name][0] / out[name][1]), out[name][1])
for name in out.keys()]
print nice_out
>>> [('aapl', '$100.00', 1), ('msft', '$23.40', 15)]
I have list MC below:
MC = [('GGP', '4.653B'), ('JPM', '157.7B'), ('AIG', '24.316B'), ('RX', 'N/A'), ('PFE', '136.6B'), ('GGP', '4.653B'), ('MNKD', '672.3M'), ('ECLP', 'N/A'), ('WYE', 'N/A')]
def fn(number):
divisors = {'B': 1, 'M': 1000}
if number[-1] in divisors:
return ((float(number[:-1]) / divisors[number[-1]])
return number
map(fn, MC)
How do I remove B, M with fn, and sort list mc high to low.
def fn(tup):
number = tup[1]
divisors = {'B': 1, 'M': 1000}
if number[-1] in divisors:
return (tup[0], float(number[:-1]) / divisors[number[-1]])
else:
return tup
The problem is that that function was meant to run on a string representation of a number but you were passing it a tuple. So just pull the 1'st element of the tuple. Then return a tuple consisting of the 0'th element and the transformed 1'st element if the 1'st element is transformable or just return the tuple.
Also, I stuck an else clause in there because I find them more readable. I don't know which is more efficient.
as far as sorting goes, use sorted with a key keyword argument
either:
MC = sorted(map(fn, MC), key=lambda x: x[0])
to sort by ticker or
MC = sorted(map(fn, MC), key=lambda x: x[1] )
to sort by price. Just pass reversed=True to the reversed if you want it high to low:
MC = sorted(map(fn, MC), key=lambda x: x[1], reversed=True)
you can find other nifty sorting tips here: http://wiki.python.org/moin/HowTo/Sorting/