Aggregation for List in Python - python

I have a list. I am using Counter as below.
Here, I need to agregate the counting of movie names according to years.
listt = [['1', '1995'],
['2', '1993'],
['3', '1992'],
['4', '1993'],
['5', '1995'],
['6', '1995'],
['7', '1996'],
['8', '1993'],
['9', '1992'],
['10', '1992'],
['11', '1995'],
['12', '1994'],
['13', '1995']]
c=Counter(listt[0:][3])
Edited:
In this listt, I count movies names (like 'Toy Story','Jumanji') according to years (like'1995')
Expected result:
enter image description here

You could simply use a comprehension list to reduce each entry to just its year (as you really don't care about neither the name nor the genre), and then using that as input for the Counter.
Finally, you can take advantage of the most_common method:
https://stackoverflow.com/a/27303678/5745962
https://docs.python.org/3/library/collections.html#collections.Counter
movies_years = [x[3] for x in listt]
c = Counter(movies_years)
print(c.most_common(5))
# Output: [('1995', 5), ('1993', 3), ('1992', 3), ('1996', 1), ('1994', 1)]

Related

Search exact string in list

I am doing an exercise where I need to search the exact function name from the fun list and get the corresponding information from another list detail.
Here is the dynamic list detail:
csvCpReportContents =[
['[PLT] rand (DEBUG INFO NOT FOUND)', '11', '15'],
['rand', '10', '11', '12'],
['__random_r', '23', '45'],
['__random', '10', '11', '12'],
[],
['multiply_matrices()','23','45'] ]
Here is fun list contains function name to be searched:
fun = ['multiply_matrices()','__random_r','__random']
Expected Output for function fun[2]
['__random', '10', '11', '12']
Expected Output for function fun[1]
['__random_r', '23', '45'],
Here what I have tried for fun[2]:
for i in range(0, len(csvCpReportContents)):
row = csvCpReportContents[i]
if len(row)!=0:
search1 = re.search("\\b" + str(fun[2]).strip() + "\\b", str(row))
if search1:
print(csvCpReportContents[i])
Please suggest to me how to search for the exact word and fetch only that information.
for each fun function you can just iterate through the csv list checking if the first element starts with it
csvCpReportContents = [
['[PLT] rand (DEBUG INFO NOT FOUND)', '11', '15'],
['rand', '10', '11', '12'],
[],
['multiply_matrices()', '23', '45']]
fun=['multiply_matrices()','[PLT] rand','rand']
for f in fun:
for c in csvCpReportContents:
if len(c) and c[0].startswith(f):
print(f'fun function {f} is in csv row {c}')
OUTPUT
fun function multiply_matrices() is in csv row ['multiply_matrices()', '23', '45']
fun function [PLT] rand is in csv row ['[PLT] rand (DEBUG INFO NOT FOUND)', '11', '15']
fun function rand is in csv row ['rand', '10', '11', '12']
Updated code since you changed the test cases and requirement in the question. My first answer was based on your test cases that you wanted to match lines that started with item from fun. Now you seem to have changed that requirement to match an exact match and if not exact match match a starts with match. Below code updated to handle that scenario. However i would say next time be clear in your question and dont change the criteria after several people have answered
csvCpReportContents =[
['[PLT] rand (DEBUG INFO NOT FOUND)', '11', '15'],
['rand', '10', '11', '12'],
['__random_r', '23', '45'],
['__random', '10', '11', '12'],
[],
['multiply_matrices()','23','45'] ]
fun = ['multiply_matrices()','__random_r','__random','asd']
for f in fun:
result = []
for c in csvCpReportContents:
if len(c):
if f == c[0]:
result = c
elif not result and c[0].startswith(f):
result = c
if result:
print(f'fun function {f} is in csv row {result}')
else:
print(f'fun function {f} is not vound in csv')
OUTPUT
fun function multiply_matrices() is in csv row ['multiply_matrices()', '23', '45']
fun function __random_r is in csv row ['__random_r', '23', '45']
fun function __random is in csv row ['__random', '10', '11', '12']
fun function asd is not vound in csv
above input is nested list, so you have to consider 2D Indexing such as
l = [[1,2,3,4],[2,5,7,9]]
for finding 3 number element
you have to use the index of l[0][2]
With custom search_by_func_name function:
csvCpReportContents = [
['[PLT] rand (DEBUG INFO NOT FOUND)', '11', '15'],
['rand', '10', '11', '12'],
[],
['multiply_matrices()', '23', '45']]
fun = ['multiply_matrices()', '[PLT] rand', 'rand']
def search_by_func_name(name, content_list):
for lst in content_list:
if any(i.startswith(name) for i in lst):
return lst
print(search_by_func_name(fun[1], csvCpReportContents)) # ['[PLT] rand (DEBUG INFO NOT FOUND)', '11', '15']
print(search_by_func_name(fun[2], csvCpReportContents)) # ['rand', '10', '11', '12']
You can also use call_fun function as I did in the below code.
def call_fun(fun_name):
for ind,i in enumerate(csvCpReportContents):
if i:
if i[0].startswith(fun_name):
return csvCpReportContents[ind]
# call_fun(fun[2])
# ['rand', '10', '11', '12']

Iterating over a list based on indices in another list [duplicate]

This question already has answers here:
Split a list into parts based on a set of indexes in Python
(9 answers)
Closed 6 years ago.
I have a list of strings - foo and another list of integers- bar which keeps the track of important indices in foo.
For example:
foo = [{}.format(i) for i in range(1, 11)] # not necessarily of this format
bar = [0, 3, 5]
I would like to create a recipe for creating a list of lists, each list obtained by splitting foo based on indices in bar.
Expected output for the above example:
[['1', '2', '3'], ['4', '5'], ['6', '7', '8', '9', '10']]
For achieving this, I have created the following function which works fine:
result = []
for index, value in enumerate(b):
if index == len(b) - 1:
result.append(a[value:])
elif index == 0 and value != 0:
result.append(a[0: value])
else:
result.append(a[value: b[index + 1]])
However, I find this code highly Non-Pythonic, thanks to my C-Java background.
I would like to know a better solution to this problem (maybe we can use itertools somehow).
You could do as follows:
In [3]: [foo[a:b] for a, b in zip(bar, bar[1:]+[None])]
Out[3]: [['1', '2', '3'], ['4', '5'], ['6', '7', '8', '9', '10']]
Here is one way using a list comprehension:
In [107]: bar = bar + [len(foo)] if bar[-1] < len(foo) else bar
In [110]: [foo[i:j] for i, j in zip(bar, bar[1:])]
Out[110]: [['1', '2', '3'], ['4', '5'], ['6', '7', '8', '9', '10']]

How to work out an average from items within a dict.?

I am new to python so a simplified explanation would be much appreciated!
As of now I have a dictionary that looks like this:
names = {'Bob Smith': ['5', '6', '7', '5'], 'Fred Jones': ['8', '5', '7', '5', '9'], 'James Jackson': ['5','8','8','6','5']}
I need to do the following:
Take the last three items from each of the entries in the dict. e.g. 6, 7, 5 for bob smith.
Calculate an average based upon those values. e.g. Bob smith would be 6.
List the averages in order from highest to lowest (without the dict keys).
So far I have the following enclosed in an if statement:
if method == 2:
for scores in names.items():
score = scores[-1,-2,-3]
average = sum(int(score)) / float(3)
print(average)
I had a look at this thread too but I am still stuck.
Can anyone give me some pointers?
Scores[-1,-2,-3] does not get the last three elements. It gets the element at the key (-1,-2,-3) in a dictionary, which will raise an error in the case of a list. Scores[-3:] would get the last three elements.
When getting the scores, you need to use names.values() instead of names.items()
The python string-to-integer conversions in the int type constructor are not smart enough to handle lists of strings, only individual strings. Using map(int,score) or int(i) for i in score would fix that.
The variable score is also an extremely poor choice of name for a list of elements.
In Python3.4+, there is a statistics module
>>> names = {'Bob Smith': ['5', '6', '7', '5'], 'Fred Jones': ['8', '5', '7', '5', '9'], 'James Jackson': ['5','8','8','6','5']}
>>> import statistics
>>> sorted((statistics.mean(map(int, x[-3:])) for x in names.values()), reverse=True)
[7.0, 6.333333333333333, 6.0]
names = {'Bob Smith': ['5', '6', '7', '5'], 'Fred Jones': ['8', '5', '7', '5', '9'], 'James Jackson': ['5','8','8','6','5']}
def avg(l):
l = list(map(int,l))
return sum(l[-3:])/3
avgs = []
for each in names.values():
avgs.append(avg(each))
avgs.sort(reverse=True)
print avgs
Output:
[7, 6, 6]

How can i sort integers in a list, if they are in a string?

I have a controlled assessment, and need to be able to order scores from a test in numerical and alphabetical order. How do i do this if they are connected to the persons name who completed the quiz. All names are within 1 list, For example ["John, 9"], ["alfie, 6"] etc
any help much appreciated!
If you want to sort a list of strings based on a transformation on each of these strings, you can use the function sorted with the key keyword argument:
>>> l = ['10', '9', '100', '8']
>>> sorted(l)
['10', '100', '8', '9']
>>> sorted(l, key=int)
['8', '9', '10', '100']
>>> def transformation(x):
... return -int(x)
...
>>> sorted(l, key=transformation)
['100', '10', '9', '8']
What the key function does is that the strings are not compared directly, but the values that are returned by the function are.

Randomly extract x items from a list using python

Starting with two lists such as:
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted. For example say I wanted 50% the output would be
newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']
I have achieved this using the following code:
from random import randrange
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
LengthOfList = len(lstOne)
print LengthOfList
PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []
HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake
for x in lstOne:
if len(RangeOfListIndices)==int(HowManyIndicesToMake):
break
else:
random_index = randrange(0,LengthOfList)
RangeOfListIndices.append(random_index)
print RangeOfListIndices
newlstOne = []
newlstTwo = []
for x in RangeOfListIndices:
newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
newlstTwo.append(lstTwo[int(x)])
print newlstOne
print newlstTwo
But I was wondering if there was a more efficient way of doing this, in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
Thank you
Q. I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.
A. The most straight-forward approach directly matches your specification:
percentage = float(raw_input('What percentage? '))
k = len(data) * percentage // 100
indicies = random.sample(xrange(len(data)), k)
new_list1 = [list1[i] for i in indicies]
new_list2 = [list2[i] for i in indicies]
Q. in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
A. In Python 2 and Python 3, the random.randrange() function completely eliminates bias (it uses the internal _randbelow() method that makes multiple random choices until a bias-free result is found).
In Python 2, the random.sample() function is slightly biased but only in the round-off in the last of 53 bits. In Python 3, the random.sample() function uses the internal _randbelow() method and is bias-free.
Just zip your two lists together, use random.sample to do your sampling, then zip again to transpose back into two lists.
import random
_zips = random.sample(zip(lstOne,lstTwo), 5)
new_list_1, new_list_2 = zip(*_zips)
demo:
list_1 = range(1,11)
list_2 = list('abcdefghij')
_zips = random.sample(zip(list_1, list_2), 5)
new_list_1, new_list_2 = zip(*_zips)
new_list_1
Out[33]: (3, 1, 9, 8, 10)
new_list_2
Out[34]: ('c', 'a', 'i', 'h', 'j')
The way you are doing it looks mostly okay to me.
If you want to avoid sampling the same object several times, you could proceed as follows:
a = len(lstOne)
choose_from = range(a) #<--- creates a list of ints of size len(lstOne)
random.shuffle(choose_from)
for i in choose_from[:a]: # selects the desired number of items from both original list
newlstOne.append(lstOne[i]) # at the same random locations & appends to two newlists in
newlstTwo.append(lstTwo[i]) # sequence

Categories