Say, I have a data containing an item in line[1] followed by its frequency count in line[2]
Item Frequency.Count
A 5
B 4
C 3
D 2
E 1
But I want the output to be like:
Data
A
A
A
A
A
.
.
.
C
C
C
D
D
E
This is somewhat the reverse of the following code:
my_list = sorted(word_freq.items(), key = lambda x:x[1], reverse = True)
for word,freq in my_list:
print ("%‐10s %d" % (word, freq))
You might think this is a silly approach to a frequency analysis but I wanted to learn if there are reverse operation for counting frequencies, does anyone have an idea about unsorting the given data given there are two columns? Thanks so much for the advice.
Input list: l = [['A', 5], ['B', 3], ['C', 1]]
out_put=[]
[(out_put.extend([j for j in lst[0]*lst[1])] )) for lst in l]
out_put : ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C']
IF you loop around your item and in second loop just replicate it no of times the frequency list than you are done i guess.Please try this approach
Do you want to do that ? I think this is what #Navin Dalal meant (not sure though).
with
> l
[['A', 5],
['B', 3],
['C', 1]]
you can get what you want:
> list("".join([i*j for i, j in l]))
['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C']
The key part being:
[i*j for i, j in l]
['AAAAA', 'BBB', 'C']
Because you can multiply a string a number of times given by an integer.
Hope this helps.
Related
I have a list of elements from which I want to remove those elements whose count is less than or equal to 2 in all the list.
For example:
A = [['a','b','c'],['b','d'],['c','d','e'],['c','e','f'],['b','c','e','g']]
I want to remove 'a', 'd', 'f', 'g' from A and store the rest in B so that the list becomes:
B = [['b','c'],['b'],['c','e'],['c','e'],['b','c','e']]
I created a dictionary which will store all the count of elements and based on that I want to remove the elements with count less than or equal to 2.
Below is the code which I have written so far.
for i in range(len(A)):
for words in A[i]:
word_count[words] +=1
B = [A[i] for i in range(len(A)) if word_count[words]<2]
You can use collections.Counter:
from collections import Counter
import itertools
A = [['a','b','c'],['b','d'],['c','d','e'],['c','e','f'],['b','c','e','g']]
c = Counter(itertools.chain(*A))
new_a = [[b for b in i if c[b] > 2] for i in A]
Output:
[['b', 'c'], ['b'], ['c', 'e'], ['c', 'e'], ['b', 'c', 'e']]
Before you add a new key to the dictionary, you have to check if the key exists. If not, just add the key to the dictionary. Otherwise, update the key's value.
A = [['a','b','c'],['b','d'],['c','d','e'],['c','e','f'],['b','c','e','g']]
word_count = {}
for i in range(len(A)):
for words in A[i]:
if words not in word_count:
word_count[words] = 0
word_count[words] += 1
Then filter the initial list using the created dictionary.
B = [[x for x in A[i] if word_count[x] > 2] for i in range(len(A))]
print(B)
Output
[['b', 'c'], ['b'], ['c', 'e'], ['c', 'e'], ['b', 'c', 'e']]
Suppose I have list
l = ['a', 'c', 'b']
and what I want is a list where those elements appear twice, one after the other, so
['a', 'a', 'c', 'c', 'b', 'b']
and I want to do this in the most pythonic way possible.
My half solution is doing something like
[[l[i], l[i]] for i in range(len(l))]
which yields
[['a', 'a'], ['c', 'c'], ['b', 'b']]
From here, I'd have to parse (walk) the list to remove the inner lists and obtain a single flat list.
Anyone has a better idea to do this in one go? Obviously things like l * 2 wouldn't help as it gives ['a', 'c', 'b', 'a', 'c', 'b'] and I want the same elements adjacent.
l_2 = [item for item in l for i in range(n)]
Link to origin: Stackoverflow: Repeating elements of a list n times
Using only list comprehension, you can do:
[i for j in my_list for i in [j]*2]
Output:
>>> my_list = ['a', 'c', 'b']
>>> [i for j in my_list for i in [j]*2]
['a', 'a', 'c', 'c', 'b', 'b']
You can zip the list against itself, then flatten it in a list comprehension.
>>> [i for j in zip(l,l) for i in j]
['a', 'a', 'c', 'c', 'b', 'b']
You can use zip function
l = ['a', 'c', 'b']
a = [i for j in zip(l,l) for i in j]
print(a)
Output
['a', 'a', 'c', 'c', 'b', 'b']
More general:
def ntimes(iterable, times=2):
for elt in iterable:
for _ in range(times):
yield elt
Here is a short solution without list comprehension, using the intuitive idea l*2:
sorted(l*2, key=l.index)
#['a', 'a', 'c', 'c', 'b', 'b']
If you like functional approaches, you can do this:
from itertools import chain, tee
l = ['a', 'c', 'b']
n = 2
list(chain.from_iterable(zip(*tee(l, n))))
While this might not perform as fast as the other answers, it can easily be used for arbitrary iterables (especially when they are infite or when you don't know when they end) by omitting list().
(Note that some of the other answers can also be adapted for arbitrary iterables by replacing their list comprehension by a generator expression.)
This is a very simple question but I can't seem to understand why I am not getting it.
def listindex():
li = ['a', 'e', 'a', 'd', 'b', 'a', 'e']
for x in li:
if x == 'a':
print(li.index(x))
Result:
0
0
0
Expected Result:
0
2
5
Although it iterates over all the item I only get first item index, why is it? Also advise even though its pretty simple.
index returns the index of the first element only. From the docs
Return the index in the list of the first item whose value is x. It is an error if there is no such item.
Use enumerate instead. When you iterate using enumerate you can access the element and its index on the loop:
>>> li = ['a', 'e', 'a', 'd', 'b', 'a', 'e']
>>> for i,element in enumerate(li):
... if element == 'a':
... print(i)
...
0
2
5
li = ['a', 'e', 'a', 'd', 'b', 'a', 'e']
Use List Comprehension:
[i for i, x in enumerate(li) if x=='a']
Output: [0, 2, 5]
Use keyword enumerate() this will generate a counter from 0 to N.
for i, x in enumerate(li):
So i will contain the indexes for li.
I have a list, for example:
res = [['a', 'b', 'a'], ['a', 'b', 'c'], ['a']]
I want to count how many lists contains a specific letter. For instance, 'a' is contained in 3 lists, 'b' is contained in 2 lists and 'c' is contained in 1 list.
The code below is what I have so far:
count=0
docs='a'
list1=[]
for c in range(len(res)):
for i in res[0]:
list1.append(i)
for i in list1:
if i == docs:
count=1
print count
When you find yourself saying "I want to count how many ...", there's a good chance Counter(), from the collections module, can help.
In this case, we want to count how many lists each letter occurs in. Since we don't want to count any letter more than once for each sublist, we'll convert them to sets:
>>> res = [['a', 'b', 'a'], ['a', 'b', 'c'], ['a']]
>>> [set(x) for x in res]
[{'b', 'a'}, {'c', 'b', 'a'}, {'a'}]
The order gets mixed up, but that doesn't matter, as long as we only have one letter from each list.
Now we want to join those sets of letters into one sequence, so we can count them all. We could do it like this:
>>> [s for x in res for s in set(x)]
['b', 'a', 'c', 'b', 'a', 'a']
... but that's a little hard to follow. Luckily there's a function in the itertools module called chain() that does the same thing and is a little easier to read. We want the chain.from_iterable() version:
>>> from itertools import chain
>>> c = chain.from_iterable(set(x) for x in res)
>>> list(c)
['b', 'a', 'c', 'b', 'a', 'a']
Don't worry about that list(c) too much - chain() returns an iterator, which means nothing gets calculated until we actually do something with the result (like make it into a list), so I did that to show what it produces.
Anyway, all we need to do now is pass that sequence to Counter():
>>> from collections import Counter
>>> Counter(chain.from_iterable(set(x) for x in res))
Counter({'a': 3, 'b': 2, 'c': 1})
Here's the whole thing:
from collections import Counter
from itertools import chain
res = [['a', 'b', 'a'], ['a', 'b', 'c'], ['a']]
letter_count = Counter(chain.from_iterable(set(x) for x in res))
print(letter_count['a']) # prints 3
A simple list comprehension does the trick.
>>> L=[['a', 'b', 'a'], ['a', 'b', 'c'], ['a']]
>>> ['a' in x for x in L]
[True, True, True]
>>> ['b' in x for x in L]
[True, True, False]
Using the knowledge that True is considered 1:
>>> sum('a' in x for x in L)
3
>>> sum('b' in x for x in L)
2
>>> sum('c' in x for x in L)
1
In the following code, why does my code not iterate properly? I'm probably missing one line but I can't figure out why it doesn't work.
I have a function with the following test case:
>>> borda([['A', 'B', 'C', 'D'], ['B', 'A', 'C', 'D'], ['B', 'C', 'D', 'A']])
('B', [5, 8, 4, 1])
Where lists in the parameter are rankings, each #1 rank gets 3 points, #2 gets 2 points, #3 gets 1 point, and no other ranks get anything. There may not necessarily four choices. The first element in the tuple should be the choice with the highest number of points, and the second element is the number of points each choice got, in alphabetical order.
I'm not done with the function, but I'm trying to get a dictionary of the choices as the keys in alphabetical order and the count of rankings as the values, but the output is a dictionary of only the very last element of the last list in the parameter.
L = ['A', 'B', 'C', 'D'] #This is referenced outside the function since it might change
D = {}
i = 0
num = 0
while num < len(L):
num += 1
for choice in L:
while i < len(parameter):
for item in parameter:
if item[0] == choice:
D[choice] = D.get(choice, 0) + 3
if item[1] == choice:
D[choice] = D.get(choice, 0) + 2
if item[2] == choice:
D[choice] = D.get(choice, 0) + 1
i += 1
return D
The way I'd do this is something like this:
import operator
from collections import defaultdict
listoflists = [['A', 'B', 'C', 'D'], ['B', 'A', 'C', 'D'], ['B', 'C', 'D', 'A']]
def borda(listoflists):
outdict = defaultdict(int)
for item in listoflists:
outdict[item[0]] += 3
outdict[item[1]] += 2
outdict[item[2]] += 1
highestitem = max(outdict.iteritems(), key=operator.itemgetter(1))[0]
outlist = [outdict[item[0]] for item in sorted(outdict.keys())]
return (highestitem, outlist)
Update:
I'm not sure why you wouldn't be able to import standard modules, but if for whatever reason you're forbidden from using the import statement, here's a version with only built-in functions:
listoflists = [['A', 'B', 'C', 'D'], ['B', 'A', 'C', 'D'], ['B', 'C', 'D', 'A']]
def borda(listoflists):
outdict = {}
for singlelist in listoflists:
# Below, we're just turning singlelist around in order to
# make use of index numbers from enumerate to add to the scores
for index, item in enumerate(singlelist[2::-1]):
if item not in outdict:
outdict[item] = index + 1
else:
outdict[item] += index + 1
highestitem = max(outdict.iteritems(), key=lambda i: i[1])[0]
outlist = [outdict[item[0]] for item in sorted(outdict.keys())]
return (highestitem, outlist)
If you had 2.7:
import operator
from collections import Counter
listoflists = [['A', 'B', 'C', 'D'], ['B', 'A', 'C', 'D'], ['B', 'C', 'D', 'A']]
def borda(listoflists):
outdict = sum([Counter({item[x]:3-x}) for item in listoflists for x in range(3]],
Counter())
highestitem = max(outdict.iteritems(), key=operator.itemgetter(1))[0]
outlist = [outdict[item[0]] for item in sorted(outdict.iteritems(),
key=operator.itemgetter(0))]
return (highestitem, outlist)
Look ma.. no loops :-)
Check out http://ua.pycon.org/static/talks/kachayev/index.html to see why this is better.