Create list / text of repeated strings based on dictionary number - python

I have the following dictionary:
mydict = {'mindestens': 2,
'Situation': 3,
'österreichische': 2,
'habe.': 1,
'Über': 1,
}
How can I get a list / text out of it, that the strings in my dictionary are repeated as the number is mapped in the dictionary to it:
mylist = ['mindestens', 'mindestens', 'Situation', 'Situation', 'Situation',.., 'Über']
mytext = 'mindestens mindestens Situation Situation Situation ... Über'

You might just use loops:
mylist = []
for word,times in mydict.items():
for i in range(times):
mylist.append(word)

itertools library has convenient features for such cases:
from itertools import chain, repeat
mydict = {'mindestens': 2, 'Situation': 3, 'österreichische': 2,
'habe.': 1, 'Über': 1,
}
res = list(chain.from_iterable(repeat(k, v) for k, v in mydict.items()))
print(res)
The output:
['mindestens', 'mindestens', 'Situation', 'Situation', 'Situation', 'österreichische', 'österreichische', 'habe.', 'Über']
For text version - joining a list items is trivial: ' '.join(<iterable>)

Related

Making Dictionary with common words from paragraphs

Link to problem statement
Please help. I am very confused on how to execute this:
This is what I currently have:
def similarityAnalysis(paragraph1, paragraph2):
dict = {}
for word in lst:
if word in dict:
dict[word] = dict[word] + 1
else:
dict[word] = 1
for key, vale in dict.items():
print(key, val)
see below.
For find common words we use set intersection
For counting we use a dict
Code
lst1 = ['jack','Jim','apple']
lst2 = ['chair','jack','ball','steve']
common = set.intersection(set(lst1),set(lst2))
print('commom words below:')
print(common)
print()
print('counter below:')
counter = dict()
for word in lst1:
if word not in counter:
counter[word] = [0,0]
counter[word][0] += 1
for word in lst2:
if word not in counter:
counter[word] = [0,0]
counter[word][1] += 1
print(counter)
output
commom words below:
{'jack'}
counter below:
{'jack': [1, 1], 'Jim': [1, 0], 'apple': [1, 0], 'chair': [0, 1], 'ball': [0, 1], 'steve': [0, 1]}
Analysing your code as follows:
You use the variable name dict which is a reserved keyword (for creating dictionaries). By using this as a variable name, you will loose the ability to use the dict function.
The function uses a variable named lst which is not one of its arguments. Where do the values for this variable come from?
In the second for loop, you use the variable name vale but then later reference a different variable called val.
Otherwise, looks good. There may be other issues, that's as far as I got.
Recommend googling the following and seeing what code you find
"Python count the number of words in a paragraph"
Update:
There are many ways to do this, but here's one answer:
def word_counts(lst):
counts = {}
for word in lst:
counts[word] = counts.get(word, 0) + 1
return counts
def similarityAnalysis(paragraph1, paragraph2):
lst1 = paragraph1.split()
lst2 = paragraph2.split()
counts1 = word_counts(lst1)
counts2 = word_counts(lst2)
common_words = set(lst1).intersection(lst2)
return {word: (counts1[word], counts2[word]) for word in common_words}
paragraph1 = 'one three two one two four'
paragraph2 = 'one two one three three one'
print(similarityAnalysis(paragraph1, paragraph2))
Output:
{'three': (1, 2), 'one': (2, 3), 'two': (2, 1)}

Python How to find arrays that has a certain element efficiently

Given lists(a list can have an element that is in another list) and a string, I want to find all names of lists that contains a given string.
Simply, I could just go through all lists using if statements, but I feel that there is more efficient way to do so.
Any suggestion and advice would be appreciated. Thank you.
Example of Simple Method I came up with
arrayA = ['1','2','3','4','5']
arrayB = ['3','4','5']
arrayC = ['1','3','5']
arrayD = ['7']
foundArrays = []
if givenString in arrayA:
foundArrays.append('arrayA')
if givenString in arrayB:
foundArrays.append('arrayB')
if givenString in arrayC:
foundArrays.append('arrayC')
if givenString in arrayD:
foundArrays.append('arrayD')
return foundArrays
Lookup in a list is not very efficient; a set is much better.
Let's define your data like
data = { # a dict of sets
"a": {1, 2, 3, 4, 5},
"b": {3, 4, 5},
"c": {1, 3, 5},
"d": {7}
}
then we can search like
search_for = 3 # for example
in_which = {label for label,values in data.items() if search_for in values}
# -> in_which = {'a', 'b', 'c'}
If you are going to repeat this often, it may be worth pre-processing your data like
from collections import defaultdict
lookup = defaultdict(set)
for label,values in data.items():
for v in values:
lookup[v].add(label)
Now you can simply
in_which = lookup[search_for] # -> {'a', 'b', 'c'}
The simple one-liner is:
result = [lst for lst in [arrayA, arrayB, arrayC, arrayD] if givenString in lst]
or if you prefer a more functional style:
result = filter(lambda lst: givenString in lst, [arrayA, arrayB, arrayC, arrayD])
Note that neither of these gives you the NAME of the list. You shouldn't ever need to know that, though.
Array names?
Try something like this with eval() nonetheless using eval() is evil
arrayA = [1,2,3,4,5,'x']
arrayB = [3,4,5]
arrayC = [1,3,5]
arrayD = [7,'x']
foundArrays = []
array_names = ['arrayA', 'arrayB', 'arrayC', 'arrayD']
givenString = 'x'
result = [arr for arr in array_names if givenString in eval(arr)]
print result
['arrayA', 'arrayD']

Count Distinct Values in a List of Lists

I just imported the values from a .csv file to a list of lists, and now I need to know how many distinct users are there. The file itself looks like to following:
[['123', 'apple'], ['123', 'banana'], ['345', 'apple'], ['567', 'berry'], ['567', 'banana']]
Basically, I need to know how many distinct users (first value in each sub-list is a user ID) are there (3 in this case, over 6,000 after doing some Excel filtering), and what are the frequencies for the food itself: {'apple': 2, 'banana': 2, 'berry': 1}.
Here is the code I have tried to use for distinct values counts (using Python 2.7):
import csv
with open('food.csv', 'rb') as food:
next(food)
for line in food:
csv_food = csv.reader(food)
result_list = list(csv_follows)
result_distinct = list(x for l in result_list for x in l)
print len(result_distinct)
You can use [x[0] for x in result_list] to get a list of all the ids. Then you create a set, that is all list of all unique items in that list. The length of the set will then give you the number of unique users.
len(set([x[0] for x in result_list]))
Well that is what a Counter is all about:
import csv
from collections import Counter
result_list = []
with open('food.csv', 'rb') as food:
next(food)
for line in food:
csv_food = csv.reader(food)
result_list += list(csv_follows)
result_counter = Counter(x[1] for x in result_list)
print len(result_counter)
A Counter is a special dictionary. Internally the dictionary will contain {'apple': 2, 'banana': 2, 'berry': 1} so you can inspect all elements with their counts. len(result_counter) will give the number of distinct elements whereas sum(result_counter.values()) will give the total number of elements).
EDIT: apparently you want to count the number of distinct users. You can do this with:
len({x[0] for x in result_list})
The {.. for x in result_list} is set comprehension.
To get the distinct users, you can use a set:
result_distinct = len({x[0] for x in result_list})
And the frequencies, you can use collections.Counter:
freqs = collections.Counter([x[1] for x in result_list])
For the first question, use set,
import operator
lists = [['123', 'apple'], ['123', 'banana'], ['345', 'apple'], ['567', 'berry'], ['567', 'banana']]
nrof_users = len(set(map(operator.itemgetter(0), lists)))
print(nrof_users)
# 3
For the second question, use collections.Counter,
import collections
import operator
result = collections.Counter(map(operator.itemgetter(1), lists))
print(result)
# Counter({'apple': 2, 'banana': 2, 'berry': 1})
A=[[0, 1],[0, 3],[1, 3],[3, 4],[3, 6],[4, 5],[4, 7],[5, 7],[6, 4]]
K = []
for _ in range(len(A)):
K.extend(A[_])
print(set(K))
OUTPUT:
{0, 1, 3, 4, 5, 6, 7}
In python extend function extends the list instead of appending it that's what we need and then use set to print distinct values.

python group by and ordered a list by another list

I wonder if there is more Pythonic way to do group by and ordered a list by the order of another list.
The lstNeedOrder has couple pairs in random order. I want the output to be ordered as order in lst. The result should have all pairs containing a's then follow by all b's and c's.
The lstNeedOrder would only have either format in a/c or c/a.
input:
lstNeedOrder = ['a/b','c/b','f/d','a/e','c/d','a/c']
lst = ['a','b','c']
output:
res = ['a/b','a/c','a/e','c/b','c/d','f/d']
update
The lst = ['a','b','c'] is not actual data. it just make logic easy to understand. the actual data are more complex string pairs
Using sorted with customer key function:
>>> lstNeedOrder = ['a/b','c/d','f/d','a/e','c/d','a/c']
>>> lst = ['a','b','c']
>>> order = {ch: i for i, ch in enumerate(lst)} # {'a': 0, 'b': 1, 'c': 2}
>>> def sort_key(x):
... # 'a/b' -> (0, 1), 'c/d' -> (2, 3), ...
... a, b = x.split('/')
... return order.get(a, len(lst)), order.get(b, len(lst))
...
>>> sorted(lstNeedOrder, key=sort_key)
['a/b', 'a/c', 'a/e', 'c/d', 'c/d', 'f/d']

Get a unique list of items that occur more than once in a list

I have a list of items:
mylist = ['A','A','B','C','D','E','D']
I want to return a unique list of items that appear more than once in mylist, so that my desired output would be:
[A,D]
Not sure how to even being this, but my though process is to first append a count of each item, then remove anything equal to 1. Then dedupe, but this seems like a really roundabout, inefficient way to do it, so I am looking for advice.
You can use collections.Counter to do what you have described easily:
from collections import Counter
mylist = ['A','A','B','C','D','E','D']
cnt = Counter(mylist)
print [k for k, v in cnt.iteritems() if v > 1]
# ['A', 'D']
>>> mylist = ['A','A','B','C','D','E','D']
>>> set([i for i in mylist if mylist.count(i)>1])
set(['A', 'D'])
import collections
cc = collections.Counter(mylist) # Counter({'A': 2, 'D': 2, 'C': 1, 'B': 1, 'E': 1})
cc.subtract(cc.keys()) # Counter({'A': 1, 'D': 1, 'C': 0, 'B': 0, 'E': 0})
cc += collections.Counter() # remove zeros (trick from the docs)
print cc.keys() # ['A', 'D']
Try some thing like this:
a = ['A','A','B','C','D','E','D']
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
['A', 'D']
Reference: How to find duplicate elements in array using for loop in Python?
OR
def list_has_duplicate_items( mylist ):
return len(mylist) > len(set(mylist))
def get_duplicate_items( mylist ):
return [item for item in set(mylist) if mylist.count(item) > 1]
mylist = [ 'oranges' , 'apples' , 'oranges' , 'grapes' ]
print 'List: ' , mylist
print 'Does list have duplicate item(s)? ' , list_has_duplicate_items( mylist )
print 'Redundant item(s) in list: ' , get_duplicate_items( mylist )
Reference https://www.daniweb.com/software-development/python/threads/286996/get-redundant-items-in-list
Using a similar approach to others here, heres my attempt:
from collections import Counter
def return_more_then_one(myList):
counts = Counter(my_list)
out_list = [i for i in counts if counts[i]>1]
return out_list
It can be as simple as ...
print(list(set([i for i in mylist if mylist.count(i) > 1])))
Use set to help you do that, like this maybe :
X = ['A','A','B','C','D','E','D']
Y = set(X)
Z = []
for val in Y :
occurrences = X.count(val)
if(occurrences > 1) :
#print(val,'occurs',occurrences,'times')
Z.append(val)
print(Z)
The list Z will save the list item which occur more than once. And the part I gave comment (#), that will show the number of occurrences of each list item which occur more than once
Might not be as fast as internal implementations, but takes (almost) linear time (since set lookup is logarithmic)
mylist = ['A','A','B','C','D','E','D']
myset = set()
dups = set()
for x in mylist:
if x in myset:
dups.add(x)
else:
myset.add(x)
dups = list(dups)
print dups
another solution what's written:
def delete_rep(list_):
new_list = []
for i in list_:
if i not in list_[i:]:
new_list.append(i)
return new_list
This is my approach without using packages
result = []
for e in listy:
if listy.count(e) > 1:
result.append(e)
else:
pass
print(list(set(result)))

Categories