I have a sorted dict as below -
check = {'id1':'01', 'id2':'03', 'id3':'03', 'id4':'10'}
I want to check the values in the above list in python codee and randomize the ids if values are same.
Expected output is check2 = {'id1':'01', 'id3':'03', 'id2':'03', 'id4':'10'} (randomize the ids which has the same values. Sometimes id2 in second position and sometimes id3 in second position)
As your dictionary is sorted by values, you can use itertools.groupby to group by identical values.
Then use random.sample to shuffle the keys per group.
Finally generate a new dictionary from the list of keys:
from itertools import groupby, chain
from random import sample
keys = list(chain.from_iterable(sample((l:=list(g)), len(l))
for k,g in groupby(check, lambda x:check[x])))
check2 = {k: check[k] for k in keys}
example output:
{'id1': '01', 'id3': '03', 'id2': '03', 'id4': '10'}
intermediate result:
>>> keys
['id1', 'id3', 'id2', 'id4']
Related
How to loop through and return any value if it is found inside any other column, and store it in a list using pandas? It doesn't matter how many times it is found, just that it is found at least one more time in a different column. If the value has repeated within the same column, it's not included in the list. Each value must be compared to every other value except from within the same column, if that makes sense.
combined_insp = []
test_df = pd.DataFrame({'area_1': ['John', 'Mike', 'Mary', 'Sarah'],
'area_2': ['John', 'Bob', 'Mary', 'Mary'],
'area_3': ['Jane', 'Sarah', 'David', 'Michael'],
'area_4': ['Diana', 'Mike', 'Bill', 'Bill']})
Expected output would be
combined_insp = ['John', 'Mary', 'Sarah', 'Mike']
A solution with itertools and set algebra:
from itertools import combinations
combined_insp = set.union(*[set(test_df[c1]).intersection(test_df[c2])
for (c1, c2) in combinations(test_df.columns, 2)])
For each unique combination of columns we take the intersection of the values. Then we take the union of all the results.
You can use pandas.apply(set) for removing duplicated elements in each list. Then You can use itertools.chain.from_iterable to flatten all elements to one list. At the end, you can use collections.Counter for counting elements and returning elements that have count > 1. (type of the result of Counter is dict and you can iterate over dict with dict.items().)
from itertools import chain
from collections import Counter
combined_insp = [k for k,v in Counter(chain.from_iterable(test_df.apply(set))).items() if v>1]
print(combined_insp)
['Sarah', 'Mike', 'Mary', 'John']
here is one way to do it
# pd.melt to flatted the table, then use groupby and take the names that appear more than once
g=df.melt(value_name='area').drop_duplicates().groupby('area')
[key for key, group in g if (group.count() > 1).all() ]
['John', 'Mary', 'Mike', 'Sarah']
counts = df.melt().drop_duplicates()['value'].value_counts()
answer = counts[counts > 1].index.to_list()
I have two dictionaries with list values; an old, and new version.
Old List:
old dict = {'bob': ['green', '5'],
'jeff': ['blue', '4'],
'sally': ['white', '7']}
New List:
new_dict = {'bob': ['green', '5'],
'jeff': ['blue', '4'],
'sally': ['black', '7']}
If the first list value changes (in this case, the colour on the new list), I would like to update the 2nd value of the list (the number, to let's say 0).
What is the best solution in Python3 to compare the first dictionary values between two dictionary value lists, and update the second dictionary list value if the first list value has been updated?
You could iterate through the old_dict checking its values against the new_dict and change them if they don't agree.
for k,v in old_dict.items():
if k in new_dict and v[0] != new_dict[k][0]:
new_dict[k][1] = '0';
d = {'Name1': ['Male', '18'],
'Name2': ['Male', '16'],
'Name3': ['Male', '18'],
'Name4': ['Female', '18'],
'Name5': ['Female', '18']}
I am trying to find a way to isolate the duplicate keys to a list if any. Like:
['Name1', 'Name3']
['Name4', 'Name5']
How can I achieve this? Thanks
An imperative solution would be to just iterate over the dictionary and add the items into another dictionary that uses the gender-age-tuple as a key, for example:
# using a defaultdict, which automatically adds an empty list for missing keys when first accesses
from collections import defaultdict
by_data = defaultdict(list)
for name, data in d.items():
# turn the data into something immutable, so it can be used as a dictionary key
data_tuple = tuple(data)
by_data[data_tuple].append(name)
the result will be:
{('Female', '18'): ['Name4', 'Name5'],
('Male', '16'): ['Name2'],
('Male', '18'): ['Name1', 'Name3']})
You can filter out entries with only one value, if you are only interested in duplicates
try this:
d = {'Name1': ['Male', '18'],
'Name2': ['Male', '16'],
'Name3': ['Male', '18'],
'Name4': ['Female', '18'],
'Name5': ['Female', '18']}
ages = {} #create a dictionary to hold items with identical ages
#loop over all the items in the dictionary
for key in d.keys():
age = d[key][1]
#if the ages dictionary still does not have an item
#for the age we create an array to hold items with the same age
if(age not in ages.keys()):
ages[age] = []
ages[age].append(key) #finally append items with the same ages together
#loop over all the items in the ages dictionary
for value in ages.values():
if(len(value) > 1):#if we have more than one item in the ages dictionary
print(value) #print it
I'm guessing you meant duplicate values and not keys, in which case you could do this with pandas:
import pandas as pd
df = pd.DataFrame(d).T #load the data into a dataframe, and transpose it
df.index[df.duplicated(keep = False)]
df.duplicated(keep = False) gives you a series of True/False, where the value is True whenever that item has a duplicate, and False otherwise. We use that to index the row names, which is 'Name1','Name2', etc.
I have one list of program names which need to be sorted into lists of smaller jsons based of a priority list. I need to do this in python 3.
B and C being of the same priority 2, they will be in a list together.
program_names = ['A','B','C','D']
priorities = [1,2,2,3]
Required end result:
[[{"name": "A"}], [{"name":"B"}, {"name":"C"}], [{"name":"D"}]]
Current code:
program_names_list = []
final_list = []
for x in program_names.split(','):
program_names_list.append(x)
for x in program_names_list:
final_list.append([{"name": x}])
That's what I currently have which is outputting the following result:
[[{'name': 'A'}], [{'name': 'B'}], [{'name': 'C'}], [{'name': 'D'}]]
I should add that program_names is a string "A,B,C,D"
Full solution
items = {}
for k, v in zip(priorities, program_names):
items.setdefault(k, []).append(v)
[[{'name': name} for name in items[key]] for key in sorted(items.keys())]
returns:
[[{'name': 'A'}], [{'name': 'B'}, {'name': 'C'}], [{'name': 'D'}]]
In steps
Create a dictionary that uses the priorities as keys and a list of all program names with corresponding priority as values:
items = {}
for k, v in zip(priorities, program_names):
items.setdefault(k, []).append(v)
Go through the sorted keys and create a new list of program names by getting them from the dictionary by the key:
[[{'name': name} for name in items[key]] for key in sorted(items.keys())]
Loop through the priorities and use a dictionary with priorities as keys and lists of programs as values to group all elements with the same priority.
In [24]: from collections import defaultdict
In [25]: program_names = ['A','B','C','D']
In [26]: priorities = [1,2,2,3]
In [27]: d = defaultdict(list)
In [28]: for i, p in enumerate(sorted(priorities)):
d[p].append({'name': program_names[i]})
....:
In [29]: list(d.values())
Out[29]: [[{'name': 'A'}], [{'name': 'B'}, {'name': 'C'}], [{'name': 'D'}]]
Use groupby.
from itertools import groupby
program_names = ['a','b','c','d']
priorities = [1,2,2,3]
data = zip(priorities, program_names)
groups_dict = []
for k, g in groupby(data, lambda x: x[0]):
m = map(lambda x: dict(name=x[1]), list(g))
groups_dict.append(m)
print(groups_dict)
Although this may be wrong from an educational point of view, I cannot resist answering such questions by one-liners:
[[{'name': p_n} for p_i, p_n in zip(priorities, program_names) if p_i == p] for p in sorted(set(priorities))]
(This assumes your "priorities" list may be sorted and is less efficient than the "normal" approach with a defaultdict(list)).
Update: Borrowing from damn_c-s answer, here's an efficient one-liner (not counting the implied from itertools import groupby):
[[{'name': pn} for pi, pn in l] for v, l in groupby(zip(priorities, program_names), lambda x: x[0])]
I have the following lists:
keys = ['god', 'hel', 'helo']
values = ['good','god', 'hell', 'hello']
I want to create a dictionary like this:
{'god':set(['god', 'good']), 'hel':'hell', 'helo': 'hello'}
where the key is determined by reducing repeated letters in the value to a single letter.
How would I do this programmatically?
"all repeated letters are reduced to single letters"
Actually according to this rule you don't need the keys list, because it will be created from the values.
Also I would suggest to use a dict of sets for all values, also for the single ones, such as "hell" and "hello". It will make the usage of the dictionary much simpler:
import itertools as it
values = ['good','god', 'hell', 'hello']
d = {}
for value in values:
d.setdefault(''.join(k for k,v in it.groupby(value)), set()).add(value)
# d == {'god': set(['god', 'good']),
# 'hel': set(['hell']),
# 'helo': set(['hello'])}
This should do it for you:
import re
import collections
values = ['good', 'god', 'hell', 'hello']
result = collections.defaultdict(set)
for value in values:
key = re.sub(r'(\w)\1*', r'\1', value)
result[key].add(value)
# result: defaultdict(<type 'set'>, {'hel': set(['hell']), 'god': set(['god', 'good']), 'helo': set(['hello'])})
# if you want to ensure that all your keys exist in the dictionary
keys = ['god', 'hel', 'helo', 'bob']
for key in keys:
result[key]
# result: defaultdict(<type 'set'>, {'hel': set(['hell']), 'god': set(['god', 'good']), 'helo': set(['hello']), 'bob': set([])})
Some code golf (sort of - obviously more obfuscation is possible) upon eumiro's answer, observing that itertools.groupby can be used twice (once to get the letter-sets in order of appearance, something I didn't think of - and again to actually create the key-value pairs for the dictionary).
from itertools import groupby
data = ['good', 'god', 'hell', 'hello']
dict((''.join(k), list(v)) for k, v in groupby(data, lambda x: zip(*groupby(x))[0]))
How it works: each word is first processed with lambda x: zip(*groupby(x))[0]. That is, we take the list of (letter, grouper-object) pairs produced by the groupby generator, transform it into a pair (list-of-letters, list-of-grouper-objects) (the generator contents are implicitly evaluated for passing to zip), and discard the list-of-grouper-objects which we don't want. Then, we group the entire word-list according to the list-of-letters produced by each word, transform the list of letters back into a string, evaluate the grouper-object generators to get the corresponding words, and use those key-value pairs to construct the final dict.
Edit: I guess it's cleaner to do the ''.join step within the lambda:
from itertools import groupby
data = ['good', 'god', 'hell', 'hello']
dict((k, list(v)) for k, v in groupby(data, lambda x: ''.join(zip(*groupby(x))[0])))