Adding "sequential" information to python list using dictionaries - python

The problem
I would like to create a dictionary of dicts out of a flat list I have in order to add a "sequentiality" piece of information, but I am having some trouble finding a solution.
The list is something like
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
and I am shooting for a dict like:
dictionary = {
'Step_1': {
'Q=123',
'W=456',
'E=789'
},
'Step_2': {
'Q=753',
'W=159',
'E=888'
}
}
I would like to end up with a function with an arbitrary number of Steps, in order to apply it to my dataset. Suppose that in the dataset there are lists like a with 1 <= n <6 Steps each.
My idea
Up to now, I came up with this:
nsteps = a.count("Q")
data = {}
for i in range(nsteps):
stepi = {}
for element in a:
new = element.split("=")
if new[0] not in stepi:
stepi[new[0]] = new[1]
else:
pass
data[f"Step_{i}"] = stepi
but it doesn't work as intended: both steps in the final dictionary contain the data of Step_1.
Any idea on how to solve this?

One way would be:
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
indices = [i for i, v in enumerate(a) if v[0:2] == 'Q=']
dictionary = {f'Step_{idx+1}': {k: v for k, v in (el.split('=') for el in a[s:e])}
for idx, (s, e) in enumerate(zip(indices, indices[1:] + [len(a)]))}
print(dictionary)
{'Step_1': {'Q': '123', 'W': '456', 'E': '789'},
'Step_2': {'Q': '753', 'W': '159', 'E': '888'}}
Details:
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
# Get indices where a step starts.
# This could handle also steps with variable amount of elements and keys starting with 'Q' that are not exactly 'Q'.
indices = [i for i, v in enumerate(a) if v[0:2] == 'Q=']
# Get the slices of the list starting at Q and ending before the next Q.
slices = list(zip(indices, indices[1:] + [len(a)]))
print(slices)
# [(0, 3), (3, 6)]
# Get step index and (start, end) pair for each slice.
idx_slices = list(enumerate(slices))
print(idx_slices)
# [(0, (0, 3)), (1, (3, 6))]
# Split the strings in the list slices and use the result as key-value pair for a given start:end.
# Here an example for step 1:
step1 = idx_slices[0][1] # This is (0, 3).
dict_step1 = {k: v for k, v in (el.split('=') for el in a[step1[0]:step1[1]])}
print(dict_step1)
# {'Q': '123', 'W': '456', 'E': '789'}
# Do the same for each slice.
step_dicts = {f'Step_{idx+1}': {k: v for k, v in (el.split('=') for el in a[s:e])}
for idx, (s, e) in idx_slices}
print(step_dicts)
# {'Step_1': {'Q': '123', 'W': '456', 'E': '789'}, 'Step_2': {'Q': '753', 'W': '159', 'E': '888'}}

You were almost there. The way you were counting the number of "Q"s was wrong and some lines of code had a wrong indentation (for instance data[f"Step_{i}"] = stepi)
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
def main():
nsteps = len([s for s in a if "Q" in s])
data = {}
for i in range(nsteps):
stepi = {}
for element in a:
new = element.split("=")
if new[0] not in stepi:
stepi[new[0]] = new[1]
data[f"Step_{i}"] = stepi
return data
if __name__ == "__main__":
data = main()

First group by items like this:
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
o = groupby(sorted(a, key=lambda x: x[0]), key=lambda x: x[0])
then create a dictionary like this:
d = {i: [j[1] for j in g] for i, g in o}
then iterate over them and make your result:
result = {f"step_{i+1}": [v[i] for v in r.items()] for i in range(len(max(r.values(), key=len)))}
the result will be:
Out[47]: {'step_1': ['E=789', 'Q=123', 'W=456'], 'step_2': ['E=888', 'Q=753', 'W=159']}

From what I understood from your question:
We can group the items in the list, in this case, a group of three elements, and loop through them three at a time.
With some help from this answer:
from itertools import zip_longest
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
def grouper(n, iterable):
args = [iter(iterable)] * n
return zip_longest(*args)
result = dict()
for i, d in enumerate(grouper(3, a), start=1):
dict.update({f"Step_{i}": set(d)})
print(result)
{
'Step_1': {'E=789', 'Q=123', 'W=456'},
'Step_2': {'E=888', 'Q=753', 'W=159'}
}

Related

How to get multiple most frequent k-mers of a string using Python?

If I insert the following
Insert the Text:
ACACACA
Insert a value for k:
2
For the following codes
print("Insert the Text:")
Text = input()
print("Insert a value for k:")
k = int(input())
Pattern = " "
count = [ ]
FrequentPatterns = [ ]
def FrequentWords(Text, k):
for i in range (len(Text)-k+1):
Pattern = Text[i: i+k]
c = 0
for i in range (len(Text)-len(Pattern)+1):
if Text[i: i+len(Pattern)] == Pattern:
c = c+1
else:
continue
count.extend([c])
print(count)
if count[i] == max(count):
FrequentPatterns.extend([Pattern])
return FrequentPatterns
FrequentWords(Text, k)
I get the following out put
Insert the Text:
ACACACA
Insert a value for k:
2
[3, 3, 3, 3, 3, 3]
['CA']
Clearly there are two FrequentPatterns. So the last list output should be ['AC', 'CA']
I don't know why this code isn't working. Really appreciate if anyone could help.
Here's how would solve this:
from itertools import groupby
def find_kgrams(string, k):
kgrams = sorted(
string[j:j+k]
for i in range(k)
for j in range(i, (len(string) - i) // k * k, k)
)
groups = [(k, len(list(g))) for k, g in groupby(kgrams)]
return sorted(groups, key=lambda i: i[1], reverse=True)
The way this works is:
it produces string chunks of the given length k, e.g.:
starting from 0: 'ACACACA' -> 'AC', 'AC', 'AC'
starting from 1: 'ACACACA' -> 'CA', 'CA', 'CA'
...up to k - 1 (1 is the maximum for k == 2)
groupby() groups those chunks
sorted() sorts them by count
a list of tuples of kgrams and their count is returned
Test:
s = 'ACACACA'
kgrams = find_kgrams(s, 2)
print(kgrams)
prints:
[('AC', 3), ('CA', 3)]
It's already sorted, you can pick the most frequent one(s) from the front of the returned list:
max_kgrams = [k for k, s in kgrams if s == kgrams[1][1])
print(max_kgrams)
prints:
['AC', 'CA']

In a dictionary where keys are tuples, remove all keys that does not have a specific value in specific positions

I have a dictionary where each key is a tuple ength N. In each position there is either a string or an empty string.
d = {('Word A','Word B','','','Word C',....) : 50,
('Word F', '','','',....,'Word H') : 10,
....
}
I have a category dictionary containing indexes, if a key in d has an empty string at every position specified by the indexes, it belongs to the category. A key can belong to multiple categories.
category_dictionary= { 'Category A':[1,2,3] , 'Category B' : [0,3,4] , .... }
In this example, the second entry in d ('Word F', '','','',....,'Word H') belongs to Category A since it has an empty string in position 1,2 and 3.
I want to remove all keys in d which do not belong to any category. What would be an efficient way of doing this? Here is code which is working but is slow.
filtered_list = []
for current_tuple in list(d.keys()):
keep_tuple = False
for category,idxs in category_dictionary.items():
all_idxs_empty = True
for idx in idxs:
if current_tuple[idx] != '':
all_idxs_empty = False
if all_idxs_empty:
filtered_list.append(current_tuple)
break
d_filter = {k:v for k,v in d.items() if k in filtered_list}
What would be a more efficient way of doing this? If I have M keys, T categories and the maximum length is U, the complexity is between complexity [O(M*T),O(M*T*U)]
Is there a way to reduce the complexity somehow?
Example data with N = 3 and 2 categories
d = {('A','','B','H') : 10,
('','','','H') : 20,
('','','F','T') : 30,
('A','C','G','') : 0
}
category_dictionary = { 'Category A':[0,1],'Category B' :[3]}
Expected output
d_filter = {('','','','H'): 20,
('','','F','T'):30,
('A','C','G','') : 0
}
Try this:
def f(tuple_, category_dictionary):
l=[i for i in range(len(tuple_)) if tuple_[i]=='']
return any([set(k)-set(l)==set() for k in category_dictionary.values()])
m=list(d.keys())
for i in m:
if not f(i, category_dictionary):
del d[i]
Not sure if this is faster, but this is simpler.
filtered_list = []
category_set = [set(x) for x in category_dictionary.values()]
for current_tuple in list(d.keys()):
#in_category = [i for i,x in enumerate(current_tuple) if x==''] in category_dictionary.values()
current_set = {i for i,x in enumerate(current_tuple) if x==''}
in_category = any([set.issubset(x, current_set) for x in category_set])
if in_category:
filtered_list.append(current_tuple)
d_filter = {k:v for k,v in d.items() if k in filtered_list}
d_filter

adding empty string while joining the 2 lists - Python

I have 2 lists
mainlist=[['RD-12',12,'a'],['RD-13',45,'c'],['RD-15',50,'e']] and
sublist=[['RD-12',67],['RD-15',65]]
if i join both the list based on 1st element condition by using below code
def combinelist(mainlist,sublist):
dict1 = { e[0]:e[1:] for e in mainlist }
for e in sublist:
try:
dict1[e[0]].extend(e[1:])
except:
pass
result = [ [k] + v for k, v in dict1.items() ]
return result
Its results in like below
[['RD-12',12,'a',67],['RD-13',45,'c',],['RD-15',50,'e',65]]
as their is no element in for 'RD-13' in sublist, i want to empty string on that.
The final output should be
[['RD-12',12,'a',67],['RD-13',45,'c'," "],['RD-15',50,'e',65]]
Please help me.
Your problem can be solved using a while loop to adjust the length of your sublists until it matches the length of the longest sublist by appending the wanted string.
for list in result:
while len(list) < max(len(l) for l in result):
list.append(" ")
You could just go through the result list and check where the total number of your elements is 2 instead of 3.
for list in lists:
if len(list) == 2:
list.append(" ")
UPDATE:
If there are more items in the sublist, just subtract the lists containing the 'keys' of your lists, and then add the desired string.
def combinelist(mainlist,sublist):
dict1 = { e[0]:e[1:] for e in mainlist }
list2 = [e[0] for e in sublist]
for e in sublist:
try:
dict1[e[0]].extend(e[1:])
except:
pass
for e in dict1.keys() - list2:
dict1[e].append(" ")
result = [[k] + v for k, v in dict1.items()]
return result
You can try something like this:
mainlist=[['RD-12',12],['RD-13',45],['RD-15',50]]
sublist=[['RD-12',67],['RD-15',65]]
empty_val = ''
# Lists to dictionaries
maindict = dict(mainlist)
subdict = dict(sublist)
result = []
# go through all keys
for k in list(set(list(maindict.keys()) + list(subdict.keys()))):
# pick the value from each key or a default alternative
result.append([k, maindict.pop(k, empty_val), subdict.pop(k, empty_val)])
# sort by the key
result = sorted(result, key=lambda x: x[0])
You can set up your empty value to whatever you need.
UPDATE
Following the new conditions, it would look like this:
mainlist=[['RD-12',12,'a'], ['RD-13',45,'c'], ['RD-15',50,'e']]
sublist=[['RD-12',67], ['RD-15',65]]
maindict = {a:[b, c] for a, b, c in mainlist}
subdict = dict(sublist)
result = []
for k in list(set(list(maindict.keys()) + list(subdict.keys()))):
result.append([k, ])
result[-1].extend(maindict.pop(k, ' '))
result[-1].append(subdict.pop(k, ' '))
sorted(result, key=lambda x: x[0])
Another option is to convert the sublist to a dict, so items are easily and rapidly accessible.
sublist_dict = dict(sublist)
So you can do (it modifies the mainlist):
for i, e in enumerate(mainlist):
data: mainlist[i].append(sublist_dict.get(e[0], ""))
#=> [['RD-12', 12, 'a', 67], ['RD-13', 45, 'c', ''], ['RD-15', 50, 'e', 65]]
Or a one liner list comprehension (it produces a new list):
[ e + [sublist_dict.get(e[0], "")] for e in mainlist ]
If you want to skip the missing element:
for i, e in enumerate(mainlist):
data = sublist_dict.get(e[0])
if data: mainlist[i].append(data)
print(mainlist)
#=> [['RD-12', 12, 'a', 67], ['RD-13', 45, 'c'], ['RD-15', 50, 'e', 65]]

Add Multiplie values in list for the same key using python

Please check the below code and my output. I have run my code i got the below output but i want Expected Result.
list_data = ['ABCD:SATARA', 'XYZ:MUMBAI', 'PQR:43566', 'LMN:455667', 'XYZ:PUNE']
Expected Result is :-
{
"ABCD": "SATARA",
"XYZ": ["MUMBAI", "PUNE"]
"PQR": "43566",
"LMN": "455667"
}
My Code :-
list_data = ['ABCD:SATARA', 'XYZ:MUMBAI', 'PQR:43566', 'LMN:455667', 'XYZ:PUNE']
for each_split_data in list_data:
split_by_colon = each_split_data.split(":")
if split_by_colon[0] is not '':
if split_by_colon[0] in splittded_data_dict:
# append the new number to the existing array at this slot
splittded_data_dict[split_by_colon[0]].append(split_by_colon[1])
else:
# create a new array in this slot
splittded_data_dict[split_by_colon[0]] = [split_by_colon[1]]
print(json.dumps(splittded_data_dict, indent=2), "\n")
My OUTPUT :-
{
"ABCD": [
"SATARA"
],
"REF": [
"MUMBAI.",
"PUNE"
],
"PQR": [
"43566"
],
"LMN": [
"455667"
]
}
How can i solve the above problem?
The best thing to do in my opinion would be to use a defaultdict from the collections module. Have a look:
from collections import defaultdict
list_data = ['ABCD:SATARA', 'XYZ:MUMBAI', 'PQR:43566', 'LMN:455667', 'XYZ:PUNE']
res = defaultdict(list)
for item in list_data:
key, value = item.split(':')
res[key].append(value)
which results in:
print(res)
# defaultdict(<class 'list'>, {'ABCD': ['SATARA'], 'XYZ': ['MUMBAI', 'PUNE'], 'PQR': ['43566'], 'LMN': ['455667']})
or cast it to dict for a more familiar output:
res = dict(res)
print(res)
# {'ABCD': ['SATARA'], 'XYZ': ['MUMBAI', 'PUNE'], 'PQR': ['43566'], 'LMN': ['455667']}
From what I understand by the description of your problem statement, you want splittded_data_dict to be a dictionary where each value is a list
For this purpose try using defaultdict(). Please see the example below.
from collections import defaultdict
splittded_data_dict = defaultdict(list)
splittded_data_dict['existing key'].append('New value')
print(splittded_data_dict)
You can use the isinstance function to check if a key has been transformed into a list:
d = {}
for i in list_data:
k, v = i.split(':', 1)
if k in d:
if not isinstance(d[k], list):
d[k] = [d[k]]
d[k].append(v)
else:
d[k] = v
d becomes:
{'ABCD': 'SATARA', 'XYZ': ['MUMBAI', 'PUNE'], 'PQR': '43566', 'LMN': '455667'}
Let's append all possible key values from the string items in the list_data. Get the list of unique items. Now loop through the list_data and check if the first item of the ":" split string matched with the list a and if matches append to a temporary list and at last assign that temporary list as the value to the key of the item in the list a.
Here is oneliner using dict comprehension and list comprehension simultaneously :
c = {i : [j.split(":")[1] for j in list_data if j.split(":")[0] == i ][0] if len([j.split(":")[1] for j in list_data if j.split(":")[0] == i ])==1 else [j.split(":")[1] for j in list_data if j.split(":")[0] == i ] for i in list(set([i.split(":")[0] for i in list_data]))}
Output should be :
# c = {'LMN': '455667', 'ABCD': 'SATARA', 'PQR': '43566', 'XYZ': ['MUMBAI', 'PUNE']}
Here is the long and detailed version of the code :
list_data = ['ABCD:SATARA', 'XYZ:MUMBAI', 'PQR:43566', 'LMN:455667', 'XYZ:PUNE']
a = []
for i in list_data:
a.append(i.split(":")[0])
a = list(set(a))
b = {}
for i in a:
temp = []
for j in list_data:
if j.split(":")[0] == i:
temp.append(j.split(":")[1])
if len(temp) > 1:
b[i] = temp
else:
b[i] = temp[0]

merge python dictionary of sets

I have a graph with 2 kinds of nodes- 'Letter nodes' (L) and 'Number nodes' (N). I have 2 dictionaries, one shows edges from L to N and the other shows edges from N to L.
A = {0:(b,), 1:(c,), 2:(c,), 3:(c,)}
B = {a:(3,), b:(0,), c:(1,2,3)}
A key,value pair c:(1,2,3) means there are edges from c to 1,2,3 (3 edges)
I want to merge these to one dictionary C so that the result is a new dictionary:
C = {(0,): (b,), (1, 2, 3): (a, c)}
or
C = {(b,):(0,), (a, c):(1, 2, 3)}
In the resulting dictionary I want the letter nodes and numerical nodes to be on separate sides of keys and values. I don't care which is the key or value just need them separated. How can I go about solving this efficiently?
CLARIFICATION: this of a graph with 2 types of nodes - number nodes, and letter nodes. the dictionary C says from letter nodes (a,c) you can reach the number nodes (1,2,3) i.e a->3->c->1, a->3->c->2 thus you can get to 1,2,3 from a. EVEN THOUGH THERE IS NO DIRECT EDGE FROM a to 2 or a to 1.
According to your statement, I guess you are trying to find a graph algorithms.
import itertools
def update_dict(A, result): #update vaules to the same set
for k in A:
result[k] = result.get(k, {k}).union(set(A[k]))
tmp = None
for i in result[k]:
tmp = result.get(k, {k}).union(result.get(i, {i}))
result[k] = tmp
for i in result[k]:
result[i] = result.get(i, {i}).union(result.get(k, {k}))
A = {0:('b',), 1:('c',), 2:('c',), 3:('c',)}
B = {'a':(3,), 'b':(0,), 'c':(1,2,3)}
result = dict()
update_dict(A, result)
update_dict(B, result)
update_dict(A, result) #update to fix bugs
update_dict(B, result)
k = sorted([sorted(list(v)) for v in result.values()])
k = list( k for k, _ in itertools.groupby(k)) #sort and remove dumplicated set
final_result = dict()
for v in k: #merge the result as expected
final_result.update({tuple([i for i in v if isinstance(i, int)]):tuple([i for i in v if not isinstance(i, int)])})
print final_result
#output
{(0,): ('b',), (1, 2, 3): ('a', 'c')}
So I'm not sure if this is the most efficient way of doing this at this point, but it works:
A = {0:('b',), 1:('c',), 2:('c',), 3:('c',)}
B = {'a':(3,), 'b':(0,), 'c':(1,2,3)}
# Put B in the same form as A
B_inv = {}
for k, v in B.items():
for i in v:
if B_inv.get(i) is not None:
B_inv[i] = B_inv[i].union(k)
else:
B_inv[i] = set(k)
B_inv = {k: tuple(v) for k, v in B_inv.items()}
AB = set(B_inv.items() + A.items()) # get AB as merged
This gets you the merged dictionaries. From here:
new_dict = {}
for a in AB:
for i in a[1]:
if new_dict.get(i) is not None:
new_dict[i] = new_dict[i].union([a[0]])
else:
new_dict[i] = set([a[0]])
# put in tuple form
new_dict = {tuple(k): tuple(v) for k,v in new_dict.items()}
This gives me:
{('a',): (3,), ('b',): (0,), ('c',): (1, 2, 3)}
Basically, I'm relying on the mutability of sets and their built-in functionality of eliminating duplicates to try to keep the number of loops through each dictionary to a minimum. Unless I missed something, this should be in linear time.
From here, I need to do comparison, and relying on sets again to prevent me from needing to do a worst-case pairwise comparison of every single element.
merge_list = []
for k, v in new_dict.items():
matched = False
nodeset = set([k[0]]).union(v)
for i in range(len(merge_list)):
if len(nodeset.intersection(merge_list[i])) != 0:
merge_list[i] = merge_list[i].union(nodeset)
matched = True
# did not find shared edges
if not matched:
merge_list.append(nodeset)
Finally, turn it into the form with a single "layer" and tuples.
C = {}
for item in merge_list:
temp_key = []
temp_val = []
for i in item:
if str(i).isalpha():
temp_key.append(i)
else:
temp_val.append(i)
C[tuple(temp_key)] = tuple(temp_val)
C gives me {('a', 'c'): (1, 3, 2), ('b',): (0,)}.
try this:
c = a.copy()
c.update(b)

Categories