Creating Max and Min list for Rolling Sublists - python

I have a list that contains sublists:
[{'h': '20', 'l': '9'}, {'h': '30', 'l': '20'}, {'h': '25', 'l': '7'}, {'h': '18', 'l': '19'}, {'h': '22', 'l': '3'}]
I wish to work my way from left to right, finding the max for 'h' and the min for 'l' within the decreasing number of remaining sublists, including the current sublist being referenced. The result should be as follows.
[{'h': '30', 'l': '3'}, {'h': '30', 'l': '3'}, {'h': '25', 'l': '3'}, {'h': '22', 'l': '3'}, {'h': '22', 'l': '3'}]
Finding the max and min of the whole list is easy enough, but I cannot figure out the best way to "discard" the preceding subsists and only use the remaining sublists in creating the new list.

This code solves your question:
inp = [{'h': '20', 'l': '9'}, {'h': '30', 'l': '20'}, {'h': '25', 'l': '7'}, {'h': '18', 'l': '19'}, {'h': '22', 'l': '3'}]
l1 = inp[::-1]
l2 = []
max1 = int(l1[0]['h'])
min1 = int(l1[0]['l'])
for item in l1:
max1 = int(item['h']) if int(item['h'])>max1 else max1
min1 = int(item['l']) if int(item['l'])<min1 else min1
l2.append({'h':str(max1),'l':str(min1)})
l2 = l2[::-1]
print(l2)
Output
[{'h': '30', 'l': '3'}, {'h': '30', 'l': '3'}, {'h': '25', 'l': '3'}, {'h': '22', 'l': '3'}, {'h': '22', 'l': '3'}]
More info
First, I reversed the input list and named it l1 then I iterate over l1 holding the current max1 for h and min1 for l. I appended the max1 and min1 to l2 in each iteration. and finally I reversed the l2 list.

You can use pandas for a little help. First you should cast your numbers as ints as they are strings currently.
l = [{'h': '20', 'l': '9'}, {'h': '30', 'l': '20'}, {'h': '25', 'l': '7'}, {'h': '18', 'l': '19'}, {'h': '22', 'l': '3'}]
l = [{k: int(v) for k, v in x.items()} for x in l]
Then you can convert them to a dataframe and use cummax and cummin. You'll have to reverse the order to get it how you describe then reverse that output:
df = pd.DataFrame(l).iloc[::-1]
df['h'] = df['h'].cummax()
df['l'] = df['l'].cummin()
df = df.iloc[::-1]
Use to_dict to go back to your original format:
df.to_dict('records')
[{'h': 30, 'l': 3},
{'h': 30, 'l': 3},
{'h': 25, 'l': 3},
{'h': 22, 'l': 3},
{'h': 22, 'l': 3}]

You can solve this problem in O(n).
Imagine you want to solve this problem only for max value, (solving it for min value is the same), you just need to iterate from the end of the list to the beginning, while keeping max values with their indices, for your example, this list would be:
[(22,4), (25,2), (30,1)].
Then when you want to answer, you will have a loop starting from 0 to 4, and while your counter is less than or equal to 1, your answer will be 30. After that while your counter is less than or equal to 2, your answer will be 25, and after than while your counter is less than or equal to 4, your answer for max value will be 22.
The same theory applies for finding minimum value.
You can my solution code below:
data = [{'h': '20', 'l': '9'}, {'h': '30', 'l': '20'}, {'h': '25', 'l': '7'}, {'h': '18', 'l': '19'}, {'h': '22', 'l': '3'}]
mx_list = []
mx = 0 # Suppose numbers are Natural
mn_list = []
mn = 1000000 # Suppose it is bigger than all of our numbers
for i, d in enumerate(reversed(data)):
index = len(data) - i - 1
print(int(d['h']))
if int(d['h']) > mx:
mx_list.append((int(d['h']), index))
mx = int(d['h'])
for i, d in enumerate(reversed(data)):
index = len(data) - i - 1
print(int(d['l']))
if int(d['l']) < mn:
mn_list.append((int(d['l']), index))
mn = int(d['l'])
answer_list = []
for i in range(len(data)):
if i <= mx_list[len(mx_list)-1][1]:
cur_mx = mx_list[len(mx_list)-1][0]
else:
mx_list.pop()
cur_mx = mx_list[len(mx_list)-1][0]
if i <= mn_list[len(mn_list)-1][1]:
cur_mn = mn_list[len(mn_list)-1][0]
else:
mn_list.pop()
cur_mn = mn_list[len(mn_list)-1][0]
answer_list.append({'h': cur_mx, 'l': cur_mn})
print(answer_list)

Related

Normalization of a nested dictionary in python

I am new to Python and I have a nested dictionary for which I want to normalize the values of the dictionary. For example:
nested_dictionary={'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
And I would like to get the normalization as
Normalized_result={'D': {'D': '0.47', 'B': '0.24', 'C': '0.00', 'A': '0.24', 'K': '0.00', 'J': '0.04'}, 'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}
I have seen the example in Normalizing dictionary values which only for one dictionary but I want to go further with nested one.
I have tried to flatten the nested_dictionary and apply the normalization as
import flatdict
d = flatdict.FlatDict(nested_dictionary, delimiter='_')
dd=dict(d)
newDict = dict(zip(dd.keys(), [float(value) for value in dd.values()]))
def normalize(d, target=1.0):
global factor
raw = sum(d.values())
print(raw)
if raw==0:
factor=0
#print('ok')
else:
# print('kok')
factor = target/raw
return {key:value*factor for key,value in d.items()}
normalize(newDict)
And I get the result as
{'D_D': 0.2578125,
'D_B': 0.1328125,
'D_C': 0.0,
'D_A': 0.1328125,
'D_K': 0.0,
'D_J': 0.023437499999999997,
'A_A': 0.39062499999999994,
'A_K': 0.0,
'A_J': 0.06249999999999999}
But what I want is the Normalized_result as above
Thanks in advance.
nested_dictionary = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'},
'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
In this example, your dict values are str type, so we need to convert to float:
nested_dictionary = dict([b, dict([a, float(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': 0.33, 'B': 0.17, 'C': 0.0, 'A': 0.17, 'K': 0.0, 'J': 0.03},
'A': {'A': 0.5, 'K': 0.0, 'J': 0.08}}
The function below is adapted from the link you provided.
It loops through the dictionaries, calculates the factor and updates the values inplace.
for _, d in nested_dictionary.items():
factor = 1.0/sum(d.values())
for k in d:
d[k] = d[k] * factor
nested_dictionary
{'D': {'D': 0.47142857142857136,
'B': 0.24285714285714285,
'C': 0.0,
'A': 0.24285714285714285,
'K': 0.0,
'J': 0.04285714285714285},
'A': {'A': 0.8620689655172414, 'K': 0.0, 'J': 0.13793103448275865}}
If you need to convert back to str, use the function below:
nested_dictionary = dict([b, dict([a, "{:.2f}".format(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': '0.47',
'B': '0.24',
'C': '0.00',
'A': '0.24',
'K': '0.00',
'J': '0.04'},
'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}
This code would do:
def normalize(d, target=1.0):
raw = sum(float(number) for number in d.values())
factor = (target/raw if raw else 0)
return {key: f'{float(value)*factor:.2f}' for key, value in d.items()}
{key: normalize(dct) for key, dct in nested_dictionary.items()}
Turn the string-values in your inner dicts into floats.
Take one of the solutions from the the duplicate, for example really_safe_normalise_in_place.
Use the solution on each dict.
Example:
d = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
d = {k: {kk: float(vv) for kk, vv in v.items()} for k, v in d.items()}
for v in d.values():
really_safe_normalise_in_place(v)

Separating nested list and dictionary in separate columns

I created an function to gather the following sample list below:
full_list = ['Group1', [{'a':'1', 'b':'2'},{'c':'3', 'x':'1'}]
'Group2', [{'d':'7', 'e':'18'}],
'Group3', [{'m':'21'}, {'n':'44','p':'13'}]]
As you can see some of the elements inside the lists are made up of key-value pair dictionaries.
And these dictionaries are of different sizes (number of kv pairs).
Can anyone suggest what to use in python to display this list in separate columns?
Group1 Group2 Group3
{'a':'1', 'b':'2'} {'d':'7', 'e':'18'} {'m':'21'}
{'c':'3', 'x':'1'} {'n':'44','p':'13'}
I am not after a solution but rather a point in the right direction for a novice like me.
I have briefly looked at itertools and pandas dataframes
Thanks in advance
Here is one way:
First extract the columns and the data:
import pandas as pd
columns = full_list[::2]
#['Group1', 'Group2', 'Group3']
data = full_list[1::2]
#[[{'a': '1', 'b': '2'}, {'c': '3', 'x': '1'}],
# [{'d': '7', 'e': '18'}],
# [{'m': '21'}, {'n': '44', 'p': '13'}]]
Here the [::2] means iterate from begin to end but only every 2 items and so does [1::2] but it starts iterating from index 1 (second position)
Then create a pd.DataFrame:
df = pd.DataFrame(data)
#0 {'a': '1', 'b': '2'} {'c': '3', 'x': '1'}
#1 {'d': '7', 'e': '18'} None
#2 {'m': '21'} {'n': '44', 'p': '13'}
Ooops but the columns and rows are transposed so we need to convert it:
df = df.T
Then add the columns:
df.columns = columns
And there we have it:
Group1 Group2 Group3
0 {'a': '1', 'b': '2'} {'d': '7', 'e': '18'} {'m': '21'}
1 {'c': '3', 'x': '1'} None {'n': '44', 'p': '13'}

How can i edit this dataframe to merge two columns dictionaries lists?

I have a dataframe like this.
ID Name id2 name2 name3
101 A [{'a': '1'}, {'b': '2'}] [{'e': '4'}, {'f': '5'}] [{'x': '4'}, {'y': '5'}]
103 B [{'c': '3'},{'d': '6'}] [{'g': '7'},{'h': '8'}] [{'t': '4'}, {'o': '5'}]
and I want the output df like this.
ID Name id2 name2
101 A [{'a': '1','e': '4','x': '4'}, {'b': '2', 'f': '5','y': '5'}}] [{'e': '4'}, {'f': '5'}]
103 B [{'c': '3', 'g': '7','t': '4'},{'d': '6', 'h': '8','o': '5'}] [{'e': '4'}, {'f': '5'}]
The Column name 3 will be as it is in the Op I have just removed it from the sample above. The thing is that even if more columns get added, its dictionaries will update in id2 column.
Thanks :)
You can try using collections.ChainMap in a list comprehension:
From the docs...
A ChainMap groups multiple dicts or other mappings together to create a single, updateable view...
So first we zip columns together, then a nested zip to get the dictsfrom each column "side-by-side" in a single list. This list is passed to ChainMap which joins them into a single dict.
Example
from collections import ChainMap
# Setup
df = pd.DataFrame({'ID': [101, 103], 'Name': ['A', 'B'], 'id2': [[{'a': '1'}, {'b': '2'}], [{'c': '3'}, {'d': '6'}]], 'name2': [[{'e': '4'}, {'f': '5'}], [{'g': '7'}, {'h': '8'}]]})
df['id2'] = [[dict(ChainMap(*x)) for x in zip(i, n)]
for i, n in zip(df['id2'], df['name2'])]
[out]
ID Name id2 name2
0 101 A [{'e': '4', 'a': '1'}, {'b': '2', 'f': '5'}] [{'e': '4'}, {'f': '5'}]
1 103 B [{'c': '3', 'g': '7'}, {'d': '6', 'h': '8'}] [{'g': '7'}, {'h': '8'}]
Update
A more scalable solution, if you have multiple columns to combine would be to use DataFrame.filter first to extract all the columns that need to be combined:
df = pd.DataFrame({'ID': [101, 103], 'Name': ['A', 'B'], 'id2': [[{'a': '1'}, {'b': '2'}], [{'c': '3'}, {'d': '6'}]], 'name2': [[{'e': '4'}, {'f': '5'}], [{'g': '7'}, {'h': '8'}]], 'name3': [[{'x': '4'}, {'y': '5'}], [{'t': '4'}, {'o': '5'}]]})
df['id2'] = [[dict(ChainMap(*y)) for y in zip(*x)]
for x in zip(*df.filter(regex='id2|name').apply(tuple))]
[out]
ID Name id2 name2 name3
0 101 A [{'e': '4', 'x': '4', 'a': '1'}, {'b': '2', 'f': '5', 'y': '5'}] [{'e': '4'}, {'f': '5'}] [{'x': '4'}, {'y': '5'}]
1 103 B [{'c': '3', 't': '4', 'g': '7'}, {'o': '5', 'd': '6', 'h': '8'}] [{'g': '7'}, {'h': '8'}] [{'t': '4'}, {'o': '5'}]
This is essentially doing the same as above, only we filter to "id" or "name" columns, and combine them all.
Considering the name of your dataframe is df, try this:
i=0
for i in range(0,df.shape[0]):
df.id2[i][0].update(df.name2[i][0])
df.id2[i][1].update(df.name2[i][1])

Printing a text file as a dictionary python

Text file I am working on
3:d:10:i:30
1:d:10:i:15
4:r:30
1:r:15
2:d:12:r:8
4:l:20
5:i:15
3:l:20:r:22
4:d:30
5:l:15:r:15
I am trying to print a dictionary from a text file that should come out looking like :
{1: {'d': 10, 'i': 15, 'r': 15},
2: {'d': 12, 'r': 8},
3: {'d': 10, 'i': 30, 'l': 20, 'r': 22},
4: {'d': 30, 'l': 20, 'r': 30},
5: { 'i': 15, 'l': 15, 'r': 15}}
Instead my code is overiding each line in the file and only takes the most recent line as the value in the dictionary so it looks like:
{'3': {'l': '20', 'r': '22'},
'1': {'r': '15'},
'4': {'d': '30'},
'2': {'d': '12', 'r': '8'},
'5': {'l': '15', 'r': '15'}})
This is what I have so far
def read_db(file):
d = defaultdict(dict)
for line in open('db1.txt'):
z = line.rstrip().split(':')
d[z[0]]=dict( zip(z[1::2],z[2::2]))
print(d)
I tried doing += but that operand is not working on the dict. I am a bit stuck. Thanks for any help
This is one approach using a simple iteration and dict.setdefault.
Ex:
res = {}
with open(filename) as infile: #Open file to read
for line in infile: #Iterate Each line
line = line.strip().split(":") #Split by colon
res.setdefault(line.pop(0), {}).update(dict(zip(line[::2], line[1::2])))
print(res)
Output:
{'1': {'d': '10', 'i': '15', 'r': '15'},
'2': {'d': '12', 'r': '8'},
'3': {'d': '10', 'i': '30', 'l': '20', 'r': '22'},
'4': {'d': '30', 'l': '20', 'r': '30'},
'5': {'i': '15', 'l': '15', 'r': '15'}}
Or using collections.defaultdict
Ex:
from collections import defaultdict
res = defaultdict(dict)
with open(filename) as infile:
for line in infile:
line = line.strip().split(":")
res[line.pop(0)].update(dict(zip(line[::2], line[1::2])))
print(res)
Output:
defaultdict(<class 'dict'>,
{'1': {'d': '10', 'i': '15', 'r': '15'},
'2': {'d': '12', 'r': '8'},
'3': {'d': '10', 'i': '30', 'l': '20', 'r': '22'},
'4': {'d': '30', 'l': '20', 'r': '30'},
'5': {'i': '15', 'l': '15', 'r': '15'}})

Parse a text file in an array python

A C G T
A 2 -1 -1 -1
C -1 2 -1 -1
G -1 -1 2 -1
T -1 -1 -1 2
This file is separated by tabs as a text file and I want it to be mapped in a similar format to in python.
{'A': {'A': 91, 'C': -114, 'G': -31, 'T': -123},
'C': {'A': -114, 'C': 100, 'G': -125, 'T': -31},
'G': {'A': -31, 'C': -125, 'G': 100, 'T': -114},
'T': {'A': -123, 'C': -31, 'G': -114, 'T': 91}}
I have tried very had but I cannot figure out how to do this as I am new to python.
Please help.
My code so far:
seq = flines[0]
newseq = []
j = 0
while(l < 4):
i = 2
while(o < 4):
newseq[i][j] = seqLine[i]
i = i + 1;
o = o + 1
j = j + 1
l = l + 1
print (seq)
print(seqLine)
I think this is what you want:
import csv
data = {}
with open('myfile.csv', 'rb') as csvfile:
ntreader = csv.reader(csvfile, delimiter="\t", quotechar='"')
for rowI, rowData in enumerate(ntreader):
if rowI == 0:
headers = rowData[1:]
else:
data[rowData[0]] = {k: int(v) for k, v in zip(headers, rowData[1:])}
print data
To make life easy I use csv-module and just say tab is delimiter, then I grab the column headers on the first row and use them for all other rows to label the values.
This produces:
{'A ': {'A': '2', 'C': '-1', 'T': '-1 ', 'G': '-1'},
'C': {'A': '-1', 'C': '2', 'T': '-1', 'G': '-1'},
'T': {'A': '-1', 'C': '-1', 'T': '2', 'G': '-1'},
'G': {'A': '-1', 'C': '-1', 'T': '-1', 'G': '2'}}
Edit*
For python <2.7 it should work if you switch the dictionary comprehension line (rowData[0]] = ....) above and use a simple loop in the same place:
rowDict = dict()
for k, v in zip(headers, rowData[1:]):
rowDict[k] = int(v)
data[rowData[0]] = rowDict
Using csv.DictReader gets you most of the way there on your own:
reader = DictReader('file.csv', delimiter='\t')
#dictdata = {row['']: row for row in reader} # <-- python 2.7+ only
dictdata = dict((row[''], row) for row in reader) # <-- python 2.6 safe
Outputs:
{'A': {None: [''], '': 'A', 'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
'C': {'': 'C', 'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
'G': {'': 'G', 'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
'T': {'': 'T', 'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}
To clean up the extraneous keys got messy, and I needed to rebuild the inner dict, but replace the last line with this:
dictdata = {row['']: {key: value for key, value in row.iteritems() if key} for row in reader}
Outputs:
{'A': {'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
'C': {'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
'G': {'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
'T': {'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}
Edit: for Python <2.7
Dictionary comprehensions were added in 2.7. For 2.6 and lower, use the dict constructor:
dictdata = dict((row[''], dict((key, value) for key, value in row.iteritems() if key)) for row in reader)

Categories