sorting by dictionary value in array python - python

Okay so I've been working on processing some annotated text output. What I have so far is a dictionary with annotation as key and relations an array of elements:
'Adenotonsillectomy': ['0', '18', '1869', '1716'],
'OSAS': ['57', '61'],
'apnea': ['41', '46'],
'can': ['94', '97', '1796', '1746'],
'deleterious': ['103', '114'],
'effects': ['122', '129', '1806', '1752'],
'for': ['19', '22'],
'gain': ['82', '86', '1776', '1734'],
'have': ['98', '102', ['1776 1786 1796 1806 1816'], '1702'],
'health': ['115', '121'],
'lead': ['67', '71', ['1869 1879 1889'], '1695'],
'leading': ['135', '142', ['1842 1852'], '1709'],
'may': ['63', '66', '1879', '1722'],
'obesity': ['146', '153'],
'obstructive': ['23', '34'],
'sleep': ['35', '40'],
'syndrome': ['47', '55'],
'to': ['143', '145', '1852', '1770'],
'weight': ['75', '81'],
'when': ['130', '134', '1842', '1758'],
'which': ['88', '93', '1786', '1740']}
What I want to do is sort this by the first element in the array and reorder the dict as:
'Adenotonsillectomy': ['0', '18', '1869', '1716']
'for': ['19', '22'],
'obstructive': ['23', '34'],
'sleep': ['35', '40'],
'apnea': ['41', '46'],
etc...
right now I've tried to use operator to sort by value:
sorted(dependency_dict.items(), key=lambda x: x[1][0])
However the output I'm getting is still incorrect:
[('Adenotonsillectomy', ['0', '18', '1869', '1716']),
('deleterious', ['103', '114']),
('health', ['115', '121']),
('effects', ['122', '129', '1806', '1752']),
('when', ['130', '134', '1842', '1758']),
('leading', ['135', '142', ['1842 1852'], '1709']),
('to', ['143', '145', '1852', '1770']),
('obesity', ['146', '153']),
('for', ['19', '22']),
('obstructive', ['23', '34']),
('sleep', ['35', '40']),
('apnea', ['41', '46']),
('syndrome', ['47', '55']),
('OSAS', ['57', '61']),
('may', ['63', '66', '1879', '1722']),
('lead', ['67', '71', ['1869 1879 1889'], '1695']),
('weight', ['75', '81']),
('gain', ['82', '86', '1776', '1734']),
('which', ['88', '93', '1786', '1740']),
('can', ['94', '97', '1796', '1746']),
('have', ['98', '102', ['1776 1786 1796 1806 1816'], '1702'])]
I'm not sure whats going wrong. Any help is appreciated.

The entries are sorted in alphabetical order. If you want to sort them on integer value, convert the value to int first:
sorted(dependency_dict.items(), key=lambda x: int(x[1][0]))

Related

Search tuples between two string of dates

I want to list the values inside a tuples between two dates(string), My data look like this:
[(1, 'ch-01-07-1', '2021-07-01', '262', 'okinama', 'OR15G9431', 'Dhenkanal', 'FULAPADA', '67', '450', '34', '395151.0', 'Not Yet'),
(3, 'ch-01-07-3', '2021-07-02', '262', 'okinama', 'OR 21 7911', 'Dhenkanal', 'FULAPADA', '67', '450', '34', '395151.0', 'Not Yet'),
(4, 'ch-01-07-4', '2021-07-01', '262', 'okinama', 'OR 21 7911', 'Dhenkanal', 'DIGHI', '67', '450', '34', '299743.0', 'Not Yet'),
(5, 'ch-01-07-5', '2021-07-03', '262', 'okinama', 'OR 21 7911', 'Dhenkanal', 'CUTTACK', '67', '450', '34', '384163.0', 'Not Yet'),
(6, 'ch-01-07-6', '2021-07-04', '262', 'okinama', 'OR 21 7911', 'Dhenkanal', 'BARSINGHA (BARAMBA)', '67', '450', '34', '356425.0', 'Not Yet'),
(7, 'ch-18-07-1', '2021-07-12', '256', 'ultra tech', 'OR 21 7911', 'Dhenkanal', 'DERA', '52', '63', '21', '340672.0', 'Not Yet'),
(8, 'ch-18-07-2', '2021-07-11', '457', 'ultra tech', 'OR15G9431', 'Dhenkanal', 'DHENKANAL TOWN AREA (M.PAT, COLLEGE BYEPASS)', '45', '5677', '66', '88082.0', 'Not Yet'),
(9, 'ch-18-07-3', '2021-07-15', '545', 'okinama', 'OR 21 7911', 'Dhenkanal', 'FULAPADA', '67', '66', '55', '395514.0', 'Not Yet'),
(10, 'ch-18-07-4', '2021-07-09', '545', 'ultra tech', 'OR 21 7911', 'Dhenkanal', 'FULAPADA', '67', '66', '55', '395514.0', 'Not Yet'),
(12, 'ch-01-07-2', '2021-07-08', '123', 'ultra tech', 'OR 21 7911', 'Dhenkanal', 'DHUBALAPALA (TELKOI)', '23', '23', '12', '287534.0', 'Not Yet'),
(17, 'ch-2021-07-1', '2021-07-12', '565', 'ultra tech', 'OR 21 7911', 'Dhenkanal', 'DHENKANAL TOWN AREA (UPTO MAHAVEER BAZAR) ', '32', '33', '22', '61289.0', 'Not Yet'),
(19, 'ch-2021-07-2022', '2021-07-18', '741', 'okinama', 'OR 21 7911', 'Dhenkanal', 'FULAPADA', '21', '22', '22', '123961.0', 'Not Yet'),
(20, 'ch-2021-07-2023', '2021-07-19', '693', 'ultra tech', 'od062598', 'Dhenkanal', 'DUDURKOTE', '78', '78', '78', '352014.0', 'Not Yet'),
(21, 'ch-2021-07-2024', '2021-07-20', '123', 'okinama', 'OR 21 7911', 'Dhenkanal', 'CUTTACK', '10', '100', '100', '57210.0', 'Not Yet')]
for example i want to search dates between "2021-07-03" to "2021-07-15", then as a result i expect the rows of 5, 6, 7, 8, 9, 10, 12, 17 to list in my console and further if with column number of [5] where value is equal to "ultra tech" then to list the rows of 7, 8, 10, 12, 17.
You can convert the date to integer this way.
cr_date = "2021-07-03"
cr_date = list(map(int, cr_date.split('-')))
start_date = 10000 * cr_date[0] + 100 * cr_date[1] + cr_date[2]
Then for query:
find_value = 'ultra tech'
for t in data:
cr_date = list(map(int, str(list(t)[2]).split('-')))
find_date = to_int(cr_date)
if end_date >= find_date >= start_date:
search_result.append(t)
if find_value in t:
adv_search.append(t)
from datetime import datetime
from datetime import date
d1 = date(2021,7,3)
d2 = date(2021,7,15)
for b in A:
if(b[4]=="ultra tech"):
c = datetime.strptime(b[2], "%Y-%m-%d").date()
print(c)
if (d1 < c < d2):
print(b)

convert list of lists to dictionary

how can I create a list of dictionaries with those lists
temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,000', '2,000', '2,000']]
the headers of the dictionary is the first element of the lists
the expected Output is:
output= [{'header1': '4', 'media_range': '1,200', 'speed': '300'}, {'header1': '8', 'media_range': '2,400', 'speed': '600'}, ...]
Ideally the code should handle any amount of lists (in this case 3)
IIUC
>>> temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4
...: ,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,0
...: 00', '2,000', '2,000']]
>>>
>>> keys = [l[0] for l in temp]
>>> values = [l[1:] for l in temp]
>>> dicts = [dict(zip(keys, sub)) for sub in zip(*values)]
>>>
>>> dicts
[{'header1': '4', 'media_range': '1,200', 'speed': '300'},
{'header1': '8', 'media_range': '2,400', 'speed': '600'},
{'header1': '16', 'media_range': '4,800', 'speed': '1,200'},
{'header1': '32', 'media_range': '4,800', 'speed': '2,000'},
{'header1': '64', 'media_range': '6,200', 'speed': '2,000'},
{'header1': '128', 'media_range': '38,400', 'speed': '2,000'},
{'header1': '256', 'media_range': '76,800', 'speed': '2,000'},
{'header1': '512', 'media_range': '153,600', 'speed': '2,000'},
{'header1': '243,6', 'media_range': '160,000', 'speed': '2,000'}]
Slightly shorter solution with zip and unpacking:
temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,000', '2,000', '2,000']]
header, *data = zip(*temp)
result = [dict(zip(header, i)) for i in data]
Output:
[{'header1': '4', 'media_range': '1,200', 'speed': '300'}, {'header1': '8', 'media_range': '2,400', 'speed': '600'}, {'header1': '16', 'media_range': '4,800', 'speed': '1,200'}, {'header1': '32', 'media_range': '4,800', 'speed': '2,000'}, {'header1': '64', 'media_range': '6,200', 'speed': '2,000'}, {'header1': '128', 'media_range': '38,400', 'speed': '2,000'}, {'header1': '256', 'media_range': '76,800', 'speed': '2,000'}, {'header1': '512', 'media_range': '153,600', 'speed': '2,000'}, {'header1': '243,6', 'media_range': '160,000', 'speed': '2,000'}]
You could use zip(). This requires you to know how many lists but does the expected output.
for header1,media_range,speed in zip(temp[0], temp[1], temp[2]):
if header1 != "header1":
output.append({temp[0][0]: header1, temp[1][0]: media_range, temp[2][0]: speed})

Convert a matrix from string to integer

I'm trying to change a matrix of numbers from string to integer but it just doesn't work.
for element in list:
for i in element:
i = int(i)
What am I doing wrong?
Edit:
This is the whole code:
import numpy as np
t_list = []
t_list = np.array(t_list)
list_rains_per_months = [['63', '65', '50', '77', '66', '69'],
['65', '65', '67', '50', '54', '58'],
['77', '73', '80', '83', '89', '100'],
['90', '85', '90', '90', '84', '90'],
['129', '113', '120', '135', '117', '130'],
['99', '116', '114', '111', '119', '100'],
['105', '98', '112', '113', '102', '100'],
['131', '120', '111', '141', '130', '126'],
['85', '101', '88', '89', '94', '91'],
['122', '103', '119', '98', '101', '107'],
['121', '101', '104', '121', '115', '104'],
['67', '44', '58', '61', '64', '58']]
for element in t_list:
for i in element:
i = int(i)
I apologize for any mistakes, I'm new to python
What you're doing wrong, is that you're not changing the list or any list element: the 'i' inside the loop starts by pointing to each element of the list, then you make it point to something else, but that doesn't affect your list (also, avoid using 'list' as an identifier, it's an existing type, that's asking for trouble).
One way to do it is with list comprehensions. Assuming your matrix is a list of (inner) lists, for example:
a_list = [["3", "56", "78"], ["2", "39", "60"], ["87", "9", "71"]]
then two nested list comprehensions should do the trick:
a_list = [[int(i) for i in inner_list] for inner_list in a_list]
This builds a new list, formed by going over your initial list, applying the change you want, and saving it a another (or the same) list.
In numpy you do it that way.
import numpy as np
list_rains_per_months = [['63', '65', '50', '77', '66', '69'],
['65', '65', '67', '50', '54', '58'],
['77', '73', '80', '83', '89', '100'],
['90', '85', '90', '90', '84', '90'],
['129', '113', '120', '135', '117', '130'],
['99', '116', '114', '111', '119', '100'],
['105', '98', '112', '113', '102', '100'],
['131', '120', '111', '141', '130', '126'],
['85', '101', '88', '89', '94', '91'],
['122', '103', '119', '98', '101', '107'],
['121', '101', '104', '121', '115', '104'],
['67', '44', '58', '61', '64', '58']]
list_rains_per_months = np.array(list_rains_per_months)
myfunc = np.vectorize(lambda x: int(x))
list_rains_per_months = myfunc(list_rains_per_months)
print(list_rains_per_months)
Output
[[ 63 65 50 77 66 69]
[ 65 65 67 50 54 58]
[ 77 73 80 83 89 100]
[ 90 85 90 90 84 90]
[129 113 120 135 117 130]
[ 99 116 114 111 119 100]
[105 98 112 113 102 100]
[131 120 111 141 130 126]
[ 85 101 88 89 94 91]
[122 103 119 98 101 107]
[121 101 104 121 115 104]
[ 67 44 58 61 64 58]]
You could use enumerate object in loops:
list = [["12", "10", "0"],
["0", "33", "60"]]
for h, i in enumerate(list):
for j, k in enumerate(i):
list[h][j] = int(k)
print(list)
Could also just map each row's values to int:
for row in list_rains_per_months:
row[:] = map(int, row)
Note that I assign to row[:], i.e., into the row and thus into the matrix. If I assigned to row instead, I'd have the same problem as you with your i: I'd only assign to the variable, not into the row/matrix.

Indexing a Dataframe in steps of 0.5

I have several dataframes that I want to add. Theyre indices range from 0 to 25 in steps of 0.5. Now, when I try to add them, the indices are interpreted differently and the new added dataframe has the indices ordered from "0 to 2" its 0.5,1,1.5,10,10.5...19.5, 2....etc. So that 10 is listed lower than 2, I guess because it starts with a 1 and the dataframe sorts the indices by the first value.
I tried different ways of adding the frames:
pd.concat([df1, df2, df3...], axis=0)
df1 + df2 + df3
df1.add(df2, fill_value=0).add(df3.....)
all of them work. The only problem is the new indexing which messes up my frames.
I could of course reset the indices before adding the frames and then change the index back. But is there a more direct way?
answer to comment:
Index(['0.5', '1.0', '1.5', '2.0', '2.5', '3.0', '3.5', '4.0', '4.5', '5.0',
'5.5', '6.0', '6.5', '7.0', '7.5', '8.0', '8.5', '9.0', '9.5', '10.0',
'10.5', '11.0', '11.5', '12.0', '12.5', '13.0', '13.5', '14.0', '14.5',
'15.0', '15.5', '16.0', '16.5', '17.0', '17.5', '18.0', '18.5', '19.0',
'19.5', '20.0', '20.5', '21.0', '21.5', '22.0', '22.5', '23.0', '23.5',
'24.0', '24.5', '25.0', '25.5', '26.0', '26.5', '27.0', '27.5', '28.0',
'28.5'],
dtype='object') Index(['0.5', '1.0', '1.5', '2.0', '2.5', '3.0', '3.5', '4.0', '4.5', '5.0',
'5.5', '6.0', '6.5', '7.0', '7.5', '8.0', '8.5', '9.0', '9.5', '10.0',
'10.5', '11.0', '11.5', '12.0', '12.5', '13.0', '13.5'],
dtype='object') Index(['0.5', '1.0', '1.5', '2.0', '2.5', '3.0', '3.5', '4.0', '4.5', '5.0',
'5.5', '6.0', '6.5', '7.0', '7.5', '8.0', '8.5', '9.0', '9.5', '10.0',
'10.5', '11.0', '11.5', '12.0', '12.5', '13.0', '13.5', '14.0', '14.5',
'15.0', '15.5', '16.0', '16.5', '17.0', '17.5', '18.0'],
dtype='object')
One simpliest solution is convert index to FloatIndex in all DataFrames:
df1.index = df1.index.astype(float)
df2.index = df2.index.astype(float)
df3.index = df3.index.astype(float)

Print nested dictionary in python and export all on a csv file

I have a dictionary like this:
{'https://github.com/project1': {'Batchfile': '91', 'Gradle': '110', 'INI': '25', 'Java': '1879', 'Markdown': '393', 'QMake': '52', 'Shell': '161', 'Text': '202', 'XML': '943'}}
{'https://github.com/project2': {'Batchfile': '91', 'Gradle': '123', 'INI': '25', 'Java': '1305', 'Markdown': '121', 'QMake': '52', 'Shell': '161', 'XML': '234'}}
{'https://github.com/project3': {'Batchfile': '91', 'Gradle': '360', 'INI': '27', 'Java': '805', 'Markdown': '27', 'QMake': '156', 'Shell': '161', 'XML': '380'}}
It is a structured in this way:
{'url': {'lang1': 'locs', 'lang2': 'locs', ...}}
{'url2': {'lang6': 'locs', 'lang5': 'locs', ...}}
where lang stay for languages and locs stay for line of codes (related to the previous language).
What i want to do is print this dictionary in a pretty way,so i can see the results before the export.
After that i want to export the dictionary into a csv file to make other operation. The problem is the languages are not sorted. That is what i mean:
{'https://github.com/Project4': {'HTML': '29', 'Java': '229', 'Markdown': '101', 'Maven POM': '88', 'XML': '62'}}
{'https://github.com/Project5': {'Batchfile': '85', 'Gradle': '84', 'INI': '22', 'Java': '2422', 'Markdown': '25', 'Prolog': '25', 'Shell': '173', 'XML': '3243', 'YAML': '43'}}
Any idea?
You could use pandas:
import pandas as pd
t = [{'https://github.com/project1': {'Batchfile': '91', 'Gradle': '110', 'INI': '25', 'Java': '1879', 'Markdown': '393', 'QMake': '52', 'Shell': '161', 'Text': '202', 'XML': '943'}},
{'https://github.com/project2': {'Batchfile': '91', 'Gradle': '123', 'INI': '25', 'Java': '1305', 'Markdown': '121', 'QMake': '52', 'Shell': '161', 'XML': '234'}},
{'https://github.com/project3': {'Batchfile': '91', 'Gradle': '360', 'INI': '27', 'Java': '805', 'Markdown': '27', 'QMake': '156', 'Shell': '161', 'XML': '380'}}]
columns = set([lang for x in t for l in x.values() for lang in l])
index = [p for x in t for p in x.keys()]
rows = [l for x in t for l in x.values() ]
df = pd.DataFrame(rows, columns=columns, index=index).fillna('N/A')
df.to_csv('projects.csv')
Which gives:
>>> df
Gradle INI Markdown ... Batchfile Java QMake
https://github.com/project1 110 25 393 ... 91 1879 52
https://github.com/project2 123 25 121 ... 91 1305 52
https://github.com/project3 360 27 27 ... 91 805 156
[3 rows x 9 columns]
And in the csv:

Categories