How to sort nested dict based on second value - python

I have a dict like this: (More than 1000 records)
{"Device1": [["Device1", "TenGigabitEthernet1/0/12", "SHUT", "", "", "IDF03"], ["Device1", "TenGigabitEthernet1/0/11", "SHUT", "", "", "IDF03", "#f76f6f"]], "Device2": [["Device2", "TenGigabitEthernet1/0/12", "SHUT", "", "", "IDF03"], ["Device2", "TenGigabitEthernet1/0/11", "SHUT", "", "", "IDF03", "#f76f6f"]]}
The problem is, I don't know how to sort the dict based on the portName which would be TenGigabitEthernet1/0/* or GigabitEthernet1/0/*
I have the following code but it's not doing it right:
with open("data-dic.txt", 'r') as dic:
data = dic.read()
dataDic = json.loads(data)
dataDic = ast.literal_eval(json.dumps(dataDic))
d2 = OrderedDict({ k : dataDic[1] for k in natsorted(dataDic) })
print(d2)
It is sorting the keys which is the Device1, Device2,...
How can I sort the dict based on the second value of the nested dict? which would be all the portNames.

import pandas as pd
from itertools import chain
header = ['Name', 'Connection', 'Type', 'Col_3', 'Col_4', 'ID', 'Color']
df = pd.DataFrame(chain(*data.values()), columns=header).fillna('')
print(df)
This looks like:
Name Connection Type Col_3 Col_4 ID Color
0 Device1 TenGigabitEthernet1/0/12 SHUT IDF03
1 Device1 TenGigabitEthernet1/0/11 SHUT IDF03 #f76f6f
2 Device2 TenGigabitEthernet1/0/12 SHUT IDF03
3 Device2 TenGigabitEthernet1/0/11 SHUT IDF03 #f76f6f
Overkill for this issue... but if you are going to be doing other manipulation of this data, you may want to consider pandas.
df['Port'] = df.Connection.str.extract('.*/(.*)').astype(int)
out = (df.sort_values('Port')
.drop('Port', axis=1)
.groupby('Name', sort=False)
.apply(lambda x: x.apply(list, axis=1).tolist())
.to_dict())
print(out)
Output:
{'Device1': [['Device1', 'TenGigabitEthernet1/0/11', 'SHUT', '', '', 'IDF03', '#f76f6f'], ['Device1', 'TenGigabitEthernet1/0/12', 'SHUT', '', '', 'IDF03', '']],
'Device2': [['Device2', 'TenGigabitEthernet1/0/11', 'SHUT', '', '', 'IDF03', '#f76f6f'], ['Device2', 'TenGigabitEthernet1/0/12', 'SHUT', '', '', 'IDF03', '']]}

Use a dict comprehension and sort sublist by index.
output = {k: sorted(data.get(k), key=lambda x: x[1].split("/", 2)[-1]) for k in data.keys()}
print(output)
{'Device1': [['Device1', 'TenGigabitEthernet1/0/11', 'SHUT', '', '', 'IDF03', '#f76f6f'], ['Device1', 'TenGigabitEthernet1/0/12', 'SHUT', '', '', 'IDF03']], 'Device2': [['Device2', 'TenGigabitEthernet1/0/11', 'SHUT', '', '', 'IDF03', '#f76f6f'], ['Device2', 'TenGigabitEthernet1/0/12', 'SHUT', '', '', 'IDF03']]}

You can do this,
with open("data-dic.txt", 'r') as dic:
data = json.load(dic)
d2 = OrderedDict(sorted(data.items(), key=lambda x:int(x[1][0][1].rsplit('/', 1)[-1])))
print(d2)
Sorting based on the value.first_row.second_element.rsplit('/', 1)[-1]
NB:
dataDic = ast.literal_eval(json.dumps(dataDic)) This line of code doing absolutely nothing. You change dict -> str -> dict. And it's not a good practice to use ast.literal_eval to parse the JSON.

Related

Pandas dataframe, get the row number for a column meeting certain conditions

This is similar to a question I asked before but I needed to make some changes to my condition statement.
I have the below output that I make a dataframe from. I then check each row if Status rows are empty and if comment is not empty. The next thing I want to get is the row number of the Status col that meets those conditions:
output = [['table_name', 'schema_name', 'column_name', 'data_type', 'null?', 'default', 'kind', 'expression', 'comment', 'database_name', 'autoincrement', 'Status'], ['ACCOUNT', 'SO', '_LOAD_DATETIME', '{"type":"TIMESTAMP_LTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', 'date and time when table was loaded', 'VICE_DEV'], ['ACCOUNT', 'SO', '_LOAD_FILENAME', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'DEV'], ['ACCOUNT', 'SO', '_LOAD_FILE_ROW_NUMBER', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'DEV']]
df = pd.DataFrame(output)
df.columns = df.iloc[0]
df = df[1:]
query_list = []
for index, row in df.iterrows():
if row['Status'] is None and row['comment'] is not None and row['comment'] != '':
empty_status = df[df['Status'].isnull()].index.tolist()
I've tried with empty_status var above but I see:
empty_status_idx = [1, 2, 3]
when I would like it to be is below, because only the first row meets those conditions:
empty_status = [1]
because only the first row has a comment and has status empty or null

Match string value to dataframe value and add to string

I have a string of column names and their datatype called cols below:
_LOAD_DATETIME datetime,
_LOAD_FILENAME string,
_LOAD_FILE_ROW_NUMBER int,
_LOAD_FILE_TIMESTAMP datetime,
ID int
Next I make a df from a gsheet I'm reading from the below:
import pandas as pd
output = [['table_name', 'schema_name', 'column_name', 'data_type', 'null?', 'default', 'kind', 'expression', 'comment', 'database_name', 'autoincrement', 'DateTime Comment Added'], ['ACCOUNT', 'SO', '_LOAD_DATETIME', '{"type":"TIMESTAMP_LTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', '', 'V'], ['ACCOUNT', 'SO', '_LOAD_FILENAME', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'VE'], ['B_ACCOUNT', 'SO', '_LOAD_FILE_ROW_NUMBER', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'V'], ['ACCOUNT', 'SO', '_LOAD_FILE_TIMESTAMP', '{"type":"TIMESTAMP_NTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', 'TEST', 'VE', '', '2022-02-16'], ['ACCOUNT', 'SO', 'ID', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":false,"fixed":false}', 'NOT_NULL', '', 'COLUMN', '', 'ID of Account', 'V', '', '2022-02-16'],]
df = pd.DataFrame(output)
df.columns = df.iloc[0]
df = df[1:]
last_2_days = '2022-02-15'
query_list = []
for index, row in df.iterrows():
if row['comment'] is not None and row['comment'] != '' and (row['DateTime Comment Added'] >= last_2_days):
comment_data = row['column_name'], row['comment']
query_list.append(comment_data)
when I print out query_list it looks like this, which is getting the correct data since I only want to get the column_name and comment when the DateTime Comment Added column is within the last 2 days of today:
[('_LOAD_FILE_TIMESTAMP', 'TEST'), ('ID', 'ID of Account')]
What I want to do next (and I'm having trouble figuring out how) is from my cols string earlier I want to add the comment from the query_list to the correct column name in cols AND add the word COMMENT before the actual comment
so cols next should look like this:
_LOAD_DATETIME datetime,
_LOAD_FILENAME string,
_LOAD_FILE_ROW_NUMBER int,
_LOAD_FILE_TIMESTAMP datetime COMMENT 'TEST',
ID int COMMENT 'ID of Account'

python replace \n from text

Here is a text file sample:
'15235457345', '', '\n\nR\n\nE\nM\nO\n\nV\nE\nD\n', '1445133666', 'nick', '', '1236500', 'git', '', '', '123face', '2015-10-18 ', '2015-10-23 ', 'name', 'great', 'sha', '\n\nB\n\nU\n\nT\n\nH\nO\nW\n', '1445123147'
I want to remove some pieces like
\n\nR\n\nE\nM\nO\n\nV\nE\nD\n
and
\n\nB\n\nU\n\nT\n\nH\nO\nW\n
I use removed and buthow to figure out the problem, but in real practice these are other words\timestamp etc.
le = ['15235457345', '', '\n\nR\n\nE\nM\nO\n\nV\nE\nD\n', '1445133666', 'nick', '', '1236500', 'git', '', '', '123face', '2015-10-18 ', '2015-10-23 ', 'name', 'great', 'sha', '\n\nB\n\nU\n\nT\n\nH\nO\nW\n', '1445123147']
print [value for value in le if '\n' not in value]
Output:
['15235457345', '', '1445133666', 'nick', '', '1236500', 'git', '', '', '123face', '2015-10-18 ', '2015-10-23 ', 'name', 'great', 'sha', '1445123147']
s='15235457345', '', '\n\nR\n\nE\nM\nO\n\nV\nE\nD\n', '1445133666', 'nick', '', '1236500', 'git', '', '', '123face', '2015-10-18 ', '2015-10-23 ', 'name', 'great', 'sha', '\n\nB\n\nU\n\nT\n\nH\nO\nW\n', '1445123147'
for i in range(0,len(s)):
print s[i].replace('\n','')
Output:
15235457345
REMOVED
1445133666
nick
1236500
git
123face
2015-10-18
2015-10-23
name
great
sha
BUTHOW
1445123147
Hope this is what you are looking for.

Why does csv.DictReader skip empty lines?

It seems that csv.DictReader skips empty lines, even when restval is set. Using the following, empty lines in the input file are skipped:
import csv
CSV_FIELDS = ("field1", "field2", "field3")
for row in csv.DictReader(open("f"), fieldnames=CSV_FIELDS, restval=""):
if not row or not row[CSV_FIELDS[0]]:
sys.exit("never reached, why?")
Where file f is:
1,2,3
a,b,c
Inside the csv.DictReader class:
# unlike the basic reader, we prefer not to return blanks,
# because we will typically wind up with a dict full of None
# values
while row == []:
row = self.reader.next()
So empty rows are skipped.
If you don't want to skip empty lines, you could instead use csv.reader.
Another option is to subclass csv.DictReader:
import csv
CSV_FIELDS = ("field1", "field2", "field3")
class MyDictReader(csv.DictReader):
def next(self):
if self.line_num == 0:
# Used only for its side effect.
self.fieldnames
row = self.reader.next()
self.line_num = self.reader.line_num
d = dict(zip(self.fieldnames, row))
lf = len(self.fieldnames)
lr = len(row)
if lf < lr:
d[self.restkey] = row[lf:]
elif lf > lr:
for key in self.fieldnames[lr:]:
d[key] = self.restval
return d
for row in MyDictReader(open("f", 'rb'), fieldnames=CSV_FIELDS, restval=""):
print(row)
yields
{'field2': '2', 'field3': '3', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}
Unutbu already pointed out to the reason why this is happening, anyways a quick fix will be replace empty lines with ',' before passing them to DictReader then restval will take care of the rest of the things.
CSV_FIELDS = ("field1", "field2", "field3")
with open('test.csv') as f:
lines = (',' if line.isspace() else line for line in f)
for row in csv.DictReader(lines, fieldnames=CSV_FIELDS, restval=""):
print row
#output
{'field2': '2', 'field3': '3', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}
Update:
In case of multi-line empty values the above code won't do it, in that case you can use csv.reader like this:
RESTVAL = ''
with open('test.csv') as f:
for row in csv.reader(f, quotechar='"'):
if not row:
# Don't use `dict.fromkeys` if RESTVAL is a mutable object
# {k: RESTVAL for k in CSV_FIELDS}
print dict.fromkeys(CSV_FIELDS, RESTVAL)
else:
print {k: v if v else RESTVAL for k, v in zip(CSV_FIELDS, row)}
If file contains:
1,2,"
4"
a,b,c
then the output will be:
{'field2': '2', 'field3': '\n\n\n4', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}
This is your file :
1,2,3
,,
,,
a,b,c
I add coma and now he takes two empty lines {'field2': '', 'field3': '', 'field1': ''}
For restval argument, it just say if you have set fields but one is missing, the other values go to this value.
So you set three fields and there are three values each time. But we talk about 'columns' right here and not lines.
Your lines were empty so he skipped it, unless you specify with comas he needs to take empty values, for dictreader.

How to know index of a decimal value in a python list

I have a list like the following
['UIS', '', '', '', '', '', '', '', '', '02/05/2014', 'N', '', '', '', '', '9:30:00', '', '', '', '', '', '', '', '', '31.8000', '', '', '', '', '', '', '3591', 'O', '', '', '', '', '0', '', '', '', '', '', '', '', '', '', '', '', '', '', '0']
Now how to know which element is decimal here , basically I want to track the 31.8000 value from the list. Is it possible ?
You can reliably find if a variable has a floating point number or not, by literal evaluating and checking if it is of type float, like this
from ast import literal_eval
result = []
for item in data:
temp = ""
try:
temp = literal_eval(item)
except (SyntaxError, ValueError):
pass
if isinstance(temp, float):
result.append(item)
print result
# ['31.8000']
If you want to get the indexes, just enumerate the data like this
for idx, item in enumerate(data):
...
...
and while preparing the result, add the index instead of the actual element
result.append(idx)
Iterate over the list and check if float() succeeds:
floatables = []
for i,item in enumerate(data):
try:
float(item)
floatables.append(i)
except ValueError:
pass
print floatables
Alternatively, if you want to match the decimal format you can use
import re
decimals = []
for i,item in enumerate(data):
if re.match("^\d+?\.\d+?$", item) is not None:
decimals.append(i)
print decimals
Using a list comprehension and a regular expression match:
>>> import re
>>> [float(i) for i in x if re.match(r'^[+-]\d+?[.]\d+$',i)]
[31.8]
If you want to tracking the indexes of the floats:
>>> [x.index(i) for i in x if re.match(r'[+-]?\d+?[.]\d+',i)]
[24]
data = ['UIS', '', '', '', '', '', '', '', '', '02/05/2014', 'N', '', '', '', '', '9:30:00', '', '', '', '', '', '', '', '', '31.8000', '', '', '', '', '', '', '3591', 'O', '', '', '', '', '0', '', '', '', '', '', '', '', '', '', '', '', '', '', '0']
import decimal
target = decimal.Decimal('31.8000')
def is_target(input):
try:
return decimal.Decimal(input) == target
except decimal.InvalidOperation, e:
pass
output = filter( is_target, data)
print output

Categories