Make arrays the same length, append json string - python

I have a Python script that adds to a list:
column = defaultdict(list)
[...]
for line in out.splitlines():
column[i + 1].append({"row": str(line)})
[...]
f = open(save_dir + 'table_data.json', "w+")
f.write(json.dumps(column))
f.close()
This will ultimately generate a JSON file, with a string like below:
{ "1":[
{
"row":"Product/Descriptian"
}
],
"2":[
{
"row":"Qty/unit"
},
{
"row":"Text"
}
],
"3":[
{
"row":""
}
]}
As you can see, array["2"] have two values. I am trying to make all arrays the same length. So array["1"] and array["3"] will ultimately also have two values.
So in order to do this, I figure I have to find the longest array first:
longest_array = (max(map(len, column.values())))
This should return 2. Now I want to append an empty {"row":""} to the other arrays, to make it the same length:
final = ([v + ["{'row'}: ''"] * (longest_array - len(v)) for v in column.values()])
Which outputs below JSON string:
[
[
{
"row":"Product/Descriptian"
},
{
"row":""
}
],
[
{
"row":"Qty/unit"
},
{
"row":"Text"
}
],
[
{
"row":""
},
{
"row":""
}
]
]
This seems to work partially. However, I spot two errors in the newly created JSON string:
It seems to add another array around the first array. The JSON string now starts with [ [ {
It removes the "parent" arrays "1", "2" and "3"

The culprit is in line:
final = ([v + ["{'row'}: ''"] * (longest_array - len(v)) for v in column.values()])
which:
It's a list comprehension (instead of dict comprehension): by iterating on column.values(), you lose all the keys, and all the lists corresponding to values have been "packed" in an outer (master) list
Not sure what you try to achieve by the double quotes (") in ["{'row'}: ''"]: that's a list with one string element
To solve your problem, turn the above line into:
final = {k: v + [{'row': ''}] * (longest_array - len(v)) for k, v in column.items()}
and final will become the expected dictionary:
>>> column
defaultdict(<class 'list'>, {'1': [{'row': 'Product/Descriptian'}], '2': [{'row': 'Qty/unit'}, {'row': 'Text'}], '3': [{'row': ''}]})
>>>
>>> longest_array_len = max((len(v) for v in column.values()))
>>> longest_array_len
2
>>> final = {k: v + [{'row': ''}] * (longest_array_len - len(v)) for k, v in column.items()}
>>>
>>> final
{'1': [{'row': 'Product/Descriptian'}, {'row': ''}], '2': [{'row': 'Qty/unit'}, {'row': 'Text'}], '3': [{'row': ''}, {'row': ''}]}

Related

Optimal code to check and append an item in a list if present in a list of dictionaries

These are my inputs, pretty-printed for better readability
input1 = [{
"ID": "1",
"SequenceNum": 1
},
{
"ID": "2",
"SequenceNum": 2
},
{
"ID": "3",
"SequenceNum": 3
},
{
"ID": "4",
"SequenceNum": 4
}]
input2 = ['4', '1']
The values contained in input2 are basically the values of the 'ID' key seen in input1
The output will be a list of dictionaries where input1[index]['ID'] == input2[index_element].
Expected output -> [{"ID": "4","SequenceNum": 4},{"ID": "1","SequenceNum": 1}]
I have solved this using the following lines of code:
match_list = []
for idx,val in enumerate(input1):
match_list.append(val['ID'])
return_list = []
for idx,val in enumerate(input2):
if val in match_list:
get_idx = match_list.index(val)
return_list.append(input1[get_idx])
While it works it doesn't feel like the most optimal nor the cleanest way to write code. I apologize for the basic question, I am not a very experienced programmer.
IIUC, you could do:
s = set(input2)
res = [d for d in input1 if d["ID"] in s]
print(res)
Output
[{'ID': '1', 'SequenceNum': 1}, {'ID': '4', 'SequenceNum': 4}]
This has an expected linear complexity.
If the order with respect to input2 needs to be kept, you could do:
lookup = {d["ID"]: d for d in input1}
res = [lookup[i] for i in input2 if i in lookup]
print(res)
This also has an expected linear complexity.

TypeError: string indices must be integers while extracting the keys

Dictionary is as the following
my = {
"1": {
"first": 'A,B',
"column": "value",
"test":"test",
"output": "Out1",
"second": "Cost",
"Out2": "Rev"
},
"2": {
"first": 'None',
"column": "value",
"test":"test",
"output": "Out2",
"Out2": "Rev"
}
}
Code I tried is the following
{k:{l:l[i] for i in ['first','test'] for l,m in v.items()} for k,v in my.items()}
I am trying to extract only two ['first','test'] keys, there is a change of ['first','test'] not exist also.
I am getting
TypeError: string indices must be integers. WHat is the problem with code
Let's take one of the subdictionaries to understand what is going wrong here.
"1": {
"first": 'A,B',
"column": "value",
"test":"test",
"output": "Out1",
"second": "Cost",
"Out2": "Rev"
},
{k:{l:l[i] for i in ['first','test'] for l,m in v.items()} for k,v in my.items()}
The variable k in your code will be the key "1" and the value v will be the subdictionary.
Then, when you do "l", "l" is actually the dictionary keys which are strings e.g. "first", "test". Then, when you try doing l:l[i], you are actually trying to index the string "first" and you aren't using an integer value to index the string but you are passing a string value - so you are doing "first"["first"].
That is why you see a TypeError with the message "string indices must be integers".
If you want a clever one liner, this should work
{
key: {sub_key:sub_dict[sub_key] for sub_key in ["first", "test"]}
for key, sub_dict in my.items()
}
Personally, I would write
selected_dict = dict()
for key, value in my.items():
for sub_key in ["first", "test"]:
selected_dict[sub_key] = value[sub_key]
Both of them could work:
print({k: {"first": v["first"], "test": v["test"]} for k, v in my.items()})
print({k: {i: v[i] for i in ['first','test']} for k, v in my.items()})
This works as expected
print({x: {i: y[i] for i in ['first', 'test']} for x, y in my.items()})
output
{'1': {'first': 'A,B', 'test': 'test'}, '2': {'first': 'None', 'test': 'test'}}

Sorting a Dictionary by Nested Key

Consider a dict of the following form:
dic = {
"First": {
3: "Three"
},
"Second": {
1: "One"
},
"Third": {
2:"Two"
}
}
I would like to sort it by the nested dic key (3, 1, 2)
I tried using the lambda function in the following manner but it returns a "KeyError: 0"
dic = sorted(dic.items(), key=lambda x: x[1][0])
The expected output would be:
{
"Second": {
1: "One"
},
"Third": {
2: "Two"
},
"First": {
3:"Three"
}
}
In essence what I want to know is how to designate a nested key independently from the main dictionary key.
In the lambda function, x is a key-value pair, x[1] is the value, which is itself a dictionary. x[1].keys() is its keys, but it needs to be turned into a list if you want to get its one and only item by its index. Thus:
sorted(dic.items(), key = lambda x: list(x[1].keys())[0])
which evaluates to:
[('Second', {1: 'One'}), ('Third', {2: 'Two'}), ('First', {3: 'Three'})]
dic = {'First': {3: 'Three'}, 'Second': {1: 'One'}, 'Third': {2: 'Two'}}
sorted_list = sorted(dic.items(), key=lambda x:list(x[1].keys())[0])
sorted_dict = dict(sorted_list)
print(sorted_dict)
You need to get the keys for the nested dictionary first and then convert them into list and sort over its first index. You will get a sorted list. All you need to convert this list to dictionary using dict(). I hope that helps. This snippet works for python3.

Dictionary to JSON object conversion

I have 2 long lists (extracted from a csv) both of the same index length.
Example:
l1 = ['Apple','Tomato','Cocos'] #name of product
l2 = ['1','2','3'] #some id's
I made my dictionary with this method:
from collections import defaultdict
d = defaultdict(list)
for x in l1:
d['Product'].append(x)
for y in l2:
d['Plu'].append(y)
print d
This will output:
{'Product': ['Apple', 'Tomato', 'Cocos'], 'Plu': ['1', '2', '3']}
(Product and Plu are my wanted keys)
Now I've tried to import this to a JavaScript Object like this:
import json
print(json.dumps(d, sort_keys=True, indent=4))
This will output:
{
"Plu": [
"1",
"2",
"3"
],
"Product": [
"Apple",
"Tomato",
"Cocos"
]
}
But my desired output is this:
{
Product:'Apple',
Plu:'1'
},
{
Product:'Tomato',
Plu:'2'
},
{
Product:'Cocos',
Plu:'3'
}
I will later use that to insert values in a MongoDB. What will I have to change in my json.dump (or in my dict?) in order to get a desired output? Also is there a way to save the output in a txt file? (since I will have a big code).
Rather than using a defaultdict (which doesn't buy you anything in this case), you're better off zipping the lists and creating a dict from each pair:
[{'Product': product, 'Plu': plu} for product, plu in zip(l1, l2)]

Remove items from list of dicts with a matching attribute

I have a list of dicts for instance:
data = [
{ 'id': 1 },
{ 'id': 2 },
{ 'id': 3 },
{ 'id': 4 },
{ 'id': 5 },
]
remove_ids = [3,4]
So I'd like to apply remove_ids to list and end up with only:
list = [
{ 'id': 1 },
{ 'id': 2 },
{ 'id': 5 },
]
I was thinking something along the lines of:
data.remove([item (if item['id'] in remove_ids) for k, item in data])
Obviously this doesn't work, but I'm interested to know whether I was even close. I was also interested to see if this is even possibly in a single line.
data = [d for d in data if d['id'] not in remove_ids]
new_data=[x for x in data if x['id'] not in remove_ids]
You could use filter.
remove_ids = (3, 4)
filtered_data = filter(lambda item: item['id'] not in remove_ids, data)
If data is large or you do this very frequently, you might also get benefit out of itertools.
from itertools import ifilterfalse
remove_ids = (3, 4)
filtered_data = tuple(ifilterfalse(lambda item: item['id'] in remove_ids, data))

Categories