Compare difference in two json files and out put the difference - python

I am trying to compare difference between two json files and output the list of r_id values which are present in file a but not in file b.
Json files which i am trying to compare
File a =
{“r_id”:”123”,"RefNumber”:”2341234131","amount":"22.99”},
{“r_id”:”345”,"RefNumber”:”2341234131","amount":"22.99”},
{“r_id”:”678”,"RefNumber”:”2341234131","amount":"22.99”}
File b =
{“name” : “James”, "id" : “123”, “class” : “1A”},
{“name” : “Sam”,"id" : “345”, “class” : “1A”},
{“name” : “Jen”,"id" : “005”, “class” : “1A”}
Comparison should be based on id's in both files. Expecting following output in difference file
{“r_id”:”678”,"RefNumber”:”2341234131","amount":"22.99”}

This will work if ids are not in order and jsons don't have equal items.
import json
with open("json_a.json","r") as first, open("json_b.json","r") as second :
b = json.load(first,object_pairs_hook=lambda x: x[0])
c = json.load(second,object_pairs_hook=lambda x: x[1])
b = [ _[1] for _ in b]
c = [ _[1] for _ in c]
with open("json_a.json","r") as first:
for each_line in json.load(first):
for uniq_id in list(set(b).difference(c)):
if each_line['r_id']== uniq_id :
print(each_line)
Another approach:
import json
with open("json_a.json","r") as first, open("json_b.json","r") as second :
b = json.load(first)
c = json.load(second)
b_ids=[x['r_id'] for x in b]
c_ids=[x['id'] for x in c]
for each_item in b:
for uniq_id in list(set(b_ids).difference(c_ids)):
if each_item['r_id'] == uniq_id:
print(each_item)
Write to file:
# Serializing json
json_object = json.dumps(each_item)
# Writing to sample.json
with open("sample.json", "w") as outfile:
outfile.write(json_object)
More details about file writing options can be found here.

Try this Code :
import json
a = ['{"r_id":"123","RefNumber":"2341234131","amount":"22.99"}',
'{"r_id":"345","RefNumber":"2341234131","amount":"22.99"}',
'{"r_id":"678","RefNumber":"2341234131","amount":"22.99"}'
]
b = [ '{"name" : "James", "id" : "123", "class" : "1A"}',
'{"name" : "Sam", "id" : "345", "class" : "1A"}',
'{"name" : "Jen", "id" : "005", "class" : "1A"}'
]
for i in range(len(a)):
y = json.loads(a[i])
z = json.loads(b[i])
if y["r_id"] != z["id"]:
print(a[i])
Output :
{"r_id":"678","RefNumber":"2341234131","amount":"22.99"}
Before working with json files the file should be like below format :
[{"r_id":"123","RefNumber":"2341234131","amount":"22.99"},
{"r_id":"345","RefNumber":"2341234131","amount":"22.99"},
{"r_id":"678","RefNumber":"2341234131","amount":"22.99"}
]
Try with this code(with files):
import json
with open('file1.json','r') as a:
data1 = a.read()
obj1 = json.loads(data1)
with open('file2.json','r') as a:
data2 = a.read()
obj2 = json.loads(data2)
count = 0
for i in obj1:
a = obj2[count]
if i["r_id"] != a["id"]:
print(i)
count = count + 1
Output is same as above.

Related

Dynamic list creation and append values - python

I have a input data that is parsed from a json and printing the output like this from keys like tablename,columnname,columnlength
data = ('tablename', 'abc.xyz'),('tablename','abc.xyz'),('columnname', 'xxx'),('columnname', 'yyy'),('columnlen', 55)
data[0] =
abc.xyz
abc.xyz
abc.xyz
data[1] =
xxx
yyy
zzz
data[2] =
20
30
60
data[0] represents tablename
data[1] represents columnname
data[2] represents column length
I have code below that does creating the empty list manually
TableName_list = []
ColumnName_list = []
ColumnLen_list = []
for x in data:
if x[0] == 'tablename':
TableName_list.append(data[0]])
elif x[0] == 'columnname':
ColumnName_list.append(data[1])
elif x[0] == 'columnlen':
ColumnLen_list.append(data[2])
I need to create a dynamic empty list respectively for each fields(tablename,column,columnlength) and append the data to that empty list in the dictionary
and my output is needed like this in a dictionary
dict = {'TableName':TableName_list,'ColumnName':ColumnName_list,'ColumnLen':columnLength_list }
This is probably most easily done with a defaultdict:
from collections import defaultdict
dd = defaultdict(list)
data = [
('tablename', 'abc.xyz'),('tablename','abc.xyz'),
('columnname', 'xxx'),('columnname', 'yyy'),
('columnlen', 55),('columnlen', 30)
]
for d in data:
dd[d[0]].append(d[1])
Output:
defaultdict(<class 'list'>, {
'tablename': ['abc.xyz', 'abc.xyz'],
'columnname': ['xxx', 'yyy'],
'columnlen': [55, 30]
})
If the case of the names in the result is important, you could use a dictionary to translate the incoming names:
aliases = { 'tablename' : 'TableName', 'columnname' : 'ColumnName', 'columnlen' : 'ColumnLen' }
for d in data:
dd[aliases[d[0]]].append(d[1])
Output:
defaultdict(<class 'list'>, {
'TableName': ['abc.xyz', 'abc.xyz'],
'ColumnName': ['xxx', 'yyy'],
'ColumnLen': [55, 30]
})
I suggest to make a dictionary directly, something look like this:
out_dict = {}
for x in data:
key = x[0]
if key in out_dict.keys():
out_dict[key] = out_dict[key].append(x[1])
else:
out_dict[key] = [x[1]]
using pandas:
import pandas as pd
>>> pd.DataFrame(data).groupby(0)[1].apply(list).to_dict()
'''
{'columnlen': [55, 30],
'columnname': ['xxx', 'yyy'],
'tablename': ['abc.xyz', 'abc.xyz']}

Duplicate values in a dictionary

I am trying to read through a csv file in the following format:
number,alphabet
1,a
2,b
3,c
2,b
1,a
My code to create a dictionary:
alpha = open('alpha.csv','r')
csv_alpha = csv.reader(alpha)
alpha_file = {row[0]:row[1] for row in csv_alpha}
OUTPUT:
alpha_file = { 1:'a', 2:'b', 3:'c' }
By looking at the file, 1 and 2 have duplicate values.
How can i possibly change my output to :
alpha_file = { 1:'a', 1:'a', 2:'b', 2:'b', 3:'c' }
LNG - PYTHON
use a list to hold key's value
alpha = open('alpha.csv','r')
csv_alpha = csv.reader(alpha)
alpha_file = dict()
for row in csv_alpha:
if row[0] in alpha_file:
alpha_file[row[0]].append(row[1])
else:
alpha_file[row[0]] = [row[1]]
the output will be like:
{ 1:['a','a'],2:['b','b'], 3:['c'] }
to output the number of key occurrences, use a for loop
d = { 1:['a','a'],2:['b','b'], 3:['c'] }
amount = []
for key, value in d.iteritems():
amount += [key] * len(value)
print amount
output looks like:
[1, 1, 2, 2, 3]

Adding to JSON in Python and converting to an object

I have a JSON array shown below.
[
"3D3iAR9M4HDETajfD79gs9BM8qhMSq5izX",
"35xfg4UnpEJeHDo55HNwJbr1V3G1ddCuVA"
]
I would like to add a value in the form of the string (self.tx_amount_5) so I get a JSON OBJECT something like this:
{
"3D3iAR9M4HDETajfD79gs9BM8qhMSq5izX" : 100000
"35xfg4UnpEJeHDo55HNwJbr1V3G1ddCuVA" : 100000
}
The part of code that has generated the first JSON array is:
r = requests.get('http://api.blockcypher.com/v1/btc/main/addrs/A/balance')
balance = r.json()['balance']
with open("Entries#x1.csv") as f,open("winningnumbers.csv") as nums:
nums = set(imap(str.rstrip, nums))
r = csv.reader(f)
results = defaultdict(list)
for row in r:
results[sum(n in nums for n in islice(row, 1, None))].append(row[0])
self.number_matched_0 = results[0]
self.number_matched_1 = results[1]
self.number_matched_2 = results[2]
self.number_matched_3 = results[3]
self.number_matched_4 = results[4]
self.number_matched_5 = results[5]
self.number_matched_5_json = json.dumps(self.number_matched_5, sort_keys = True, indent = 4)
print(self.number_matched_5_json)
if len(self.number_matched_3) == 0:
print('Nobody matched 3 numbers')
else:
self.tx_amount_3 = int((balance*0.001)/ len(self.number_matched_3))
if len(self.number_matched_4) == 0:
print('Nobody matched 4 numbers')
else:
self.tx_amount_4 = int((balance*0.1)/ len(self.number_matched_4))
if len(self.number_matched_5) == 0:
print('Nobody matched 3 numbers')
else:
self.tx_amount_5 = int((balance*0.4)/ len(self.number_matched_5))
If I understand correctly, you can create the dictionary like this:
import json
s="""[
"3D3iAR9M4HDETajfD79gs9BM8qhMSq5izX",
"35xfg4UnpEJeHDo55HNwJbr1V3G1ddCuVA"
]"""
d = {el: self.tx_amount_5 for el in json.loads(s)}
print(d)
which produces
{'3D3iAR9M4HDETajfD79gs9BM8qhMSq5izX': 100000,
'35xfg4UnpEJeHDo55HNwJbr1V3G1ddCuVA': 100000}

loop is not working when I try to read a Json file and a text file with python

I have a json file with objects and a text file with several groups (Each group have 5 numbers and I have them in a list this way: the first number of each group are in list 1, the second number of each group, are in list 2, etc). I basically have to match each object of the json with each group I created. The problem is that Im getting as result the last element from the Json. The groups from the text file are created in the correct way.
This is my code:
import json
NUM_LIST = 5
index = 0
def report(a, b, c, d, e, index):
json_file = 'json_global.json'
json_data = open(json_file)
data = json.load(json_data)
i = 0
index = 0
item = 0
cmd = " "
ind = 0
for node in data:
for i in range(0, 5):
item = data[i]['item']
cmd = data[i]['command']
index+= 1
print item, cmd, a, b, c, d, e
f = open("Output.txt", "r")
lines = [line.rstrip() for line in f if line != "\n"]
NUM_LISTS = 5
groups = [[] for i in range(NUM_LISTS)]
listIndex = 0
for line in lines:
if "Transactions/Sec for Group" not in line:
groups[listIndex].append(float(line))
listIndex += 1
if listIndex == NUM_LISTS:
listIndex = 0
value0 = groups[0]
value1 = groups[1]
value2 = groups[2]
value3 = groups[3]
value4 = groups[4]
for i in range(0, 5):
a = value0[i]
b = value1[i]
c = value2[i]
d = value3[i]
e = value4[i]
i += 1
report(a, b, c, d, e, index)
The Json file looks like:
[
{
"item": 1,
"command": "AA"
},
{
"item": 2,
"command": "BB",
},
{
"item": 3,
"command": "CC",
},
{
"item": 4,
"command": "DD",
},
{
"item": 5,
"command": "EE",
}
]
The text file looks like this:
Transactions/Sec for Group = AA\CODE1\KK
1011.5032
2444.8864
2646.6893
2740.8531
2683.8178
Transactions/Sec for Group = BB\CODE1\KK
993.2360
2652.8784
3020.2740
2956.5260
3015.5910
Transactions/Sec for Group = CC\CODE1\KK
1179.5766
3271.5700
4588.2059
4174.6358
4452.6785
Transactions/Sec for Group = DD\CODE1\KK
1112.2567
3147.1466
4014.8404
3913.3806
3939.0626
Transactions/Sec for Group = EE\CODE1\KK
1205.8499
3364.8987
4401.1702
4747.4354
4765.7614
The logic in the body of the program works fine. The groups appears ok, but instead of having the list from 1 to 5 from the Json file, is appearing everything with the number 5 command EE. Instead should appear: Item 1, 2, 3, 4, 5, with their commands
My list 1 will have the numbers: 1011.5032, 993.2360, 1179.5766, 1112.2567, 1205.8499.
My list 2 will have the numbers: 2444.8864, 2652.8784, 3271.5700, 3147.1466,
The python version I'm using is 2.6
Based on your explanation it's hard to tell what you're trying to do -- do you mean the nested loop below? The inner loop executes 5 times, but in every iteration it overwrites the previous values for item and cmd.
for node in data:
for i in range(0, 5):
item = data[i]['item']
cmd = data[i]['command']
index+= 1
Try printing the values each time the inner loop executes:
for node in data:
for i in range(0, 5):
item = data[i]['item']
cmd = data[i]['command']
print item, cmd
index+= 1
I think this code is your problem:
for node in data:
for i in range(0, 5):
item = data[i]['item']
cmd = data[i]['command']
Item will always be "5" and command will always be "EE" after this executes. Perhaps your indents are off for the code beneath it, and that code is supposed to be within the loop?

Scrapy with a nested array

I'm new to scrapy and would like to understand how to scrape on object for output into nested JSON. Right now, I'm producing JSON that looks like
[
{'a' : 1,
'b' : '2',
'c' : 3},
]
And I'd like it more like this:
[
{ 'a' : '1',
'_junk' : [
'b' : 2,
'c' : 3]},
]
---where I put some stuff in _junk subfields to post-process later.
The current code under the parser definition file in my scrapername.py is...
item['a'] = x
item['b'] = y
item['c'] = z
And it seemed like
item['a'] = x
item['_junk']['b'] = y
item['_junk']['c'] = z
---might fix that, but I'm getting an error about the _junk key:
File "/usr/local/lib/python2.7/dist-packages/scrapy/item.py", line 49, in __getitem__
return self._values[key]
exceptions.KeyError: '_junk'
Does this mean I need to change my items.py somehow? Currently I have:
class Website(Item):
a = Field()
_junk = Field()
b = Field()
c = Field()
You need to create the junk dictionary before storing items in it.
item['a'] = x
item['_junk'] = {}
item['_junk']['b'] = y
item['_junk']['c'] = z

Categories