I have two different files which consist of dictionaries. I am trying to loop through the key ('name') only , value in the first file of dictionaries and match them with the second file. I seem to be getting the wrong output as it loops through both keys 'name' and 'size'. I have looked at a few ways of doing this but i don't want to be able to convert my dictionary to a set. I want to be able to print out either "match" or "no match". I have done the following so far :
def compare_files():
with open('new.json', 'r') as current_data_file, open('old.json','r') as pre_data_file:
for current_data, previous_data in zip(current_data_file, pre_data_file):
data_current = json.loads(current_data)
data_previous = json.loads(previous_data)
for key, value in data_current.items():
if value not in data_previous:
print "No Match"
else:
print "Match"
These are my two json files that i am loading :
old.json
{"name": "d.json", "size": 1000}
{"name": "c.json", "size": 1000}
{"name": "b.json", "size": 1000}
new.json
{"name": "a.json", "size": 1000}
{"name": "b.json", "size": 1000}
{"name": "c.json", "size": 1000}
data_current is :
{u'size': 1000, u'name': u'a.json'}
{u'size': 1000, u'name': u'b.json'}
{u'size': 1000, u'name': u'c.json'}
data_previous is :
{u'size': 1000, u'name': u'd.json'}
{u'size': 1000, u'name': u'c.json'}
{u'size': 1000, u'name': u'b.json'}
Output :
No Match
No Match
No Match
No Match
No Match
No Match
My expected output is :
No Match
Match
Match
b.json and c.json exists in both , but a.json and d.json does not.
To save yourself from troubles you can directly read the data using pandas(a third party library) and can do the analysis very easily
import pandas as pd
df=pd.DataFrame('new.json')
df2=pd.DataFrame('old.json')
df.name.isin(df2.name).replace({False:'No Match',True:'Match'}).tolist()
Output
['No Match', 'Match', 'Match']
There are a couple problems in your code.
When you do if value not in data_previous: you actually check if value is in keys of data_previous, not in its values.
When you do zip(current_data_file, pre_data_file), you actually looking at corresponding pairs of the two dictionaries. Here, you have 3 dictionaries with 2 keys in each, that's why you have 6 output lines instead of 3. In other words, you are looking up the data in pairs, not comparing every dictionary in a data to all the others in the other data.
Here's a sample code:
def compare_files():
with open('new.json', 'r') as current_data_file, open('old.json','r') as pre_data_file:
# load both data
data_currents = [json.loads(line) for line in current_data_file]
data_previous = [json.loads(line) for line in pre_data_file]
# store the previous names for convenient lookup
pre_names = set([data["name"] for data in data_previous])
# loop through all current data for matching names
for data in data_currents:
print("Match" if data["name"] in pre_names else "No Match")
You for each "current" item you have to compare with all "previous" items, not just with the one in the same position (which is what "zip" would help you achieve)
data_current = [{"name": "d.json", "size": 1000},
{"name": "c.json", "size": 1000},
{"name": "b.json", "size": 1000}]
data_previous = [{"name": "a.json", "size": 1000},
{"name": "b.json", "size": 1000},
{"name": "c.json", "size": 1000}]
for current in data_current:
result = "No Match"
for previous in data_previous:
if current["name"] == previous["name"]:
result = "Match"
print(result)
EDIT: If you want to check the items of current against previous and also previous against current, you could do the following (I've added some text to the prints clarify what is happening)
checks_to_run = [
{
"from": data_current,
"from_name": "current", #Added for transparency
"against": data_previous,
"against_name": "previous", #Added for transparency
},
{
"from": data_previous,
"from_name": "previous", #Added for transparency
"against": data_current,
"against_name": "current", #Added for transparency
}
]
for check_to_run in checks_to_run:
for check_from in check_to_run["from"]:
result = "No Match"
for check_against in check_to_run["against"]:
if check_from["name"] == check_against["name"]:
result = "Match"
print("result for item {} from {} compared to items in {}: {}".format(check_from["name"],
check_to_run["from_name"],
check_to_run["against_name"],
result))
Related
I'm pretty new to programming, how would I select a certain part of the JSON file and have it display the value? For example:
{
"name": "John",
"age": 18,
"state": "New York"
}
What would be the code in Python needed to get the value of any of the items by giving it the keyword? (i.e. I give it "name" and the output displays "John")
Put the title in brackets.
myJSON = #that JSON
print(myJSON["name"]) #prints John
print(myJSON["age"]) #prints 18
print(myJSON["state"]) #prints New York
Hi I'm trying to solve a problem. I have a json list of products, ex:
[{
"id": 5677240,
"name": "Cønjuntø de Pænelæs æntiæderentes ¢øm 05 Peçæs Pæris",
"quantity": 21,
"price": "192.84",
"category": "Panelas"
},
{
"id": 9628920,
"name": "Lava & Seca 10,2 Kg Sæmsung E¢ø ßußßle ßræn¢æ ¢øm 09 Prøgræmæs de Lævægem",
"quantity": 57,
"price": 3719.70,
"category": "Eletrodomésticos"
}]
But I basically need the "price" to be float like the second product. I have a large list of these products
(Ignore the weird characters I managed to fix it with help from a teacher.) I converted them to python object using this
import json
with open('br2.json', 'r', encoding='utf8') as json_data:
data = json.load(json_data)
I've tried something like this but it doesn't work
for product in data:
product["price"] = product["price"].replace(",", "")
I want to replace the values that are in string with the "" to float
thanks in advance sorry I'm new to python so I don't understand much
You can convert a string to float with float(). So instead of your replace line, try:
product['price'] = float(product['price'])
I have downloaded 5MB of a very large json file. From this, I need to be able to load that 5MB to generate a preview of the json file. However, the file will probably be incomplete. Here's an example of what it may look like:
[{
"first": "bob",
"address": {
"street": 13301,
"zip": 1920
}
}, {
"first": "sarah",
"address": {
"street": 13301,
"zip": 1920
}
}, {"first" : "tom"
From here, I'd like to "rebuild it" so that it can parse the first two objects (and ignore the third).
Is there a json parser that can infer or cut off the end of the string to make it parsable? Or perhaps to 'stream' the parsing of the json array, so that when it fails on the last object, I can exit the loop? If not, how could the above be accomplished?
If your data will always look somewhat similar, you could do something like this:
import json
json_string = """[{
"first": "bob",
"address": {
"street": 13301,
"zip": 1920
}
}, {
"first": "sarah",
"address": {
"street": 13301,
"zip": 1920
}
}, {"first" : "tom"
"""
while True:
if not json_string:
raise ValueError("Couldn't fix JSON")
try:
data = json.loads(json_string + "]")
except json.decoder.JSONDecodeError:
json_string = json_string[:-1]
continue
break
print(data)
This assumes that the data is a list of dicts. Step by step, the last character is removed and a missing ] appended. If the new string can be interpreted as JSON, the infinite loop breaks. Otherwise the next character is removed and so on. If there are no characters left ValueError("Couldn't fix JSON") is raised.
For the above example, it prints:
[{'first': 'bob', 'address': {'zip': 1920, 'street': 13301}}, {'first': 'sarah', 'address': {'zip': 1920, 'street': 13301}}]
For the specific structure in the example we can walk through the string and track occurrences of curly brackets and their closing counterparts. If at the end one or more curly brackets remain unmatched, we know that this indicates an incomplete object. We can then strip any intermediate characters such as commas or whitespace and close the resulting string with a square bracket.
This method ensures that the string is only parsed twice, one time manually and one time by the JSON parser, which might be advantageous for large text files (with incomplete objects consisting of many characters).
brackets = []
for i, c in enumerate(string):
if c == '{':
brackets.append(i)
elif c == '}':
brackets.pop()
if brackets:
string = string[:brackets[0]].rstrip(', \n')
if not string.endswith(']'):
string += ']'
I am using VBA to get some user input data out from Excel to python, the data entries that cause me problems look like this, they are in file.json
[{"ExclusionList_var": [
[
"Exclusion List",
"Name",
"Rank",
"Rec",
"Min",
"Max",
"Relative",
"Active",
"Date",
"Dupe?",
"Comment"
],
[
"6076146",
,
,
,
-0.002,
0.002,
"Y",
,
null,
"",
"Mega Structure"
]]}]
It looks like as the missing values are causing the problem, in python I just do this.
with open("file.json") as json_file:
data = json.load(json_file)
Is there are default case to detect those values and set them to " " or null? I cannot do this at the json creation side as this is direct user input.
EDIT: There is not really an easy way to "fix" the JSON before loading it, is there a way to create it properly on the creation side of the block?
=======
There were a few things wrong with the JSON block which is causing errors doing a string to JSON conversion
A few things: the lines that are just commas (,) are not proper JSON, if you need empty spaces, I would suggest using an empty string ("")
The second part is that the brackets and braces at the bottom are not in the same order that you have at the top. Here is a working JSON object:
[
{
"ExclusionList_var": [
[
"Exclusion List",
"Name",
"Rank",
"Rec",
"Min",
"Max",
"Relative",
"Active",
"Date",
"Dupe?",
"Comment"
],
[
"6076146",
"",
"",
"",
-0.002,
0.002,
"Y",
"",
null,
"",
"Mega Structure"
]
]
}
]
I'm trying to iterate over a JSON list to print out all of the results of the following:
"examples": [
{
"text": "carry all of the blame"
},
{
"text": "she left all her money to him"
},
{
"text": "we all have different needs"
},
{
"text": "he slept all day"
},
{
"text": "all the people I met"
},
{
"text": "10% of all cars sold"
}
],
I've tried to iterate over it by doing:
iterator = 0
json_example = str(json_data['results'][0]['lexicalEntries'][0]['entries'][0]['senses'][0]['examples'][iterator]['text']).capitalize()
for i in json_example:
print(i)
iterator += 1
But this is only printing each letter of the first example, as oppose to the entire example, followed by other entire examples.
Can I iterate over these as I would like to, or do I need to create separate variables with each example?
Following your code and example, it looks like what you need is :
for example in json_data['results'][0]['lexicalEntries'][0]['entries'][0]['senses'][0]['examples']:
print(example["text"])
In your code, by doing json_data['results'][0]['lexicalEntries'][0]['entries'][0]['senses'][0]['examples'][iterator]['text'] you were only accessing the iteratorth item, so, always the first one (iterator=0), and then iterating on the content of the "text" member.
Only index the json data out to 'examples':
json_example = json_data['results'][0]['lexicalEntries'][0]['entries'][0]['senses'][0]['examples']
then treat each element of 'examples' like a dictionary:
for dictionary in json_example:
for key in dictionary:
print(dictionary[key])
This will print out each value correlated with the key 'text', like you want.