Delete entries from json that are missing properties - python

I have a json file that contains about 100,000 lines in the following format:
{
"00-0000045": {
"birthdate": "5/18/1975",
"college": "Michigan State",
"first_name": "Flozell",
"full_name": "Flozell Adams",
"gsis_id": "00-0000045",
"gsis_name": "F.Adams",
"height": 79,
"last_name": "Adams",
"profile_id": 2499355,
"profile_url": "http://www.nfl.com/player/flozelladams/2499355/profile",
"weight": 338,
"years_pro": 13
},
"00-0000108": {
"birthdate": "12/9/1974",
"college": "Louisville",
"first_name": "David",
"full_name": "David Akers",
"gsis_id": "00-0000108",
"gsis_name": "D.Akers",
"height": 70,
"last_name": "Akers",
"number": 2,
"profile_id": 2499370,
"profile_url": "http://www.nfl.com/player/davidakers/2499370/profile",
"weight": 200,
"years_pro": 16
}
}
I am trying to delete all the items that do not have a gsis_name property. So far I have this python code, but it does not delete any values (note: I do not want to overwrite the original file)
import json
with open("players.json") as json_file:
json_data = json.load(json_file)
for x in json_data:
if 'gsis_name' not in x:
del x
print json_data

You're deleting x, but x is a copy of the original element in json_data; deleting x won't actually delete it from the object that it was drawn from.
In Python, if you want to filter some items out of a collection your best bet is to copy the items you do want into a new collection.
clean_data = {k: v for k, v in json_data.items() if 'gsis_name' in v}
and then write clean_data to a file with json.dump.

When you say del x, you are unassigning the name x from your current scope (in this case, global scope, since the delete is not in a class or function).
You need to delete it from the object json_data. json.load returns a dict because your main object is an associative array / map / Javascript object. When you iterate a dict, you are iterating over the keys, so x is a key (e.g. "00-0000108"). This is a bug: You want to check whether the value has the key gsis_name.
The documentation for dict shows you how to delete from a dict using the key: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
del d[key]
Remove d[key] from d. Raises a KeyError if key is not in the map.
But as the other answers say, it's better to create a new dict with the objects you want, rather than removing the objects you don't want.

Just create new dict without unwanted elements:
res = dict((k, v) for k, v in json_data.iteritems() if 'gsis_name' in json_data[k])
Since Python 2.7 you could use a dict comprehension.

Related

How do I change all the keys of a Python dictionary, if the values of said keys are nested dictionaries?

I am working on a headache of a college project in Python, and one of the functions basically has to write a dictionary (which cointains nested dictionaries) into a csv file, and later read from the said file, place it into a new dictionary, and return that new dictionary to the user.
Now, the keys of the said dictionary are of type int, but obvsiouly after being read from the file and written into the new dictionary, they turn into type str.
The unittests we were given keep failing my function because of this, as they expect the same value that was given to us originally, so I have been struggling with modifying this. Below is a (hand-crafted) example of that dictionary that I was practicing on, but still couldn't seem to get the change to take place:
dictionary = {
"0": {
"name": "gary",
"last_name": "john",
"age": 13
},
"1": {
"name": "larry",
"last_name": "boyle",
"age": 10
},
"2": {
"name": "banji",
"last_name": "buas",
"age": 20
}
}
for i in dictionary.items(): #this gives us a tuple
change = list(i) #convert tuple to list
change[0] = int(change[0]) #change each str to int
i = tuple(change) #convert back to tuple
This approach only seems to work within the for-loop, but as soon as the loop ends, and I try printing out the 'new' dict, all I get is the same old one. Can anyone help with this problem?

Remove entire JSON object if it contains a specified phrase (from a list in python)

Have a JSON file output similar to:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
},
"object2": {
"json_data": {json data information}",
"tables_data": ""
}
}
Essentially, if there is an empty string for tables_data as shown in object2 (eg. "tables_data": ""), I want the entire object to be removed so that the output would look like:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
}
}
What is the best way to go about doing this? Each of these objects correspond to a separate index in a list that I've appended called summary[].
To achieve this, you could iterate through the JSON dictionary and test the tables_data values, adding the objectX elements to a new dictionary if their tables_data value passes the test:
new_dict = {k: v for k, v in json_dict.items()
if v.get("tables_data", "") != ""}
If your JSON objectX is stored in a list as you say, these could be processed as follows using a list comprehension:
filtered_summary = [object_dict for object_dict in summary
if object_dict.get("tables_data", "") != ""]
Unless you have compelling reasons to do otherwise, the pythonic way to filter out a list or dict (or any iterable) is not to change it in place but to create a new filtered one. For your case this would look like
raw_data = YOUR_DICT_HERE_WHEREVER_IT_COMES_FROM
# NB : empty string have a false value in a boolean test
cleaned_data = {k:v for k, v in raw_data.items() if not v["table_data"]}

How to iterate through dict of dicts

I have the following json data
data_fixt_json =
{"api": {"results": 402,
"fixtures": [{
"fixture_id": 127807,
"league_id": 297,
"homeTeam": {
"team_id": 2279,
"team_name": "Tigres UANL",
"logo":"url"},
"awayTeam": {
"team_id": 2282,
"team_name": "Monterrey",
"logo": "url"},
"goalsHomeTeam": 1,
"goalsAwayTeam": 0,
"score": {
"halftime": "1-0",
"fulltime": "1-0",
"extratime": null,
"penalty": null}}
I need to store in each key:value pairs in variables than use this variables to create objects in my database. I tried the following code
data_json =
date_fixt_json["api"["fixtures"]
for item in data_json:
fixture_id = item["fixture_id"]
league_id = item["league_id"]
But when for loop go up to the dict "homeTeam" my script arrise error. How i can write code which will iterate through my json data and provide me opportunities to store values in variables
If you'd like to iterate over the entries in a dict, you can do:
for fixture in date_fixt_json['api']['fixtures']:
for key, value in fixture.items():
print('key: {}, value: {}'.format(key, value))
There are a few things to think about here.
Do you know the number of items in the array?
If you do, then consider simply using indexing to access the values - in this use case they are similar to variables.
data_array = ["api"]["fixtures"]
fixture_id1 = data_array[0]["fixture_id"]
Do you require variables?
If you absolutely have to use variables, you can use the following concept, however I strongly recommend against doing this:
example = ['k', 'l', 'm', 'n']
for n, val in enumerate(example):
globals()[f"var{n}"] = val
print(var2)
>>> m #Output
Let me know if this helps - happy coding!

Store two values from JSON_file in new dictionary with python

I want to store two values from JSON_file in new dict like this : {[1,2,5]:[0], [1,2,4]:[2]}
my JSON-file looks like this :
{
"index": [
{
"timestamp": "2018-04-17 17:56:25",
"src": "src",
"dst": [1,2,5],
"value": [0],
"datatype": "datatype"
},
{
"timestamp": "2018-04-17 18:00:43",
"src": "src",
"dst": [1,2,4],
"value": [2],
"datatype": "datatype"
}
]
}
I wrote this code:
with open(filename) as feedjson:
json_data = json.load(feedjson)
feedjson.close()
list_dev = {}
for i in json_data["index"]:
key = i['value']
value = i['dst']
list_dev[key] = value
print(list_dev)
I get this error:
list_dev.update({key: value})
TypeError: unhashable type: 'list'
can someone help me to fix this problem please?
This is just for understanding purposes:
Dictionary keys should be immutable as explained here. In the question, [1,2,5] is a list which are mutable(contents can be modified with methods like append,pop, push) data types. So, the only way to use the entire contents of a list as a dictionary key(highly unusual) is to convert it to an immutable data type such as a tuple or string:
new_dict = {} #initialize empty dictionary
dst = t['index'][0]['dst'] #[1,2,5]
value = t['index'][0]['value'] #[0]
new_dict[tuple(dst)] = value #new_dict key "dst" as tuple
print(new_dict)
--->{(1, 2, 5): [0]}
new_dict[str(dst)] = value #new_dict key "dst" as string
print(new_dict)
---->{'[1, 2, 5]': [0]}
value is a list -> [1] or [2] or whatever, list is mutable object so you can't hash it
you can use the element in the list like
key=i['value'][0]
or convert the list to tuple like
key=tuple(i['value'])
both objects are immutable thus can be hashed and used as a key
by the way with provide context manager so you don't need to close the file using feedjson.close(), with will do it for you

For each loop with JSON object python

Alright, so I'm struggling a little bit with trying to parse my JSON object.
My aim is to grab the certain JSON key and return it's value.
JSON File
{
"files": {
"resources": [
{
"name": "filename",
"hash": "0x001"
},
{
"name": "filename2",
"hash": "0x002"
}
]
}
}
I've developed a function which allows me to parse the JSON code above
Function
def parsePatcher():
url = '{0}/{1}'.format(downloadServer, patcherName)
patch = urllib2.urlopen(url)
data = json.loads(patch.read())
patch.close()
return data
Okay so now I would like to do a foreach statement which prints out each name and hash inside the "resources": [] object.
Foreach statement
for name, hash in patcher["files"]["resources"]:
print name
print hash
But it only prints out "name" and "hash" not "filename" and "0x001"
Am I doing something incorrect here?
By using name, hash as the for loop target, you are unpacking the dictionary:
>>> d = {"name": "filename", "hash": "0x001"}
>>> name, hash = d
>>> name
'name'
>>> hash
'hash'
This happens because iteration over a dictionary only produces the keys:
>>> list(d)
['name', 'hash']
and unpacking uses iteration to produce the values to be assigned to the target names.
That that worked at all is subject to random events even, on Python 3.3 and newer with hash randomisation enabled by default, the order of those two keys could equally be reversed.
Just use one name to assign the dictionary to, and use subscription on that dictionary:
for resource in patcher["files"]["resources"]:
print resource['name']
print resource['hash']
So what you intend to do is :
for dic in x["files"]["resources"]:
print dic['name'],dic['hash']
You need to iterate on those dictionaries in that array resources.
The problem seems to be you have a list of dictionaries, first get each element of the list, and then ask the element (which is the dictionary) for the values for keys name and hash
EDIT: this is tested and works
mydict = {"files": { "resources": [{ "name": "filename", "hash": "0x001"},{ "name": "filename2", "hash": "0x002"}]} }
for element in mydict["files"]["resources"]:
for d in element:
print d, element[d]
If in case you have multiple files and multiple resources inside it. This generalized solution works.
for keys in patcher:
for indices in patcher[keys].keys():
print(patcher[keys][indices])
Checked output from myside
for keys in patcher:
... for indices in patcher[keys].keys():
... print(patcher[keys][indices])
...
[{'hash': '0x001', 'name': 'filename'}, {'hash': '0x002', 'name': 'filename2'}]

Categories