Conditional random selection - python

So i have a JSON file that is populated with recipes, a sample from the file:
{
"id": 6,
"name": "Lobster roll",
"type": "fish",
"ingredients":[
{"item": "Lobster","amount": 0.5},
{"item": "Baguette","amount": 8},
{"item": "Garlic","amount": 2}
]
},
{
"id": 7,
"name": "Potato and leaks soup",
"type": "vegetarian",
"ingredients":[
{"item": "Water","amount": 0.5},
{"item": "Potato","amount": 8},
{"item": "Onion","amount": 2}
]
}
What i want to achieve is to select random recipes from the JSON-file, for example 7 recipes. But with the condition that at least 25% of them should be of the type: fish. As every recipe is tagged with what type it is in the JSON-file.
I am using the random.sample() function, so how do i get the condition in play here?
import json
import random
with open('recipes.json') as json_file:
data = json.load(json_file)
random = random.sample(data, 7)
for i in range(0, len(random)):
print(random[i]["name"])

Related

Remove duplicate JSON keys in python

I have a 2M entries json that looks like this with some keys as integers that occasionally repeat and I'm trying to figure out how to remove all duplicates in a dictionary:
{
"1": {
"id": 1,
"some_data": "some_data_1",
},
"2": {
"id": 2,
"some_data": "some_data_2",
},
"2": {
"id": 2,
"some_data": "some_data_2",
},
"3": {
"id": 3,
"some_data": "some_data_3",
},
}
So, basically, I'm looking for a generic function that will iterate over dict_keys, look for duplicates and return a clean json. Tried flicking with { each[''] : each for each in te }.values(), but to no avail.

JSON filter "smaller then" condition

I have a JSON which looks like this:
{
"data": [
{
"Name": "Hello",
"Number": "20"
},
{
"Name": "Beautiful",
"Number": "22"
},
{
"Name": "World",
"Number": "25"
},
{
"Name": "!",
"Number": "28"
}
}
and I want to get everything what is smaller than 28, it should look like this:
{
"data": [
{
"Name": "Hello",
"Number": "20"
},
{
"Name": "Beautiful",
"Number": "22"
},
{
"Name": "World",
"Number": "25"
}
}
I looked for a solution but all I have found was to remove an exact value.
I'm doing this with a much larger file this is just an example.
You can do it with a simple for loop
import json
with open('your_path_here.json', 'r') as f:
data = json.load(f)
for elem in data['data']:
if int(elem['Number']) >= 28:
data['data'].remove(elem)
print(data)
>>> {
"data": [
{
"Name": "Hello",
"Number": "20"
},
{
"Name": "Beautiful",
"Number": "22"
},
{
"Name": "World",
"Number": "25"
}
}
An example could use list comprehension:
data = {
"data": [
{
"Name": "Hello",
"Number": "20"
},
{
"Name": "Beautiful",
"Number": "22"
},
{
"Name": "World",
"Number": "25"
},
{
"Name": "!",
"Number": "28"
}
]
}
filter_ = 28
filtered = {
"data": [
item for item in data["data"]
if int(item["Number"]) < filter_
]
}
print(filtered)
Basically, this creates iterates through data["data"], checks if that current item's number is less than the filter (28 in this case), and adds those to the list. You're left with:
{'data': [{'Name': 'Hello', 'Number': '20'}, {'Name': 'Beautiful', 'Number': '22'}, {'Name': 'World', 'Number': '25'}]}
...which should be what you need, but unformatted.
However, for larger JSON files, you might want to look into ijson, which allows you to load json files in a memory-efficient way. Here's an example:
import ijson
import json
filter_ = 28
with open('data.json', 'r') as file:
items = ijson.items(file, 'data.item')
filtered = [item for item in items if int(item["Number"]) < filter_]
with open('filtered.json', 'w') as output:
json.dump(filtered, output, indent=2)
Try this code online

Don't really know how to work with json files

I have a little problem. I don't know much about json and I need help. I have main.py file and the .json file. I, for example, want to output a certain line from .json file in print() in main.py. For Example, the json file has the line "name":"Alex" and the second line "name":"John". I need to make sure that it finds the line "name":" Alex" in the json file and outputs the name Alex in main.py. I hope I have made my question clear
So this is a piece of json file. It's Schedule of university
"group": "КМБО-02-19",
"days": [
{
"day": "ПН",
"pars": [
{
"name": "Введение в ПД",
"type": "зачет",
"number": 3,
"place": "Б-209",
"whiteWeek": 17
}
]
},
{
"day": "ВТ",
"pars": [
{
"name": "Программирование в ЗР",
"number": 2,
"place": "Б-209",
"type": "зачет",
"whiteWeek": 17
},
{
"name": "Физкультура и спорт",
"type": "зачет",
"number": 5,
"whiteWeek": 17
}
]
}
I think that this was already answered here: multiple Json objects in one file extract by python.
Where u can see how to store multiple objects in a file:
[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]},
...
]
And later on load them:
import json
with open('file.json') as json_file:
data = json.load(json_file)
Where data is a python list of dicts (as bruno-desthuilliers pointed out):
data[index]["name"]
An example tailored exactly to OPs question edit:
import json
# read the json
# with open('data.txt') as f:
# json_data = json.load(f)
json_data = {
"group": "КМБО-02-19",
"days": [
{
"day": "ПН",
"pars": [
{
"name": "Введение в ПД",
"type": "зачет",
"number": 3,
"place": "Б-209",
"whiteWeek": 17
}
]
},
{
"day": "ВТ",
"pars": [
{
"name": "Программирование в ЗР",
"number": 2,
"place": "Б-209",
"type": "зачет",
"whiteWeek": 17
},
{
"name": "Физкультура и спорт",
"type": "зачет",
"number": 5,
"whiteWeek": 17
}
]
}]}
# loop over each day and pars inside it
for day in json_data['days']:
for par in day['pars']:
# check if Alex and print
if par['name'] == 'Alex':
print(par['name'])
While I am not a python guy. If I had to do that in c++
I would create own parser for JSON file and create a Token tree with all it's property and then use that as your library.
or
Read the docs
or
invert bit time to get the JSON parser made by other fellows
or
see the same question at stackoverflow

Finding specific values in nested json using Python

I have a file containing a large number of nested json objects. I pasted a snippet of it below. I am trying to use python to query all of the objects in the file to pull out those objects that have at least one custom feeds - url value that begins with "http://commshare" Some objects will not have any custom feeds, and the others will have one or more custom feed each of which might or might not begin with that string I am searching for. Any help would be appreciated! I am very new to Python.
Example JSON:
[{
"empid": "12345",
"values": {
"custom_feeds": {
"custom_feeds": [
{
"name": "Bulletins",
"url": "http://infoXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}
]
},
"gadgetTitle": "InfoSec Updates",
"newWindow": false,
"article_limit_value": 10,
"show_source": true
}
},
{
"empid": "23456",
"values": {
"custom_feeds": {
"custom_feeds": [
{
"name": "1 News",
"url": "http://blogs.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
{
"name": "2 News",
"url": "http://info.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
{
"name": "3 News",
"url": "http://blogs.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
{
"name": "4 News",
"url": "http://commshare.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
]
},
"gadgetTitle": "Org News",
"newWindow": false,
"article_limit_value": 10,
"show_source": true
}
}, {
"empid": "34567",
"values": {
"custom_feeds": {
"custom_feeds": []
},
"gadgetTitle": "Org News",
"newWindow": false,
"article_limit_value": 10,
"show_source": true
}
}]
Assuming your file is named input.json and you want the object for each feed, you could parse the JSON and create a new list where the feeds meet your criteria using list comprehension:
import json
with open('input.json') as input_file:
items = json.loads(input_file.read())
feeds = [{'name': feed['name'], 'url': feed['url'], 'empid': item['empid']}
for item in items
for feed in item['values']['custom_feeds']['custom_feeds']
if feed['url'].startswith('http://commshare')]
assert feeds == [{'name': '4 News', 'url': 'http://commshare.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 'empid': '23456'}]

Output an existing defaultdict into appropriate JSON format for flare dendogram?

I have a defaultdict(list) and I used simplejson.dumps(my_defaultdict) in order to output the defaultdict into a JSON format. I am using the HTML code for dendogram from http://bl.ocks.org/mbostock/4063570 but I am trying to make my defaultdict information into the format of the JSON file the author is using. This JSON file is named: /mbostock/raw/4063550/flare.JSON and it's found in this link: http://bl.ocks.org/mbostock/raw/4063550/flare.json.
So here is my defaultdict data:
my_defaultdict = {5: ['child10'], 45: ['child92', 'child45'], 33:['child38']}
json_data = simplejson.dumps(my_defaultdict)
so my current json_data looks like this:
{
"5": [
"child10"
],
"45": [
"child92",
"child45"
],
"33": [
"child38"
]
}
So in my understanding the numbers would be the corresponding "name":"5" and then my JSON format file would also have the children as "children". As what it is right now, my JSON format output doesn't run in the HTML code of the dendogram.
The expected outcome would be like this:
{
"name": "flare",
"children": [
{
"name": "5",
"children": [
{
"name": "child10", "size": 5000},
]
{
"name": "45",
"children": [
{"name": "child92", "size": 3501},
{"name": "child45", "size": 3567},
]
},
{
"name": "33",
"children": [
{"name": "child38", "size": 8044}
]
}
}
Edit:
The answer of #martineau works, but it's not exactly what I want. I start with a defaultdict(list) and the desired output, as above should have the "children" as a list of dicts whereas with martineau kind answer, the "children" it's just a list. If anybody can add something to that to make it work it would be great. Don't worry about the "size" variable, this can be ignored for now.
You need to make a new dictionary from your defaultdict. The children in your example code is just a list of strings, so I don't know where the "size" of each one comes from so just changed it into a list of dicts (which don't have a an entry for a "size" key).
from collections import defaultdict
#import simplejson as json
import json # using stdlib module instead
my_defaultdict = defaultdict(list, { 5: ['child10'],
45: ['child92', 'child45'],
33: ['child38']})
my_dict = {'name': 'flare',
'children': [{'name': k,
'children': [{'name': child} for child in v]}
for k, v in my_defaultdict.items()]}
json_data = json.dumps(my_dict, indent=2)
print(json_data)
Output:
{
"name": "flare",
"children": [
{
"name": 33,
"children": [
{
"name": "child38"
}
]
},
{
"name": 5,
"children": [
{
"name": "child10"
}
]
},
{
"name": 45,
"children": [
{
"name": "child92"
},
{
"name": "child45"
}
]
}
]
}
I solved by using this: How to convert defaultdict to dict?
For future people that may search for it. I achieved by transforming the defaultdict into a commom dictionary just calling:
b = defaultdict(dict)
a = dict(b)
Then the JSON could recognize this structure.
You need to build the dictionary so that it contains the desired 'children' fields. json.dumps does not output data in any predefined schema. Rather, the object passed to json.dumps must already adhere to any structure desired.
Try something like this:
my_defaultdict = {"name": "5",
"children":[ {"name": "child10", "children":[]}]}
print json.dumps(my_defaultdict)

Categories