Iterate through Json Data to get URL's - python

I'm trying to parse the following JSON data and get URL values using python Function.From the below JSON example I would like to get the URL from under the Jobs tag and store it in 2 arrays. 1 array will store URL that has color tag and other will store URL that do not have color tag. Once the 2 arrays are ready I would like to return these two arrays. I'm very new to python and need some help with this.
{
"_class":"com.cloudbees.hudson.plugins.folder.Folder",
"actions":[ ],
"description":"This is a TSG level folder.",
"displayName":"CONSOLIDATED",
"displayNameOrNull":null,
"fullDisplayName":"CONSOLIDATED",
"fullName":"CONSOLIDATED",
"name":"CONSOLIDATED",
"url":"https://cyggm.com/job/CONSOLIDATED/",
"healthReport":[
{
"description":"Projects enabled for building: 187 of 549",
"iconClassName":"icon-health-20to39",
"iconUrl":"health-20to39.png",
"score":34
}
],
"jobs":[
{
"_class":"com.cloudbees.hudson.plugins.folder.Folder",
"name":"yyfyiff",
"url":"https://tdyt.com/job/
CONSOLIDATED/job/yfiyf/"
},
{
"_class":"com.cloudbees.hudson.plugins.folder.Folder",
"name":"Ops-Prod-Jobs",
"url":"https://ygduey.com/job/
CONSOLIDATED/job/Ops-Prod-Jobs/"
},
{
"_class":"com.cloudbees.hudson.plugins.folder.Folder",
"name":"TEST-DATA-MGMT",
"url":"https://futfu.com/job/
CONSOLIDATED/job/TEST-DATA-MGMT/"
},
{
"_class":"com.cloudbees.hudson.plugins.folder.Folder",
"name":"TESTING-OPS",
"url":"https://gfutfu.com/job/
CONSOLIDATED/job/TESTING-OPS/"
},
{
"_class":"com.cloudbees.hudson.plugins.folder.Folder",
"name":"Performance_Engineering Team",
"url":"https://ytdyt.com/job/
CONSOLIDATED/job/Performance_Engineering%20Team/"
},
{
"_class":"hudson.model.FreeStyleProject",
"name":"test",
"url":"https://tduta.com/job/
CONSOLIDATED/job/test/",
"color":"notbuilt"
}
],
"primaryView":{
"_class":"hudson.model.AllView",
"name":"all",
"url":"https://fuyfi.com/job/
CONSOLIDATED/"
},
"views":[
{
"_class":"hudson.model.AllView",
"name":"all",
"url":"https://utfufu.com/job/
CONSOLIDATED/"
}
]
}
The following is the python code I used to get the jobs data but then I'm not able to iterate through the jobs data to get all URL. I'm only getting 1 at a time if I change the code
req = requests.get(url, verify=False, auth=(username, password))
j = json.loads(req.text)
jobs = j['jobs']
print(jobs[1]['url'])
I'm getting 2nd URL here but no way to check if this entry has color tag

First of all, your JSON is improperly formatted. You will have to use a JSON formatter to check its validity and fix any issues.
That said, you'll have to read in the file as a string with
In [87]: with open('data.json', 'r') as f:
...: data = f.read()
...:
Then using the json library, load the data into a dict
In [88]: d = json.loads(data)
You can then use 2 list comprehensions to get the data you want
In [90]: no_color = [record['url'] for record in d['jobs'] if 'color' not in record]
In [91]: color = [record['url'] for record in d['jobs'] if 'color' in record]
In [93]: no_color
Out[93]:
['https://tdyt.com/job/CONSOLIDATED/job/yfiyf/',
'https://ygduey.com/job/CONSOLIDATED/job/Ops-Prod-Jobs/',
'https://futfu.com/job/CONSOLIDATED/job/TEST-DATA-MGMT/',
'https://gfutfu.com/job/CONSOLIDATED/job/TESTING-OPS/',
'https://ytdyt.com/job/CONSOLIDATED/job/Performance_Engineering%20Team/']
In [94]: color
Out[94]: ['https://tduta.com/job/CONSOLIDATED/job/test/']

Related

Iterating over each json in the array

I have a query for SQL from which I need to prepare for each line json which I will use as a payload json for HTTP request. This json I need to rebuild a little bit, because for some keys I need to add another level of json. This is not a problem.
The problem is that I don't know how to iterate over each such object/row in the json. I need to do a for each and output as payload to separate HTTP requests. At this point I have:
result = []
for row in rows:
d = dict()
d['id'] = row[40]
d['email'] = row[41]
d['additional_level'] = dict()
d['additional_level']['key'] = row[42]
result.append(d)
payload = json.dumps(result, indent=3)
At this point, print(payload) looks like this:
[
{
"id": 01,
"email": "someemail01#gmail.com"
"additional_level": {
"key": 10
}
},
{
"id": 02,
"email": "someemail02#gmail.com"
"additional_level": {
"key": 10
}
}
]
Now I want to make a separate payload json from each id and use in request http. How can I refer to them and how to make for loop to separately refer to each "object" separately?
You can dump each object separately:
for row in rows:
d = dict()
d['id'] = row[40]
d['email'] = row[41]
d['additional_level'] = dict()
d['additional_level']['key'] = row[42]
result.append(json.dumpsd, indent = 3)
And then iterate over them and use them individually:
for payload in result:
# Use payload for a request

How to save json data as it is without data type conversion in dynamo db using python

I want to store key-value JSON data in aws DynamoDB where key is a date string in YYYY-mm-dd format and value is entries which is a python dictionary. When I used boto3 client to save data there, it saved it as a data type object, which I don't want. My purpose is simple: Store JSON data against a key which is a date, so that later I will query the data by giving that date. I am struggling with this issue because I did not find any relevant link which says how to store JSON data and retrieve it without any conversion.
I need help to solve it in Python.
What I am doing now:
item = {
"entries": [
{
"path": [
{
"name": "test1",
"count": 1
},
{
"name": "test2",
"count": 2
}
],
"repo": "test3"
}
],
"date": "2022-10-11"
}
dynamodb_client = boto3.resource('dynamodb')
table = self.dynamodb_client.Table(table_name)
response = table.put_item(Item = item)
What actually saved:
[{"M":{"path":{"L":[{"M":{"name":{"S":"test1"},"count":{"N":"1"}}},{"M":{"name":{"S":"test2"},"count":{"N":"2"}}}]},"repo":{"S":"test3"}}}]
But I want to save exactly the same JSON data as it is, without any conversion at all.
When I retrieve it programmatically, you see the difference of single quote, count value change.
response = table.get_item(
Key={
"date": "2022-10-12"
}
)
Output
{'Item': {'entries': [{'path': [{'name': 'test1', 'count': Decimal('1')}, {'name': 'test2', 'count': Decimal('2')}], 'repo': 'test3'}], 'date': '2022-10-12} }
Sample picture:
Why not store it as a single attribute of type string? Then you’ll get out exactly what you put in, byte for byte.
When you store this in DynamoDB you get exactly what you want/have provided. Key is your date and you have a list of entries.
If you need it to store in a different format you need to provide the JSON which correlates with what you need. It's important to note that DynamoDB is a key-value store not a document store. You should also look up the differences in these.
I figured out how to solve this issue. I have two column name date and entries in my dynamo db (also visible in screenshot in ques).
I convert entries values from list to string then saved it in db. At the time of retrival, I do the same, create proper json response and return it.
I am also sharing sample code below so that anybody else dealing with the same situation can have atleast one option.
# While storing:
entries_string = json.dumps([
{
"path": [
{
"name": "test1",
"count": 1
},
{
"name": "test2",
"count": 2
}
],
"repo": "test3"
}
])
item = {
"entries": entries_string,
"date": "2022-10-12"
}
dynamodb_client = boto3.resource('dynamodb')
table = dynamodb_client.Table(<TABLE-NAME>)
-------------------------
# While fetching:
response = table.get_item(
Key={
"date": "2022-10-12"
}
)['Item']
entries_string=response['entries']
entries_dic = json.loads(entries_string)
response['entries'] = entries_dic
print(json.dumps(response))

Flatten Nested JSON in Python

I'm new to Python and I'm quite stuck (I've gone through multiple other stackoverflows and other sites and still can't get this to work).
I've the below json coming out of an API connection
{
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":null,
"stats":{
"max":null,
"min":null,
"count":14,
"count_negative":null,
"count_positive":null,
"sum":null,
"current":null,
"ratio":null,
"numerator":null,
"denominator":null,
"target":null
}
}
],
"views":null
}
]
}
]
}
and what I'm mainly looking to get out of it is (or at least something as close as)
MediaType
QueueId
NOffered
Chat
67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d
14
Is something like that possible? I've tried multiple things and I either get the whole of this out in one line or just get different errors.
The error you got indicates you missed that some of your values are actually a dictionary within an array.
Assuming you want to flatten your json file to retrieve the following keys: mediaType, queueId, count.
These can be retrieved by the following sample code:
import json
with open(path_to_json_file, 'r') as f:
json_dict = json.load(f)
for result in json_dict.get("results"):
media_type = result.get("group").get("mediaType")
queue_id = result.get("group").get("queueId")
n_offered = result.get("data")[0].get("metrics")[0].get("count")
If your data and metrics keys will have multiple indices you will have to use a for loop to retrieve every count value accordingly.
Assuming that the format of the API response is always the same, have you considered hardcoding the extraction of the data you want?
This should work: With response defined as the API output:
response = {
"results":[
{
"group":{
"mediaType":"chat",
"queueId":"67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d"
},
"data":[
{
"interval":"2021-01-14T13:12:19.000Z/2022-01-14T13:12:19.000Z",
"metrics":[
{
"metric":"nOffered",
"qualifier":'null',
"stats":{
"max":'null',
"min":'null',
"count":14,
"count_negative":'null',
"count_positive":'null',
"sum":'null',
"current":'null',
"ratio":'null',
"numerator":'null',
"denominator":'null',
"target":'null'
}
}
],
"views":'null'
}
]
}
]
}
You can extract the results as follows:
results = response["results"][0]
{
"mediaType": results["group"]["mediaType"],
"queueId": results["group"]["queueId"],
"nOffered": results["data"][0]["metrics"][0]["stats"]["count"]
}
which gives
{
'mediaType': 'chat',
'queueId': '67d9fb5e-26b2-4db5-b062-bbcfa8d2ca0d',
'nOffered': 14
}

python Issue with getting correct info from the json

I have issue then i try get from correct information.
For example i have very big json output after request i made in post (i cant use get).
"offers": [
{
"rank": 1,
"provider": {
"id": 6653,
"isLocalProvider": false,
"logoUrl": "https://img.vxcdn.com/i/partner-energy/c_6653.png?v=878adaf9ed",
"userRatings": {
"additonalCustomerRatings": {
"price": {
"percent": 73.80
},
"service": {
"percent": 67.50
},
"switching": {
"percent": 76.37
},
"caption": {
"text": "Zusätzliche Kundenbewertungen"
}
},
I cant show it all because its very big.
Like you see "rank" 1 in this request exist 20 ranks with information like content , totalCost and i need pick them all. Like 6 rank content and totalCost, 8 rank content and totalCost.
So first off all in python i use code for getting what json data.
import requests
import json
url = "https://www.verivox.de/api/energy/offers/electricity/postCode/10555/custom?"
payload="{\"profile\":\"H0\",\"prepayment\":true,\"signupOnly\":true,\"includePackageTariffs\":true,\"includeTariffsWithDeposit\":true,\"includeNonCompliantTariffs\":true,\"bonusIncluded\":\"non-compliant\",\"maxResultsPerPage\":20,\"onlyProductsWithGoodCustomerRating\":false,\"benchmarkTariffId\":741122,\"benchmarkPermanentTariffId\":38,\"paolaLocationId\":\"71085\",\"includeEcoTariffs\":{\"includesNonEcoTariffs\":true},\"maxContractDuration\":240,\"maxContractProlongation\":240,\"usage\":{\"annualTotal\":3500,\"offPeakUsage\":0},\"priceGuarantee\":{\"minDurationInMonths\":0},\"maxTariffsPerProvider\":999,\"cancellationPeriod\":null,\"previewDisplayTime\":null,\"onlyRegionalTariffs\":false,\"sorting\":{\"criterion\":\"TotalCosts\",\"direction\":\"Ascending\"},\"includeSpecialBonusesInCalculation\":\"None\",\"totalCostViewMode\":1,\"ecoProductType\":0}"
headers = {
'Content-Type': 'application/json',
'Cookie': '__cfduid=d97a159bb287de284487ebdfa0fd097b41606303469; ASP.NET_SessionId=jfg3y20s31hclqywloocjamz; 0e3a873fd211409ead79e21fffd2d021=product=Power&ReturnToCalcLink=/power/&CustomErrorsEnabled=False&IsSignupWhiteLabelled=False; __RequestVerificationToken=vrxksNqu8CiEk9yV-_QHiinfCqmzyATcGg18dAqYXqR0L8HZNlvoHZSZienIAVQ60cB40aqfQOXFL9bsvJu7cFOcS2s1'
}
response = requests.request("POST", url, headers=headers, data=payload)
jsondata = response.json()
# print(response.text)
For it working fine, but then i try pick some data what i needed like i say before im getting
for Rankdata in str(jsondata['rank']):
KeyError: 'rank'
my code for this error.
dataRank = []
for Rankdata in str(jsondata['rank']):
dataRank.append({
'tariff':Rankdata['content'],
'cost': Rankdata['totalCost'],
'sumOfOneTimeBonuses': Rankdata['content'],
'savings': Rankdata['content']
})
Then i try do another way. Just get one or some data, but not working too.
data = response.json()
#print(data)
test = float((data['rank']['totalCost']['content']))
I know my code not perfect, but i first time deal with json what are so big and are so difficult. I will be very grateful if show my in my case example how i can pick rank 1 - rank 20 data and print it.
Thank you for your help.
If you look closely at the highest level in the json, you can see that the value for key offers is a list of dicts. You can therefore loop through it like this:
for offer in jsondata['offers']:
print(offer.get('rank'))
print(offer.get('provider').get('id'))
And the same goes for other keys in the offers.

How to extract specific data from JSON object using python?

I'm trying to scrape a website and get items list from it using python. I parsed the html using BeaufitulSoup and made a JSON file using json.loads(data). The JSON object looks like this:
{ ".1768j8gv7e8__0":{
"context":{
//some info
},
"pathname":"abc",
"showPhoneLoginDialog":false,
"showLoginDialog":false,
"showForgotPasswordDialog":false,
"isMobileMenuExpanded":false,
"showFbLoginEmailDialog":false,
"showRequestProductDialog":false,
"isContinueWithSite":true,
"hideCoreHeader":false,
"hideVerticalMenu":false,
"sequenceSeed":"web-157215950176521",
"theme":"default",
"offerCount":null
},
".1768j8gv7e8.6.2.0.0__6":{
"categories":[
],
"products":{
"count":12,
"items":[
{
//item info
},
{
//item info
},
{
//item info
}
],
"pageSize":50,
"nextSkip":100,
"hasMore":false
},
"featuredProductsForCategory":{
},
"currentCategory":null,
"currentManufacturer":null,
"type":"Search",
"showProductDetail":false,
"updating":false,
"notFound":false
}
}
I need the items list from product section. How can I extract that?
Just do:
products = jsonObject[list(jsonObject.keys())[1]]["products"]["items"]
import json packagee and map every entry to a list of items if it has any:
This solution is more universal, it will check all items in your json and find all the items without hardcoding the index of an element
import json
data = '{"p1": { "pathname":"abc" }, "p2": { "pathname":"abcd", "products": { "items" : [1,2,3]} }}'
# use json package to convert json string to dictionary
jsonData = json.loads(data)
type(jsonData) # dictionary
# use "list comprehension" to iterate over all the items in json file
# itemData['products']["items"] - select items from data
# if "products" in itemData.keys() - check if given item has products
[itemData['products']["items"] for itemId, itemData in jsonData.items() if "products" in itemData.keys()]
Edit: added comments to code
I'll just call the URL of the JSON file you got from BeautifulSoup "response" and then put in a sample key in the items array, like itemId:
import json
json_obj = json.load(response)
array = []
for i in json_obj['items']:
array[i] = i['itemId']
print(array)

Categories