Webscrapping a site which contains JSON data

Webscrapping a site which contains JSON data - python

I am working on a site to get the job data from it. The site response does not have full information when I used beautifulsoup. So tried to achieve it using Pandas. Still no luck. Can someone help me here?
import pandas as pd
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'}
url = f'https://hirist.com'
# r = requests.get(url, headers, verify=False)
payload = {"pageNo": "1",
"query": "software engineer",
"loc": '17',
"minexp": '0',
"maxexp": '0',
"range": '0',
"boost": '0',
"searchRange": '4',
"searchOp": 'AND',
"jobType": "1"
}
jsonData = requests.post(url, headers=headers,
json=payload, verify=False).json()
df = pd.DataFrame(jsonData)
print(df)

Try the following approach:
import pandas as pd
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
'Referer' : 'https://www.hirist.com/',
'Authorization' : 'Bearer undefined',
'Origin' : 'https://www.hirist.com',
}
payload = {
"pageNo" : "1",
"query" : "software engineer",
"loc" : '17',
"minexp" : '0',
"maxexp" : '0',
"range" : '0',
"boost" : '0',
"searchRange" : '4',
"searchOp" : 'AND',
"jobType" : "1"
}
jsonData = requests.get("https://jobseeker-api.hirist.com/jobfeed/-1/search", headers=headers, params=payload, verify=False).json()
print(jsonData)
Giving you output starting:
{'count': 58, 'jobs': [{'id': 982486, 'title': 'Software Engineer - ASP/C# (1-4 yrs)', 'introText': '<p><p><b>Position : Software Engineer</b><br/><br/><b>Experience : 1- 4 Years</b><br/><br/><b>Job type : Permanent</b><br/><br/><b>Skills Required :</b><br/><br/>- Extensive knowledge in <b>Asp.net, C# and SQL.</b><br/><br/>- Ability to troubleshoot and solve complex technical problems.<br/><br/>- Great interpersonal and communication skills<br/><br/>- Must have good analytical and problem-solving skills.<br/><br/>- Good Time Management and Planning skills.<br/><br/><b>Roles & Responsibility :</b><br/><br/>- Producing clean, efficient code based on specifications.<br/><br/>- Fixing and improving existing software<br/><br/>- Integrate software components and third-party programs<br/><br/>- Verify and deploy programs and systems<br/><br/>- Troubleshoot, debug and upgrade existing software<br/><br/>- Gather and evaluate user feedback<br/><br/>- Recommend and execute improvements<br/><br/>- Create technical documentation for reference and reporting<br/><br/>- Prefer Immediate Joiners</p></p>', 'jobdesignation': 'Software Developer', 'min': 1, 'max': 4, 'createdBy': 93163, 'creatorDomainName': 'sapwood.net', 'categoryId': 1, 'jobDetailUrl': 'https://www.hirist.com/j/software-engineer-aspc-1-4-yrs-982486.html?ref=ambitionbox', 'femaleCandidate': 0, 'differentlyAbled': 0, 'exDefence': 0, 'workFromHome': 0, 'femaleBackWorkForce': 0, 'confidential': 0, 'premium': 0, 'star': 0, 'applyStatus': 1, 'applyCount': 42, 'createdTimeMs': 1643024958613, 'createdTime': 1642982400000, 'createdTimeNoMillis': None, 'tagIdString': '206 387 91 7', 'tags': [{'id': 206, 'name': 'C#'}, {'id': 387, 'name': 'SQL Server'}, {'id': 91, 'name': 'ASP'}, {'id': 7, 'name': '.Net'}], 'locations': [{'id': 70, 'name': 'Cochin/Kochi'}, {'id': 17, 'name': 'Kerala'}], 'showcase': None, 'diversity': None, 'companyStatus': 1, 'createdByAlias': 'Cochin/Kochi/Kerala', 'applyUrl': '', 'videoUrl': '', 'assessmentFlags': 0, 'mediaResume': 0, 'industry': '', 'functionalArea': 18, 'minSal': 1, 'maxSal': 6, 'hits': 373, 'otherLocation': '', 'minBatch': None, 'maxBatch': None, 'brandJobFlag': 0, 'companyDomain': None, 'lableId': None, 'companyData': {'companyId': 0, 'companyName': 'Sapwood Ventures', 'companyNameNotAnalyzed': 'Sapwood Ventures', 'companyStatus': 1, 'logoPath': None}, 'recruiter': {'recruiterId': 93163, 'recruiterName': 'Hemaa R', 'designation': 'Senior Manager - Team & Key Accounts', 'profilePicUrl': '', 'logoPath': '', 'recruiterActions': 34}, 'jobStatusInfo': None, 'location': [{'id': 70, 'name': 'Cochin/Kochi'}, {'id': 17, 'name': 'Kerala'}], 'saved': 0, 'applied': 0}, {'id': 997211, 'title': 'Tetherfi Technologies - Software Engineer - Java/J2EE (3-10 yrs)', 'introText': "<p>The Right Individual :<br/><br/>The ideal candidate will have a passion for technology and software building. Attention to detail and an analytical mind are essential qualities in this role. You will have to work on both technical and design aspects of software projects. A proactive approach to problem-solving as well as a detailed understanding of coding is essential. If finding issues and fixing them with beautiful, meticulous code are among the talents that make you tick, we'd like to hear from you.<br/><br/>Required Functional Skill :<br/><br/>1. 4+ years of experience in java and familiarity in Spring boot, JPA.<br/><br/>2. Extensive Hands-on experience in JAVA Java SE.<br/><br/>3. Well versed with Object Oriented Programming Concepts.<br/><br/>4. Prior experience on JAVA Spring / Spring boot framework.<br/><br/>5. Familiarity with java application servers JBoss, WebLogic.<br/><br/>6. Have in-depth knowledge and self-driven interest to work with JAVA Servlets.<br/><br/>7. Experience in deploying solutions for cross integrations among OEMs in CC or UC environment is preferred.<br/><br/>Role and Responsibilities :<br/><br/>1. Candidate will be part of our Global Delivery center team liaising with Product Strategist and Product Owner to enhance Tetherfi's Products based on Web chat, CC & UC Product Streams.<br/><br/>2. Will develop, enhance and support Tetherfi's existing projects and future projects.<br/><br/>Required Professional & Interpersonal Qualities :<br/><br/>- Bachelor's Degree in appropriate field of study or equivalent work experience.<br/><br/>- Experienced with all ancillary technologies necessary for Internet applications: HTTP, TCP/IP, POP/SMTP, etc.</p>", 'jobdesignation': 'Software Engineer', 'min': 3, 'max': 10, 'createdBy': 72249, 'creatorDomainName': 'tetherfi.com', 'categoryId': 1, 'jobDetailUrl': 'https://www.hirist.com/j/tetherfi-technologies-software-engineer-javaj2ee-997211.html?ref=ambitionbox', 'femaleCandidate': 0, 'differentlyAbled': 0, 'exDefence': 0, 'workFromHome': 0, 'femaleBackWorkForce': 0, 'confidential': 0, 'premium': 0, 'star': 0, 'applyStatus': 1, 'applyCount': 4, 'createdTimeMs': 1645156975704, 'createdTime': 1645142400000, 'createdTimeNoMillis': None, 'tagIdString': '5 2850 25 279 87 237 11100 19', 'tags': [{'id': 5, 'name': 'Java'}, {'id': 2850, 'name': 'Spring Boot'}, {'id': 25, 'name': 'J2EE'}, {'id': 279, 'name': 'Servlets'}, {'id': 87, 'name': 'JBOSS'}, {'id': 237, 'name': 'WebLogic'}, {'id': 11100, 'name': 'Application Server'}, {'id': 19, 'name': 'OOPS'}], 'locations': [{'id': 88, 'name': 'Anywhere in India/Multiple Locations'}, {'id': 3, 'name': 'Bangalore'}, {'id': 6, 'name': 'Chennai'}, {'id': 7, 'name': 'Pune'}, {'id': 17, 'name': 'Kerala'}, {'id': 31, 'name': 'Karnataka'}], 'showcase': None, 'diversity': None, 'companyStatus': 1, 'createdByAlias': 'Anywhere in India/Multiple Locations/Bangalore/Chennai/Pune/Kerala/Karnataka', 'applyUrl': '', 'videoUrl': '', 'assessmentFlags': 0, 'mediaResume': 0, 'industry': '', 'functionalArea': 16, 'minSal': 5, 'maxSal': 14, 'hits': 15, 'otherLocation': '', 'minBatch': None, 'maxBatch': None, 'brandJobFlag': 0, 'companyDomain': None, 'lableId': None, 'companyData': {'companyId': 0, 'companyName': 'Tetherfi Technologies Pvt Ltd', 'companyNameNotAnalyzed': 'Tetherfi Technologies Pvt Ltd', 'companyStatus': 1, 'logoPath': None}, 'recruiter': {'recruiterId': 72249, 'recruiterName': 'Laxman Shenoy', 'designation': 'Deputy Manager HR', 'profilePicUrl': '', 'logoPath': '', 'recruiterActions': 2}, 'jobStatusInfo': None, 'location': [{'id': 88, 'name': 'Anywhere in India/Multiple Locations'}, {'id': 3, 'name': 'Bangalore'}, {'id': 6, 'name': 'Chennai'}, {'id': 7, 'name': 'Pune'}, {'id': 17, 'name': 'Kerala'}, {'id': 31, 'name': 'Karnataka'}], 'saved': 0, 'applied': 0}, {'id': 1003219, 'title': 'Senior Software Engineer - Python/Django (3-8 yrs)', 'introText': "<p><p><p><b>Position / Designation :</b> Software Engineer /Senior Software Engineer<br/><br/><b>Location</b> <b>: </b>Chennai<br/><br/><b>Experience</b> <b>: </b>0-3 years for SE, 3+ years for SSE, <br/><br/><b>CTC : <br/></b><br/>SE - 4 to 6 L.P.A<br/><br/>SSE- 7-11 L.P.A<br/><br/>The ideal candidate is a self-motivated, multi-tasker, and demonstrated team player. You will be a lead developer responsible for the development of new software products and enhancements to existing products. You should excel in working with large-scale applications and frameworks and have outstanding communication and leadership skills. <br/><br/><b>Responsibilities : <br/></b><br/>- Writing clean, high-quality, high-performance, maintainable code<br/><br/>- Develop and support software including applications, database integration, interfaces, and new functionality enhancements.<br/><br/>- Coordinate cross-functionally to ensure the project meets business objectives and compliance standards.<br/><br/>- Support test and deployment of new products and features.<br/><br/>- Participate in code reviews.<br/><br/><b>Qualifications : <br/></b><br/>- Bachelor's degree in Computer Science (or related field)<br/><br/>- 3+ years of work experience in Python, Django.<br/><br/>- Expertise in Object-Oriented Design, Database Design, and XML Schema<br/><br/>- Experience with Agile or Scrum software development methodologies<br/><br/>- Ability to multi-task, organize and prioritize work.</p></p></p>", 'jobdesignation': None, 'min': 3, 'max': 8, 'createdBy': 98899, 'creatorDomainName': 'gmail.com', 'categoryId': 1, 'jobDetailUrl': 'https://www.hirist.com/j/senior-software-engineer-pythondjango-3-8-yrs-1003219.html?ref=ambitionbox', 'femaleCandidate': 1, 'differentlyAbled': 0, 'exDefence': 0, 'workFromHome': 1, 'femaleBackWorkForce': 0, 'confidential': 0, 'premium': 0, 'star': 0, 'applyStatus': 1, 'applyCount': 98, 'createdTimeMs': 1646059583336, 'createdTime': 1646006400000, 'createdTimeNoMillis': None, 'tagIdString': '9 592 50 280 97 30357 3429 4422 11 2339 2807', 'tags': [{'id': 9, 'name': 'Python'}, {'id': 592, 'name': 'Agile'}, {'id': 50, 'name': 'Django'}, {'id': 280, 'name': 'Scrum'}, {'id': 97, 'name': 'XML'}, {'id': 30357, 'name': 'Object Modeling'}, {'id': 3429, 'name': 'Database Schema'}, {'id': 4422, 'name': 'Database Architecture'}, {'id': 11, 'name': 'MySQL'}, {'id': 2339, 'name': 'Python Architect'}, {'id': 2807, 'name': 'PySpark'}], 'locations': [{'id': 3, 'name': 'Bangalore'}, {'id': 6, 'name': 'Chennai'}, {'id': 84, 'name': 'Coimbatore'}, {'id': 17, 'name': 'Kerala'}], 'showcase': None, 'diversity': None, 'companyStatus': 2, 'createdByAlias': 'Bangalore/Chennai/Coimbatore/Kerala', 'applyUrl': '', 'videoUrl': '', 'assessmentFlags': 0, 'mediaResume': 0, 'industry': '0', 'functionalArea': 16, 'minSal': 16, 'maxSal': 31, 'hits': 538, 'otherLocation': '', 'minBatch': None, 'maxBatch': None, 'brandJobFlag': 0, 'companyDomain': None, 'lableId': None, 'companyData': {'companyId': 0, 'companyName': 'AR Consultant', 'companyNameNotAnalyzed': 'AR Consultant', 'companyStatus': 2, 'logoPath': None}, 'recruiter': {'recruiterId': 98899, 'recruiterName': 'Afzal', 'designation': 'Recruiter', 'profilePicUrl': 'https://edgar.hirist.com/media/recruiterpics/2022/01/25/2022-01-25-19-12-23-98899.jpg', 'logoPath': '', 'recruiterActions': 11}, 'jobStatusInfo': None, 'location': [{'id': 3, 'name': 'Bangalore'}, {'id': 6, 'name': 'Chennai'}, {'id': 84, 'name': 'Coimbatore'}, {'id': 17, 'name': 'Kerala'}], 'saved': 0, 'applied': 0}, {'id': 967513, 'title': 'Software Test Engineer - Java/Selenium (0-2 yrs)', 'introText': "<p>Immediate joiners required for a reputed client <br/><br/>Only Male Kerala candidates <br/><br/>Position : Software Test Engineer<br/><br/>Experience : 0-2 years<br/><br/>Job

Related

Parse urlib request

I have the following huge output from the code :urllib.request.urlopen("https://api...").read()
This looks like a JSON object but it is a bytes object. I am looking into on data into this whole. I am not sure how parse all these nested dictionary. Any help would be appreciated. I want to extract the value 112242287903649 located around the end.
b'{"address":"0x4264422fa4c1e60c2ee10d19549c0775fe544d7c","ETH":{"balance":39234.92760140797,"price":{"rate":406.0918669863694,"diff":3.33,"diff7d":7.19,"ts":1603860182,"marketCapUsd":45964513524.05101,"availableSupply":113187476.1865,"volume24h":14765115042.093159,"diff30d":14.028844201369225}},"countTxs":7,"tokens":[{"tokenInfo":{"address":"0x0d4b4da5fb1a7d55e85f8e22f728701ceb6e44c9","name":"DigiMax","decimals":"18","symbol":"DGMT","totalSupply":"1000000000000000000000000000","owner":"0x","lastUpdated":1603831313,"issuancesCount":0,"holdersCount":1042,"description":"DigiMax (DGMT) is a de-centralized Currency on ETHEREUM NETWORK. It is trustless, non-custodial, Layer-2 scaling solution for transferring value on Ethereum. It is Open Source. Community oriented and powered to maximize the power of the blockchain technology","website":"https://digimaxtoken.io/","twitter":"DigiMax_DGMT","image":"/images/DGMT0d4b4da5.png","telegram":"https://t.me/DigiMaxToken","reddit":"DigiMax_DGMT","coingecko":"digimax","price":{"rate":1.218303675e-5,"diff":3.55,"diff7d":-87.33,"ts":1603860187,"marketCapUsd":0,"availableSupply":0,"volume24h":0.36549128,"diff30d":-99.95948266499424,"currency":"USD"}},"balance":3.9e+19,"totalIn":0,"totalOut":0},{"tokenInfo":{"address":"0x28cb7e841ee97947a86b06fa4090c8451f64c0be","name":"YF Link","decimals":"18","symbol":"YFL","totalSupply":"52000000000000000000000","owner":"0x","lastUpdated":1603851830,"issuancesCount":0,"holdersCount":5164,"image":"/images/YFL28cb7e84.png","website":"https://yflink.io/","telegram":"https://t.me/YFLinkGroup","twitter":"YFLinkio","coingecko":"yflink","price":{"rate":411.62315709142763,"diff":2.44,"diff7d":22.67,"ts":1603860243,"marketCapUsd":20628385.985420085,"availableSupply":50114.73633112,"volume24h":673808.77973096,"diff30d":-9.745291974110742,"currency":"USD"},"publicTags":["Yield Farming","Yearn","Governance"]},"balance":69000000000000,"totalIn":0,"totalOut":0},{"tokenInfo":{"address":"0x618e75ac90b12c6049ba3b27f5d5f8651b0037f6","name":"QASH","decimals":"6","symbol":"QASH","totalSupply":"1000000000000000","owner":"0x9fa8a9cd0bd7cbfc503513bc94cd3b3a9ca90e35","lastUpdated":1603818056,"issuancesCount":0,"holdersCount":13087,"website":"https://liquid.plus/","facebook":"LiquidGlobal","telegram":"https://t.me/QUOINENews","twitter":"Liquid_Global","image":"/images/QASH618e75ac.jpeg","reddit":"liquid","coingecko":"qash","ethTransfersCount":2,"price":{"rate":0.03783789848158,"diff":2.83,"diff7d":0.05,"ts":1603860243,"marketCapUsd":13243264.468553,"availableSupply":350000000,"volume24h":170565.95092274,"diff30d":-5.421371004476654,"currency":"USD"},"publicTags":["Exchange"]},"balance":112242287903649,"totalIn":0,"totalOut":0},{"tokenInfo":{"address":"0x9f7229af0c4b9740e207ea283b9094983f78ba04","decimals":"18","name":"Tadpole","owner":"0x","symbol":"TAD","totalSupply":"1000000000000000000000000","lastUpdated":1603859098,"issuancesCount":0,"holdersCount":597,"price":false},"balance":100000000000000,"totalIn":0,"totalOut":0}]}'

The in built json module is perfectly capable of parsing byte strings-
import json
response = urllib.request.urlopen("https://api/endpoint").read()
jsondat = json.loads(response)
Now you can use jsondat however you'd like and extract whichever nested property you desire.
Note that you can also use the requests module, though you absolutely don't have to in this case, to achieve this a bit more simply-
import requests
jsondat = requests.get("https://api/endpoint").json()
Doing json.loads on your given byte string yields-
{'address': '0x4264422fa4c1e60c2ee10d19549c0775fe544d7c',
'ETH': {'balance': 39234.92760140797,
'price': {'rate': 406.0918669863694,
'diff': 3.33,
'diff7d': 7.19,
'ts': 1603860182,
'marketCapUsd': 45964513524.05101,
'availableSupply': 113187476.1865,
'volume24h': 14765115042.093159,
'diff30d': 14.028844201369225}},
'countTxs': 7,
'tokens': [{'tokenInfo': {'address': '0x0d4b4da5fb1a7d55e85f8e22f728701ceb6e44c9',
'name': 'DigiMax',
'decimals': '18',
'symbol': 'DGMT',
'totalSupply': '1000000000000000000000000000',
'owner': '0x',
'lastUpdated': 1603831313,
'issuancesCount': 0,
'holdersCount': 1042,
'description': 'DigiMax (DGMT) is a de-centralized Currency on ETHEREUM NETWORK. It is trustless, non-custodial, Layer-2 scaling solution for transferring value on Ethereum. It is Open Source. Community oriented and powered to maximize the power of the blockchain technology',
'website': 'https://digimaxtoken.io/',
'twitter': 'DigiMax_DGMT',
'image': '/images/DGMT0d4b4da5.png',
'telegram': 'https://t.me/DigiMaxToken',
'reddit': 'DigiMax_DGMT',
'coingecko': 'digimax',
'price': {'rate': 1.218303675e-05,
'diff': 3.55,
'diff7d': -87.33,
'ts': 1603860187,
'marketCapUsd': 0,
'availableSupply': 0,
'volume24h': 0.36549128,
'diff30d': -99.95948266499424,
'currency': 'USD'}},
'balance': 3.9e+19,
'totalIn': 0,
'totalOut': 0},
{'tokenInfo': {'address': '0x28cb7e841ee97947a86b06fa4090c8451f64c0be',
'name': 'YF Link',
'decimals': '18',
'symbol': 'YFL',
'totalSupply': '52000000000000000000000',
'owner': '0x',
'lastUpdated': 1603851830,
'issuancesCount': 0,
'holdersCount': 5164,
'image': '/images/YFL28cb7e84.png',
'website': 'https://yflink.io/',
'telegram': 'https://t.me/YFLinkGroup',
'twitter': 'YFLinkio',
'coingecko': 'yflink',
'price': {'rate': 411.62315709142763,
'diff': 2.44,
'diff7d': 22.67,
'ts': 1603860243,
'marketCapUsd': 20628385.985420085,
'availableSupply': 50114.73633112,
'volume24h': 673808.77973096,
'diff30d': -9.745291974110742,
'currency': 'USD'},
'publicTags': ['Yield Farming', 'Yearn', 'Governance']},
'balance': 69000000000000,
'totalIn': 0,
'totalOut': 0},
{'tokenInfo': {'address': '0x618e75ac90b12c6049ba3b27f5d5f8651b0037f6',
'name': 'QASH',
'decimals': '6',
'symbol': 'QASH',
'totalSupply': '1000000000000000',
'owner': '0x9fa8a9cd0bd7cbfc503513bc94cd3b3a9ca90e35',
'lastUpdated': 1603818056,
'issuancesCount': 0,
'holdersCount': 13087,
'website': 'https://liquid.plus/',
'facebook': 'LiquidGlobal',
'telegram': 'https://t.me/QUOINENews',
'twitter': 'Liquid_Global',
'image': '/images/QASH618e75ac.jpeg',
'reddit': 'liquid',
'coingecko': 'qash',
'ethTransfersCount': 2,
'price': {'rate': 0.03783789848158,
'diff': 2.83,
'diff7d': 0.05,
'ts': 1603860243,
'marketCapUsd': 13243264.468553,
'availableSupply': 350000000,
'volume24h': 170565.95092274,
'diff30d': -5.421371004476654,
'currency': 'USD'},
'publicTags': ['Exchange']},
'balance': 112242287903649,
'totalIn': 0,
'totalOut': 0},
{'tokenInfo': {'address': '0x9f7229af0c4b9740e207ea283b9094983f78ba04',
'decimals': '18',
'name': 'Tadpole',
'owner': '0x',
'symbol': 'TAD',
'totalSupply': '1000000000000000000000000',
'lastUpdated': 1603859098,
'issuancesCount': 0,
'holdersCount': 597,
'price': False},
'balance': 100000000000000,
'totalIn': 0,
'totalOut': 0}]}

How save a json file in python from api response when the class is a list and object is not serializable

I have tried to find the answer but I could not find it
I am looking for the way to save in my computer a json file from python.
I call the API
configuration = api.Configuration()
configuration.api_key['X-XXXX-Application-ID'] = 'xxxxxxx'
configuration.api_key['X-XXX-Application-Key'] = 'xxxxxxxx1'
## List our parameters as search operators
opts= {
'title': 'Deutsche Bank',
'body': 'fraud',
'language': ['en'],
'published_at_start': 'NOW-7DAYS',
'published_at_end': 'NOW',
'per_page': 1,
'sort_by': 'relevance'
}
try:
## Make a call to the Stories endpoint for stories that meet the criteria of the search operators
api_response = api_instance.list_stories(**opts)
## Print the returned story
pp(api_response.stories)
except ApiException as e:
print('Exception when calling DefaultApi->list_stories: %s\n' % e)
I got the response like this
[{'author': {'avatar_url': None, 'id': 1688440, 'name': 'Pranav Nair'},
'body': 'The law firm will investigate whether the bank or its officials have '
'engaged in securities fraud or unlawful business practices. '
'Industries: Bank Referenced Companies: Deutsche Bank',
'categories': [{'confident': False,
'id': 'IAB11-5',
'level': 2,
'links': {'_self': 'https://,
'parent': 'https://'},
'score': 0.39,
'taxonomy': 'iab-qag'},
{'confident': False,
'id': 'IAB3-12',
'level': 2,
'links': {'_self': 'https://api/v1/classify/taxonomy/iab-qag/IAB3-12',
'score': 0.16,
'taxonomy': 'iab-qag'},
'clusters': [],
'entities': {'body': [{'indices': [[168, 180]],
'links': {'dbpedia': 'http://dbpedia.org/resource/Deutsche_Bank'},
'score': 1.0,
'text': 'Deutsche Bank',
'types': ['Bank',
'Organisation',
'Company',
'Banking',
'Agent']},
{'indices': [[80, 95]],
'links': {'dbpedia': 'http://dbpedia.org/resource/Securities_fraud'},
'score': 1.0,
'text': 'securities fraud',
'types': ['Practice', 'Company']},
'hashtags': ['#DeutscheBank', '#Bank', '#SecuritiesFraud'],
'id': 3004661328,
'keywords': ['Deutsche',
'behalf',
'Bank',
'firm',
'investors',
'Deutsche Bank',
'bank',
'fraud',
'unlawful'],
'language': 'en',
'links': {'canonical': None,
'coverages': '/coverages?story_id=3004661328',
'permalink': 'https://www.snl.com/interactivex/article.aspx?KPLT=7&id=58657069',
'related_stories': '/related_stories?story_id=3004661328'},
'media': [],
'paragraphs_count': 1,
'published_at': datetime.datetime(2020, 5, 19, 16, 8, 5, tzinfo=tzutc()),
'sentences_count': 2,
'sentiment': {'body': {'polarity': 'positive', 'score': 0.599704},
'title': {'polarity': 'neutral', 'score': 0.841333}},
'social_shares_count': {'facebook': [],
'google_plus': [],
'source': {'description': None,
'domain': 'snl.com',
'home_page_url': 'http://www.snl.com/',
'id': 8256,
'links_in_count': None,
'locations': [{'city': 'Charlottesville',
'country': 'US',
'state': 'Virginia'}],
'logo_url': None,
'name': 'SNL Financial',
'scopes': [{'city': None,
'country': 'US',
'level': 'national',
'state': None},
{'city': None,
'country': None,
'level': 'international',
'state': None}],
'title': None},
'summary': {'sentences': ['The law firm will investigate whether the bank or '
'its officials have engaged in securities fraud or '
'unlawful business practices.',
'Industries: Bank Referenced Companies: Deutsche '
'Bank']},
'title': "Law firm to investigate Deutsche Bank's US ops on behalf of "
'investors',
'translations': {'en': None},
'words_count': 26}]
In the documentation says "Stories you retrieve from the API are returned as JSON objects by default. These JSON story objects contain 22 top-level fields, whereas a full story object will contain 95 unique data points"
The class is a list. When I have tried to save json file I have the error "TypeError: Object of type Story is not JSON serializable".
How I can save a json file in my computer?

The response you got is not json, json uses double quotes, but here its single quotes. Copy paste your response in the following link to see the issues
http://json.parser.online.fr/.
If you change it like
[{"author": {"avatar_url": None, "id": 1688440, "name": "Pranav Nair"},
"body": "......
It will work, You can use python json module to do it
import json
json.loads(the_dict_got_from_response).
But it should be the duty of the API provider to, To make it working you can json load the result you got.

How do I turn my JSON into something easier to read?

My Python script connects to an API and gets some JSON.
I've been trying out prettyprint, parse, loads, dumps but I haven't figured them out yet...
Right now, when i do print(request.json()) I get this:
{'info': {'status': 'OK', 'time': {'seconds': 0.050006151199341, 'human': '50 milliseconds'}},
'datalist': {'total': 1, 'count': 1, 'offset': 0, 'limit': 3, 'next': 1, 'hidden': 0, 'loaded': True, 'list': [
{'id': 27862209, 'name': 'Fate/Grand Order', 'package': 'com.xiaomeng.fategrandorder',
'uname': 'komoe-game-fate-go', 'size': 49527668,
'icon': 'http://pool.img.xxxxx.com/msi8/9b58a48638b480c17135a10810374bd6_icon.png',
'graphic': 'http://pool.img.xxxxx.com/msi8/3a240b50ac37a9824b9ac99f1daab8c8_fgraphic_705x345.jpg',
'added': '2017-05-20 10:54:53', 'modified': '2017-05-20 10:54:53', 'updated': '2018-02-12 12:35:51',
'uptype': 'regular', 'store': {'id': 750918, 'name': 'msi8',
'avatar': 'http://pool.img.xxxxx.com/msi8/c61a8cfe9f68bfcfb71ef59b46a8ae5d_ravatar.png',
'appearance': {'theme': 'grey',
'description': '❤️ Welcome To Msi8 Store & My Store Will Mostly Be Specialized in Games With OBB File Extension. I Hope You Find What You Are Looking For Here ❤️'},
'stats': {'apps': 20776, 'subscribers': 96868, 'downloads': 25958359}},
'file': {'vername': '1.14.5', 'vercode': 52, 'md5sum': 'xxxxx', 'filesize': 49527668,
'path': 'http://pool.apk.xxxxx.com/msi8/com-xiaomeng-fategrandorder-52-27862209-32a264b031d6933514970c43dea4191f.apk',
'path_alt': 'http://pool.apk.xxxxx.com/msi8/alt/Y29tLXhpYW9tZW5nLWZhdGVncmFuZG9yZGVyLTUyLTI3ODYyMjA5LTMyYTI2NGIwMzFkNjkzMzUxNDk3MGM0M2RlYTQxOTFm.apk',
'malware': {'rank': 'UNKNOWN'}},
'stats': {'downloads': 432, 'pdownloads': 452, 'rating': {'avg': 0, 'total': 0},
'prating': {'avg': 0, 'total': 0}}, 'has_versions': False, 'obb': None,
'xxxxx': {'advertising': False, 'billing': False}}]}}
But I want it to look like this:

>>> import json
>>> a={"some":"json", "a":{"b":[1,2,3,4]}}
>>> print(json.dumps(a, indent=4, sort_keys=True))
{
"a": {
"b": [
1,
2,
3,
4
]
},
"some": "json"
}

Grab all nested keys in json structure

Let's say I read in the following json file.
text = "NASCAR"
with urllib.request.urlopen(f'https://en.wikipedia.org/w/api.php?action=query&generator=search&gsrsearch=morelike:{text}&format=json') as url:
more_like_data = json.loads(url.read().decode())
I am trying to extract each of the "titles" contained in query >> pages >> [random page number] and store that in a list. My attempt to do so looked like this
more_like_titles = list([page_number.get('title') for page_number in more_like_data.get('query').get('pages')])
print(more_like_titles)
I get the error
"AttributeError: 'str' object has no attribute 'get'"
I'm not sure why it's reading the value in as a string, as in the JSON file that is loaded it clearly appears as a dictionary. See here:
{'batchcomplete': '',
'continue': {'continue': 'gsroffset||', 'gsroffset': 10},
'query': {'pages': {'147515': {'index': 6,
'ns': 0,
'pageid': 147515,
'title': 'NASCAR Xfinity Series'},
'14855318': {'index': 4,
'ns': 0,
'pageid': 14855318,
'title': 'Criticism of NASCAR'},
'17138753': {'index': 9,
'ns': 0,
'pageid': 17138753,
'title': 'List of NASCAR drivers who have '
'won in each of top three series'},
'2201365': {'index': 5,
'ns': 0,
'pageid': 2201365,
'title': 'Buschwhacker'},
'35514289': {'index': 1,
'ns': 0,
'pageid': 35514289,
'title': 'List of female NASCAR drivers'},
'40853273': {'index': 7,
'ns': 0,
'pageid': 40853273,
'title': 'Daniel Hemric'},
'43410277': {'index': 10,
'ns': 0,
'pageid': 43410277,
'title': '2015 NASCAR Camping World Truck '
'Series'},
'47112554': {'index': 8,
'ns': 0,
'pageid': 47112554,
'title': 'Ryan Preece'},
'47828021': {'index': 3,
'ns': 0,
'pageid': 47828021,
'title': '2016 NASCAR Xfinity Series'},
'5082163': {'index': 2,
'ns': 0,
'pageid': 5082163,
'title': 'NASCAR Whelen Modified Tour'}}}}
Any thoughts?

When you're having trouble with a list comprehension, breaking it down is probably a good idea. That being said, your issue was that you were trying to iterate over a dictionary directly which can give some unexpected results. I've fixed your list comprehension below using pythons built in .items
more_like_titles = list([vals.get('title') for page_number, vals in more_like_data.get('query').get('pages').items()])

Parsing data in a dict

I have a dict that I am trying to obtain certain data from, an example of this dict is as follows:
{
'totalGames': 1,
'dates': [{
'totalGames': 1,
'totalMatches': 0,
'matches': [],
'totalEvents': 0,
'totalItems': 1,
'games': [{
'status': {
'codedGameState': '7',
'abstractGameState': 'Final',
'startTimeTBD': False,
'detailedState': 'Final',
'statusCode': '7',
},
'season': '20172018',
'gameDate': '2018-05-20T19:00:00Z',
'venue': {'link': '/api/v1/venues/null',
'name': 'Bell MTS Place'},
'gameType': 'P',
'teams': {'home': {'leagueRecord': {'wins': 9,
'losses': 8, 'type': 'league'}, 'score': 1,
'team': {'link': '/api/v1/teams/52',
'id': 52, 'name': 'Winnipeg Jets'}},
'away': {'leagueRecord': {'wins': 12,
'losses': 3, 'type': 'league'}, 'score': 2,
'team': {'link': '/api/v1/teams/54',
'id': 54, 'name': 'Vegas Golden Knights'}}},
'content': {'link': '/api/v1/game/2017030325/content'},
'link': '/api/v1/game/2017030325/feed/live',
'gamePk': 2017030325,
}],
'date': '2018-05-20',
'events': [],
}],
'totalMatches': 0,
'copyright': 'NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. \xa9 NHL 2018. All Rights Reserved.',
'totalEvents': 0,
'totalItems': 1,
'wait': 10,
}
I am interested obtaining the score for a certain team if they played that night, for example if my team of interest is the Vegas Golden Knights I would like to create a variable that contains their score (2 in this case). I am completely stuck on this so any help would be greatly appreciated!

This just turns into ugly parsing but is easily doable following the JSON structure; would recommend flattening the structure for your purposes. With that said, if you'd like to find the score of a particular team on a particular date, you could do this:
def find_score_by_team(gamedict, team_of_interest, date_of_interest):
for date in gamedict['dates']:
for game in date['games']:
if game['gameDate'].startswith(date_of_interest):
for advantage in game['teams']:
if game['teams'][advantage]['team']['name'] == team_of_interest:
return game['teams'][advantage]['score']
return -1
Example query:
>>> d = {'totalGames':1,'dates':[{'totalGames':1,'totalMatches':0,'matches':[],'totalEvents':0,'totalItems':1,'games':[{'status':{'codedGameState':'7','abstractGameState':'Final','startTimeTBD':False,'detailedState':'Final','statusCode':'7',},'season':'20172018','gameDate':'2018-05-20T19:00:00Z','venue':{'link':'/api/v1/venues/null','name':'BellMTSPlace'},'gameType':'P','teams':{'home':{'leagueRecord':{'wins':9,'losses':8,'type':'league'},'score':1,'team':{'link':'/api/v1/teams/52','id':52,'name':'WinnipegJets'}},'away':{'leagueRecord':{'wins':12,'losses':3,'type':'league'},'score':2,'team':{'link':'/api/v1/teams/54','id':54,'name':'VegasGoldenKnights'}}},'content':{'link':'/api/v1/game/2017030325/content'},'link':'/api/v1/game/2017030325/feed/live','gamePk':2017030325,}],'date':u'2018-05-20','events':[],}],'totalMatches':0,'copyright':'NHLandtheNHLShieldareregisteredtrademarksoftheNationalHockeyLeague.NHLandNHLteammarksarethepropertyoftheNHLanditsteams.\xa9NHL2018.AllRightsReserved.','totalEvents':0,'totalItems':1,'wait':10,}
>>> find_score_by_team(d, 'VegasGoldenKnights', '2018-05-20')
2
This returns -1 if the team didn't play that night, otherwise it returns the team's score.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Webscrapping a site which contains JSON data - python

Related

Parse urlib request

How save a json file in python from api response when the class is a list and object is not serializable

How do I turn my JSON into something easier to read?

Grab all nested keys in json structure

Parsing data in a dict

Categories

Resources