How can i get multiple layer in JSON format - python

I'm using pyhton3.4.1.
I'm using google custom search.
I want to get link but it diplay TypeError: string indices must be integers.
Below is my code and JSON format.
from urllib.request import urlopen
import json
u = urlopen('https://www.googleapis.com/customsearch/v1?key=AIzaSyC3jpmwO3Ieifw1VnrVoL3mS3KSE_GMRvo&cx=010407088344546736418:onjj7gscy2g&q=lol&num=10')
resp = json.loads(u.read().decode('utf-8'))
for link in resp:
for k in link['item']:
print(k['link'])
and JSON fomat is like below.
"items": [
{
"kind": "customsearch#result",
"title": "League of Legends",
"htmlTitle": "<b>League of Legends</b>",
"link": "http://leagueoflegends.com/",
"displayLink": "leagueoflegends.com",
"snippet": "Official site. Features, media, screenshots, FAQs, and forums.",
"htmlSnippet": "Official site. Features, media, screenshots, FAQs, and forums.",
"cacheId": "GCRD1wy5e3QJ",
"formattedUrl": "leagueoflegends.com/",
"htmlFormattedUrl": "<b>leagueoflegends</b>.com/",
"pagemap": {
"cse_image": [
{
"src": "http://na.leagueoflegends.com/sites/default/files/styles/wide_small/public/upload/pool_party_201_splash_1920.jpg?itok=QGxFrikL"
}
],
"cse_thumbnail": [
{
"width": "256",
"height": "144",
"src": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSvyCGlnn9a7N13rjwbPvSNemH-mbqzC6otkcJgeOK-6c1dkcMP6XIumTXG"
}
],

Change last 3 lines to:
for item in resp['items']:
print(item['link'])

Related

Scraping Google Maps with Python and bs4 Without API

I'm triying get data from google maps with python and BeautifulSoup. For example pharmacies in a city. I will get location data (lat-lon), name of pharmacy(ie, MDC Pharmacy), score of pharmcy(3.2), number of rewiews(10), addres with zip code, and phone number of pharmacy.
I have tried python and BeautifulSoup but I'm stuck because I don't know how to extract the data. Class method isn't working. When I prettifing and printing to the results I have seen all of data. So how can I clean them for a pandas data frame? I need more codes both for clean data and add them a list or df. Also classobject turning noobject type. Here my codes:
import requests
from bs4 import BeautifulSoup
r=requests.get("https://www.google.com.tr/maps/search/eczane/#37.4809437,36.7749346,57378m/data=!3m1!1e3")
soup= BeautifulSoup(r.content,"lxml")
a=soup.prettify()
l=soup.find("div",{"class":"mapsConsumerUiSubviewSectionGm2Placeresultcontainer__result-container mapsConsumerUiSubviewSectionGm2Placeresultcontainer__one-action mapsConsumerUiSubviewSectionGm2Placeresultcontainer__wide-margin"})
print(a)
Printresult.jpg
I have this result I need extract data from here (above).
I want a result like this table (below). Thanks...
wanted resul(it is just a sample)
You don't need selenium for this. You don't even need BeautifulSoup (in fact, it doesn't help at all). Here is code that fetches the page, isolates the initialization data JSON, decodes it, and prints the resulting Python structure.
You would need to print out the structure, and start doing some counting to find the data you want, but it's all here.
import requests
import json
from pprint import pprint
r=requests.get("https://www.google.com.tr/maps/search/eczane/#37.4809437,36.7749346,57378m/data=!3m1!1e3")
txt = r.text
find1 = "window.APP_INITIALIZATION_STATE="
find2 = ";window.APP"
i1 = txt.find(find1)
i2 = txt.find(find2, i1+1 )
js = txt[i1+len(find1):i2]
data = json.loads(js)
pprint(data)
It might be also worth looking into a third party solution like SerpApi. It's a paid API with a free trial.
Example python code (available in other libraries also):
from serpapi import GoogleSearch
params = {
"api_key": "secret_api_key",
"engine": "google_maps",
"q": "eczane",
"google_domain": "google.com",
"hl": "en",
"ll": "#37.5393407,36.707705,11z",
"type": "search"
}
search = GoogleSearch(params)
results = search.get_dict()
Example JSON output:
"local_results": [
{
"position": 1,
"title": "Ocak Eczanesi",
"place_id": "ChIJcRipbonnLRUR4DG-UuCnB2I",
"data_id": "0x152de7896ea91871:0x6207a7e052be31e0",
"data_cid": "7063799122456621536",
"reviews_link": "https://serpapi.com/search.json?data_id=0x152de7896ea91871%3A0x6207a7e052be31e0&engine=google_maps_reviews&hl=en",
"photos_link": "https://serpapi.com/search.json?data_id=0x152de7896ea91871%3A0x6207a7e052be31e0&engine=google_maps_photos&hl=en",
"gps_coordinates": {
"latitude": 37.5775156,
"longitude": 36.957789399999996
},
"place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x152de7896ea91871%3A0x6207a7e052be31e0%218m2%213d37.5775156%214d36.957789399999996&engine=google_maps&google_domain=google.com&hl=en&type=place",
"rating": 3.5,
"reviews": 8,
"type": "Drug store",
"address": "Kanuni Mh. Milcan Cd. Pk:46100 Merkez, 46100 Dulkadiroğlu/Kahramanmaraş, Turkey",
"open_state": "Closes soon ⋅ 6PM ⋅ Opens 8:30AM Fri",
"hours": "Closing soon: 6:00 PM",
"phone": "+90 344 231 68 00",
"website": "https://kahramanmaras.bel.tr/nobetci-eczaneler",
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipN5CQRdoKc_BdCgSDiEdi0nEkk1X_VUy1PP4wN3=w93-h92-k-no"
},
{
"position": 2,
"title": "Nobetci eczane",
"place_id": "ChIJP4eh2WndLRURD6IcnOov0dA",
"data_id": "0x152ddd69d9a1873f:0xd0d12fea9c1ca20f",
"data_cid": "15046860514709512719",
"reviews_link": "https://serpapi.com/search.json?data_id=0x152ddd69d9a1873f%3A0xd0d12fea9c1ca20f&engine=google_maps_reviews&hl=en",
"photos_link": "https://serpapi.com/search.json?data_id=0x152ddd69d9a1873f%3A0xd0d12fea9c1ca20f&engine=google_maps_photos&hl=en",
"gps_coordinates": {
"latitude": 37.591462,
"longitude": 36.8847051
},
"place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x152ddd69d9a1873f%3A0xd0d12fea9c1ca20f%218m2%213d37.591462%214d36.8847051&engine=google_maps&google_domain=google.com&hl=en&type=place",
"rating": 3.3,
"reviews": 12,
"type": "Pharmacy",
"address": "Mimar Sinan, 48007. Sk. No:19, 46050 Kahramanmaraş Merkez/Kahramanmaraş, Turkey",
"open_state": "Open now",
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNznf-hC_y9KdijwUMqdO9YIcn7rbN8ZQpdIHK5=w163-h92-k-no"
},
...
]
Check out the documentation for more details.
Disclaimer: I work at SerpApi.

Python - How to retrieve element from json

Aloha,
My python routine will retrieve json from site, then check the file and download another json given the first answer and eventually download a zip.
The first json file gives information about doc.
Here's an example :
[
{
"id": "d9789918772f935b2d686f523d066a7b",
"originalName": "130010259_AC2_R44_20200101",
"type": "SUP",
"status": "document.deleted",
"legalStatus": "APPROVED",
"name": "130010259_SUP_R44_AC2",
"grid": {
"name": "R44",
"title": "GRAND EST"
},
"bbox": [
3.4212881,
47.6171589,
8.1598899,
50.1338684
],
"documentSource": "UPLOAD",
"uploadDate": "2020-06-25T14:56:27+02:00",
"updateDate": "2021-01-19T14:33:35+01:00",
"fileIdentifier": "SUP-AC2-R44-130010259-20200101",
"legalControlStatus": 101
},
{
"id": "6a9013bdde6acfa632861aeb1a02942b",
"originalName": "130010259_AC2_R44_20210101",
"type": "SUP",
"status": "document.production",
"legalStatus": "APPROVED",
"name": "130010259_SUP_R44_AC2",
"grid": {
"name": "R44",
"title": "GRAND EST"
},
"bbox": [
3.4212881,
47.6171589,
8.1598899,
50.1338684
],
"documentSource": "UPLOAD",
"uploadDate": "2021-01-18T16:37:01+01:00",
"updateDate": "2021-01-19T14:33:29+01:00",
"fileIdentifier": "SUP-AC2-R44-130010259-20210101",
"legalControlStatus": 101
},
{
"id": "efd51feaf35b12248966cb82f603e403",
"originalName": "130010259_PM2_R44_20210101",
"type": "SUP",
"status": "document.production",
"legalStatus": "APPROVED",
"name": "130010259_SUP_R44_PM2",
"grid": {
"name": "R44",
"title": "GRAND EST"
},
"bbox": [
3.6535762,
47.665021,
7.9509455,
49.907347
],
"documentSource": "UPLOAD",
"uploadDate": "2021-01-28T09:52:31+01:00",
"updateDate": "2021-01-28T18:53:34+01:00",
"fileIdentifier": "SUP-PM2-R44-130010259-20210101",
"legalControlStatus": 101
},
{
"id": "2e1b6104fdc09c84077d54fd9e74a7a7",
"originalName": "444619258_I4_R44_20210211",
"type": "SUP",
"status": "document.pre_production",
"legalStatus": "APPROVED",
"name": "444619258_SUP_R44_I4",
"grid": {
"name": "R44",
"title": "GRAND EST"
},
"bbox": [
2.8698336,
47.3373246,
8.0881368,
50.3796449
],
"documentSource": "UPLOAD",
"uploadDate": "2021-04-19T10:20:20+02:00",
"updateDate": "2021-04-19T14:46:21+02:00",
"fileIdentifier": "SUP-I4-R44-444619258-20210211",
"legalControlStatus": 100
}
]
What I try to do is to retrieve "id" from this json file. (ex. "id": "2e1b6104fdc09c84077d54fd9e74a7a7",)
I've tried
import json
from jsonpath_rw import jsonpath, parse
import jsonpath_rw_ext as jp
with open('C:/temp/gpu/SUP/20210419/SUPGE.json') as f:
d = json.load(f)
data = json.dumps(d)
print("oriName: {}".format( jp.match1("$.id[*]",data) ) )
It doesn't work In fact, I'm not sure how jsonpath-rw is intended to work. Thankfully there was this blogpost But I'm still stuck.
Does anyone have a clue ?
With the id, I'll be able to download another json and in this json there'll be an archiveUrl to get the zipfile.
Thanks in advance.
import json
file = open('SUPGE.json')
with file as f:
d = json.load(f)
for i in d:
print(i.get('id'))
this will give you id only.
d9789918772f935b2d686f523d066a7b
6a9013bdde6acfa632861aeb1a02942b
efd51feaf35b12248966cb82f603e403
2e1b6104fdc09c84077d54fd9e74a7a7
Ok.
Here's what I've done.
import json
import urllib
# not sure it's the best way to load json from url, but it works fine
# and I could test most of code if needed.
def getResponse(url):
operUrl = urllib.request.urlopen(url)
if(operUrl.getcode()==200):
data = operUrl.read()
jsonData = json.loads(data)
else:
print("Erreur reçue", operUrl.getcode())
return jsonData
# Here I get the json from the url. *
# That part will be in the final script a parameter,
# because I got lot of territory to control
d = getResponse('https://www.geoportail-urbanisme.gouv.fr/api/document?documentFamily=SUP&grid=R44&legalStatus=APPROVED')
for i in d:
if i['status'] == 'document.production' :
print('id du doc en production :',i.get('id'))
# here we parse the id to fetch the whole document.
# Same server, same API but different url
_URL = 'https://www.geoportail-urbanisme.gouv.fr/api/document/' + i.get('id')+'/details'
d2 = getResponse(_URL)
print('archive',d2['archiveUrl'])
urllib.request.urlretrieve(d2['archiveUrl'], 'c:/temp/gpu/SUP/'+d2['metadata']+'.zip' )
# I used wget in the past and loved the progression bar.
# Maybe I'd switch to wget because of it.
# Works fine.
Thanks for your answer. I'm delighted to see that even with only the json library you could do amazing things. Just normal stuff. But amazing.
Feel free to comment if you think I've missed smthg.

Where attachments of Steps are stored in VSTS REST API?

I am using the Python REST API of VSTS for TFS / Azure Dev Ops (https://github.com/Microsoft/azure-devops-python-api).
I would like to add attachments to some of the steps of my Test Cases like I can do in the Web Interface.
This is how I want my step to look like:
... and when you run it, that would look like this:
However, I have not been able to find where this information is stored.
This is the JSON data for the WorkItem of my Test Case
{
id: 224,
rev: 2,
fields: {
System.AreaPath: "GM_sandbox\GM-Toto",
System.TeamProject: "GM_sandbox",
System.IterationPath: "GM_sandbox",
System.WorkItemType: "Test Case",
System.State: "Design",
System.Reason: "New",
System.AssignedTo: "Jeff",
System.CreatedDate: "2019-01-03T01:43:09.743Z",
System.CreatedBy: "Jeff",
System.ChangedDate: "2019-01-03T02:12:07.15Z",
System.ChangedBy: "Jeff",
System.Title: "Titi",
Microsoft.VSTS.Common.StateChangeDate: "2019-01-03T01:43:09.743Z",
Microsoft.VSTS.Common.ActivatedDate: "2019-01-03T01:43:09.743Z",
Microsoft.VSTS.Common.ActivatedBy: "Jeff",
Microsoft.VSTS.Common.Priority: 2,
Microsoft.VSTS.TCM.AutomationStatus: "Not Automated",
Microsoft.VSTS.TCM.Steps: "<steps id="0" last="2"><step id="2" type="ValidateStep"><parameterizedString isformatted="true"><DIV><P>Click on the rainbow button</P></DIV></parameterizedString><parameterizedString isformatted="true"><P>Screen becomes Blue (see picture)</P></parameterizedString><description/></step></steps>"
},
_links: {
self: {
href: "https://my_server.com:8443/tfs/PRODUCT/23d89bd4-8547-4be3-aa73-13a30866f176/_apis/wit/workItems/224"
},
workItemUpdates: {
href: "https://my_server.com:8443/tfs/PRODUCT/_apis/wit/workItems/224/updates"
},
workItemRevisions: {
href: "https://my_server.com:8443/tfs/PRODUCT/_apis/wit/workItems/224/revisions"
},
workItemHistory: {
href: "https://my_server.com:8443/tfs/PRODUCT/_apis/wit/workItems/224/history"
},
html: {
href: "https://my_server.com:8443/tfs/PRODUCTi.aspx?pcguid=4107d6a2-eaaa-40b9-9a8d-f8fdbb31d4b7&id=224"
},
workItemType: {
href: "https://my_server.com:8443/tfs/PRODUCT/23d89bd4-8547-4be3-aa73-13a30866f176/_apis/wit/workItemTypes/Test%20Case"
},
fields: {
href: "https://my_server.com:8443/tfs/PRODUCT/23d89bd4-8547-4be3-aa73-13a30866f176/_apis/wit/fields"
}
},
url: "https://my_server.com:8443/tfs/PRODUCT/23d89bd4-8547-4be3-aa73-13a30866f176/_apis/wit/workItems/224"
}
Any idea on where this information is stored?
And, if you are familiar with the Python REST API, how to add an attachment from a file and link it to the test step?
Thanks a lot
Here is the flow using just the azure-devops-rest-api
Create the attachment:
Request:
POST https://dev.azure.com/{organization}/_apis/wit/attachments?fileName=info.txt&api-version=4.1
Body:
{"User text content to upload"}
Response:
{
"id": "f5016cf4-4c36-4bd6-9762-b6ad60838cf7",
"url": "https://dev.azure.com/{organization}/_apis/wit/attachments/f5016cf4-4c36-4bd6-9762-b6ad60838cf7?fileName=info.txt"
}
Create the Work Item:
Request:
PATCH https://dev.azure.com/{organization}/{project}/_apis/wit/workitems/$Test Case?api-version=4.1
Body:
[
{
"op": "add",
"path": "/fields/System.Title",
"from": null,
"value": "Sample test case"
},
{
"op": "add",
"path": "/fields/Microsoft.VSTS.TCM.Steps",
"value": "<steps id=\"0\" last=\"4\"><step id=\"2\" type=\"ActionStep\"><parameterizedString isformatted=\"true\"><DIV><P>test&nbsp;</P></DIV></parameterizedString><parameterizedString isformatted=\"true\"><DIV><P>&nbsp;</P></DIV></parameterizedString><description/></step><step id=\"3\" type=\"ActionStep\"><parameterizedString isformatted=\"true\"><DIV><DIV><P>test&nbsp;</P></DIV></DIV></parameterizedString><parameterizedString isformatted=\"true\"><DIV><P>&nbsp;</P></DIV></parameterizedString><description/></step><step id=\"4\" type=\"ActionStep\"><parameterizedString isformatted=\"true\"><DIV><P>test&nbsp;</P></DIV></parameterizedString><parameterizedString isformatted=\"true\"><DIV><P>&nbsp;</P></DIV></parameterizedString><description/></step></steps>"
},
{
"op": "add",
"path": "/relations/-",
"value": {
"rel": "AttachedFile",
"url": "https://dev.azure.com/{organization}/_apis/wit/attachments/f5016cf4-4c36-4bd6-9762-b6ad60838cf7?fileName=info.txt",
"attributes": {
"comment": "[TestStep=3]:",
"name": "info.txt"
}
}
}
]
The test case that is created will look like the below. There is something off with the step numbering for the number in the comment. Looks like it need to be +1 for the actual step you want to reference.
The key is to have in the attributes of the attached file, the comment with "[TestStep=3]:" as well as a name for the attachment.
In Python, that would give something like this:
Creating of attachment with function create_attachment
Updating a Test Case with url, comment, and filename
So something like that...
from vsts.work_item_tracking.v4_1.models.json_patch_operation import JsonPatchOperation
def add_attachment(wit_id: int, project: str, url:str, comment: str, step = 0, name = ""):
"""Add attachment already uploaded to a WorkItem
"""
# For linking the attachment to a step, we need to modify the comment and add a name
if step:
attributes = {
"comment":f"[TestStep={step}]:{comment}",
"name": name if name else re.sub(r".*fileName=", "", url)
}
else:
attributes = {"comment": comment}
patch_document = [
JsonPatchOperation(
op="add",
path="/relations/-",
value={
"rel": "AttachedFile",
"url": url,
"attributes": attributes,
},
)
]
return client.wit.update_work_item(patch_document, wit_id, project)
attachment = client_wit.create_attachment(stream, project, 'smiley.png')
add_attachment(tcid, project, attachment.url, 'Attaching file to work item', step=3)

Simple Python social media scrape of Public information

I just want to grab public information from my accounts on two social media sites. (Instagram and Twitter) My code returns info for twitter, and I know the xpath is correct for instagram but for some reason i'm not getting data for it. I know the XPATH's could be more specific but I can fix that later. Both my accounts are public.
1) I thought maybe it didn't like the python header, so I tried changing it and I still get nothing. That line is commented out but its still there.
2) I heard something about an API on github, this lengthy code is very intimidating and way above my level of understanding. I don't know more than half of what i'm reading on there.
from lxml import html
import requests
import webbrowser
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
#page = requests.get('https://www.instagram.com/<my account>/', headers=headers)
page = requests.get('https://www.instagram.com/<my account>/')
tree = html.fromstring(page.text)
pageTwo = requests.get('http://www.twitter.com/<my account>')
treeTwo = html.fromstring(pageTwo.text)
instaFollowers = tree.xpath("//span[#data-reactid='.0.1.0.0:0.1.3.1.0']/span[2]/text()")
instaFollowing = tree.xpath("//span[#data-reactid='.0.1.0.0:0.1.3.2.0']/span[2]/text()")
twitFollowers = treeTwo.xpath("//a[#data-nav='followers']/span[#class='ProfileNav-value']/text()")
twitFollowing = treeTwo.xpath("//a[#data-nav='following']/span[#class='ProfileNav-value']/text()")
print ''
print '--------------------'
print 'Social Media Checker'
print '--------------------'
print ''
print 'Instagram: ' + str(instaFollowers) + ' / ' + str(instaFollowing)
print ''
print 'Twitter: ' + str(twitFollowers) + ' / ' + str(twitFollowing)
As mentioned, Instragram's page source does not reflect its rendered source as a Javascript function is called to pass content from JSON data to browser. Hence, what Python scrapes in page source does not show exactly what browser renders to screen. Welcome to the new world of dynamic web programming! Consider using Instagram's API or other web parser that can retrieve html generated content (not just page source).
With that said, if you simply need the IG account data you can still use Python's lxml to XPath the JSON content in <script> tag (specifically sixth occurrence but adjust to your needed page). Below example parses Google's Instagram JSON data:
import lxml.etree as et
import urllib.request as rq
rqpage = rq.urlopen('https://instagram.com/google')
txtpage = rqpage.read()
tree = et.HTML(txtpage)
jsondata = tree.xpath("//script[#type='text/javascript' and position()=6]/text()")
for i in jsondata:
print(i)
OUTPUT
window._sharedData = {"qs":"{\"shift\":10,\"header
\":\"n3bTdmHGHDgxvZYPN0KDFHqbkxd6zpTl\",\"edges\":100,\"blob
\":\"AQCq42rOTCnKOZcOxFn06L1J6_W8wY6ntAS1bX88VBClAjQD9PyJdefCzOwfSAbUdsBwHKb1QSndurPtjyN-
rHMOrZ_6ubE_Xpu908cyron9Zczkj4QMkAYUHIgnmmftuXG8rrFzq_Oq3BoXpQgovI9hefha-
6SAs1RLJMwMArrbMlFMLAwyd1TZhArcxQkk9bgRGT4MZK4Tk2VNt1YOKDN1pO3NJneFlUxdUJTdDX
zj3eY-stT7DnxF_GM_j6xwk1o\",\"iterations\":7,\"size\":42}","static_root":"
\/\/instagramstatic-a.akamaihd.net\/bluebar\/5829dff","entry_data":
{"ProfilePage":[{"__query_string":"?","__path":"\/google\/","__get_params":
{},"user":{"username":"google","has_blocked_viewer":false,"follows":
{"count":10},"requested_by_viewer":false,"followed_by":
{"count":977186},"country_block":null,"has_requested_viewer":false,"followed_
by_viewer":false,"follows_viewer":false,"profile_pic_url":"https:
\/\/instagram.ford1-1.fna.fbcdn.net\/hphotos-xfp1\/t51.2885-19\/s150x150
\/11910217_933356470069152_115044571_a.jpg","is_private":false,"full_name":
"Google","media":{"count":180,"page_info":
{"has_previous_page":false,"start_cursor":"1126896719808871555","end_cursor":
"1092117490206686720","has_next_page":true},"nodes":[{"code":"-
jipiawryD","dimensions":{"width":640,"height":640},"owner":
{"id":"1067259270"},"comments":{"count":105},"caption":"Today's the day!
Your searches are served. Happy Thanksgiving \ud83c\udf57\ud83c\udf70
#GoogleTrends","likes":
{"count":11410},"date":1448556579.0,"thumbnail_src":"https:\/
\/instagram.ford1-1.fna.fbcdn.net\/hphotos-xat1\/t51.2885-15\/e15\
/11848856_482502108621097_589421586_n.jpg","is_video":true,"id":"112689671980
8871555","display_src":"https:\/\/instagram.ford1-1.fna.fbcdn.net\/hphotos-
xat1\/t51.2885-15
...
JSON Pretty Print (extracting the window._sharedData variable from above)
See below where user (followers, following, etc.) data shows at beginning:
{
"qs": "{\"shift\":10,\"header\":\"n3bTdmHGHDgxvZYPN0KDFHqbkxd6zpTl\",\"edges\":100,\"blob\":\"AQCq42rOTCnKOZcOxFn06L1J6_W8wY6ntAS1bX88VBClAjQD9PyJdefCzOwfSAbUdsBwHKb1QSndurPtjyN-rHMOrZ_6ubE_Xpu908cyron9Zczkj4QMkAYUHIgnmmftuXG8rrFzq_Oq3BoXpQgovI9hefha-6SAs1RLJMwMArrbMlFMLAwyd1TZhArcxQkk9bgRGT4MZK4Tk2VNt1YOKDN1pO3NJneFlUxdUJTdDXzj3eY-stT7DnxF_GM_j6xwk1o\",\"iterations\":7,\"size\":42}",
"static_root": "\/\/instagramstatic-a.akamaihd.net\/bluebar\/5829dff",
"entry_data": {
"ProfilePage": [
{
"__query_string": "?",
"__path": "\/google\/",
"__get_params": {
},
"user": {
"username": "google",
"has_blocked_viewer": false,
"follows": {
"count": 10
},
"requested_by_viewer": false,
"followed_by": {
"count": 977186
},
"country_block": null,
"has_requested_viewer": false,
"followed_by_viewer": false,
"follows_viewer": false,
"profile_pic_url": "https:\/\/instagram.ford1-1.fna.fbcdn.net\/hphotos-xfp1\/t51.2885-19\/s150x150\/11910217_933356470069152_115044571_a.jpg",
"is_private": false,
"full_name": "Google",
"media": {
"count": 180,
"page_info": {
"has_previous_page": false,
"start_cursor": "1126896719808871555",
"end_cursor": "1092117490206686720",
"has_next_page": true
},
"nodes": [
{
"code": "-jipiawryD",
"dimensions": {
"width": 640,
"height": 640
},
"owner": {
"id": "1067259270"
},
"comments": {
"count": 105
},
"caption": "Today's the day! Your searches are served. Happy Thanksgiving \ud83c\udf57\ud83c\udf70 #GoogleTrends",
"likes": {
"count": 11410
},
"date": 1448556579,
"thumbnail_src": "https:\/\/instagram.ford1-1.fna.fbcdn.net\/hphotos-xat1\/t51.2885-15\/e15\/11848856_482502108621097_589421586_n.jpg",
"is_video": true,
"id": "1126896719808871555",
"display_src": "https:\/\/instagram.ford1-1.fna.fbcdn.net\/hphotos-xat1\/t51.2885-15\/e15\/11848856_482502108621097_589421586_n.jpg"
},
{
"code": "-hwbf2wr0O",
"dimensions": {
"width": 640,
"height": 640
},
"owner": {
"id": "1067259270"
},
"comments": {
"count": 95
},
"caption": "Thanksgiving dinner is waiting. But first, the airport. \u2708\ufe0f #GoogleApp",
"likes": {
"count": 12621
},
...
IF anyone is interested in this sort of thing still, using selenium solved my problems.
http://pastebin.com/5eHeDt3r
Is there a faster way ?
In case you want to find information about yourself and others without hassling with code, try this piece of software. Apart from automatic scraping, it analyzes and visualizes the received information on a PDF report from such social networks: Facebook, Twitter, Instagram and from the Google Search engine.
P.S. I am the main developer and maintainer of this project.

Get all values from a JSON object and store in a flat array with Python

I am returning a JSON object from a requests call. I would like to get all the values from it and store them in a flat array.
My JSON object:
[
{
"link": "https://f.com/1"
},
{
"link": "https://f.com/2"
},
{
"link": "https://f.com/3"
}
]
I would like to store this as:
[https://f.com/things/1, https://f.com/things/2, https://f.com/things/3]
My code is as follows.. it is just printing each link out:
import requests
import json
def start_urls_data():
url = 'http://106309.n.com:3000/api/v1/product_urls?q%5Bcompany_in%5D%5B%5D=F'
headers = {'X-Api-Key': '1', 'Content-Type': 'application/json'}
r = requests.get(url, headers=headers)
start_urls_data = json.loads(r.content)
for i in start_urls_data:
print i['link']
You can use a simple list comprehension:
data = [
{
"link": "https://f.com/1"
},
{
"link": "https://f.com/2"
},
{
"link": "https://f.com/3"
}
]
print([x["link"] for x in data])
This code just loops through the list data and put the value of the key link from the dict element to a new list.

Categories