Extract JSON value from API call to use as variable [duplicate] - python

This question already has answers here:
What's the best way to parse a JSON response from the requests library?
(3 answers)
Closed 6 years ago.
I am using Indeeds API to scrape job listings. Their API only allows 25 results per call so that's why I have to iterate through the range.
I need to know the number of results returned (for the range), to use as my numresults variable. Right now I am just doing the same search in my browser and manually inputting the result.
I want iterate through multiple countries or search terms so I need to pass in the value "totalResults" to numresults which is found in the JSON.
The problem is I don't understand how to extract this value.
Can I do this right after the call (where would the json be stored) or do I need to create the JSON file first?
Here is my working scraper:
import requests
api_url = 'http://api.indeed.com/ads/apisearch? publisher=XXXXXXXXXXX&v=2&limit=100000&format=json'
Country = 'au'
SearchTerm = 'Insight'
number = -25
numresults = 3925
# must match the actual number of job results to the lower of the 25 increment or the last page will repeat over and over
#so if there are 392 results, then put 375
for number in range(-25, numresults, 25):
url = api_url + '&co=' + Country + '&q=' + SearchTerm + '&start=' + str(number + 25)
response = requests.get(url)
f = open(SearchTerm + '_' + Country +'.json','a')
f.write (response.content)
f.close()
print 'Complete' , url
Here is a sample of the returned JSON:
{
"version" : 2,
"query" : "Pricing",
"location" : "",
"dupefilter" : true,
"highlight" : true,
"start" : 1,
"end" : 25,
"totalResults" : 1712,
"pageNumber" : 0,
"results" : [
{
"jobtitle" : "New Energy Technical Specialist",
"company" : "Rheem",
etc.

Why not use the python json module ?
import json
# inside the loop, after the request.
json_content = json.loads(r.content)
print(json_content["version"]) # should display 2
Be careful, check before is the content returned by the request is really in json format. The doc is here: https://docs.python.org/2/library/json.html

Related

Reading JSON data in Python using Pagination, max records 100

I am trying to extract data from a REST API using python and put it into one neat JSON file, and having difficulty. The date is rather lengthy, with a total of nearly 4,000 records, but the max record allowed by the API is 100.
I've tried using some other examples to get through the code, and so far this is what I'm using (censoring the API URL and auth key, for the sake of confidentiality):
import requests
import json
from requests.structures import CaseInsensitiveDict
url = "https://api.airtable.com/v0/CENSORED/Vendors?maxRecords=100"
headers = CaseInsensitiveDict()
headers["Authorization"] = "Bearer CENSORED"
resp = requests.get(url, headers=headers)
resp.content.decode("utf-8")
vendors = []
new_results = True
page = 1
while new_results:
centiblock = requests.get(url + f"&page={page}", headers=headers).json()
new_results = centiblock.get("results", [])
vendors.extend(centiblock)
page += 1
full_directory = json.dumps(vendors, indent=4)
print(full_directory)
For the life of me, I cannot figure out why it isn't working. The output keeps coming out as just:
[
"records"
]
If I play around with the print statement at the end, I can get it to print centiblock (so named for being a block of 100 records at a time) just fine - it gives me 100 records in un-formated text. However, if I try printing vendors at the end, the output is:
['records']
...which leads me to guess that somehow, the vendors array is not getting filled with the data. I suspect that I need to modify the get request where I define new_results, but I'm not sure how.
For reference, this is a censored look at how the json data begins, when I format and print out one centiblock:
{
"records": [
{
"id": "XXX",
"createdTime": "2018-10-15T19:23:59.000Z",
"fields": {
"Vendor Name": "XXX",
"Main Phone": "XXX",
"Street": "XXX",
Can anyone see where I'm going wrong?
Thanks in advance!
When you are extending vendors with centiblock, your are giving a dict to the extend function. extend is expecting an Iterable, so that works, but when you iterate over a python dict, you only iterate over the keys of the dict. In this case, ['records'].
Note as well, that your loop condition becomes False after the first iteration, because centiblock.get("results", []) returns [], since "results" is not a key of the output of the API. and [] has a truthiness value of False.
Hence to correct those errors you need to get the correct field from the API into new_results, and extend vendors with new_results, which is itself an array. Note that on the last iteration, new_results will be the empty list, which means vendors won't be extended with any null value, and will contain exactly what you need:
This should look like:
import requests
import json
from requests.structures import CaseInsensitiveDict
url = "https://api.airtable.com/v0/CENSORED/Vendors?maxRecords=100"
headers = CaseInsensitiveDict()
headers["Authorization"] = "Bearer CENSORED"
resp = requests.get(url, headers=headers)
resp.content.decode("utf-8")
vendors = []
new_results = True
page = 1
while len(new_results) > 0:
centiblock = requests.get(url + f"&page={page}", headers=headers).json()
new_results = centiblock.get("records", [])
vendors.extend(new_results)
page += 1
full_directory = json.dumps(vendors, indent=4)
print(full_directory)
Note that I replaced the while new_results with a while len(new_results)>0 which is equivalent in this case, but more readable, and better practice in general.

Using data from one request for second url request in same script python

What I have so far is the first request gathering Id's. I would then like to use that return draftgroupid to insert into the second url request. Is it possible to send two requests in the same script, and if so how would I do a for i in range(draftgroupid) in the second url request?
import requests
import json
req1 = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
Output of draftgroupid:
16901
16905
16902
16903
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/THEVALUEIWANTTOLOOPTHROUGH/draftables?format=json")
EDIT
import csv
import requests
import json
req = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/" + str(draftgroupid) + "/draftables?format=json")
data2 = req2.json
for i, player_info in enumerate(data2['draftables'][0]):
date = player_info['competition']['startTime']
print(date)
Running into a TypeError: 'method' object is not subscriptable
As I understand, your problem is related to string manipulation rather than for the request library.
So basically,
import requests
import json
req1 = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/" + str(draftgroupid) + "/draftables?format=json")
should do the job.
More elegant ways to concatenate strings can be found at http://www.pythonforbeginners.com/concatenation/string-concatenation-and-formatting-in-python
Edit
For example,
"some string " + str(123)
"some string %d" % 123
"some string %s" % 123
Will all give the same output. There are more ways to concatenate strings. You just need to choose the best fit based on the context.
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/%d/draftables?format=json" % draftgroupid)
I assume you didn't actually mean for i in range(draftgroupid) as you stated in the question, because that would mean making 16901 requests, followed by 16905 requests (all of which except the last four would be duplicates of the first batch), followed by 16902 requests (of which all would be duplicates), etc.

Paginate the CSV (Python)

How can I paginate through the CSV version of an API call using Python?
I understand the metadata in the JSON call includes the total number of records, but without similar info in the CSV call I won't know where to stop my loop if I try to increment the page parameter.
Below is my code:
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.csv'
payload = {
'api_key': '4KC***UNKk',
'fields': 'school.name,2012.repayment.2_yr_default_rate',
'_page' : '0'
}
r = requests.get(url, params=payload)
df = pd.read_csv(r.url)
This loads a dataframe with the first 20 results, but I'd like to load a dataframe with all the results.
Utilize the &_per_page option parameter to edit the number of choices per call; Setting it to &_per_page=200 returns a CSV with 100 lines, so lets assume 100 is the maximum.
Now that we know the maximum per call, and we have the total calls, its possible to run a for loop to get what we need, like so:
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.csv'
apikey = '&api_key=xxx'
fields = '&_fields=school.name,2012.repayment.2_yr_default_rate'
pageA = '&_page='
pageTotal = '&_per_page='
pageNumbersMaximum = 10
rowSum = 200
for page in range(pageNumbersMaximum):
fullURL = url + pageA + str(page) + pageTotal + str(rowSum) + fields + apikey
print(fullURL)
print("Page Number: " + str(page) + ", Total Rows: " + str(rowSum))
rowSum += 200
That will loop through the results until it gets to 7000 total.

Manipulating CSV data stored as a string Python

I have an API string which responds with an XML page, and has my data stored as CSV in the "data" tag (I can request it in JSON format but I haven't been able to handle the data correctly in my Python script in that format).
<reports.getAccountsStatsResponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:com:gigya:api" xsi:schemaLocation="urn:com:gigya:api http://socialize-api.gigya.com/schema">
<statusCode>200</statusCode>
<errorCode>0</errorCode>
<statusReason>OK</statusReason>
<callId>ae1b3f13ba1c4e62ad3120afb1269c76</callId>
<time>2015-09-01T09:01:46.511Z</time>
<headers>
<header>date</header>
<header>initRegistrations</header>
<header>registrations</header>
<header>siteLogins</header>
<header>newUsers</header>
</headers>
<data xmlns:q1="http://www.w3.org/2001/XMLSchema" xsi:type="q1:string">
"date","initRegistrations","registrations","siteLogins","newUsers" "01/01/2015","0","0","0","0" "01/02/2015","0","0","0","0" "01/03/2015","0","0","0","0" "01/04/2015","0","0","0","0" "01/05/2015","0","0","0","0" "01/06/2015","0","0","0","0" "01/07/2015","0","0","0","0" "01/08/2015","0","0","0","0" "01/09/2015","0","0","0","0" "01/10/2015","0","0","0","0" "01/11/2015","0","0","0","0" "01/12/2015","0","0","0","0" "01/13/2015","0","0","0","0" "01/14/2015","0","0","0","0" "01/15/2015","0","0","0","0" "01/16/2015","0","0","0","0" "01/17/2015","0","0","0","0" "01/18/2015","0","0","0","0" "01/19/2015","0","0","0","0" "01/20/2015","34","34","72","34" "01/21/2015","33","23","58","23" "01/22/2015","19","19","49","19" "01/23/2015","21","21","50","21" "01/24/2015","1","1","2","1" "01/25/2015","0","0","0","0" "01/26/2015","8","4","49","4" "01/27/2015","8","8","35","8" "01/28/2015","4","2","16","2" "01/29/2015","7","7","27","7" "01/30/2015","69","58","516","58" "01/31/2015","9","6","76","6" "02/01/2015","0","0","2","0" "02/02/2015","304","203","2317","203" "02/03/2015","122","93","786","93" "02/04/2015","69","47","435","47" "02/05/2015","93","64","677","64" "02/06/2015","294","255","1327","255" "02/07/2015","0","0","0","0" "02/08/2015","0","0","0","0" "02/09/2015","0","0","3","0" "02/10/2015","1","0","1","0" "02/11/2015","3","3","7","3" "02/12/2015","0","0","0","0" "02/13/2015","2","2","4","2" "02/14/2015","0","0","1","0" "02/15/2015","0","0","0","0" "02/16/2015","0","0","0","0" "02/17/2015","3","3","7","3" "02/18/2015","0","0","0","0" "02/19/2015","1","1","3","1" "02/20/2015","3","3","10","3" "02/21/2015","0","0","0","0" "02/22/2015","0","0","1","0" "02/23/2015","1","1","4","1" "02/24/2015","0","0","1","0" "02/25/2015","0","0","0","0" "02/26/2015","0","0","0","0" "02/27/2015","0","0","1","0" "02/28/2015","1","1","2","1" "03/01/2015","1","1","3","1" "03/02/2015","19","9","348","9" "03/03/2015","14","9","132","9" "03/04/2015","4","4","41","4" "03/05/2015","8","5","101","5" "03/06/2015","6","5","71","5" "03/07/2015","8","4","42","4" "03/08/2015","7","4","45","4" "03/09/2015","5","4","30","4" "03/10/2015","7","7","39","7" "03/11/2015","9","9","41","9" "03/12/2015","1","1","20","1" "03/13/2015","3","3","26","3" "03/14/2015","2","0","21","0" "03/15/2015","3","3","28","3" "03/16/2015","3","3","38","3" "03/17/2015","4","4","43","4" "03/18/2015","5","3","45","3" "03/19/2015","19","16","108","16" "03/20/2015","11","8","96","8" "03/21/2015","276","261","807","261" "03/22/2015","197","192","604","192" "03/23/2015","0","0","3","0" "03/24/2015","1","1","4","1" "03/25/2015","181","166","401","166" "03/26/2015","124","109","265","109" "03/27/2015","53","47","124","47" "03/28/2015","41","39","99","39" "03/29/2015","75","65","173","65" "03/30/2015","249","239","536","239" "03/31/2015","222","212","487","212" "04/01/2015","40","29","394","29" "04/02/2015","16","10","132","10" "04/03/2015","13","10","125","10" "04/04/2015","6","4","49","4" "04/05/2015","2","1","46","1" "04/06/2015","4","3","38","3" "04/07/2015","1","0","32","0" "04/08/2015","4","2","16","2" "04/09/2015","9","8","30","8" "04/10/2015","31","29","96","29" "04/11/2015","17","14","90","14" "04/12/2015","10","7","46","7" "04/13/2015","19","13","69","13" "04/14/2015","63","58","199","58" "04/15/2015","17","16","58","16" "04/16/2015","13","12","41","12" "04/17/2015","7","5","51","5" "04/18/2015","51","46","165","46" "04/19/2015","51","45","179","45" "04/20/2015","28","21","110","21" "04/21/2015","32","24","290","24" "04/22/2015","47","31","329","31" "04/23/2015","30","27","183","27" "04/24/2015","71","65","284","65" "04/25/2015","25","17","268","17" "04/26/2015","26","24","268","24" "04/27/2015","72","67","172","67" "04/28/2015","28","25","96","25" "04/29/2015","72","48","159","48" "04/30/2015","50","22","136","22" "05/01/2015","33","23","126","23" "05/02/2015","22","17","112","17" "05/03/2015","31","21","169","21" "05/04/2015","29","21","182","21" "05/05/2015","12","10","24","10" "05/06/2015","369","354","790","354" "05/07/2015","409","401","839","401" "05/08/2015","258","253","539","253" "05/09/2015","227","221","469","221" "05/10/2015","138","134","297","134" "05/11/2015","14","13","32","13" "05/12/2015","57","24","452","24" "05/13/2015","23","12","300","12" "05/14/2015","7","5","70","5" "05/15/2015","7","6","15","6" "05/16/2015","3","3","7","3" "05/17/2015","3","3","8","3" "05/18/2015","2","4","4","2" "05/19/2015","10","16","24","8" "05/20/2015","4","8","10","4" "05/21/2015","7","12","14","6" "05/22/2015","9","14","33","7" "05/23/2015","9","14","19","7" "05/24/2015","16","32","39","16" "05/25/2015","11","9","21","7" "05/26/2015","23","16","87","16" "05/27/2015","30","24","87","24" "05/28/2015","12","12","39","12" "05/29/2015","14","12","37","12" "05/30/2015","8","7","19","7" "05/31/2015","5","4","17","4" "06/01/2015","10","10","31","10" "06/02/2015","23","20","95","20" "06/03/2015","11","9","31","9" "06/04/2015","14","13","36","13" "06/05/2015","12","11","27","11" "06/06/2015","8","6","20","6" "06/07/2015","9","9","21","9" "06/08/2015","16","16","37","16" "06/09/2015","24","17","40","17" "06/10/2015","8","8","34","8" "06/11/2015","46","27","464","27" "06/12/2015","45","23","383","23" "06/13/2015","12","9","143","9" "06/14/2015","22","15","112","15" "06/15/2015","14","13","74","13" "06/16/2015","63","56","197","56" "06/17/2015","28","25","114","25" "06/18/2015","17","15","85","15" "06/19/2015","143","135","460","135" "06/20/2015","54","46","217","46" "06/21/2015","60","55","211","55" "06/22/2015","91","78","249","78" "06/23/2015","99","87","295","87" "06/24/2015","115","103","315","103" "06/25/2015","455","380","964","380" "06/26/2015","585","489","1144","489" "06/27/2015","345","300","695","300" "06/28/2015","349","320","783","320" "06/29/2015","113","98","362","98" "06/30/2015","128","113","424","113" "07/01/2015","115","99","277","99" "07/02/2015","73","65","323","65" "07/03/2015","22","16","184","16" "07/04/2015","13","12","69","12" "07/05/2015","15","12","71","12" "07/06/2015","31","25","107","25" "07/07/2015","15","10","63","10" "07/08/2015","16","12","60","12" "07/09/2015","35","32","103","32" "07/10/2015","22","19","72","19" "07/11/2015","7","7","25","7" "07/12/2015","4","4","27","4" "07/13/2015","81","73","195","73" "07/14/2015","60","53","157","53" "07/15/2015","44","40","115","40" "07/16/2015","40","40","112","40" "07/17/2015","27","23","64","23" "07/18/2015","15","11","56","11" "07/19/2015","19","14","63","14" "07/20/2015","21","17","48","17" "07/21/2015","11","10","30","10" "07/22/2015","13","12","40","12" "07/23/2015","9","6","43","6" "07/24/2015","9","8","32","8" "07/25/2015","8","5","20","5" "07/26/2015","20","18","64","18" "07/27/2015","15","14","80","14" "07/28/2015","9","8","48","8" "07/29/2015","21","13","88","13" "07/30/2015","9","5","92","5" "07/31/2015","4","3","81","3" "08/01/2015","4","3","23","3" "08/02/2015","11","5","29","5" "08/03/2015","19","17","50","17" "08/04/2015","15","10","32","10" "08/05/2015","14","9","31","9" "08/06/2015","26","5","338","5" "08/07/2015","22","13","182","13" "08/08/2015","9","7","72","7" "08/09/2015","7","4","58","4" "08/10/2015","17","14","88","14" "08/11/2015","23","17","100","17" "08/12/2015","20","20","62","20" "08/13/2015","23","21","81","21" "08/14/2015","30","26","136","26" "08/15/2015","12","7","59","7" "08/16/2015","12","8","61","8" "08/17/2015","68","46","331","46" "08/18/2015","72","48","327","48" "08/19/2015","149","75","542","75" "08/20/2015","95","59","358","59" "08/21/2015","93","54","342","54" "08/22/2015","69","40","300","40" "08/23/2015","150","103","505","103" "08/24/2015","39","30","105","30"
</data>
</reports.getAccountsStatsResponse>
And in JSON format:
{
"statusCode": 200,
"errorCode": 0,
"statusReason": "OK",
"callId": "99949da72d034b04ba910c91704ba4c0",
"time": "2015-09-01T09:19:30.569Z",
"headers": [
"date",
"initRegistrations",
"registrations",
"siteLogins",
"newUsers"
],
"data": "\"date\",\"initRegistrations\",\"registrations\",\"siteLogins\",\"newUsers\"\r\n\"01/01/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/02/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/03/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/04/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/05/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/06/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/07/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/08/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/09/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/10/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/11/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/12/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/13/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/14/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/15/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/16/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/17/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/18/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/19/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/20/2015\",\"34\",\"34\",\"72\",\"34\"\r\n\"01/21/2015\",\"33\",\"23\",\"58\",\"23\"\r\n\"01/22/2015\",\"19\",\"19\",\"49\",\"19\"\r\n\"01/23/2015\",\"21\",\"21\",\"50\",\"21\"\r\n\"01/24/2015\",\"1\",\"1\",\"2\",\"1\"\r\n\"01/25/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"01/26/2015\",\"8\",\"4\",\"49\",\"4\"\r\n\"01/27/2015\",\"8\",\"8\",\"35\",\"8\"\r\n\"01/28/2015\",\"4\",\"2\",\"16\",\"2\"\r\n\"01/29/2015\",\"7\",\"7\",\"27\",\"7\"\r\n\"01/30/2015\",\"69\",\"58\",\"516\",\"58\"\r\n\"01/31/2015\",\"9\",\"6\",\"76\",\"6\"\r\n\"02/01/2015\",\"0\",\"0\",\"2\",\"0\"\r\n\"02/02/2015\",\"304\",\"203\",\"2317\",\"203\"\r\n\"02/03/2015\",\"122\",\"93\",\"786\",\"93\"\r\n\"02/04/2015\",\"69\",\"47\",\"435\",\"47\"\r\n\"02/05/2015\",\"93\",\"64\",\"677\",\"64\"\r\n\"02/06/2015\",\"294\",\"255\",\"1327\",\"255\"\r\n\"02/07/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/08/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/09/2015\",\"0\",\"0\",\"3\",\"0\"\r\n\"02/10/2015\",\"1\",\"0\",\"1\",\"0\"\r\n\"02/11/2015\",\"3\",\"3\",\"7\",\"3\"\r\n\"02/12/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/13/2015\",\"2\",\"2\",\"4\",\"2\"\r\n\"02/14/2015\",\"0\",\"0\",\"1\",\"0\"\r\n\"02/15/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/16/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/17/2015\",\"3\",\"3\",\"7\",\"3\"\r\n\"02/18/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/19/2015\",\"1\",\"1\",\"3\",\"1\"\r\n\"02/20/2015\",\"3\",\"3\",\"10\",\"3\"\r\n\"02/21/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/22/2015\",\"0\",\"0\",\"1\",\"0\"\r\n\"02/23/2015\",\"1\",\"1\",\"4\",\"1\"\r\n\"02/24/2015\",\"0\",\"0\",\"1\",\"0\"\r\n\"02/25/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/26/2015\",\"0\",\"0\",\"0\",\"0\"\r\n\"02/27/2015\",\"0\",\"0\",\"1\",\"0\"\r\n\"02/28/2015\",\"1\",\"1\",\"2\",\"1\"\r\n\"03/01/2015\",\"1\",\"1\",\"3\",\"1\"\r\n\"03/02/2015\",\"19\",\"9\",\"348\",\"9\"\r\n\"03/03/2015\",\"14\",\"9\",\"132\",\"9\"\r\n\"03/04/2015\",\"4\",\"4\",\"41\",\"4\"\r\n\"03/05/2015\",\"8\",\"5\",\"101\",\"5\"\r\n\"03/06/2015\",\"6\",\"5\",\"71\",\"5\"\r\n\"03/07/2015\",\"8\",\"4\",\"42\",\"4\"\r\n\"03/08/2015\",\"7\",\"4\",\"45\",\"4\"\r\n\"03/09/2015\",\"5\",\"4\",\"30\",\"4\"\r\n\"03/10/2015\",\"7\",\"7\",\"39\",\"7\"\r\n\"03/11/2015\",\"9\",\"9\",\"41\",\"9\"\r\n\"03/12/2015\",\"1\",\"1\",\"20\",\"1\"\r\n\"03/13/2015\",\"3\",\"3\",\"26\",\"3\"\r\n\"03/14/2015\",\"2\",\"0\",\"21\",\"0\"\r\n\"03/15/2015\",\"3\",\"3\",\"28\",\"3\"\r\n\"03/16/2015\",\"3\",\"3\",\"38\",\"3\"\r\n\"03/17/2015\",\"4\",\"4\",\"43\",\"4\"\r\n\"03/18/2015\",\"5\",\"3\",\"45\",\"3\"\r\n\"03/19/2015\",\"19\",\"16\",\"108\",\"16\"\r\n\"03/20/2015\",\"11\",\"8\",\"96\",\"8\"\r\n\"03/21/2015\",\"276\",\"261\",\"807\",\"261\"\r\n\"03/22/2015\",\"197\",\"192\",\"604\",\"192\"\r\n\"03/23/2015\",\"0\",\"0\",\"3\",\"0\"\r\n\"03/24/2015\",\"1\",\"1\",\"4\",\"1\"\r\n\"03/25/2015\",\"181\",\"166\",\"401\",\"166\"\r\n\"03/26/2015\",\"124\",\"109\",\"265\",\"109\"\r\n\"03/27/2015\",\"53\",\"47\",\"124\",\"47\"\r\n\"03/28/2015\",\"41\",\"39\",\"99\",\"39\"\r\n\"03/29/2015\",\"75\",\"65\",\"173\",\"65\"\r\n\"03/30/2015\",\"249\",\"239\",\"536\",\"239\"\r\n\"03/31/2015\",\"222\",\"212\",\"487\",\"212\"\r\n\"04/01/2015\",\"40\",\"29\",\"394\",\"29\"\r\n\"04/02/2015\",\"16\",\"10\",\"132\",\"10\"\r\n\"04/03/2015\",\"13\",\"10\",\"125\",\"10\"\r\n\"04/04/2015\",\"6\",\"4\",\"49\",\"4\"\r\n\"04/05/2015\",\"2\",\"1\",\"46\",\"1\"\r\n\"04/06/2015\",\"4\",\"3\",\"38\",\"3\"\r\n\"04/07/2015\",\"1\",\"0\",\"32\",\"0\"\r\n\"04/08/2015\",\"4\",\"2\",\"16\",\"2\"\r\n\"04/09/2015\",\"9\",\"8\",\"30\",\"8\"\r\n\"04/10/2015\",\"31\",\"29\",\"96\",\"29\"\r\n\"04/11/2015\",\"17\",\"14\",\"90\",\"14\"\r\n\"04/12/2015\",\"10\",\"7\",\"46\",\"7\"\r\n\"04/13/2015\",\"19\",\"13\",\"69\",\"13\"\r\n\"04/14/2015\",\"63\",\"58\",\"199\",\"58\"\r\n\"04/15/2015\",\"17\",\"16\",\"58\",\"16\"\r\n\"04/16/2015\",\"13\",\"12\",\"41\",\"12\"\r\n\"04/17/2015\",\"7\",\"5\",\"51\",\"5\"\r\n\"04/18/2015\",\"51\",\"46\",\"165\",\"46\"\r\n\"04/19/2015\",\"51\",\"45\",\"179\",\"45\"\r\n\"04/20/2015\",\"28\",\"21\",\"110\",\"21\"\r\n\"04/21/2015\",\"32\",\"24\",\"290\",\"24\"\r\n\"04/22/2015\",\"47\",\"31\",\"329\",\"31\"\r\n\"04/23/2015\",\"30\",\"27\",\"183\",\"27\"\r\n\"04/24/2015\",\"71\",\"65\",\"284\",\"65\"\r\n\"04/25/2015\",\"25\",\"17\",\"268\",\"17\"\r\n\"04/26/2015\",\"26\",\"24\",\"268\",\"24\"\r\n\"04/27/2015\",\"72\",\"67\",\"172\",\"67\"\r\n\"04/28/2015\",\"28\",\"25\",\"96\",\"25\"\r\n\"04/29/2015\",\"72\",\"48\",\"159\",\"48\"\r\n\"04/30/2015\",\"50\",\"22\",\"136\",\"22\"\r\n\"05/01/2015\",\"33\",\"23\",\"126\",\"23\"\r\n\"05/02/2015\",\"22\",\"17\",\"112\",\"17\"\r\n\"05/03/2015\",\"31\",\"21\",\"169\",\"21\"\r\n\"05/04/2015\",\"29\",\"21\",\"182\",\"21\"\r\n\"05/05/2015\",\"12\",\"10\",\"24\",\"10\"\r\n\"05/06/2015\",\"369\",\"354\",\"790\",\"354\"\r\n\"05/07/2015\",\"409\",\"401\",\"839\",\"401\"\r\n\"05/08/2015\",\"258\",\"253\",\"539\",\"253\"\r\n\"05/09/2015\",\"227\",\"221\",\"469\",\"221\"\r\n\"05/10/2015\",\"138\",\"134\",\"297\",\"134\"\r\n\"05/11/2015\",\"14\",\"13\",\"32\",\"13\"\r\n\"05/12/2015\",\"57\",\"24\",\"452\",\"24\"\r\n\"05/13/2015\",\"23\",\"12\",\"300\",\"12\"\r\n\"05/14/2015\",\"7\",\"5\",\"70\",\"5\"\r\n\"05/15/2015\",\"7\",\"6\",\"15\",\"6\"\r\n\"05/16/2015\",\"3\",\"3\",\"7\",\"3\"\r\n\"05/17/2015\",\"3\",\"3\",\"8\",\"3\"\r\n\"05/18/2015\",\"2\",\"4\",\"4\",\"2\"\r\n\"05/19/2015\",\"10\",\"16\",\"24\",\"8\"\r\n\"05/20/2015\",\"4\",\"8\",\"10\",\"4\"\r\n\"05/21/2015\",\"7\",\"12\",\"14\",\"6\"\r\n\"05/22/2015\",\"9\",\"14\",\"33\",\"7\"\r\n\"05/23/2015\",\"9\",\"14\",\"19\",\"7\"\r\n\"05/24/2015\",\"16\",\"32\",\"39\",\"16\"\r\n\"05/25/2015\",\"11\",\"9\",\"21\",\"7\"\r\n\"05/26/2015\",\"23\",\"16\",\"87\",\"16\"\r\n\"05/27/2015\",\"30\",\"24\",\"87\",\"24\"\r\n\"05/28/2015\",\"12\",\"12\",\"39\",\"12\"\r\n\"05/29/2015\",\"14\",\"12\",\"37\",\"12\"\r\n\"05/30/2015\",\"8\",\"7\",\"19\",\"7\"\r\n\"05/31/2015\",\"5\",\"4\",\"17\",\"4\"\r\n\"06/01/2015\",\"10\",\"10\",\"31\",\"10\"\r\n\"06/02/2015\",\"23\",\"20\",\"95\",\"20\"\r\n\"06/03/2015\",\"11\",\"9\",\"31\",\"9\"\r\n\"06/04/2015\",\"14\",\"13\",\"36\",\"13\"\r\n\"06/05/2015\",\"12\",\"11\",\"27\",\"11\"\r\n\"06/06/2015\",\"8\",\"6\",\"20\",\"6\"\r\n\"06/07/2015\",\"9\",\"9\",\"21\",\"9\"\r\n\"06/08/2015\",\"16\",\"16\",\"37\",\"16\"\r\n\"06/09/2015\",\"24\",\"17\",\"40\",\"17\"\r\n\"06/10/2015\",\"8\",\"8\",\"34\",\"8\"\r\n\"06/11/2015\",\"46\",\"27\",\"464\",\"27\"\r\n\"06/12/2015\",\"45\",\"23\",\"383\",\"23\"\r\n\"06/13/2015\",\"12\",\"9\",\"143\",\"9\"\r\n\"06/14/2015\",\"22\",\"15\",\"112\",\"15\"\r\n\"06/15/2015\",\"14\",\"13\",\"74\",\"13\"\r\n\"06/16/2015\",\"63\",\"56\",\"197\",\"56\"\r\n\"06/17/2015\",\"28\",\"25\",\"114\",\"25\"\r\n\"06/18/2015\",\"17\",\"15\",\"85\",\"15\"\r\n\"06/19/2015\",\"143\",\"135\",\"460\",\"135\"\r\n\"06/20/2015\",\"54\",\"46\",\"217\",\"46\"\r\n\"06/21/2015\",\"60\",\"55\",\"211\",\"55\"\r\n\"06/22/2015\",\"91\",\"78\",\"249\",\"78\"\r\n\"06/23/2015\",\"99\",\"87\",\"295\",\"87\"\r\n\"06/24/2015\",\"115\",\"103\",\"315\",\"103\"\r\n\"06/25/2015\",\"455\",\"380\",\"964\",\"380\"\r\n\"06/26/2015\",\"585\",\"489\",\"1144\",\"489\"\r\n\"06/27/2015\",\"345\",\"300\",\"695\",\"300\"\r\n\"06/28/2015\",\"349\",\"320\",\"783\",\"320\"\r\n\"06/29/2015\",\"113\",\"98\",\"362\",\"98\"\r\n\"06/30/2015\",\"128\",\"113\",\"424\",\"113\"\r\n\"07/01/2015\",\"115\",\"99\",\"277\",\"99\"\r\n\"07/02/2015\",\"73\",\"65\",\"323\",\"65\"\r\n\"07/03/2015\",\"22\",\"16\",\"184\",\"16\"\r\n\"07/04/2015\",\"13\",\"12\",\"69\",\"12\"\r\n\"07/05/2015\",\"15\",\"12\",\"71\",\"12\"\r\n\"07/06/2015\",\"31\",\"25\",\"107\",\"25\"\r\n\"07/07/2015\",\"15\",\"10\",\"63\",\"10\"\r\n\"07/08/2015\",\"16\",\"12\",\"60\",\"12\"\r\n\"07/09/2015\",\"35\",\"32\",\"103\",\"32\"\r\n\"07/10/2015\",\"22\",\"19\",\"72\",\"19\"\r\n\"07/11/2015\",\"7\",\"7\",\"25\",\"7\"\r\n\"07/12/2015\",\"4\",\"4\",\"27\",\"4\"\r\n\"07/13/2015\",\"81\",\"73\",\"195\",\"73\"\r\n\"07/14/2015\",\"60\",\"53\",\"157\",\"53\"\r\n\"07/15/2015\",\"44\",\"40\",\"115\",\"40\"\r\n\"07/16/2015\",\"40\",\"40\",\"112\",\"40\"\r\n\"07/17/2015\",\"27\",\"23\",\"64\",\"23\"\r\n\"07/18/2015\",\"15\",\"11\",\"56\",\"11\"\r\n\"07/19/2015\",\"19\",\"14\",\"63\",\"14\"\r\n\"07/20/2015\",\"21\",\"17\",\"48\",\"17\"\r\n\"07/21/2015\",\"11\",\"10\",\"30\",\"10\"\r\n\"07/22/2015\",\"13\",\"12\",\"40\",\"12\"\r\n\"07/23/2015\",\"9\",\"6\",\"43\",\"6\"\r\n\"07/24/2015\",\"9\",\"8\",\"32\",\"8\"\r\n\"07/25/2015\",\"8\",\"5\",\"20\",\"5\"\r\n\"07/26/2015\",\"20\",\"18\",\"64\",\"18\"\r\n\"07/27/2015\",\"15\",\"14\",\"80\",\"14\"\r\n\"07/28/2015\",\"9\",\"8\",\"48\",\"8\"\r\n\"07/29/2015\",\"21\",\"13\",\"88\",\"13\"\r\n\"07/30/2015\",\"9\",\"5\",\"92\",\"5\"\r\n\"07/31/2015\",\"4\",\"3\",\"81\",\"3\"\r\n\"08/01/2015\",\"4\",\"3\",\"23\",\"3\"\r\n\"08/02/2015\",\"11\",\"5\",\"29\",\"5\"\r\n\"08/03/2015\",\"19\",\"17\",\"50\",\"17\"\r\n\"08/04/2015\",\"15\",\"10\",\"32\",\"10\"\r\n\"08/05/2015\",\"14\",\"9\",\"31\",\"9\"\r\n\"08/06/2015\",\"26\",\"5\",\"338\",\"5\"\r\n\"08/07/2015\",\"22\",\"13\",\"182\",\"13\"\r\n\"08/08/2015\",\"9\",\"7\",\"72\",\"7\"\r\n\"08/09/2015\",\"7\",\"4\",\"58\",\"4\"\r\n\"08/10/2015\",\"17\",\"14\",\"88\",\"14\"\r\n\"08/11/2015\",\"23\",\"17\",\"100\",\"17\"\r\n\"08/12/2015\",\"20\",\"20\",\"62\",\"20\"\r\n\"08/13/2015\",\"23\",\"21\",\"81\",\"21\"\r\n\"08/14/2015\",\"30\",\"26\",\"136\",\"26\"\r\n\"08/15/2015\",\"12\",\"7\",\"59\",\"7\"\r\n\"08/16/2015\",\"12\",\"8\",\"61\",\"8\"\r\n\"08/17/2015\",\"68\",\"46\",\"331\",\"46\"\r\n\"08/18/2015\",\"72\",\"48\",\"327\",\"48\"\r\n\"08/19/2015\",\"149\",\"75\",\"542\",\"75\"\r\n\"08/20/2015\",\"95\",\"59\",\"358\",\"59\"\r\n\"08/21/2015\",\"93\",\"54\",\"342\",\"54\"\r\n\"08/22/2015\",\"69\",\"40\",\"300\",\"40\"\r\n\"08/23/2015\",\"150\",\"103\",\"505\",\"103\"\r\n\"08/24/2015\",\"39\",\"30\",\"105\",\"30\"\r\n"
}
Firstly, I would like to store the text from the "data" tag by referencing the name of it, but I've currently only had success by using this following:
response = requests.get(url)
root = ElementTree.fromstring(response.content)
dataString = root[6].text
Is there a separate command to be able to specify the name of the tag?
Next, my goal is to loop through different URL's (which correspond to different accounts), and append the name of those accounts to the end of the data. Is this possible, given that the data is stored as a string and I would need to add it to the end of each row? As a follow up, what's the best convention for saving multiple values in a variable to be able to loop through i.e. the list of accounts?
Apologies if this is unclear, I'm happy to provide any more information if it means anybody can help.
As far as I understood, you have a specific URL for each user and you want to collect data for all users given.
However, since you are not able to get the username out of the response you have to combine the response with the username corresponding to the URL the request was sent to. If so, you could use a dictionary to store the data of your response since the JSON-format is equivalent to Python's dictionary.
The code below simply iterates through a set of tuples containing the different user names and the corresponding URL. For each URL a request is sent, the data is extracted from the JSON-formatted response and stored in a dictionary with the username as a key. This dictionary is then stored (.update()) in a kind of main dictionary containing all your collected datasets.
# replace names 'url_xyz' with corresponding names and url
users = {('Albert', 'url_albert'), ('Steven', 'url_steven'), ('Mike', 'url_mike')}
all_data = dict()
for name, url in users:
response = requests.get(url)
data = response['data'].replace('\"', '')
all_data.update({name: data})
Thank you Albert.
Your JSON suggestion let me control the data in a much better way. The code below is what I ended up with to get to my desired output. Now just to work out how to convert the date from MM/DD/YYYY into DD/MM/YYYY.
startDate = '2015-01-01' # Must be in format YYYY-MM-DD
endDate = '2015-12-31' # Must be in format YYYY-MM-DD
dimensions = 'date' # Available dimensions are 'date' and 'cid'
format = 'json'
dataFormat = 'json'
measures = 'initRegistrations,registrations,siteLogins,newUsers'
allData = []
# Construct API URL
for i in range(0,len(apiKey)):
url = ('https://reports.eu1.gigya.com/reports.getAccountsStats?secret=' + secret + '&apiKey=' + apiKey[i] + \
'&uid=' + uid + '&startDate=' + startDate + '&endDate=' + endDate + '&dimensions='+ dimensions +\
'&measures=' + measures + '&format=' + format + '&dataFormat=' + dataFormat)
response = requests.get(url)
json = response.json()
data = json['data']
if i == 0:
headers = json['headers']
headers.append('brand')
for x in range(0,len(data)):
data[x].append(brand[i])
brandData = [headers] + data
else:
for x in range(0,len(data)):
data[x].append(brand[i])
brandData = data
allData += brandData
with open("testDataJSON.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(allData)
I don't know how well this follows best practice for Python but as I said, I am very new to it.

Performing a Google search for .pdf and .ppt using keywords from a file in python

I am working on a program which performs a google search for .pdf and .ppt files. Currently I'm manually giving the keywords as inputs in my program. What I want to do is perform an automated search for both .pdf and .ppt files.
Suppose i have file.txt containing keywords:
python
android
parser
I want my program to automatically take these keywords one by one and search for both .pdf and .ppt files.
import urllib2
import urllib
import json as m_json
def getgoogleurl(search,siteurl=False):
if siteurl==False:
return 'http://www.google.com/search?q='+urllib2.quote(search)+'&oq='+urllib2.quote(search)
else:
return 'http://www.google.com/search?q=site:'+urllib2.quote(siteurl)+'%20'+urllib2.quote(search)+'&oq=site:'+urllib2.quote(siteurl)+'%20'+urllib2.quote(search)
def getgooglelinks(search,siteurl=False):
#google returns 403 without user agent
headers = {'User-agent':'Mozilla/11.0'}
req = urllib2.Request(getgoogleurl(search,siteurl),None,headers)
site = urllib2.urlopen(req)
data = site.read()
site.close()
start = data.find('<div id="res">')
end = data.find('<div id="foot">')
if data[start:end]=='':
#error, no links to find
return False
else:
links =[]
data = data[start:end]
start = 0
end = 0
while start>-1 and end>-1:
#get only results of the provided site
if siteurl==False:
start = data.find('<a href="/url?q=')
else:
start = data.find('<a href="/url?q='+str(siteurl))
data = data[start+len('<a href="/url?q='):]
end = data.find('&sa=U&ei=')
if start>-1 and end>-1:
link = urllib2.unquote(data[0:end])
data = data[end:len(data)]
if link.find('http')==0:
links.append(link)
return links
keyword1 =raw_input('Enter the keyword as keyword+filetype: \n eg:python filetype:pdf' )
links = getgooglelinks(keyword1,'http://www.google.com/')
for link in links:
print link
query = raw_input ( 'Query: ' )
query = urllib.urlencode ( { 'q' : query } )
response = urllib.urlopen ( 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&' + query ).read()
json = m_json.loads ( response )
results = json [ 'responseData' ] [ 'results' ]
for result in results:
title = result['title']
url = result['url']
print ( title + '; ' + url )
This is the code i'm working on. I tried using beautiful soup library; but it didn't work.
So you are manually searching as in typing in the query to the program?
If that's what you are doing then you are almost there. You just have to do some basic file operations and instead of passing the user inserted query you will be passing each line in the file as query.
Make sure it is in string format such as str(retrieved_data_from_file) also with dictionary entries maybe like mydict = {'q' : "'"+str(retrieved_data_from_file)+"'" }
and close file after.

Categories