Nested while loop for API json collection - python

I'm requesting 590 pages from the Meetup API. I've iterated with a while loop to get the pages. Now that I have the pages I need to request this pages and format them correctly as python in order to place into a Pandas dataframe.
This is how it looks when you do it for one url :
url = ('https://api.meetup.com/2/groups?offset=1&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3')
r = requests.get(url).json()
data = pd.io.json.json_normalize(r['results'])
But because I have so many pages I want to do this automatically and iterate through them all.
That's how nested while loops came to mind and this is what I tried:
urls = 0
offset = 0
url = 'https://api.meetup.com/2/groups?offset=%d&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3'
r = requests.get(urls%d = 'https://api.meetup.com/2/groups?offset=%d&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3').json()
while urlx < 591:
new_url = r % urls % offset
print(new_url)
offset += 1
However, it isn't working and I'm receiving many errors including this one:
SyntaxError: keyword can't be an expression

Not sure what you're trying to do, and the code has lots of issues.
But if you just want to loop through 0 to 591 and fetch URLs, then here's the code:
import requests
import pandas as pd
dfs = []
base_url = 'https://api.meetup.com/2/groups?offset=%d&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3'
for i in range(0, 592):
url = base_url % i
r = requests.get(url).json()
print("Fetching URL: %s\n" % url)
# do something with r here
# here I'll append it to a list of dfs
dfs.append(pd.io.json.json_normalize(r['results']))

Related

Python Get Request All Pages Movie list

While using below snippet it is not returning values of Page, Total page and data.
Also not returning the value of function "getMovieTitles".
import request
import json
def getMovieTitles(substr):
titles = []
url = "https://jsonmock.hackerrank.com/api/movies/search/?Title={}'.format(substr)"
data = requests.get(url)
print(data)
response = json.loads(data.content.decode('utf-8'))
print(data.content)
for page in range(0, response['total_pages']):
page_response = requests.get("https://jsonmock.hackerrank.com/api/movies/search/?Title={}}&page={}".format(substr, page + 1))
page_content = json.loads(page_response.content.decode('utf-8'))
print ('page_content', page_content, 'type(page_content)', type(page_content))
for item in range(0, len(page_content['data'])):
titles.append(str(page_content['data'][item]['Title']))
titles.sort()
return titles
print(getMovieTitles('Superman'))
You're not formatting the url string correctly.
url = "https://jsonmock.hackerrank.com/api/movies/search/?Title={}'.format(substr)"
format() is a method of string and you've put it inside of the url string, instead do:
url = "https://jsonmock.hackerrank.com/api/movies/search/?Title={}".format(substr)
First, import
import requests
The problem is in your string formatting
' instead of "
url = "https://jsonmock.hackerrank.com/api/movies/search/?Title={}".format(substr)
and one } too much
page_response = requests.get("https://jsonmock.hackerrank.com/api/movies/search/?Title={}&page={}".format(substr, page + 1))

Retrieving items in a loop

I'm trying to retrieve JSON data from https://www.fruityvice.com/api/fruit/.
So, I'm creating a function to do that, but as return i've got only 1 fruit.
import requests
import json
def scrape_all_fruits():
for ID in range(1, 10):
url = f'https://www.fruityvice.com/api/fruit/{ID}'
response = requests.get(url)
data = response.json()
return data
print(scrape_all_fruits())
Can anyone explain to me, why ?

how to export the Web scraping data into csv by python

i am web scraped the data by Beautifulsoup and printing the data. now i want the import to be imported to excel/csv my program below.i am new to python need help there are multiple pages that i have scraped now i need to export them to csv/excel
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
from bs4 import BeautifulSoup as bs
def scrape_bid_data():
page_no = 1 #initial page number
while True:
print('Hold on creating URL to fetch data...')
URL = 'https://bidplus.gem.gov.in/bidlists?bidlists&page_no=' + str(page_no) #create dynamic URL
print('URL cerated: ' + URL)
scraped_data = requests.get(URL,verify=False) # request to get the data
soup_data = bs(scraped_data.text, 'lxml') #parse the scraped data using lxml
extracted_data = soup_data.find('div',{'id':'pagi_content'}) #find divs which contains required data
if len(extracted_data) == 0: # **if block** which will check the length of extracted_data if it is 0 then quit and stop the further execution of script.
break
else:
for idx in range(len(extracted_data)): # loops through all the divs and extract and print data
if(idx % 2 == 1): #get data from odd indexes only because we have required data on odd indexes
bid_data = extracted_data.contents[idx].text.strip().split('\n')
print('-' * 100)
print(bid_data[0]) #BID number
print(bid_data[5]) #Items
print(bid_data[6]) #Quantitiy Required
print(bid_data[10] + bid_data[12].strip()) #Department name and address
print(bid_data[16]) #Start date
print(bid_data[17]) #End date
print('-' * 100)
page_no +=1 #increments the page number by 1
scrape_bid_data()
data is coming in the form like this below:
You can use pandas
pip install pandas
obj can be
bid_data = []
for obj in list:
obj= {
"bid_data_0" :bid_data[0],
"bid_data_5" :bid_data[5],
"bid_data_6" :bid_data[6],
"bid_data_10" :bid_data[10],
"bid_data_12" :bid_data[12].strip(),
"bid_data_17" :bid_data_17,
}
bid_data.append(obj)
you can format bid_data to dict obj and in that object add only required field
import pandas as pd
bid_data = pd.DataFrame(bid_data)
bid_data.to_csv("file_name.csv", index=True, encoding='utf-8')
it is the simplest method I have ever used for exporting data to csv.
Let me know if encounter any problem

While loop page iteration for Meetup API not working

I'm trying to iterate through the pages of this Meetup API but I am receiving an error:
url = 'https://api.meetup.com/2/groups?offset=1&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3'
while url:
data = requests.get(url).json()
url2 = data['meta'].get('next')
data2 = pd.io.json.json_normalize(data['results'])
print(data2)
However, when I write it as;
while url:
data = requests.get(url).json()
print(data)
url2 = data['meta'].get('next')
data2 = pd.io.json.json_normalize(data['results'])
It comes out as a list that keeps iterating it's self but I don't know if it's looping through the same page or not.
I also need to use this ["offset"] += 1 somehow but fon't know where to place it
there is also a parameter page that you can use in your api call.
page = 1
url = '<base_url>&page=%d'
while page < 590:
new_url = url % page
# fetch new_url and do your magic
....
page += 1

Python Avoid Nested For Loop

I am new to python programming and I am getting my hands dirty by working on a pet project.
I have tried a lot to avoid these nested for loops, but no success.
Avoiding nested for loops
Returns values from a for loop in python
import requests
import json
r = requests.get('https://api.coinmarketcap.com/v1/ticker/')
j = r.json()
for item in j:
item['id']
n = item['id']
url = 'https://api.coinmarketcap.com/v1/ticker/%s' %n
req = requests.get(url)
js = req.json()
for cool in js:
print n
print cool['rank']
Please let me know if more information is needed.
Question
I have too many loops in loops and want a python way of cleaning it up
Answer
Yes, there is a python way of cleaning up loops-in-loops to make it look better but there will still be loops-in-loops under-the-covers.
import requests
import json
r = requests.get('https://api.coinmarketcap.com/v1/ticker/')
j = r.json()
id_list = [item['id'] for item in j]
for n in id_list:
url = 'https://api.coinmarketcap.com/v1/ticker/%s' %n
req = requests.get(url)
js = req.json()
print "\n".join([ n+"\n"+item['rank'] for item in js ])
Insight from running this
After running this specific code, I realize that your are actually first retrieving the list of tickers in order of rank using
r = requests.get('https://api.coinmarketcap.com/v1/ticker/')
and then using
url = 'https://api.coinmarketcap.com/v1/ticker/%s' %n
to get the rank.
So long as the https://api.coinmarketcap.com/v1/ticker/ continues to return the items in order of rank you could simplify your code like so
import requests
import json
r = requests.get('https://api.coinmarketcap.com/v1/ticker/')
j = r.json()
id_list = [item['id'] for item in j]
result = zip(id_list,range(1,len(id_list)+1) )
for item in result :
print item[0]
print item[1]
Answer to addition question
Addition question : What if I want one more parameter say price_usd? ..... for cool in js: print n print cool['rank'] print cool['price_usd']
Answer :
change the line
print "\n".join([ n+"\n"+item['rank'] for item in js ])
to
print "\n".join([ n+"\n"+item['rank']+"\n"+cool['price_usd'] for item in js ])
Your first request already gets you everything you need.
import requests
import json
response = requests.get('https://api.coinmarketcap.com/v1/ticker/')
coin_data = response.json()
for coin in coin_data:
print coin['id'] # "bitcoin", "ethereum", ...
print coin['rank'] # "1", "2", ...
print coin['price_usd'] # "2834.75", "276.495", ...

Categories