I'm requesting 590 pages from the Meetup API. I've iterated with a while loop to get the pages. Now that I have the pages I need to request this pages and format them correctly as python in order to place into a Pandas dataframe.
This is how it looks when you do it for one url :
url = ('https://api.meetup.com/2/groups?offset=1&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3')
r = requests.get(url).json()
data = pd.io.json.json_normalize(r['results'])
But because I have so many pages I want to do this automatically and iterate through them all.
That's how nested while loops came to mind and this is what I tried:
urls = 0
offset = 0
url = 'https://api.meetup.com/2/groups?offset=%d&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3'
r = requests.get(urls%d = 'https://api.meetup.com/2/groups?offset=%d&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3').json()
while urlx < 591:
new_url = r % urls % offset
print(new_url)
offset += 1
However, it isn't working and I'm receiving many errors including this one:
SyntaxError: keyword can't be an expression
Not sure what you're trying to do, and the code has lots of issues.
But if you just want to loop through 0 to 591 and fetch URLs, then here's the code:
import requests
import pandas as pd
dfs = []
base_url = 'https://api.meetup.com/2/groups?offset=%d&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3'
for i in range(0, 592):
url = base_url % i
r = requests.get(url).json()
print("Fetching URL: %s\n" % url)
# do something with r here
# here I'll append it to a list of dfs
dfs.append(pd.io.json.json_normalize(r['results']))
Related
While using below snippet it is not returning values of Page, Total page and data.
Also not returning the value of function "getMovieTitles".
import request
import json
def getMovieTitles(substr):
titles = []
url = "https://jsonmock.hackerrank.com/api/movies/search/?Title={}'.format(substr)"
data = requests.get(url)
print(data)
response = json.loads(data.content.decode('utf-8'))
print(data.content)
for page in range(0, response['total_pages']):
page_response = requests.get("https://jsonmock.hackerrank.com/api/movies/search/?Title={}}&page={}".format(substr, page + 1))
page_content = json.loads(page_response.content.decode('utf-8'))
print ('page_content', page_content, 'type(page_content)', type(page_content))
for item in range(0, len(page_content['data'])):
titles.append(str(page_content['data'][item]['Title']))
titles.sort()
return titles
print(getMovieTitles('Superman'))
You're not formatting the url string correctly.
url = "https://jsonmock.hackerrank.com/api/movies/search/?Title={}'.format(substr)"
format() is a method of string and you've put it inside of the url string, instead do:
url = "https://jsonmock.hackerrank.com/api/movies/search/?Title={}".format(substr)
First, import
import requests
The problem is in your string formatting
' instead of "
url = "https://jsonmock.hackerrank.com/api/movies/search/?Title={}".format(substr)
and one } too much
page_response = requests.get("https://jsonmock.hackerrank.com/api/movies/search/?Title={}&page={}".format(substr, page + 1))
I'm trying to retrieve JSON data from https://www.fruityvice.com/api/fruit/.
So, I'm creating a function to do that, but as return i've got only 1 fruit.
import requests
import json
def scrape_all_fruits():
for ID in range(1, 10):
url = f'https://www.fruityvice.com/api/fruit/{ID}'
response = requests.get(url)
data = response.json()
return data
print(scrape_all_fruits())
Can anyone explain to me, why ?
i am web scraped the data by Beautifulsoup and printing the data. now i want the import to be imported to excel/csv my program below.i am new to python need help there are multiple pages that i have scraped now i need to export them to csv/excel
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
from bs4 import BeautifulSoup as bs
def scrape_bid_data():
page_no = 1 #initial page number
while True:
print('Hold on creating URL to fetch data...')
URL = 'https://bidplus.gem.gov.in/bidlists?bidlists&page_no=' + str(page_no) #create dynamic URL
print('URL cerated: ' + URL)
scraped_data = requests.get(URL,verify=False) # request to get the data
soup_data = bs(scraped_data.text, 'lxml') #parse the scraped data using lxml
extracted_data = soup_data.find('div',{'id':'pagi_content'}) #find divs which contains required data
if len(extracted_data) == 0: # **if block** which will check the length of extracted_data if it is 0 then quit and stop the further execution of script.
break
else:
for idx in range(len(extracted_data)): # loops through all the divs and extract and print data
if(idx % 2 == 1): #get data from odd indexes only because we have required data on odd indexes
bid_data = extracted_data.contents[idx].text.strip().split('\n')
print('-' * 100)
print(bid_data[0]) #BID number
print(bid_data[5]) #Items
print(bid_data[6]) #Quantitiy Required
print(bid_data[10] + bid_data[12].strip()) #Department name and address
print(bid_data[16]) #Start date
print(bid_data[17]) #End date
print('-' * 100)
page_no +=1 #increments the page number by 1
scrape_bid_data()
data is coming in the form like this below:
You can use pandas
pip install pandas
obj can be
bid_data = []
for obj in list:
obj= {
"bid_data_0" :bid_data[0],
"bid_data_5" :bid_data[5],
"bid_data_6" :bid_data[6],
"bid_data_10" :bid_data[10],
"bid_data_12" :bid_data[12].strip(),
"bid_data_17" :bid_data_17,
}
bid_data.append(obj)
you can format bid_data to dict obj and in that object add only required field
import pandas as pd
bid_data = pd.DataFrame(bid_data)
bid_data.to_csv("file_name.csv", index=True, encoding='utf-8')
it is the simplest method I have ever used for exporting data to csv.
Let me know if encounter any problem
I'm trying to iterate through the pages of this Meetup API but I am receiving an error:
url = 'https://api.meetup.com/2/groups?offset=1&format=json&category_id=34&photo-host=public&page=100&radius=200.0&fields=&order=id&desc=false&sig_id=243750775&sig=768bcf78d9c73937fcf2f5d41fe6070424f8d0e3'
while url:
data = requests.get(url).json()
url2 = data['meta'].get('next')
data2 = pd.io.json.json_normalize(data['results'])
print(data2)
However, when I write it as;
while url:
data = requests.get(url).json()
print(data)
url2 = data['meta'].get('next')
data2 = pd.io.json.json_normalize(data['results'])
It comes out as a list that keeps iterating it's self but I don't know if it's looping through the same page or not.
I also need to use this ["offset"] += 1 somehow but fon't know where to place it
there is also a parameter page that you can use in your api call.
page = 1
url = '<base_url>&page=%d'
while page < 590:
new_url = url % page
# fetch new_url and do your magic
....
page += 1
I am new to python programming and I am getting my hands dirty by working on a pet project.
I have tried a lot to avoid these nested for loops, but no success.
Avoiding nested for loops
Returns values from a for loop in python
import requests
import json
r = requests.get('https://api.coinmarketcap.com/v1/ticker/')
j = r.json()
for item in j:
item['id']
n = item['id']
url = 'https://api.coinmarketcap.com/v1/ticker/%s' %n
req = requests.get(url)
js = req.json()
for cool in js:
print n
print cool['rank']
Please let me know if more information is needed.
Question
I have too many loops in loops and want a python way of cleaning it up
Answer
Yes, there is a python way of cleaning up loops-in-loops to make it look better but there will still be loops-in-loops under-the-covers.
import requests
import json
r = requests.get('https://api.coinmarketcap.com/v1/ticker/')
j = r.json()
id_list = [item['id'] for item in j]
for n in id_list:
url = 'https://api.coinmarketcap.com/v1/ticker/%s' %n
req = requests.get(url)
js = req.json()
print "\n".join([ n+"\n"+item['rank'] for item in js ])
Insight from running this
After running this specific code, I realize that your are actually first retrieving the list of tickers in order of rank using
r = requests.get('https://api.coinmarketcap.com/v1/ticker/')
and then using
url = 'https://api.coinmarketcap.com/v1/ticker/%s' %n
to get the rank.
So long as the https://api.coinmarketcap.com/v1/ticker/ continues to return the items in order of rank you could simplify your code like so
import requests
import json
r = requests.get('https://api.coinmarketcap.com/v1/ticker/')
j = r.json()
id_list = [item['id'] for item in j]
result = zip(id_list,range(1,len(id_list)+1) )
for item in result :
print item[0]
print item[1]
Answer to addition question
Addition question : What if I want one more parameter say price_usd? ..... for cool in js: print n print cool['rank'] print cool['price_usd']
Answer :
change the line
print "\n".join([ n+"\n"+item['rank'] for item in js ])
to
print "\n".join([ n+"\n"+item['rank']+"\n"+cool['price_usd'] for item in js ])
Your first request already gets you everything you need.
import requests
import json
response = requests.get('https://api.coinmarketcap.com/v1/ticker/')
coin_data = response.json()
for coin in coin_data:
print coin['id'] # "bitcoin", "ethereum", ...
print coin['rank'] # "1", "2", ...
print coin['price_usd'] # "2834.75", "276.495", ...