How to get all of the Facebook Graph API page feed? - python

Using Python:
from facepy import GraphAPI
key = 'My Access_Token'
a = GraphAPI(key)
response = a.get('Page_ID/posts')
print response
If I run this code, It shows me data in JSON format! And Limited Data. I want to get Years of feeds from my page.
In response I get Next tag URL: If I put this URL in Browser I can access more data. In one request we can fetch 200 feeds at a time.
How can I get data in bulk? I want to collect data of 5 years from public page!

Facebook's GraphAPI uses paging for API responses:
If you query v2.5/{page-id}/posts then the result will provide you with a data array and beneath this there is the paging section. You get the "previous" and "next" query string you just have to call again and again. API won't allow you more than 100 results per "page".

Related

Getting few results with trying to authenticate and search repositories with github API using python

I'm trying to search for repositories using javascript using python and the github API, and put the links to the repositories in a file.
import requests
from pprint import pprint
username = #my username here!
token = #my token here!
user_data = requests.get(f"https://api.github.com/search/repositories?q=language:js&sort=stars&order=desc", auth=(username,token)).json()
headers = {'Authorization': 'token ' + token}
login = requests.get('https://api.github.com/user', headers=headers)
print(login.json())
f = open("snapshotJS.txt", "w")
for userKeys in user_data.keys():
if userKeys == "items":
for item in user_data[userKeys]:
for lines in item:
if lines == "html_url":
print(item.get(lines))
f.write(item.get(lines) + "\n")
f.close()
When I run the code, I only get 30 links in my textfile every time (granted, they're different links every time I run it). How would I be able to get more than 30 at a time? Since I have a personal token, shouldn't I be able to get up to 5000 requests?
Sorry if it's something small I'm missing, I'm new to API!
The Github API returns 30 entries per page if the size of the page is not specified.
Requests that return multiple items will be paginated to 30 items by default. You can specify further pages with the page parameter. For some resources, you can also set a custom page size up to 100 with the per_page parameter.
To get more records in a page, use the per_page query param.
To get all the records, use a while loop to keep fetching the pages till no page is left.

send a post request to a website with multiple form tags using requests in python

good evening,
im trying to write a programme that extracts the sell price of certain stocks and shares on a website called hl.co.uk
As you can imagine you have to search for the stock you want to see the sale price of.
my code so far is as follows:
import requests
from bs4 import BeautifulSoup as soup
url = "https://www.hl.co.uk/shares"
page = requests.get(url)
parsed_html = soup(page.content, 'html.parser')
form = parsed_html.find('form', id="stock_search")
input_tag = form.find('input').get('name')
submit = form.find('input', id="stock_search_submit").get('alt')
post_data = {input_tag: "fgt", "alt": submit}
i have been able to extract the correct form tag and the input names i require. but the website has multiple forms on this page.
how can i submit a post request to this website using the data i have in "post_data" to that specfic form in order for it to search the stockk/share that i desire and then give me the next page?
thanks in advance
Actually when you submit the form from the homepage, it redirect you to the the target page with an url looking like this, "https://www.hl.co.uk/shares/search-for-investments?stock_search_input=abc&x=56&y=35&category_list=CEHGINOPW", so in my opinion, instead of submitting the homepage form, you should directly call the target page with your own GET parameters, the url you're supposed to call will look like this https://www.hl.co.uk/shares/search-for-investments?stock_search_input=[your_keywords].
Hope this helped you
This is a pretty general problem which you can use google chrome's devtools to solve. Basically,
1- Navigate to the page where you have a form and bunch of fields.
In your case page should look like this:
2- Then choose XHR tab under Network tab which will filter out all Fetch and XHR requests. These requests are generally sent after a form submission and they return a JSON with resulting data most of the time.
3- Make sure you enable the checkbox on the top left Preserve Log so the list doesn't refresh when form is submitted.
4- Submit the form, then you'll see bunch of requests are being made. Inspect them to hopefully find what you're looking for.
In this case I found this URL endpoint which gives out the results as response.
https://www.hl.co.uk/ajax/funds/fund-search/search?investment=&companyid=1324&sectorid=132&wealth=&unitTypePref=&tracker=&payment_frequency=&payment_type=&yield=&standard_ocf=&perf12m=&perf36m=&perf60m=&fund_size=&num_holdings=&start=0&rpp=20&lo=0&sort=fd.full_description&sort_dir=asc&
You can see all the query parameters here as companyid, sectorid what you need to do is change those and just make a request to URL. Then you'll get the relevant information.
To retrieve those companyid and sectorid values you can send a get request to the page https://www.hl.co.uk/shares/search-for-investments?stock_search_input=ftg&x=17&y=23&category_list=CEHGINOPW which has those dropdowns and filter the html to find these values in the screenshot below:
You can see this documentation for BS4 to find tags inside HTML source, https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find

Python POST requests - how to extract html of request destination

Scraping data of mortgage from official mortgage registry. The problem is that I can't extract the html of particular document. Everything happens on POST behalf - I have all of the data required to precise the POST request, but still when i'm printing the request.url it shows me the welcome screen page. It should retrieve html from particular document. All data like number of mortgage or current page are listed in dev tools > netowrk > Form Data, so I bet it must be possible. I'm quite new in web python so I will apprecaite any help.
My code:
import requests
data = {
'kodWydzialu':'PT1R',
'nrKw':'00037314',
'cyfraK':'9',
}
r = requests.post('https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW', data=data)
print(r.url), print(r.content)
You are getting the welcome screen because you aren't sending all the requests required to view the next page.
Go to Chrome > Network tabs, and you will see that when you click the submit/search button, a bunch of other GET requests are being sent to different URLs after that first POST request.
You need to replicate that in your script. Depending upon the website it can be tough to get the response, so you should consider using Selenium
That said, it's not impossible to do this with requests:
session = requests.Session()
You need to send the POST request, and all other GET requests that follow in the same session.
data = {
'kodWydzialu':'PT1R',
'nrKw':'00037314',
'cyfraK':'9',
}
session.post(URL, headers=headers, params=data)
# Start sending the GET requests
session.get(URL_1, headers=headers)
session.get(URL_2, headers=headers)

BeautifulSoup table data extraction - data not showing up

-2 to make this 30 characters is some top kek information based for things idfk
As you yourself found out, the element is not present in the page source, and is loaded dynamically through an AJAX request. The urllib module (or requests) returns the page source, which is why you won't be able to get that value directly.
Go to Developer Tools > Network > XHR and refresh the page. You'll see an AJAX request made to this url:
https://ethplorer.io/service/service.php?data=0x8b353021189375591723e7384262f45709a3c3dc
This url returns the data in the form of JSON. If you have a look at it, you can get the Holders number from it using requests module and the built-in .json() method.
import requests
r = requests.get('https://ethplorer.io/service/service.php?data=0x8b353021189375591723e7384262f45709a3c3dc')
data = r.json()
holders = data['pager']['holders']['total']
print(holders)
# 2346

Post content as a page

I am trying to post a message to the wall of a business page. I follow the following steps and everything works fine except that I don't publish the message on the business wall as administrator.
graph = facebook.GraphAPI(access_token='xxx')
If I use graph.put_wall_post(message='test') I publish the text on my personal wall.
With the profile id of business page, graph.put_wall_post(message='test', profile_id='5537xx') I post something like Me > business page
If I try to create the app using the business page, I get the following error:
Users not logged into their personal account cannot access developers.facebook.com
How can I post the message as a text post directly to my business page without error?
You should get an access-token for a page. You are probably getting an access-token for your personal account.
As stated in the Graph API Docs, here and here
With the Pages API, people using your app can post to Facebook as a
Page (...)
Before your app can make calls to read, update, or post to Pages you need to get a page access token. With this token you can view Page settings, make updates to page information and manage a Page.
Therefore, you basically should get the token corresponding to your page
To get the Page access token for a single page call the API endpoint
/{page-id} using an user access token and asking for the field
access_token. You need the permission pages_show_list or manage_pages
to successfully execute this call.
And then make requests to post content, for instance, a message
To post text to a Page's feed, provide a message parameter with the
text along with the Page ID:
POST https://graph.facebook.com/546349135390552/feed?message=Hello
On success, Graph API responds with JSON containing the Page
ID and the ID for the post:
{ "id": "546349135390552_1116689038356556" }
Read the links above and you'll have more information about it.

Categories