I'm relatively new to the facebook graph api.
I'm the owner of a business page and I want to get some information about the fan (interest - pages they likes).
How can I achieve that?
This is what I wrote until this moment:
api_endpoint = "https://graph.facebook.com/v2.10/"
page_id = "1627395454223846"
node='/'+ page_id + '/insights/page_impressions'
url = api_endpoint+node
Now I create the graph object:
graph = facebook.GraphAPI(access_token=access["token"], version = 2.10)
I have had in mind to use requests but how?
I have to use it withe the graph object.
Thanks
Using requests for this I think is preferred to be honest.
import requests
payload = {'access_token': access['token']}
api_endpoint = 'https://graph.facebook.com/v2.10'
page_id = '1627395454223846'
url = '{}/{}/insights/page_impressions'.format(api_endpoint, page_id)
resp = requests.get(url, params = payload)
...process data as you wish...
Related
I have a post on my fb page which I need to update several times a day with data elaborated in a python script. I tried using Selenium, but it gets often stuck when saving the post hence the script gets stuck too, so I'm trying to find a way to do the job within python itself without using a web browser.
I wonder is there a way to edit a FB post using a python library such as Facepy or similar?
I'm reading the graph API reference but there are no examples to learn from, but I guess first thing is to set up the login. On the facepy github page is written that
note that Facepy does not do authentication with Facebook; it only consumes its API. To get an access token to consume the API on behalf of a user, use a suitable OAuth library for your platform
I tried logging in with BeautifulSoup
from bs4 import BeautifulSoup
import requests
import re
def facebook_login(mail, pwd):
session = requests.Session()
r = session.get('https://www.facebook.com/', allow_redirects=False)
soup = BeautifulSoup(r.text)
action_url = soup.find('form', id='login_form')['action']
inputs = soup.find('form', id='login_form').findAll('input', {'type': ['hidden', 'submit']})
post_data = {input.get('name'): input.get('value') for input in inputs}
post_data['email'] = mail
post_data['pass'] = pwd.upper()
scripts = soup.findAll('script')
scripts_string = '/n/'.join([script.text for script in scripts])
datr_search = re.search('\["_js_datr","([^"]*)"', scripts_string, re.DOTALL)
if datr_search:
datr = datr_search.group(1)
cookies = {'_js_datr' : datr}
else:
return False
return session.post(action_url, data=post_data, cookies=cookies, allow_redirects=False)
facebook_login('email', 'psw')
but it gives this error
action_url = soup.find('form', id='login_form')['action']
TypeError: 'NoneType' object is not subscriptable
I also tried with Mechanize
import mechanize
username = 'email'
password = 'psw'
url = 'http://facebook.com/login'
print("opening browser")
br = mechanize.Browser()
print("opening url...please wait")
br.open(url)
print(br.title())
print("selecting form")
br.select_form(name='Login')
br['UserID'] = username
br['PassPhrase'] = password
print("submitting form"
br.submit()
response = br.submit()
pageSource = response.read()
but it gives an error too
mechanize._response.httperror_seek_wrapper: HTTP Error 403: b'request disallowed by robots.txt'
Install the facebook package
pip install facebook-sdk
then to update/edit a post on your page just run
import facebook
page_token = '...'
page_id = '...'
post_id = '...'
fb = facebook.GraphAPI(access_token = page_token, version="2.12")
fb.put_object(parent_object=page_id+'_'+post_id, connection_name='', message='new text')
I use API:
https://www.blockchain.com/api/q
Trying to make a Get request:
url = 'https://www.blockchain.info/api/q/getreceivedbyaddress/' + strpod + '?confirmations=6'
zapros = requests.get(url)
But it returns the entire page.
And I only need the balance value.
Please help me.
import requests
address = "17LREmmnmTxCoFZ59wfhg4S639GsPqjTRT"
URL = "https://blockchain.info/q/getreceivedbyaddress/"+address+"?confirmations=6"
r = requests.get(url = URL)
# extracting balance ((in satoshi))
bt_balance = r.json()
The API link is not wrong. Please check with the blockchain addresss
I want to download all the pdf docs corresponding to a list of "API#" values from http://imaging.occeweb.com/imaging/UIC1012_1075.aspx
So far I have managed to post the "API#" request but not sure what to do next.
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
url = 'http://imaging.occeweb.com/imaging/UIC1012_1075.aspx'
API = '15335187'
payload = {'txtIndex7':'1','txtIndex2': API}
session = requests.Session()
res = session.post(url,headers=headers,data=payload)
It is a bit more complicated than that, there are some additional event validation hidden input fields that you need to take into account. For that you first need to get the page, collect all the hidden values, set the value for the API and then make a POST request with following HTML parsing of the HTML response.
Fortunately, there is a tool called MechanicalSoup that may help to auto-fill these hidden fields in your the form submission request. Here is a complete solution including sample code for parsing the resulting table:
import mechanicalsoup
url = 'http://imaging.occeweb.com/imaging/UIC1012_1075.aspx'
API = '15335187'
browser = mechanicalsoup.StatefulBrowser(
user_agent='Mozilla/5.0'
)
browser.open(url)
# Fill-in the search form
browser.select_form('form#Form1')
browser["txtIndex2"] = API
browser.submit_selected("Button1")
# Display the results
for tr in browser.get_current_page().select('table#DataGrid1 tr'):
print([td.get_text() for td in tr.find_all("td")])
import mechanicalsoup
import urllib
url = 'http://imaging.occeweb.com/imaging/UIC1012_1075.aspx'
Form = '1012'
API = '15335187'
browser = mechanicalsoup.StatefulBrowser(
user_agent='Mozilla/5.0'
)
browser.open(url)
# Fill-in the search form
browser.select_form('form#Form1')
browser["txtIndex7"] = Form
browser["txtIndex2"] = API
browser.submit_selected("Button1")
# Display the results
for tr in browser.get_current_page().select('table#DataGrid1 tr')[2:]:
try:
pdf_url = tr.select('td')[0].find('a').get('href')
except:
print('Pdf not found')
else:
pdf_id = tr.select('td')[0].text
response = urllib.urlopen(pdf_url) # for python 2.7, for python 3. urllib.request.urlopen()
pdf_str = "C:\\Data\\"+pdf_id+".pdf"
file = open(pdf_str, 'wb')
file.write(response.read())
file.close()
print('Pdf '+pdf_id+' saved')
My goal to create an authenticated session in github so I can use the advanced search (which limits functionality to non-authenticated users). Currently I am getting a webpage response from the post request of "What? Your browser did something unexpected. Please contact us if the problem persists."
Here is the code I am using to try to accomplish my task.
import requests
from lxml import html
s = requests.Session()
payload = (username, password)
_ = s.get('https://www.github.com/login')
p = s.post('https://www.github.com/login', auth=payload)
url = "https://github.com/search?l=&p=0&q=language%3APython+extension%3A.py+sklearn&ref=advsearch&type=Code"
r = s.get(url, auth=payload)
text = r.text
tree = html.fromstring(text)
Is what I'm trying possible? I would prefer to not use the github v3 api since it is rate limited and I wanted to do more of my own scraping of the advanced search. Thanks.
As mentioned in the comments, github uses post data for authentication so you should have your creds in the data parameter.
The elements you have to submit are 'login', 'password', and 'authenticity_token'. The value of 'authenticity_token' is dynamic, but you can scrape it from '/login'.
Finally submit data to /session and you should have an authenticated session.
s = requests.Session()
r = s.get('https://www.github.com/login')
tree = html.fromstring(r.content)
data = {i.get('name'):i.get('value') for i in tree.cssselect('input')}
data['login'] = username
data['password'] = password
r = s.post('https://github.com/session', data=data)
So, all I want to do is send a request to the 511 api and return the train times from the train station. I can do that using the full url request, but I would like to be able to set values without paste-ing together a string and then sending that string. I want to have the api return the train times for different stations. I see other requests that use headers, but I don't know how to use headers with a request and am confused by the documentation.
This works...
urllib2.Request("http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?token=xxxx-xxx&stopcode=70142")
response = urllib2.urlopen(request)
the_page = response.read()
I want to be able to set values like this...
token = xxx-xxxx
stopcode = 70142
url = "http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?"
... and then put them together like this...
urllib2.Request(url,token, stopcode)
and get the same result.
The string formatting documentation would be a good place to start to learn more about different ways to plug in values.
val1 = 'test'
val2 = 'test2'
url = "https://www.example.com/{0}/blah/{1}".format(val1, val2)
urllib2.Request(url)
The missing piece is "urllib" needs to be used along with "urllib2". Specifically, the function urllib.urlencode() returns the encoded versions of the values.
From the urllib documentation here
import urllib
query_args = { 'q':'query string', 'foo':'bar' }
encoded_args = urllib.urlencode(query_args)
print 'Encoded:', encoded_args
url = 'http://localhost:8080/?' + encoded_args
print urllib.urlopen(url).read()
So the corrected code is as follows:
import urllib
import urllib2
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
encoded_args = urllib.urlencode(query_args)
request = urllib2.Request(url+encoded_args)
response = urllib2.urlopen(request)
print(response.read())
Actually, it is a million times easier to use requests package and not urllib, urllib2. All that code above can be replaced with this:
import requests
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
r = request.get(url, params = query_args)
r.text