Im new to python and figured that best way to learn is by practice, this is my first project.
So there is this fantasy football website. My goal is to create script which logins to site, automatically creates preselected team and submits it.
I have managed to get to submitting team part.
When I add a team member this data gets sent to server:
https://i.gyazo.com/e7e6f82ca91e19a08d1522b93a55719b.png
When I press save this list this data gets sent:
https://i.gyazo.com/546d49d1f132eabc5e6f659acf7c929e.png
Code:
import requests
with requests.Session() as c:
gameurl = 'here is link where data is sent'
BPL = ['5388', '5596', '5481', '5587',
'5585', '5514', '5099', '5249', '5566', '5501', '5357']
GID = '168'
UDID = '0'
ACT = 'draft'
ACT2 = 'save_draft'
SIGN = '18852c5f48a94bf3ee58057ff5c016af'
# eleven of those with different BPL since 11 players needed:
c.get(gameurl)
game_data = dict(player_id = BPL[0], action = ACT, id = GID)
c.post(gameurl, data = game_data)
# now I need to submit my list of selected players:
game_data_save = dict( action = ACT2, id = GID, user_draft_id = UDID, sign = SIGN)
c.post(gameurl, data = game_data_save)
This code works pretty fine, but the problem is, that 'SIGN' is unique for each individual game and I have no idea how to get this data without using Chromes inspect option.
How can I get this data simply running python code?
Because you said you can find it using devtools I'm assuming SIGN is written somewhere in the DOM.
In that case you can use requests.get().text to get the HTML of the page and parse it with a tool like lxml or HTMLParser
Solved by posting all data without 'SIGN' and in return I got 'SIGN' in html.
Related
I am looking to find various statistics about players in games such as CS:GO from the Steam Web API, but cannot work out how to search through the JSON returned from the query (e.g. here) in Python.
I just need to be able to get a specific part of the list that is provided, e.g. finding total_kills from the link above. If I had a way that could sort through all of the information provided and filters it down to just that specific thing (in this case total_kills) then that would help a load!
The code I have at the moment to turn it into something Python can read is:
url = "http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key=FE3C600EB76959F47F80C707467108F2&steamid=76561198185148697&include_appinfo=1"
data = requests.get(url).text
data = json.loads(data)
If you are looking for a way to search through the stats list then try this:
import requests
import json
def findstat(data, stat_name):
for stat in data['playerstats']['stats']:
if stat['name'] == stat_name:
return stat['value']
url = "http://api.steampowered.com/ISteamUserStats/GetUserStatsForGame/v0002/?appid=730&key=FE3C600EB76959F47F80C707467108F2&steamid=76561198185148697"
data = requests.get(url).text
data = json.loads(data)
total_kills = findstat(data, 'total_kills') # change 'total_kills' to your desired stat name
print(total_kills)
I apologize for not being able to specifically give out the url im dealing with. I'm trying to extract some data from a certain site but its not organized well enough. However, they do have an "Export To CSV file" and the code for that block is ...
<input type="submit" name="ctl00$ContentPlaceHolder1$ExportValueCSVButton" value="Export to Value CSV" id="ContentPlaceHolder1_ExportValueCSVButton" class="smallbutton">
In this type of situation, whats the best way to go about grabbing that data when there is no specific url to the CSV, Im using Mechanize and BS4.
If you're able to click a button that could download the data as a csv, it sounds like you might be able to wget link that data and save it on your machine and work with it there. I'm not sure if that's what you're getting at here though, any more details you can offer?
You should try Selenium, Selenium is a suite of tools to automate web browsers across many platforms. It can do a lot thing including click button.
Well, you need SOME starting URL to feed br.open() to even start the process.
It appears that you have an aspnetForm type control there and the below code MAY serve as a bit of a starting point, even though it does not work as-is (it's a work in progress...:-).
You'll need to look at the headers and parameters via the network tab of your browser dev tools to see them.
br.open("http://media.ethics.ga.gov/search/Lobbyist/Lobbyist_results.aspx?&Year=2016&LastName="+letter+"&FirstName=&City=&FilerID=")
soup = BS(br.response().read())
table = soup.find("table", { "id" : "ctl00_ContentPlaceHolder1_Results" }) # Need to add error check here...
if table is None: # No lobbyist with last name starting with 'X' :-)
continue
records = table.find_all('tr') # List of all results for this letter
for form in br.forms():
print "Form name:", form.name
print form
for row in records:
rec_print = ""
span = row.find_all('span', 'lblentry', 'value')
for sname in span:
if ',' in sname.get_text(): # They actually have a field named 'comma'!!
continue
rec_print = rec_print + sname.get_text() + "," # Create comma-delimited output
print(rec_print[:-1]) # Strip final comma
lnk = row.find('a', 'lblentrylink')
if lnk is None: # For some reason, first record is blank.
continue
print("Lnk: ", lnk)
newlnk = lnk['id']
print("NEWLNK: ", newlnk)
newstr = lnk['href']
newctl = newstr[+25:-5] # Matching placeholder (strip javascript....)
br.select_form('aspnetForm') # Tried (nr=0) also...
print("NEWCTL: ", newctl)
br[__EVENTTARGET] = newctl
response = br.submit(name=newlnk).read()
I am beginner in Web scraping and I have become very much interested in the process.
I set for myself a Project that can keep me motivated till I completed the project.
My Project
My Aim is to write a Python Program that goes to my university results page(which happens to be a " xx.asp") and enters my
MY EXAM NO
MY COURSE
SEMESTER and submit it to the website.
Clicking on the submit button leads to another "yy.asp" page in which my results are displayed. But I am having a lot of trouble doing the same.
Some Sample Data to try it out
The Results Website: http://result.pondiuni.edu.in/candidate.asp
Register Number: 15te1218
Degree: BTHEE
Exam: Second
Could anyone give me directions of how I am to accomplish the task?
I have written a sample program that I am not really proud of or nor does it work as I wanted. The following is the code that I wrote. I am a beginner so sorry if I did something terribly wrong. Please correct me and would be awesome if you could guide me to solve the problem.
The website is a .asp website not .aspx.
I have provided sample data so that you can see whats happening where we submit a request to the website.
The Code
import requests
with requests.Session() as c:
url='http://result.pondiuni.edu.in/candidate.asp'
url2='http://result.pondiuni.edu.in/ResultDisp.asp'
TXTREGNO='15te1218'
CMBDEGREE='BTHEE~\BTHEE\result.mdb'
CMBEXAMNO='B'
DPATH='\BTHEE\result.mdb'
DNAME='BTHEE'
TXTEXAMNO='B'
c.get(url)
payload = {
'txtregno':TXTREGNO,
'cmbdegree':CMBDEGREE,
'cmbexamno':CMBEXAMNO,
'dpath':DPATH,
'dname':DNAME,
'txtexamno':TXTEXAMNO
}
post_request = requests.post(url, data=payload)
page=c.get(url2)
I have no idea what to do next so that I can retrieve my result page(displayed in url2 from the code). All the data is entered in link url in the program(the starting link were all the info is entered) from where after submitting takes is to url2 the results page.
Please help me make this program.
I took all the post form parameters from Chrome's Network Tab.
You are way over complicating it and you have carriage returns in your post data so that could never work:
In [1]: s = "BTHEE~\BTHEE\result.mdb"
In [2]: print(s) # where did "\result.mdb" go?
esult.mdbHEE
In [3]: s = r"BTHEE~\BTHEE\result.mdb" # raw string
In [4]: print(s)
BTHEE~\BTHEE\result.mdb
So fix you form data and just post to get to your results:
import requests
data = {"txtregno": "15te1218",
"cmbdegree": r"BTHEE~\BTHEE\result.mdb", # use raw strings
"cmbexamno": "B",
"dpath": r"\BTHEE\result.mdb",
"dname": "BTHEE",
"txtexamno": "B"}
results_page = requests.post("http://result.pondiuni.edu.in/ResultDisp.asp", data=data).content
To add to the answer already given, you can use bs4.BeautifulSoup to find the data you need in the result page afterwards.
#!\usr\bin\env python
import requests
from bs4 import BeautifulSoup
payload = {'txtregno': '15te1218',
'cmbdegree': r'BTHEE~\BTHEE\result.mdb',
'cmbexamno': 'B',
'dpath': r'\BTHEE\result.mdb',
'dname': 'BTHEE',
'txtexamno': 'B'}
results_page = requests.get('http://result.pondiuni.edu.in/ResultDisp.asp', data = payload)
soup = BeautifulSoup(results_page.text, 'html.parser')
SubjectElem = soup.select("td[width='66%'] font")
MarkElem = soup.select("font[color='DarkGreen'] b")
Subject = []
Mark = []
for i in range(len(SubjectElem)):
Subject.append(SubjectElem[i].text)
Mark.append(MarkElem[i].text)
Transcript = dict(zip(Subject, Mark))
This will give a dictionary with the subject as a key and mark as a value.
My question involves learning how to retrieve my entire list of friends using Facebook's Python API. The current result returns an object with limited number of friends and a link to the 'next' page. How do I use this to fetch the next set of friends ? (Please post the link to possible duplicates) Any help would be much appreciated. In general, I need to learn about the pagination involved the API usage.
import facebook
import json
ACCESS_TOKEN = "my_token"
g = facebook.GraphAPI(ACCESS_TOKEN)
print json.dumps(g.get_connections("me","friends"),indent=1)
Sadly the documentation of pagination is an open issue since almost 2 years. You should be able to paginate like this (based on this example) using requests:
import facebook
import requests
ACCESS_TOKEN = "my_token"
graph = facebook.GraphAPI(ACCESS_TOKEN)
friends = graph.get_connections("me","friends")
allfriends = []
# Wrap this block in a while loop so we can keep paginating requests until
# finished.
while(True):
try:
for friend in friends['data']:
allfriends.append(friend['name'].encode('utf-8'))
# Attempt to make a request to the next page of data, if it exists.
friends=requests.get(friends['paging']['next']).json()
except KeyError:
# When there are no more pages (['paging']['next']), break from the
# loop and end the script.
break
print allfriends
Update: There's a new generator method available which implements above behavior and can be used to iterate over all friends like this:
for friend in graph.get_all_connections("me", "friends"):
# Do something with this friend.
Meanwhile I was searching answer here is much better approach:
import facebook
access_token = ""
graph = facebook.GraphAPI(access_token = access_token)
totalFriends = []
friends = graph.get_connections("me", "/friends&summary=1")
while 'paging' in friends:
for i in friends['data']:
totalFriends.append(i['id'])
friends = graph.get_connections("me", "/friends&summary=1&after=" + friends['paging']['cursors']['after'])
At end point you will get one response where data will be empty and then there will be no 'paging' key so at that time it will break and all the data will be stored.
I couldn't find this anywhere, these answers seem super complicated and just no way I would even use an SDK if I had do stuff like that when Paging from a simple POST is so easy to start with, however:
FacebookAdsApi.init(my_app_id, my_app_secret, my_access_token)
my_account = AdAccount('act_23423423423423423')
# In the below, I added the limit to the max rows, 250.
# Also more importantly, paging. the SDK has a really sneaky way of doing this,
# enclose the request in a list() the results end up the same, but this will make the script request new objects until there are no more
#I tested this example and compared to Graph API and as of right now, 1/22 9:47AM, I get 81 from Graph and 81 here.
fields = ['name']
params = {'limit':250}
ads = list(my_account.get_ads(
fields = fields,
params = params,
))
Trick from the docs: "NOTE: We wrap the return value of get_ad_accounts with list() because get_ad_accounts returns an EdgeIterator object (located in facebook_business.adobjects) and we want to get the full list right away instead of having the iterator lazily loading accounts."
https://github.com/facebook/facebook-python-business-sdk
in this example you off set / pagination by one at the time, i think my while loop is simple since it only looking for the pagination key"next" to be none, if doesnt exists means we finish looping, and you will have your results in a list.
in this example i am just looking for all the people call jacob
import requests
import facebook
token = access_token="your token goes here"
fb = facebook.GraphAPI(access_token=token)
limit = 1
offset = 0
data = {"q": "jacob",
"type": "user",
"fields": "id",
"limit": limit,
"offset": offset}
req = fb.request('/search', args=data, method='GET')
users = []
for item in req['data']:
users.append(item["id"])
pag = req['paging']
while pag.get("next") is not None:
offset += limit
data["offset"] = offset
req = fb.request('/search', args=data, method='GET')
for item in req['data']:
users.append(item["id"])
pag = req.get('paging')
print users
I'm looking at scraping some data from Facebook using Python 2.7. My code basically augments by 1 changing the Facebook profile ID to then capture details returned by the page.
An example of the page I'm looking to capture the data from is graph.facebook.com/4.
Here's my code below:
import scraperwiki
import urlparse
import simplejson
source_url = "http://graph.facebook.com/"
profile_id = 1
while True:
try:
profile_id +=1
profile_url = urlparse.urljoin(source_url, str(profile_id))
results_json = simplejson.loads(scraperwiki.scrape(profile_url))
for result in results_json['results']:
print result
data = {}
data['id'] = result['id']
data['name'] = result['name']
data['first_name'] = result['first_name']
data['last_name'] = result['last_name']
data['link'] = result['link']
data['username'] = result['username']
data['gender'] = result['gender']
data['locale'] = result['locale']
print data['id'], data['name']
scraperwiki.sqlite.save(unique_keys=['id'], data=data)
#time.sleep(3)
except:
continue
profile_id +=1
I am using the scraperwiki site to carry out this check but no data is printed back to console despite the line 'print data['id'], data['name'] used just to check the code is working
Any suggestions on what is wrong with this code? As said, for each returned profile, the unique data should be captured and printed to screen as well as populated into the sqlite database.
Thanks
Any suggestions on what is wrong with this code?
Yes. You are swallowing all of your errors. There could be a huge number of things going wrong in the block under try. If anything goes wrong in that block, you move on without printing anything.
You should only ever use a try / except block when you are looking to handle a specific error.
modify your code so that it looks like this:
while True:
profile_id +=1
profile_url = urlparse.urljoin(source_url, str(profile_id))
results_json = simplejson.loads(scraperwiki.scrape(profile_url))
for result in results_json['results']:
print result
data = {}
# ... more ...
and then you will get detailed error messages when specific things go wrong.
As for your concern in the comments:
The reason I have the error handling is because, if you look for
example at graph.facebook.com/3, this page contains no user data and
so I don't want to collate this info and skip to the next user, ie. no
4 etc
If you want to handle the case where there is no data, then find a way to handle that case specifically. It is bad practice to swallow all errors.