I'm super new to programming so figured i'd ask for a bit of help since i've been stuggling for a few days.
Scenario.Trying to pull Data from Twitch's API and output a clean list.
import requests
import json
from requests.structures import CaseInsensitiveDict
url = "https://api.twitch.tv/helix/users/follows?to_id=495565101"
headers = CaseInsensitiveDict()
headers["Authorization"] = "Bearer fdsafasdfasdfasdfasdfasfd"
headers["Client-Id"] = "asdfasdfasdfasdfasdfasdfasfd"
resp = requests.get(url, headers=headers)
json_response = resp.json()['data'][0]['from_id']
print(json_response)
433442715
This will only output one from_id and then stop. However if I output it to a file and parse the data from the file it works using seperate code. But because of the pagination you only get the list to output so many then somehow i have to run it again and amend the file or something to add the new data.
I did also notice that if I take away "['from_id']" It changes the json_response type from a dictionary to and string so not sure if thats part of my issue.
import json
with open('Follower.json') as json_file:
data = json.load(json_file)
print(data['data'][0])
for i in data['data']:
print(i['from_id'])
print()
433442715
169916770
733044434
478480475
186230385
472433229
253461348
etc...
It then will dish out the pagination code used to retrieve a new page of data which i can probably set to a variable to run with the loop but no idea where to even start searching
I'm mostly looking for some good reference material pertaining to this problem to help solve it. but suggestions are super welcome also. this is the first project i've tried to build.
Thanks in advance
-MM
Related
I am looking to find various statistics about players in games such as CS:GO from the Steam Web API, but cannot work out how to search through the JSON returned from the query (e.g. here) in Python.
I just need to be able to get a specific part of the list that is provided, e.g. finding total_kills from the link above. If I had a way that could sort through all of the information provided and filters it down to just that specific thing (in this case total_kills) then that would help a load!
The code I have at the moment to turn it into something Python can read is:
url = "http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key=FE3C600EB76959F47F80C707467108F2&steamid=76561198185148697&include_appinfo=1"
data = requests.get(url).text
data = json.loads(data)
If you are looking for a way to search through the stats list then try this:
import requests
import json
def findstat(data, stat_name):
for stat in data['playerstats']['stats']:
if stat['name'] == stat_name:
return stat['value']
url = "http://api.steampowered.com/ISteamUserStats/GetUserStatsForGame/v0002/?appid=730&key=FE3C600EB76959F47F80C707467108F2&steamid=76561198185148697"
data = requests.get(url).text
data = json.loads(data)
total_kills = findstat(data, 'total_kills') # change 'total_kills' to your desired stat name
print(total_kills)
I am beginner in Web scraping and I have become very much interested in the process.
I set for myself a Project that can keep me motivated till I completed the project.
My Project
My Aim is to write a Python Program that goes to my university results page(which happens to be a " xx.asp") and enters my
MY EXAM NO
MY COURSE
SEMESTER and submit it to the website.
Clicking on the submit button leads to another "yy.asp" page in which my results are displayed. But I am having a lot of trouble doing the same.
Some Sample Data to try it out
The Results Website: http://result.pondiuni.edu.in/candidate.asp
Register Number: 15te1218
Degree: BTHEE
Exam: Second
Could anyone give me directions of how I am to accomplish the task?
I have written a sample program that I am not really proud of or nor does it work as I wanted. The following is the code that I wrote. I am a beginner so sorry if I did something terribly wrong. Please correct me and would be awesome if you could guide me to solve the problem.
The website is a .asp website not .aspx.
I have provided sample data so that you can see whats happening where we submit a request to the website.
The Code
import requests
with requests.Session() as c:
url='http://result.pondiuni.edu.in/candidate.asp'
url2='http://result.pondiuni.edu.in/ResultDisp.asp'
TXTREGNO='15te1218'
CMBDEGREE='BTHEE~\BTHEE\result.mdb'
CMBEXAMNO='B'
DPATH='\BTHEE\result.mdb'
DNAME='BTHEE'
TXTEXAMNO='B'
c.get(url)
payload = {
'txtregno':TXTREGNO,
'cmbdegree':CMBDEGREE,
'cmbexamno':CMBEXAMNO,
'dpath':DPATH,
'dname':DNAME,
'txtexamno':TXTEXAMNO
}
post_request = requests.post(url, data=payload)
page=c.get(url2)
I have no idea what to do next so that I can retrieve my result page(displayed in url2 from the code). All the data is entered in link url in the program(the starting link were all the info is entered) from where after submitting takes is to url2 the results page.
Please help me make this program.
I took all the post form parameters from Chrome's Network Tab.
You are way over complicating it and you have carriage returns in your post data so that could never work:
In [1]: s = "BTHEE~\BTHEE\result.mdb"
In [2]: print(s) # where did "\result.mdb" go?
esult.mdbHEE
In [3]: s = r"BTHEE~\BTHEE\result.mdb" # raw string
In [4]: print(s)
BTHEE~\BTHEE\result.mdb
So fix you form data and just post to get to your results:
import requests
data = {"txtregno": "15te1218",
"cmbdegree": r"BTHEE~\BTHEE\result.mdb", # use raw strings
"cmbexamno": "B",
"dpath": r"\BTHEE\result.mdb",
"dname": "BTHEE",
"txtexamno": "B"}
results_page = requests.post("http://result.pondiuni.edu.in/ResultDisp.asp", data=data).content
To add to the answer already given, you can use bs4.BeautifulSoup to find the data you need in the result page afterwards.
#!\usr\bin\env python
import requests
from bs4 import BeautifulSoup
payload = {'txtregno': '15te1218',
'cmbdegree': r'BTHEE~\BTHEE\result.mdb',
'cmbexamno': 'B',
'dpath': r'\BTHEE\result.mdb',
'dname': 'BTHEE',
'txtexamno': 'B'}
results_page = requests.get('http://result.pondiuni.edu.in/ResultDisp.asp', data = payload)
soup = BeautifulSoup(results_page.text, 'html.parser')
SubjectElem = soup.select("td[width='66%'] font")
MarkElem = soup.select("font[color='DarkGreen'] b")
Subject = []
Mark = []
for i in range(len(SubjectElem)):
Subject.append(SubjectElem[i].text)
Mark.append(MarkElem[i].text)
Transcript = dict(zip(Subject, Mark))
This will give a dictionary with the subject as a key and mark as a value.
I'm trying to manipulate a dynamic JSON from this site:
http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do
It has 3 elements, imagem, a base64, labelValorCaptcha, just a message, and uuidCaptcha, a value to pass by parameter to play a sound in this link bellow:
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_e7b072e1fce5493cbdc46c9e4738ab8a
When I enter in the first site through a browser and put in the second link the uuidCaptha after the equal ("..uuidCaptcha="), the sound plays normally. I wrote a simple code to catch this elements.
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
But I dont know what's happening, the caught value of the uuidCaptcha doesn't work. Open a error web page.
Someone knows?
Thanks!
It works for me.
$ cat a.py
#!/usr/bin/env python
# encoding: utf-8
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
$ python a.py
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_efc8d4bc3bdb428eab8370c4e04ab42c
As I said #Charlie Harding, the best way is download the page and get the JSON values, because this JSON is dynamic and need an opened web link to exist.
More info here.
I am a beginner in python to pull some data from reddit.com
More precisely, I am trying to send a request to http:www.reddit.com/r/nba/.json to get the JSON content of the page and then parse it for entries about a specific team or player.
To automate the data gathering, I am requesting the page like this:
import urllib2
FH = urllib2.urlopen("http://www.reddit.com/r/nba/.json")
rnba = FH.readlines()
rnba = str(rnba[0])
FH.close()
I am also pulling the content like this on a copy of the script, just to be sure:
FH = requests.get("http://www.reddit.com/r/nba/.json",timeout=10)
rnba_json = FH.json()
FH.close()
However, I am not getting the full data that is presented when I manually go to
http://www.reddit.com/r/nba/.json with either method, in particular when I call
print len(rnba_json['data']['children']) # prints 20-something child stories
but when I do the same loading the copy-pasted JSON string like this:
import json
import urllib2
fh = r"""{"kind": "Listing", "data": {"modhash": ..."""# long JSON string
r_nba = json.loads(fh) #loads the json string from the site into json object
print len(r_nba['data']['children']) #prints upwards of 100 stories
I get more story links. I know about the timeout parameter but providing it did not resolve anything.
What am I doing wrong or what can I do to get all the content presented when I pull the page in the browser?
To get the max allowed, you'd use the API like: http://www.reddit.com/r/nba/.json?limit=100
The JSON syntax definition say that
html/xml tags (like the <script>...</script> part) are not part of
valid json, see the description at http://json.org.
A number of browsers and tools ignore these things silently, but python does
not.
I'd like to insert the javascript code (google analytics) to get info about the users using this service (place, browsers, OS ...).
What do you suggest to do?
I should solve the problem on [browser output][^1] or [python script][^2]?
thanks,
Antonio
[^1]: Browser output
<script>...</script>
[{"key": "value"}]
[^2]: python script
#!/usr/bin/env python
import urllib2, urllib, json
url="http://.........."
params = {}
url = url + '?' + urllib.urlencode(params, doseq=True)
req = urllib2.Request(url)
headers = {'Accept':'application/json;text/json'}
for key, val in headers.items():
req.add_header(key, val)
data = urllib2.urlopen(req)
print json.load(data)
These sound like two different kinds of services--one is a user-oriented web view of some data, with visualizations, formatting, etc., and one is a machine-oriented data service. I would keep these separate, and maybe build the user view as an extension to the data service.