I have worked with a few API's, but not sure how to get started with sending requests for Star Citizen. Does anyone know how you might go about using python to send a get request for say getting some data on game items. Here is their official API documentation but not sure where to start!
https://starcitizen-api.com/gamedata.php#get-items
Could anyone post an example get request that return data?
from the docs, the urls seems to be /xxxxxxxx/v1/gamedata/get/3.6.1/ship?name=Avenger or some such where i guess the xxx is your personal key or account or whatever
try this:
import requests
url = '/xxxxxxxx/v1/gamedata/get/3.6.1/ship?name=Avenger'
response = requests.get(url, verify = False)
contents = response.json()
just make sure the url is complete, should work the same for any web API really
EDIT:
from the docs it looks like the url should look like this (since the host is listed as Host: api.starcitizen-api.com
https://api.starcitizen-api.com/xxxxxxx/v1/gamedata/get/3.6.1/ship?name=Avenger
Related
I've been working on a project to reverse-enginner twitter's app to scrape public posts from Twitter using an unofficial API, with Python. (I want to create an "alternative" app, which is simply a localhost that can search for a user, and get its posts)
I've been searching and reading everything related to REST, AJAX, and the python modules requests, requests-html, BeautifulSoup, and more.
I can see when looking at twitter on the devtools (for example on Marvel's profile page) that the only relevant requests being sent (by POST and GET) are the following: client_event.json and UserTweets?variables=... .
I understood that these are the relevant messages being received by cleaning the network tab and recording only when I scroll down and load new tweets - these are the only messages that came up which aren't random videos (I cleaned the search using -video -init -csp_report -config -ondemand -like -pageview -recommendations -prefetch -jot -key_live_kn -svg -jpg -jpeg -png -ico -analytics -loader -sharedCore -Hebrew).
I am new to this field, so I am probably doing something wrong. I can see on UserTweets the response I'm looking for - a beautiful JSON with all the data I need - but I am unable, no matter how much I've been trying to, to access it.
I tried different modules and different headers, and I get nothing. I DON'T want to use Selenium since it's tiresome, and I know where the data I need is stored.
I've been trying to send a GET reuest to:
https://twitter.com/i/api/graphql/vamMfA41UoKXUmppa9PhSw/UserTweets?variables=%7B%22userId%22%3A%2215687962%22%2C%22count%22%3A20%2C%22cursor%22%3A%22HBaIgLLN%2BKGEryYAAA%3D%3D%22%2C%22withHighlightedLabel%22%3Atrue%2C%22withTweetQuoteCount%22%3Atrue%2C%22includePromotedContent%22%3Atrue%2C%22withTweetResult%22%3Afalse%2C%22withUserResults%22%3Afalse%2C%22withVoice%22%3Afalse%2C%22withNonLegacyCard%22%3Atrue%7D
by doing:
from requests_html import HTMLSession
from bs4 import BeautifulSoup
response = session.get('https://twitter.com/i/api/graphql/vamMfA41UoKXUmppa9PhSw/UserTweets?variables=%7B%22userId%22%3A%2215687962%22%2C%22count%22%3A20%2C%22cursor%22%3A%22HBaIgLLN%2BKGEryYAAA%3D%3D%22%2C%22withHighlightedLabel%22%3Atrue%2C%22withTweetQuoteCount%22%3Atrue%2C%22includePromotedContent%22%3Atrue%2C%22withTweetResult%22%3Afalse%2C%22withUserResults%22%3Afalse%2C%22withVoice%22%3Afalse%2C%22withNonLegacyCard%22%3Atrue%7D')
response.html.render()
s = BeautifulSoup(response.html.html, 'lxml')
but I get back an HTML script that either says Chromium is unsupported, or just a static page without the javascript updating the DOM.
All help appreciated.
Thank you
P.S
I've posted the same question on reverseengineering.stackexchange, just to be safe (overflow has more appropriate tags :-))
Before you deep dive into the actual code, I would first start building the correct request to twitter. I would use a 3rd party tool focused on REST and APIs such as Postman to build and test the required request - and only then would write the actual code.
From your questions it seems that you'll be using an open API of twitter, so it means you'll only need to send x-guest-token and basic Bearer authorization in your request headers.
The Bearer is static - you can just browse to twitter and copy/paste
it from the dev tools network monitor.
To get the x-guest-token you'll need something dynamic because it has expiration, what I would suggest is send a curl request to twitter, parse the token from there and put it in your header before sending the request. You can see something very similar in: Python Downloading twitter video using python (without using twitter api)
.
After you have both of the above, build the required GET request in Postman and test if you get back the correct response. Only after you have everything working in Postman - write the same in Python, or any other language**
**You can use Postman snippets which automatically generates the code needed in many programming languages.
#TripleS, example of how one may extract json data from __INITIAL_STATE__ and write it to text file.
import requests
import re
import json
from contextlib import suppress
# get page
result = requests.get('https://twitter.com/ThePSF')
# Extract json from "window.__INITIAL_STATE__={....};
json_string = re.search(r"window.__INITIAL_STATE__\s?=\s?(\{.*?\});", result.text).group(1)
# convert text string to structured json data
twitter_json = json.loads(json_string)
# Save structured json data to a text file that may help
# you to orient yourself and possible pick some parts you
# are interested in (if there are any)
with open('twitter_json_data.txt', 'w') as outfile:
outfile.write(json.dumps(twitter_json, indent=4, sort_keys=True))
I've just tried the same, but with requests, not requests_html module. I could get all site contents, but I would not call it "beautiful".
Also, now I am blocked to access the site without logging in.
Here is my small example.
Use official Twitter API instead.
I also think that I will probably be blocked after some tries of using this script. I've tried it only 2 times.
import requests
import bs4
def example():
result = requests.get("https://twitter.com/childrightscnct")
soup = bs4.BeautifulSoup(result.text, "lxml")
print(soup)
if __name__ == '__main__':
example()
To select any element with bs4, use
some_text = soup.select('locator').getText()
I found one tool for scraping Twitter, that has quite a lot of stars on Github https://github.com/twintproject/twint I did not try it myself and hope it is legal.
What you're missing is the bearer and guest token needed to make your request. If I just hit your endpoint with curl and no headers I get no response. However, if I add headers for the bearer token and guest token then I get that json you're looking for:
curl https://twitter.com/i/api/graphql/vamMfA41UoKXUmppa9PhSw/UserTweets?variables=%7B%22userId%22%3A%2215687962%22%2C%22count%22%3A20%2C%22cursor%22%3A%22HBaIgLLN%2BKGEryYAAA%3D%3D%22%2C%22withHighlightedLabel%22%3Atrue%2C%22withTweetQuoteCount%22%3Atrue%2C%22includePromotedContent%22%3Atrue%2C%22withTweetResult%22%3Afalse%2C%22withUserResults%22%3Afalse%2C%22withVoice%22%3Afalse%2C%22withNonLegacyCard%22%3Atrue%7D -H 'authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'' -H 'x-guest-token: 1452696114205847552'
You can get the bearer token (which may not expire that often) and the guest token (which does expire, I think) like this:
The html of the twitter link you go to links a file called main.some random numbers.js. Within that javascript file is the bearer token. You can recognize it is because a long string starting with lots of A's.
Take the bearer token and call https://api.twitter.com/1.1/guest/activate.json using the bearer token as an authorization header
curl 'https://api.twitter.com/1.1/guest/activate.json' -X POST -H 'authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'
In python this looks like:
import requests
import json
url = "https://twitter.com/i/api/graphql/vamMfA41UoKXUmppa9PhSw/UserTweets?variables=%7B%22userId%22%3A%2215687962%22%2C%22count%22%3A20%2C%22cursor%22%3A%22HBaIgLLN%2BKGEryYAAA%3D%3D%22%2C%22withHighlightedLabel%22%3Atrue%2C%22withTweetQuoteCount%22%3Atrue%2C%22includePromotedContent%22%3Atrue%2C%22withTweetResult%22%3Afalse%2C%22withUserResults%22%3Afalse%2C%22withVoice%22%3Afalse%2C%22withNonLegacyCard%22%3Atrue%7D"
headers = {"authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA", "x-guest-token": "1452696114205847552"}
resp = requests.get(url, headers=headers)
j = json.loads(resp.text)
And now, that variable, j, holds your beautiful json. One warning, sometimes the response back can be so big that it doesn't seem to fit into a single response. If this happens, you'll notice the resp.text isn't valid json, but just some portion of a big blog of json. To fix this, you'll just need to adapt the requests to use "stream=True" and stream out the whole response before you try to parse it as json.
The task I want to complete is very simple. To do a http get request using python.
Below is the code I used:
url = 'http://www.costcobusinessdelivery.com/AjaxWarehouseBrowseLookupView?storeId=11301&catalogId=11701&langId=-1&parentGeoNode=10112'
requests.get(url)
Then I got:
<Response [401]>
I am new to python, can someone help? Thanks!
Update:
Based on the comments. It seems the code is okay, but I do get the 401 response. I doubt my company's network has some restrictions? But I can access and get a valid response through a browser. Is there a way to bypass my company's firewall/proxy or whatever? Just to pretend that I am using a browser in python? Thanks again!
If your browser is accessing the web via a proxy server, look that up on your browser settings and use that in python.
r = requests.get(url,
proxies={"http": "http://61.233.25.166:80"})
your proxy server will have a different address.
I'm a Transifex user, I need to retrieve my dashboard page with the list of all the projects of my organization.
that is, the page I see when I login: https://www.transifex.com/organization/(my_organization_name)/dashboard
I can access Transifex API with this code:
import urllib.request as url
usr = 'myusername'
pwd = 'mypassword'
def GetUrl(Tx_url):
auth_handler = url.HTTPBasicAuthHandler()
auth_handler.add_password(realm='Transifex API',
uri=Tx_url,
user=usr,
passwd=pwd)
opener = url.build_opener(auth_handler)
url.install_opener(opener)
f = url.urlopen(Tx_url)
return f.read().decode("utf-8")
everything is ok, but there's no API call to get all the projects of my organization.
the only way is to get that page html, and parse it, but if I use this code, I get the login page.
This works ok with google.com, but I get an error with www.transifex.com or www.transifex.com/organization/(my_organization_name)/dashboard
Python, HTTPS GET with basic authentication
I'm new at Python, I need some code with Python 3 and only standard library.
Thanks for any help.
The call to
/projects/
returns your projects along with all the public projects that you can have access (like what you said). You can search for the ones that you need by modifying the call to something like:
https://www.transifex.com/api/2/projects/?start=1&end=6
Doing so the number of projects returned will be restricted.
For now maybe it would be more convenient to you, if you don't have many projects, to use this call:
/project/project_slug
and fetch each one separately.
Transifex comes with an API, and you can use it to fetch all the projects you have.
I think that what you need this GET request on projects. It returns a list of (slug, name, description, source_language_code) for all projects that you have access to in JSON format.
Since you are familiar with python, you could use the requests library to perform the same actions in a much easier and more readable way.
You will just need to do something like that:
import requests
import json
AUTH = ('yourusername', 'yourpassword')
url = 'www.transifex.com/api/2/projects'
headers = {'Content-type': 'application/json'}
response = requests.get(url, headers=headers, auth=AUTH)
I hope I've helped.
This script succeeds at getting a 200 response object, getting a cookie, and returning reddit's stock homepage source. However, it is supposed to get the source of the "recent activity" subpage which can only be accessed after logging in. This makes me think it's failing to log in appropriately but the username and password are accurate, I've double checked that.
#!/usr/bin/python
import requests
import urllib2
auth = ('username', 'password')
with requests.session(auth=auth) as s:
c = s.get('http://www.reddit.com')
cookies = c.cookies
for k, v in cookies.items():
opener = urllib2.build_opener()
opener.addheaders.append(('cookie', '{}={}'.format(k, v)))
f = opener.open('http://www.reddit.com/account-activity')
print f.read()
It looks like you're using the standard "HTTP Basic" authentication, which is not what Reddit uses to log in to its web site. (Almost no web sites use HTTP Basic (which pops up a modal dialog box requesting authentication), but implement their own username/password form).
What you'll need to do is get the home page, read the login form fields, fill in the user name and password, POST the response back to the web site, get the resulting cookie, then use the cookie in future requests. There may be quite a number of other details for you to work out too, but you'll have to experiment.
I just think maybe we're having the same problem. I get status code 200 ok. But the script never logged me in. I'm getting some suggestions and help. Hopefully you'll let me know what works for you too. Seems reddit is using the same system too.
Check out this page where my problem is being discussed.
Authentication issue using requests on aspx site
I'm working on a simple HTML scraper for Hulu in python 2.6 and am having problems with logging on to my account. Here's my code so far:
import urllib
import urllib2
from cookielib import CookieJar
#make a cookie and redirect handlers
cookies = CookieJar()
cookie_handler= urllib2.HTTPCookieProcessor(cookies)
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)#make opener w/ handlers
#build the url
login_info = {'username':USER,'password':PASS}#USER and PASS are defined
data = urllib.urlencode(login_info)
req = urllib2.Request("http://www.hulu.com/account/authenticate",data)#make the request
test = opener.open(req) #open the page
print test.read() #print html results
The code compiles and runs, but all that prints is:
Login.onError("Please \074a href=\"/support/login_faq#cant_login\"\076enable cookies\074/a\076 and try again.");
I assume there is some error in how I'm handling cookies, but just can't seem to spot it. I've heard Mechanize is a very useful module for this type of program, but as this seems to be the only speed bump left, I was hoping to find my bug.
What you're seeing is a ajax return. It is probably using javascript to set the cookie, and screwing up your attempts to authenticate.
The error message you are getting back could be misleading. For example the server might be looking at user-agent and seeing that say it's not one of the supported browsers, or looking at HTTP_REFERER expecting it to be coming from hulu domain. My point is there are two many variables coming in the request to keep guessing them one by one
I recommend using an http analyzer tool, e.g. Charles or the one in Firebug to figure out what (header fields, cookies, parameters) the client sends to server when you doing hulu login via a browser. This will give you the exact request that you need to construct in your python code.