How to url encode the binary contents of a video in Python? - python

I wanted to port video to Tumblr using the API using the Tumblpy library.
My code is this:
import requests
r = requests.get(video-url)
f = {'data':r.content}
dat = urllib.urlencode(f)
t.post('post', blog_url='http://tumblrname.tumblr.com/',params={'type':'video',
'title':post.title, 'slug': post.slug,'date':post.date,'data':dat,'tags':post.tagscsv,
'caption': post.body_html}) #t is TumblPy instance
Well, I am not being successful in this. I do think I am missing out on how to encode the binary contents to make the post successful, though I am not sure.

Presumably it's going to be similar to how you post a photo, in which case the library wants a file(like) object. A requests response can act as a file-like object just fine:
import requests
r = requests.get(video_url)
t.post('post', blog_url='http://tumblrname.tumblr.com/',
params={'type': 'video', 'title': post.title, 'slug': post.slug,
'date': post.date, 'data': r.raw, 'tags': post.tagscsv,
'caption': post.body_html})
where r.raw gives you a file-like object that, when read, yields the video data read from video_url.

Related

Trying to Parse a JSON using Python but having issues

I usually use Powershell and have parsed JSONs from HTTP requests, successfully, before. I am now using Python and using the 'Requests' library. I have successfully got the JSON from the API. Here is the format it came through in (I removed some information and other fields).:
{'content': [
{
'ContactCompany': Star,
'ContactEmail': test#company.star,
'ContactPhoneNumber': 123-456-7894,
'assignedGroup': 'TR_Hospital',
'assignedGroupId': 'SGP000000132297',
'serviceClass': None, 'serviceReconId': None
}
]
}
I'm having trouble getting the values inside of the 'content.' With my Powershell experience in the past, I've tried:
tickets_json = requests.get(request_url, headers=api_header).json()
Tickets_Info = tickets_json.content
for tickets in tickets_info:
tickets.assignedGroup
How do I parse the JSON to get the information inside of 'Content' in Python?
tickets_json = requests.get(request_url, headers=api_header).json()
tickets_info = tickets_json['content']
for tickets in tickets_info:
print(tickets['assignedGroup'])

Extracting articles from New York Post by using Python and New York Post API

I am trying to create a corpus of text documents via the New york Times API (articles concerning terrorist attacks) on Python.
I am aware that the NYP API do not provide the full body text, but provides the URL from which I can scrape the article. So the idea is to extract the "web_url" parameters from the API and consequently scrape the full body article.
I am trying to use the NYT API library on Python with these lines:
from nytimesarticle import articleAPI
api = articleAPI("*Your Key*")
articles = api.search( q = 'terrorist attack')
print(articles['response'],['docs'],['web_url'])
But I cannot extract the "web_url" or the articles. All I get is this output:
{'meta': {'time': 19, 'offset': 10, 'hits': 0}, 'docs': []} ['docs'] ['web_url']
There seems to be an issue with the nytimesarticle module itself. For example, see the following:
>>> articles = api.search(q="trump+women+accuse", begin_date=20161001)
>>> print(articles)
{'response': {'docs': [], 'meta': {'offset': 0, 'hits': 0, 'time': 21}}, 'status': 'OK', 'copyright': 'Copyright (c) 2013 The New York Times Company. All Rights Reserved.'}
But if I use requests (as is used in the module) to access the API directly, I get the results I'm looking for:
>>> import requests
>>> r = requests.get("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=trump+women+accuse&begin_date=20161001&api-key=XXXXX")
>>> data = r.json()
>>> len(data["response"]["docs"])
10
meaning that 10 articles were returned (the full value of data is 16kb, so I won't include it all here). Contrast that to the response from api.search(), where articles["response"]["docs"] is an empty list.
nytimesarticle.py is only 115 lines long, so it's pretty straightforward to debug. Printing the value of the URL sent to the API reveals this:
>>> articles = api.search(q="trump+women+accuse", begin_date=20161001)
https://api.nytimes.com/svc/search/v2/articlesearch.json?q=b'trump+women+accuse'&begin_date=20161001&api-key=XXXXX
# ^^ THIS
The offending code encodes every string parameter to UTF-8, which makes it a bytes object. This is not necessary, and wrecks the constructed URL as shown above. Fortunately, there is a pull request that fixes this:
>>> articles = api.search(q="trump+women+accuse", begin_date=20161001)
http://api.nytimes.com/svc/search/v2/articlesearch.json?begin_date=20161001&q=trump+women+accuse&api-key=XXXXX
>>> len(articles["response"]["docs"])
10
This also allows for other string parameters such as sort="newest" to be used, as the bytes formatting was causing an error previously.
The comma in the print statement separates what is printed.
You'll want something like this
articles['response']['docs']['web_url']
But 'docs': [] is both an array and empty, so above line won't work, so you could try
articles = articles['response']['docs']
for article in articles:
print(article['web_url'])

Requests dict from cookiejar issue with escaped chars

I'm running into some issues getting a cookie into a dictionary with python. It seems to be all escaped somehow even after running the command provided by requests.
resp = requests.get(geturl, cookies=cookies)
cookies = requests.utils.dict_from_cookiejar(resp.cookies)
and this is what cookies looks like
{'P-fa9d887b1fe1a997d543493080644610': '"\\050dp1\\012S\'variant\'\\012p2\\012S\'corrected\'\\012p3\\012sS\'pid\'\\012p4\\012VNTA2NjU0OTU4MDc5MTgwOA\\075\\075\\012p5\\012sS\'format\'\\012p6\\012S\'m3u8\'\\012p7\\012sS\'mode\'\\012p8\\012Vlive\\012p9\\012sS\'type\'\\012p10\\012S\'video/mp2t\'\\012p11\\012s."'}
Is there any way to make the characters unescaped in the value section of P-fa9d887b1fe1a997d543493080644610 become escaped and part of the dict itself?
Edit:
I would like the dictionary to look something like:
{'format': 'm3u8', 'variant': 'corrected', 'mode': u'live', 'pid': u'NTA2NjU0OTU4MDc5MTgwOA==', 'type': 'video/mp2t'}
You are dealing with the Python Pickle format for data serialisation. Once you have evaluated the expression, so escaped characters are unescaped, you need to load the pickle from a string using the pickle.loads function.
>>> import pickle
>>> import ast
>>> pickle.loads(ast.literal_eval("'''" + cookies.values()[0] + "'''")[1:-1])
{'pid': u'NTA2NjU0OTU4MDc5MTgwOA==', 'type': 'video/mp2t', 'variant': 'corrected', 'mode': u'live', 'format': 'm3u8'}

how to get request body text using bottle?

I'm using bottle to receive POST webhook from bitbucket. The body of the POST contains info about what changed in the repo, etc. I am able to do this fine with #post('/myroute'), however I'm having trouble getting to the actual POST body data text.
here is an image that shows what I'm doing end to end
http://i.imgur.com/rWc7Hry.png
When printed to consolerequest.body yields:
StringIO.StringIO instance at 0x7fa13341c4d0
and request.body.dir() yields:
AttributeError: StringIO instance has no attribute 'dir'
I'm wondering how do I get to the actual text of the request body (or inspect the object somehow to find the same)?
the POST request body will look something like this:
http://pastebin.com/SWjLrHig
I've also tried request.json (no luck)
any advice?
EDIT:
i ended up using this:
from bottle import get, post, request, run
import urllib
import json
#post('/bitbucket')
def postToJSON():
body = request.body.read()
body = body.replace("+","").replace("payload=","")
parsedBody = urllib.unquote(body).decode('utf8')
print parsedBody
jsonObj = json.loads(parsedBody)
print jsonObj
interesting now, parsedBody looks good:
{"repository":{"website":null,"fork":false,"name":"test","scm":"git","owner":"
testName","absolute_url":"/testNameTest/test/","slug":"test","is_private":true},"trunc
ated":false,"commits":[{"node":"04554d6980dd","files":[{"type":"modified","file"
:"stacker.py"}],"raw_author":"TestName<testName#testName.info>","utctimestamp":"
2015-05-2815:30:03+00:00","author":"testName","timestamp":"2015-05-2817:30:03","
raw_node":"04554d6980dd3c5fe4c3712d95b49fcf9b8da4f4","parents":["7f98b4e7532e"],
"branch":"master","message":"foo\n","revision":null,"size":-1}],"canon_url":"htt
ps://bitbucket.org","user":"testName"}
but jsonObj is not so good:
{u'commits': [{u'node': u'7f98b4e7532e', u'files': [{u'type': u'modified', u'fil
e': u'stacker.py'}], u'branch': u'master', u'utctimestamp': u'2015-05-2815:24:50
+00:00', u'author': u'TestName', u'timestamp': u'2015-05-2817:24:50', u'raw_node
': u'7f98b4e7532e02d53d83a29ec2073c5a5eac58c8', u'parents': [u'019e77d2e0d3'], u
'raw_author': u'TestNamer<TestName#TestName.info>', u'message': u'foo\n', u'size'
: -1, u'revision': None}], u'user': u'TestName', u'canon_url': u'https://bitbuck
et.org', u'repository': {u'website': None, u'fork': False, u'name': u'test', u's
cm': u'git', u'absolute_url': u'/ericTest/test/', u'owner': u'TestName', u'slug'
: u'test', u'is_private': True}, u'truncated': False}
however, when I do something like
print jsonObj['repository']['name']
it works as expected (just prints the name 'test')
As the bottle documentation states, the request data is "a file like object". http://bottlepy.org/docs/dev/tutorial.html#the-raw-request-body
So you access the raw body using read().
Also, dir is not a method of objects, it's a freestanding function which you call passing an object.
dir(request.body)
And googling for StringIO should have brought you here: https://docs.python.org/2/library/stringio.html

How to retrieve and display a Vimeo video's JSON data in python 3.x?

I want to retrieve and work with basic Vimeo data in python 3.2, given a video's URL. I'm a newcomer to JSON (and python), but it looked like the right fit for doing this.
Request Vimeo video data (via an API-formatted .json URL)
Convert returned JSON data into python dict
Display dict keys & data ("id", "title", "description", etc.)
Another SO page Get json data via url and use in python did something similar in python 2.x, but syntax changes (like integrating urllib2) led me to try this.
>>> import urllib
>>> import json
>>> req = urllib.request.urlopen("http://vimeo.com/api/v2/video/31161781.json")
>>> opener = urllib.request.build_opener()
>>> f = opener.open(req)
Traceback (most recent call last):
File "<pyshell#28>", line 1, in <module>
f = opener.open(req)
File "C:\Python32\lib\urllib\request.py", line 358, in open
protocol = req.type
AttributeError: 'HTTPResponse' object has no attribute 'type'
This code will integrate into an existing project, so I'm tied to using python. I know enough about HTTP queries to guess the data's within that response object, but not enough about python to understand why the open failed and how to reference it correctly. What should I try instead of opener.open(req)?
This works for me:
import urllib.request, json
response = urllib.request.urlopen('http://vimeo.com/api/v2/video/31161781.json')
content = response.read()
data = json.loads(content.decode('utf8'))
Or with Requests:
import requests
data = requests.get('http://vimeo.com/api/v2/video/31161781.json').json()
Can you try to just request the url like so
response = urllib.urlopen('http://www.weather.com/weather/today/Ellicott+City+MD+21042')
response_dict = json.loads(response.read())
As you see python has a lot of libraries that share functionality, you shouldn't need to build an opener or anything to get this data.
Check out: http://www.voidspace.org.uk/python/articles/urllib2.shtml
>>> import urllib2
>>> import json
>>> req = urllib2.Request("http://vimeo.com/api/v2/video/31161781.json")
>>> response = urllib2.urlopen(req)
>>> content_string = response.read()
>>> content_string
'[{"id":31161781,"title":"Kevin Fanning talks about hiring for Boston startups","description":"CogoLabs.com talent developer and author Kevin Fanning talks about hiring for small teams in Boston, how job seekers can make themselves more attractive, and why recruiters should go the extra mile to attract talent.","url":"http:\\/\\/vimeo.com\\/31161781","upload_date":"2011-10-26 15:37:35","thumbnail_small":"http:\\/\\/b.vimeocdn.com\\/ts\\/209\\/777\\/209777866_100.jpg","thumbnail_medium":"http:\\/\\/b.vimeocdn.com\\/ts\\/209\\/777\\/209777866_200.jpg","thumbnail_large":"http:\\/\\/b.vimeocdn.com\\/ts\\/209\\/777\\/209777866_640.jpg","user_name":"Venture Cafe","user_url":"http:\\/\\/vimeo.com\\/venturecafe","user_portrait_small":"http:\\/\\/b.vimeocdn.com\\/ps\\/605\\/605070_30.jpg","user_portrait_medium":"http:\\/\\/b.vimeocdn.com\\/ps\\/605\\/605070_75.jpg","user_portrait_large":"http:\\/\\/b.vimeocdn.com\\/ps\\/605\\/605070_100.jpg","user_portrait_huge":"http:\\/\\/b.vimeocdn.com\\/ps\\/605\\/605070_300.jpg","stats_number_of_likes":0,"stats_number_of_plays":43,"stats_number_of_comments":0,"duration":531,"width":640,"height":360,"tags":"startup stories, entrepreneurship, interview, Venture Cafe, jobs","embed_privacy":"anywhere"}]'
>>> loaded_content = json.loads(content_string)
>>> type(content_string)
<type 'str'>
>>> type(loaded_content)
<type 'list'>
you can try to like so:
import requests
url1 = 'http://vimeo.com/api/v2/video/31161781.json'
html = requests.get(url1)
html.encoding = html.apparent_encoding
print(html.text)

Categories