Weird json value urllib python

Weird json value urllib python - python

I'm trying to manipulate a dynamic JSON from this site:
http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do
It has 3 elements, imagem, a base64, labelValorCaptcha, just a message, and uuidCaptcha, a value to pass by parameter to play a sound in this link bellow:
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_e7b072e1fce5493cbdc46c9e4738ab8a
When I enter in the first site through a browser and put in the second link the uuidCaptha after the equal ("..uuidCaptcha="), the sound plays normally. I wrote a simple code to catch this elements.
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
But I dont know what's happening, the caught value of the uuidCaptcha doesn't work. Open a error web page.
Someone knows?
Thanks!

It works for me.
$ cat a.py
#!/usr/bin/env python
# encoding: utf-8
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
$ python a.py
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_efc8d4bc3bdb428eab8370c4e04ab42c

As I said #Charlie Harding, the best way is download the page and get the JSON values, because this JSON is dynamic and need an opened web link to exist.
More info here.

Related

Trying to parse xml from url store to string so I can use in another spot to output to irc

The following is the xml from remote URL
<SHOUTCASTSERVER>
<CURRENTLISTENERS>0</CURRENTLISTENERS>
<PEAKLISTENERS>0</PEAKLISTENERS>
<MAXLISTENERS>100</MAXLISTENERS>
<UNIQUELISTENERS>0</UNIQUELISTENERS>
<AVERAGETIME>0</AVERAGETIME>
<SERVERGENRE>variety</SERVERGENRE>
<SERVERGENRE2/>
<SERVERGENRE3/>
<SERVERGENRE4/>
<SERVERGENRE5/>
<SERVERURL>http://localhost/</SERVERURL>
<SERVERTITLE>Wicked Radio WIKD/WPOS</SERVERTITLE>
<SONGTITLE>Unknown - Haxor Radio Show 08</SONGTITLE>
<STREAMHITS>0</STREAMHITS>
<STREAMSTATUS>1</STREAMSTATUS>
<BACKUPSTATUS>0</BACKUPSTATUS>
<STREAMLISTED>0</STREAMLISTED>
<STREAMLISTEDERROR>200</STREAMLISTEDERROR>
<STREAMPATH>/stream</STREAMPATH>
<STREAMUPTIME>448632</STREAMUPTIME>
<BITRATE>128</BITRATE>
<CONTENT>audio/mpeg</CONTENT>
<VERSION>2.4.7.256 (posix(linux x64))</VERSION>
</SHOUTCASTSERVER>
All I am trying to do is store the contents of the element <SONGTITLE> store it so I can post to IRC using a bot that I have.
import urllib2
from lxml import etree
url = "http://142.4.217.133:9203/stats?sid=1&mode=viewxml&page=0"
fp = urllib2.urlopen(url)
doc = etree.parse(fp)
fp.close()
for record in doc.xpath('//SONGTITLE'):
for x in record.xpath("./subfield/text()"):
print "\t", x
That is what I have so far; not sure what I am doing wrong here. I am quite new to python but the IRC bot works and does some other utility type things I just want to add this as a feature to it.

You don't need to include ./subfield/:
for x in record.xpath("text()"):
Output:
Unknown - Haxor Radio Show 08

urllib2.urlopen not getting all content

I am a beginner in python to pull some data from reddit.com
More precisely, I am trying to send a request to http:www.reddit.com/r/nba/.json to get the JSON content of the page and then parse it for entries about a specific team or player.
To automate the data gathering, I am requesting the page like this:
import urllib2
FH = urllib2.urlopen("http://www.reddit.com/r/nba/.json")
rnba = FH.readlines()
rnba = str(rnba[0])
FH.close()
I am also pulling the content like this on a copy of the script, just to be sure:
FH = requests.get("http://www.reddit.com/r/nba/.json",timeout=10)
rnba_json = FH.json()
FH.close()
However, I am not getting the full data that is presented when I manually go to
http://www.reddit.com/r/nba/.json with either method, in particular when I call
print len(rnba_json['data']['children']) # prints 20-something child stories
but when I do the same loading the copy-pasted JSON string like this:
import json
import urllib2
fh = r"""{"kind": "Listing", "data": {"modhash": ..."""# long JSON string
r_nba = json.loads(fh) #loads the json string from the site into json object
print len(r_nba['data']['children']) #prints upwards of 100 stories
I get more story links. I know about the timeout parameter but providing it did not resolve anything.
What am I doing wrong or what can I do to get all the content presented when I pull the page in the browser?

To get the max allowed, you'd use the API like: http://www.reddit.com/r/nba/.json?limit=100

Python script for "Google search by image"

I have checked Google Search API's and it seems that they have not released any API for searching "Images". So, I was wondering if there exists a python script/library through which I can automate the "search by image feature".

This was annoying enough to figure out that I thought I'd throw a comment on the first python-related stackoverflow result for "script google image search". The most annoying part of all this is setting up your proper application and custom search engine (CSE) in Google's web UI, but once you have your api key and CSE, define them in your environment and do something like:
#!/usr/bin/env python
# save top 10 google image search results to current directory
# https://developers.google.com/custom-search/json-api/v1/using_rest
import requests
import os
import sys
import re
import shutil
url = 'https://www.googleapis.com/customsearch/v1?key={}&cx={}&searchType=image&q={}'
apiKey = os.environ['GOOGLE_IMAGE_APIKEY']
cx = os.environ['GOOGLE_CSE_ID']
q = sys.argv[1]
i = 1
for result in requests.get(url.format(apiKey, cx, q)).json()['items']:
link = result['link']
image = requests.get(link, stream=True)
if image.status_code == 200:
m = re.search(r'[^\.]+$', link)
filename = './{}-{}.{}'.format(q, i, m.group())
with open(filename, 'wb') as f:
image.raw.decode_content = True
shutil.copyfileobj(image.raw, f)
i += 1

There is no API available but you are can parse the page and imitate the browser, but I don't know how much data you need to parse because google may limit or block access.
You can imitate the browser by simply using urllib and setting correct headers, but if you think parsing complex web-pages may be difficult from python, you can directly use a headless browser like phontomjs, inside a browser it is trivial to get correct elements using javascript/DOM
Note before trying all this check google's TOS

You can try this:
https://developers.google.com/image-search/v1/jsondevguide#json_snippets_python
It's deprecated, but seems to work.

The JSON syntax vs html/xml tags

The JSON syntax definition say that
html/xml tags (like the <script>...</script> part) are not part of
valid json, see the description at http://json.org.
A number of browsers and tools ignore these things silently, but python does
not.
I'd like to insert the javascript code (google analytics) to get info about the users using this service (place, browsers, OS ...).
What do you suggest to do?
I should solve the problem on [browser output][^1] or [python script][^2]?
thanks,
Antonio
[^1]: Browser output
<script>...</script>
[{"key": "value"}]
[^2]: python script
#!/usr/bin/env python
import urllib2, urllib, json
url="http://.........."
params = {}
url = url + '?' + urllib.urlencode(params, doseq=True)
req = urllib2.Request(url)
headers = {'Accept':'application/json;text/json'}
for key, val in headers.items():
req.add_header(key, val)
data = urllib2.urlopen(req)
print json.load(data)

These sound like two different kinds of services--one is a user-oriented web view of some data, with visualizations, formatting, etc., and one is a machine-oriented data service. I would keep these separate, and maybe build the user view as an extension to the data service.

Python error when using urllib.open

When I run this:
import urllib
feed = urllib.urlopen("http://www.yahoo.com")
print feed
I get this output in the interactive window (PythonWin):
<addinfourl at 48213968 whose fp = <socket._fileobject object at 0x02E14070>>
I'm expecting to get the source of the above URL. I know this has worked on other computers (like the ones at school) but this is on my laptop and I'm not sure what the problem is here. Also, I don't understand this error at all. What does it mean? Addinfourl? fp? Please help.

Try this:
print feed.read()
See Python docs here.

urllib.urlopen actually returns a file-like object so to retrieve the contents you will need to use:
import urllib
feed = urllib.urlopen("http://www.yahoo.com")
print feed.read()

In python 3.0:
import urllib
import urllib.request
fh = urllib.request.urlopen(url)
html = fh.read().decode("iso-8859-1")
fh.close()
print (html)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Weird json value urllib python - python

As I said #Charlie Harding, the best way is download the page and get the JSON values, because this JSON is dynamic and need an opened web link to exist. More info here.

Related

Trying to parse xml from url store to string so I can use in another spot to output to irc

urllib2.urlopen not getting all content

Python script for "Google search by image"

The JSON syntax vs html/xml tags

Python error when using urllib.open

Categories

Resources