escaping double quotes in cgi.FieldStorage() with json.loads python - python

if I receive a JSON string with different values I want to escape double quotes.
This is not working because if I want to loop over the values of given fields I need to json.loads(string) first, but this already fails because their is a misleading double quote in one of the values.
If I loop over the raw string it escapes double quotes that are set correctly as well and it fails again.
How would I accomplish to just loop over the values?
print("Test started...")
try:
import json
import cgi
form3 = cgi.FieldStorage()
print("cgi Fieldstorage loaded in form3...")
form2 = form3["json"].value
print("form 2 is now form3.value...")
print("loop now starting...")
for x in form2:
print("in loop...")
x = x.replace('"','\"')
print("dumped an item in form1...")
form1 = json.loads(form2)
print("form1 is being prepared with json.loads... ")
print("dumped form1 string looks like : " + json.dumps(form1))
# handeling JSON-Exceptions thrown by corrupted parameters
except (ValueError, KeyError):
import sys
print("Encoding Error")
sys.exit()
Example input from adressbar:
http://localhost/script.py?json={"field":"value","field2":"value","field3":"val"ue"}
note that value from field 3 should escape as following:
http://localhost/script.py?json={"field":"value","field2":"value","field3":"val\"ue"}
what happens if the string is escaped without loading it as json via json.loads(string)
http://localhost/script.py?json={\"field\":\"value\",\"field2\":\"value\",\"field3\":\"val\"ue\"}
it's obvious that this happens, but this string can't be loaded via json.loads anymore
neither can I json.loads and afterwards escape the values because json.loads won't recognize a correct json string (value3 corrupted)

Related

How To Get The value of the input by passing a dynamic input name using Xpath?

The working code is like this:
csrf = list(set(htmls.xpath("//input[#name='whatever']/#value")))[0]
However, I'm trying to get that input name as a parameter passed into the function, in that way I would do something like this:
tokenname = sys.argv[2]
which gives the value 'whatever', and I want to pass it something like this:
csrf = list(set(htmls.xpath("//input[#name="+tokenname+"]/#value")))[0]
But it doesn't work that way, anyway to pass a variable in that #name value?
The full code is here:
import requests
from lxml import html
import json
import sys
session_requests = requests.session()
login_url = sys.argv[1]
tokenname = sys.argv[2]
result = session_requests.get(login_url)
htmls = html.fromstring(result.text)
csrf = list(set(htmls.xpath("//input[#name={}]/#value".format(tokenname))))[0]
print csrf
EDIT
Based upon discussion, looks like you had issues with " and escape charcaters.
Use following
csrf = list(set(htmls.xpath("//input[#name=\"{}\"]/#value".format(tokenname))))[0]
Old
You can use format as below
"//input[#name={}]/#value".format('whatever')
From python doc site
str.format(*args, **kwargs)
Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.
>>> "The sum of 1 + 2 is {0}".format(1+2)
'The sum of 1 + 2 is 3'

Parse multipart/form-data file in UTF-8

I am parsing a multipart/form input with Python's cgi module:
body_file = StringIO.StringIO(self.request.body)
pdict = {'boundary': 'xYzZY'}
httpbody = cgi.parse_multipart(body_file, pdict)
text = self.trim(httpbody['text'])
and I want to print some elements of httpbody that are the UTF-8 encoded.
I tried text.decode('utf-8') and unicode(text, encoding='utf-8'), but nothing seems to work. Am I missing something here?
Try the following:
text = self.trim(httpbody['text'])
text.encode('utf-8')
I'm assuming the text variable is in string, if not sure str(). Otherwise, you'll get another error thrown at you.

ValueError: need more than 1 value to unpack, PoolManager request

The following code in utils.py
manager = PoolManager()
data = json.dumps(dict) #takes in a python dictionary of json
manager.request("POST", "https://myurlthattakesjson", data)
Gives me ValueError: need more than 1 value to unpack when the server is run. Does this most likely mean that the JSON is incorrect or something else?
Your Json data needs to be URLencoded for it to be POST (or GET) safe.
# import parser
import urllib.parse
manager = PoolManager()
# stringify your data
data = json.dumps(dict) #takes in a python dictionary of json
# base64 encode your data string
encdata = urllib.parse.urlencode(data)
manager.request("POST", "https://myurlthattakesjson", encdata)
I believe in python3 they made some changes that the data needs to be binary. See unable to Post data to a login form using urllib python v3.2.1

Submitting empty form and weird output

Here's my form :
<form action = "/search/" method = "get">
<input type = "text" name = "q">
<input type = "submit" value = "Search">
</form>
And here's my view:
def search(request):
if 'q' in request.GET:
message = 'You searched for: %r' % request.GET['q']
else:
message = 'You submitted an empty form :('
return HttpResponse(message)
When I try to input something everything works fine, except for weird u' ' thing. For example when I enter asdasda I get the output You searched for: u'asdsa'. Another problem is that when I submit an empty form the output is simply u'', when it should be "You submitted an empty form :(". I'm reading "The Django Book", the 1.x.x version and this was an example..
The "weird u thing" is a unicode string. You can read about it here: http://docs.python.org/tutorial/introduction.html#unicode-strings
And I'm guessing since the user pressed submit, you get a request that has an empty q value (u'') since the user didn't enter anything. That makes sense, right? You should change your if statement to check for this empty unicode string.
For the first problem, try using %s instead of %r. What you're doing is 'raw' formatting, which, when the string is unicode, tells you that. Normal string formatting will just copy the value without the 'u' or quotes.
For the second problem, a text input will always have the key in the dictionary. Instead of you if statement, try:
if request.GET['q'] != "":
to test if the string is empty.
'q' is present in the request.GET dictionary after the form is submitted, it just happens to be empty in that case. Try this, to show the error message when submitting an empty query:
if 'q' in request.GET and request.GET['q'] != '':
The strange u is due to the %r which calls repr-- use %s instead.
>>>'%r' % u'foo'
[out] "u'foo'"
>>>'%s' % u'foo'
[out] u'foo'

Extracting data from a URL result with special formatting

I have a URL:
http://somewhere.com/relatedqueries?limit=2&query=seedterm
where modifying the inputs, limit and query, will generate wanted data. Limit is the max number of term possible and query is the seed term.
The URL provides text result formatted in this way:
oo.visualization.Query.setResponse({version:'0.5',reqId:'0',status:'ok',sig:'1303596067112929220',table:{cols:[{id:'score',label:'Score',type:'number',pattern:'#,##0.###'},{id:'query',label:'Query',type:'string',pattern:''}],rows:[{c:[{v:0.9894380670262618,f:'0.99'},{v:'newterm1'}]},{c:[{v:0.9894380670262618,f:'0.99'},{v:'newterm2'}]}],p:{'totalResultsCount':'7727'}}});
I'd like to write a python script that takes two arguments (limit number and the query seed), go fetch the data online, parse the result and return a list with the new terms ['newterm1','newterm2'] in this case.
I'd love some help, especially with the URL fetching since I have never done this before.
It sounds like you can break this problem up into several subproblems.
Subproblems
There are a handful of problems that need to be solved before composing the completed script:
Forming the request URL: Creating a configured request URL from a template
Retrieving data: Actually making the request
Unwrapping JSONP: The returned data appears to be JSON wrapped in a JavaScript function call
Traversing the object graph: Navigating through the result to find the desired bits of information
Forming the request URL
This is just simple string formatting.
url_template = 'http://somewhere.com/relatedqueries?limit={limit}&query={seedterm}'
url = url_template.format(limit=2, seedterm='seedterm')
Python 2 Note
You will need to use the string formatting operator (%) here.
url_template = 'http://somewhere.com/relatedqueries?limit=%(limit)d&query=%(seedterm)s'
url = url_template % dict(limit=2, seedterm='seedterm')
Retrieving data
You can use the built-in urllib.request module for this.
import urllib.request
data = urllib.request.urlopen(url) # url from previous section
This returns a file-like object called data. You can also use a with-statement here:
with urllib.request.urlopen(url) as data:
# do processing here
Python 2 Note
Import urllib2 instead of urllib.request.
Unwrapping JSONP
The result you pasted looks like JSONP. Given that the wrapping function that is called (oo.visualization.Query.setResponse) doesn't change, we can simply strip this method call out.
result = data.read()
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
Parsing JSON
The resulting result string is just JSON data. Parse it with the built-in json module.
import json
result_object = json.loads(result)
Traversing the object graph
Now, you have a result_object that represents the JSON response. The object itself be a dict with keys like version, reqId, and so on. Based on your question, here is what you would need to do to create your list.
# Get the rows in the table, then get the second column's value for
# each row
terms = [row['c'][2]['v'] for row in result_object['table']['rows']]
Putting it all together
#!/usr/bin/env python3
"""A script for retrieving and parsing results from requests to
somewhere.com.
This script works as either a standalone script or as a library. To use
it as a standalone script, run it as `python3 scriptname.py`. To use it
as a library, use the `retrieve_terms` function."""
import urllib.request
import json
import sys
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
# Strip JSONP function wrapper
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
# Deserialize JSON to Python objects
result_object = json.loads(result)
# Get the rows in the table, then get the second column's value
# for each row
return [row['c'][2]['v'] for row in result_object['table']['rows']]
def retrieve_terms(limit, seedterm):
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://somewhere.com/relatedqueries?limit={limit}&query={seedterm}'
url = url_template.format(limit=limit, seedterm=seedterm)
try:
with urllib.request.urlopen(url) as data:
data = perform_request(limit, seedterm)
result = data.read()
except:
print('Could not request data from server', file=sys.stderr)
exit(E_OPERATION_ERROR)
terms = parse_result(result)
print(terms)
def main(limit, seedterm):
"""Retrieves and parses data and prints each term to standard output"""
terms = retrieve_terms(limit, seedterm)
for term in terms:
print(term)
if __name__ == '__main__'
try:
limit = int(sys.argv[1])
seedterm = sys.argv[2]
except:
error_message = '''{} limit seedterm
limit must be an integer'''.format(sys.argv[0])
print(error_message, file=sys.stderr)
exit(2)
exit(main(limit, seedterm))
Python 2.7 version
#!/usr/bin/env python2.7
"""A script for retrieving and parsing results from requests to
somewhere.com.
This script works as either a standalone script or as a library. To use
it as a standalone script, run it as `python2.7 scriptname.py`. To use it
as a library, use the `retrieve_terms` function."""
import urllib2
import json
import sys
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
# Strip JSONP function wrapper
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
# Deserialize JSON to Python objects
result_object = json.loads(result)
# Get the rows in the table, then get the second column's value
# for each row
return [row['c'][2]['v'] for row in result_object['table']['rows']]
def retrieve_terms(limit, seedterm):
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://somewhere.com/relatedqueries?limit=%(limit)d&query=%(seedterm)s'
url = url_template % dict(limit=2, seedterm='seedterm')
try:
with urllib2.urlopen(url) as data:
data = perform_request(limit, seedterm)
result = data.read()
except:
sys.stderr.write('%s\n' % 'Could not request data from server')
exit(E_OPERATION_ERROR)
terms = parse_result(result)
print terms
def main(limit, seedterm):
"""Retrieves and parses data and prints each term to standard output"""
terms = retrieve_terms(limit, seedterm)
for term in terms:
print term
if __name__ == '__main__'
try:
limit = int(sys.argv[1])
seedterm = sys.argv[2]
except:
error_message = '''{} limit seedterm
limit must be an integer'''.format(sys.argv[0])
sys.stderr.write('%s\n' % error_message)
exit(2)
exit(main(limit, seedterm))
i didn't understand well your problem because from your code there it seem to me that you use Visualization API (it's the first time that i hear about it by the way).
But well if you are just searching for a way to fetch data from a web page you could use urllib2 this is just for getting data, and if you want to parse the retrieved data you will have to use a more appropriate library like BeautifulSoop
if you are dealing with another web service (RSS, Atom, RPC) rather than web pages you can find a bunch of python library that you can use and that deal with each service perfectly.
import urllib2
from BeautifulSoup import BeautifulSoup
result = urllib2.urlopen('http://somewhere.com/relatedqueries?limit=%s&query=%s' % (2, 'seedterm'))
htmletxt = resul.read()
result.close()
soup = BeautifulSoup(htmltext, convertEntities="html" )
# you can parse your data now check BeautifulSoup API.

Categories