How to properly parse request strings with Twisted (python) - python

I am trying to implement a simple twisted HTTP server which would respond to requests for loading tiles from a database and return them. However I find that the way it interprets request strings quite odd.
This is what I POST to the server:
curl -d "request=loadTiles&grid[0][x]=17&grid[0][y]=185&grid[1][x]=18&grid[1][y]=184" http://localhost:8080/fetch/
What I expect the request.args to be:
{'request': 'loadTiles', 'grid': [{'x': 17, 'y': 185}, {'x': 18, 'y': 184}]}
How Twisted interprets request.args:
{'grid[1][y]': ['184'], 'grid[0][y]': ['185'], 'grid[1][x]': ['18'], 'request': ['loadTiles'], 'grid[0][x]': ['17']}
Is it possible to have it automatically parse the request string and create a list for the grid parameter or do I have to do it manually?
I could json encode the grid parameter and then decode it server side, but it seems like an unneccssary hack.

I don't know why you would expect your urlencoded data to be decoded according to some ad-hoc non-standard rules, or why you would consider the standard treatment "odd"; [ isn't special in query strings. What software decodes them this way?
In any event, this isn't really Twisted, but Python (and more generally speaking, the web-standard way of parsing this data). You can see the sort of data you'll get back via the cgi.parse_qs function interactively. For example:
>>> import cgi
>>> cgi.parse_qs("")
{}
>>> cgi.parse_qs("x=1")
{'x': ['1']}
>>> cgi.parse_qs("x[something]=1")
{'x[something]': ['1']}
>>> cgi.parse_qs("x=1&y=2")
{'y': ['2'], 'x': ['1']}
>>> cgi.parse_qs("x=1&y=2&x=3")
{'y': ['2'], 'x': ['1', '3']}
I hope that clears things up for you.

Maybe instead of a parser, how about something to post-process the request.args you are getting?
from pyparsing import Suppress, alphas, alphanums, nums, Word
from itertools import groupby
# you could do this with regular expressions too, if you prefer
LBRACK,RBRACK = map(Suppress, '[]')
ident = Word('_' + alphas, '_' + alphanums)
integer = Word(nums).setParseAction(lambda t : int(t[0]))
subscriptedRef = ident + 2*(LBRACK + (ident | integer) + RBRACK)
def simplify_value(v):
if isinstance(v,list) and len(v)==1:
return simplify_value(v[0])
if v == integer:
return int(v)
return v
def regroup_args(dd):
ret = {}
subscripts = []
for k,v in dd.items():
# this is a pyparsing short-cut to see if a string matches a pattern
# I also used it above in simplify_value to test for integerness of a string
if k == subscriptedRef:
subscripts.append(tuple(subscriptedRef.parseString(k))+
(simplify_value(v),))
else:
ret[k] = simplify_value(v)
# sort all the matched subscripted args, and then use groupby to
# group by name and list index
# this assumes all indexes 0-n are present in the parsed arguments
subscripts.sort()
for name,nameitems in groupby(subscripts, key=lambda x:x[0]):
ret[name] = []
for idx,idxitems in groupby(nameitems, key=lambda x:x[1]):
idd = {}
for item in idxitems:
name, i, attr, val = item
idd[attr] = val
ret[name].append(idd)
return ret
request_args = {'grid[1][y]': ['184'], 'grid[0][y]': ['185'], 'grid[1][x]': ['18'], 'request': ['loadTiles'], 'grid[0][x]': ['17']}
print regroup_args(request_args)
prints
{'grid': [{'y': 185, 'x': 17}, {'y': 184, 'x': 18}], 'request': 'loadTiles'}
Note that this also simplifies the single-element lists to just the 0'th element value, and converts the numeric strings to actual integers.

Related

Utility Function to convert dictionary's "key"=value into key:value

I have seen quite a few links but mostly it gives me errors:
ValueError: Parse error: unable to parse:
'hover_data=["Confirmed","Deaths","Recovered"],
animation_frame="Date",color_continuous_scale="Portland",radius=7,
zoom=0,height=700"'
For example I want to convert the following string into a dict:
abc= 'fn=True, lat="Lat", lon="Long", hover_name="Country/Province/State",hover_data=["Confirmed","Deaths","Recovered"], animation_frame="Date",color_continuous_scale="Portland",radius=7, zoom=0,height=700"'
Expected output:
{'fn': True, "lat":"Lat",
"lon":"Long",
"hover_name":"Country/Province/State",
"hover_data":["Confirmed","Deaths","Recovered"],
"animation_frame":"Date",
"color_continuous_scale":"Portland",
"radius":7,
"zoom":0,
"height":700}
I tried to use this reference's code:
import re
keyval_re = re.compile(r'''
\s* # Leading whitespace is ok.
(?P<key>\w+)\s*=\s*( # Search for a key followed by..
(?P<str>"[^"]*"|\'[^\']*\')| # a quoted string; or
(?P<float>\d+\.\d+)| # a float; or
(?P<int>\d+) # an int.
)\s*,?\s* # Handle comma & trailing whitespace.
|(?P<garbage>.+) # Complain if we get anything else!
''', re.VERBOSE)
def handle_keyval(match):
if match.group('garbage'):
raise ValueError("Parse error: unable to parse: %r" %
match.group('garbage'))
key = match.group('key')
if match.group('str') is not None:
return (key, match.group('str')[1:-1]) # strip quotes
elif match.group('float') is not None:
return (key, float(match.group('float')))
elif match.group('int') is not None:
return (key, int(match.group('int')))
elif match.group('list') is not None:
return (key, int(match.group('list')))
elif match.group('bool') is not None:
return (key, int(match.group('bool')))
print(dict(handle_keyval(m) for m in keyval_re.finditer(abc)))
There seems to be an unwanted double-quote character as the last character of your string abc.
If that is removed, the following solution will work nicely:
eval("dict(" + abc + ")")
Output:
{'fn': True,
'lat': 'Lat',
'lon': 'Long',
'hover_name': 'Country/Province/State',
'hover_data': ['Confirmed', 'Deaths', 'Recovered'],
'animation_frame': 'Date',
'color_continuous_scale': 'Portland',
'radius': 7,
'zoom': 0,
'height': 700}
⚠️ DON'T USE EVAL.
import re, ast
test_string = 'fn=True, lat="Lat", lon="Long", hover_name="Country/Province/State",hover_data=["Confirmed","Deaths","Recovered"], animation_frame="Date",color_continuous_scale="Portland",radius=7, zoom=0,height=700'
items = re.split(r', |,(?=\w)', test_string)
d = {
key: ast.literal_eval(val)
for item in items
for key, val in [re.split(r'=|\s*=\s*', item)]
}
print(d)
I used a very simple method. Just splitted the string on , and then plain dict comprehension. I've also used ast.literal_eval() to convert strings into their respective keywords and data types.

How to replace text between parentheses in Python?

I have a dictionary containing the following key-value pairs: d={'Alice':'x','Bob':'y','Chloe':'z'}
I want to replace the lower case variables(values) by the constants(keys) in any given string.
For example, if my string is:
A(x)B(y)C(x,z)
how do I replace the characters in order to get a resultant string of :
A(Alice)B(Bob)C(Alice,Chloe)
Should I use regular expressions?
re.sub() solution with replacement function:
import re
d = {'Alice':'x','Bob':'y','Chloe':'z'}
flipped = dict(zip(d.values(), d.keys()))
s = 'A(x)B(y)C(x,z)'
result = re.sub(r'\([^()]+\)', lambda m: '({})'.format(','.join(flipped.get(k,'')
for k in m.group().strip('()').split(','))), s)
print(result)
The output:
A(Alice)B(Bob)C(Alice,Chloe)
Extended version:
import re
def repl(m):
val = m.group().strip('()')
d = {'Alice':'x','Bob':'y','Chloe':'z'}
flipped = dict(zip(d.values(), d.keys()))
if ',' in val:
return '({})'.format(','.join(flipped.get(k,'') for k in val.split(',')))
else:
return '({})'.format(flipped.get(val,''))
s = 'A(x)B(y)C(x,z)'
result = re.sub(r'\([^()]+\)', repl, s)
print(result)
Bonus approach for particular input case A(x)B(y)C(Alice,z):
...
s = 'A(x)B(y)C(Alice,z)'
result = re.sub(r'\([^()]+\)', lambda m: '({})'.format(','.join(flipped.get(k,'') or k
for k in m.group().strip('()').split(','))), s)
print(result)
I assume you want to replace the values in a string with the respective keys of the dictionary. If my assumption is correct you can try this without using regex.
First the swap the keys and values using dictionary comprehension.
my_dict = {'Alice':'x','Bob':'y','Chloe':'z'}
my_dict = { y:x for x,y in my_dict.iteritems()}
Then using list_comprehension, you replace the values
str_ = 'A(x)B(y)C(x,z)'
output = ''.join([i if i not in my_dict.keys() else my_dict[i] for i in str_])
Hope this is what you need ;)
Code
import re
d={'Alice':'x','Bob':'y','Chloe':'z'}
keys = d.keys()
values = d.values()
s = "A(x)B(y)C(x,z)"
for i in range(0, len(d.keys())):
rx = r"" + re.escape(values[i])
s = re.sub(rx, keys[i], s)
print s
Output
A(Alice)B(Bob)C(Alice,Chloe)
Also you could use the replace method in python like this:
d={'x':'Alice','y':'Bob','z':'Chloe'}
str = "A(x)B(y)C(x,z)"
for key in d:
str = str.replace(key,d[key])
print (str)
But yeah you should swipe your dictionary values like Kishore suggested.
This is the way that I would do it:
import re
def sub_args(text, tosub):
ops = '|'.join(tosub.keys())
for argstr, _ in re.findall(r'(\(([%s]+?,?)+\))' % ops, text):
args = argstr[1:-1].split(',')
args = [tosub[a] for a in args]
subbed = '(%s)' % ','.join(map(str, args))
text = re.sub(re.escape(argstr), subbed, text)
return text
text = 'A(x)B(y)C(x,z)'
tosub = {
'x': 'Alice',
'y': 'Bob',
'z': 'Chloe'
}
print(sub_args(text, tosub))
Basically you just use the regex pattern to find all of the argument groups and substitute in the proper values--the nice thing about this approach is that you don't have to worry about subbing where you don't want to (for example, if you had a string like 'Fn(F,n)'). You can also have multi-character keys, like 'F(arg1,arg2)'.

Python 2.6 how to convert this string to dict?

str = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
how do I convert this string to dict?
"source_ip":"127.0.0.1","db_ip":"43.53.696.23","db_port":"3306"
I have tried
str = dict(str)
but it didn't work
Those fragments look like python sets. If you run them through ast.literal_eval you get something close, but since sets are not ordered, you can't guarantee which of the two items is the key and which is the value. This is a total hack, but I replaced the curly braces with parens so they look more tuple-like and made the dictionary from there.
>>> mystr = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
>>> mystr = mystr.replace('{', '(').replace('}', ')')
>>> import ast
>>> mydict = dict(ast.literal_eval(mystr))
>>> mydict
{u'user_name': u'uz,ifls', u'db_port': u'3306', u'source_ip': u'127.0.0.1', u'db_ip': u'43.53.696.23'}
>>>
A few points:
The top-level data structure is actually a tuple (because in Python, 1, 2, 3 is the same as (1, 2, 3).
As others have pointed out, the inner data structures are set literals, which are not ordered.
Set literals are implemented in Python 2.6 but not in its ast.literal_eval function, which is arguably a bug.
As it turns out, you can make your own custom literal_eval function and make it do what you want.
from _ast import *
from ast import *
# This is mostly copied from `ast.py` in your Python source.
def literal_eval(node_or_string):
"""
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the following
Python literal structures: strings, bytes, numbers, tuples, lists, dicts,
sets, booleans, and None.
"""
if isinstance(node_or_string, str):
node_or_string = parse(node_or_string, mode='eval')
if isinstance(node_or_string, Expression):
node_or_string = node_or_string.body
def _convert(node):
if isinstance(node, (Str)):
return node.s
elif isinstance(node, Tuple):
return tuple(map(_convert, node.elts))
elif isinstance(node, Set):
# ** This is the interesting change.. when
# we see a set literal, we return a tuple.
return tuple(map(_convert, node.elts))
elif isinstance(node, Dict):
return dict((_convert(k), _convert(v)) for k, v
in zip(node.keys, node.values))
raise ValueError('malformed node or string: ' + repr(node))
return _convert(node_or_string)
Then we can do:
>>> s = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
>>> dict(literal_eval(s))
{u'user_name': u'uz,ifls', u'db_port': u'3306', u'source_ip': u'127.0.0.1', u'db_ip': u'43.53.696.23'}
I don't know if you want to convert your entire input string to a dict or not, because the output you gave confuses me.
Otherwise, my answer will give you an output like the second hilighted text you want in a dict format:
a = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
c = a.replace("{", '').replace("}","").replace(" u'", '').replace("'", '').replace(" ", "").split(",")
d, j = {}, 0
for i in range(len(c)):
if j +2 > len(c):
break
if c[j] == "user_name":
#d[c[j]] = "uz,ifls" #uncomment this line to have a complete dict
continue
d[c[j]] = c[j+1]
j += 2
Output:
print d
{'db_port': '3306', 'source_ip': '127.0.0.1', 'db_ip': '43.53.696.23'}
print type(d)
<type 'dict'>
If you want to have a complete dict of your string uncomment the line which is commented above, and the output will be:
print d
{'user_name': 'uz,ifls', 'db_port': '3306', 'source_ip': '127.0.0.1', 'db_ip': '43.53.696.23'}
print type(d)
<type 'dict'>

Unpack a string into an expanded string

I am given a string in the following format: "a{1;4:6}" and "a{1;2}b{2:4}" where the ; represents two different numbers, and a : represents a sequence of numbers. There can be any number of combinations of semicolons and colons within the brace.
I want to expand it such that these are the results of expanding the two examples above:
"a{1;4:6}" = "a1a4a5a6"
"a{1;2}b{2:4}" = "a1b2b3b4a2b2b3b4"
I've never had to deal with something like this before, since I am usually given strings in some sort of ready-made format which is easily parsable. In this case I have to parse the string manually.
My attempt is to split the string manually, over and over again, until you hit a case where there is either a colon or a semicolon, then start building the string from there. This is horribly inefficient, and I would appreciate any thoughts on this approach. Here is essentially what the code looks like (I omitted a lot of it, just to get the point across more quickly):
>>> s = "a{1;4:6}"
>>> splitted = s.split("}")
>>> splitted
['a{1;4:6', '']
>>> splitted2 = [s.split("{") for s in splitted]
>>> splitted2
[['a', '1;4:6'], ['']]
>>> splitted3 = [s.split(";") for s in splitted2[0]]
>>> splitted3
[['a'], ['1', '4:6']]
# ... etc, then build up the strings manually once the ranges are figured out.
The thinking behind splitting at the close brace at first is that it is guaranteed that a new identifier, with an associated range comes up after it. Where am I going wrong? My approach works for simple strings such as the first example, but it doesn't for the second example. Furthermore it is inefficient. I would be thankful for any input on this problem.
I tried pyparsing for that and IMHO it produced a pretty readable code (took pack_tokens from the previous answer).
from pyparsing import nums, Literal, Word, oneOf, Optional, OneOrMore, Group, delimitedList
from string import ascii_lowercase as letters
# transform a '123' to 123
number = Word(nums).setParseAction(lambda s, l, t: int(t[0]))
# parses 234:543 ranges
range_ = number + Literal(':').suppress() + number
# transforms the range x:y to a list [x, x+1, ..., y]
range_.setParseAction(lambda s, l, t: list(range(t[0], t[1]+1)))
# parse the comma delimited list of ranges or individual numbers
range_list = delimitedList(range_|number,",")
# and pack them in a tuple
range_list.setParseAction(lambda s, l, t: tuple(t))
# parses 'a{2,3,4:5}' group
group = Word(letters, max=1) + Literal('{').suppress() + range_list + Literal('}').suppress()
# transform the group parsed as ['a', [2, 4, 5]] to ['a2', 'a4' ...]
group.setParseAction(lambda s, l, t: tuple("%s%d" % (t[0],num) for num in t[1]))
# the full expression is just those group one after another
expression = OneOrMore(group)
def pack_tokens(s, l, tokens):
current, *rest = tokens
if not rest:
return ''.join(current) # base case
return ''.join(token + pack_tokens(s, l, rest) for token in current)
expression.setParseAction(pack_tokens)
parsed = expression.parseString('a{1,2,3}')[0]
print(parsed)
parsed = expression.parseString('a{1,3:7}b{1:5}')[0]
print(parsed)
import re
def expand(compressed):
# 'b{2:4}' -> 'b{2;3;4}' i.e. reduce the problem to just one syntax
normalized = re.sub(r'(\d+):(\d+)', lambda m: ';'.join(map(str, range(int(m.group(1)), int(m.group(2)) + 1))), compressed)
# 'a{1;2}b{2;3;4}' -> ['a{1;2}', 'b{2;3;4}']
elements = re.findall(r'[a-z]\{[\d;]+\}', normalized)
tokens = []
# ['a{1;2}', 'b{2;3;4}'] -> [['a1', 'a2'], ['b2', 'b3', 'b4']]
for element in elements:
match = re.match(r'([a-z])\{([\d;]+)\}', element)
alphanumerics = [] # match result already guaranteed by re.findall()
for number in match.group(2).split(';'):
alphanumerics.append(match.group(1) + number)
tokens.append(alphanumerics)
# [['a1', 'a2'], ['b2', 'b3', 'b4']] -> 'a1b2b3b4a2b2b3b4'
def pack_tokens(tokens):
current, *rest = tokens
if not rest:
return ''.join(current) # base case
return ''.join(token + pack_tokens(rest) for token in current)
return pack_tokens(tokens)
strings = ['a{1;4:6}', 'a{1;2}b{2:4}', 'a{1;2}b{2:4}c{3;6}']
for string in strings:
print(string, '->', expand(string))
OUTPUT
a{1;4:6} -> a1a4a5a6
a{1;2}b{2:4} -> a1b2b3b4a2b2b3b4
a{1;2}b{2:4}c{3;6} -> a1b2c3c6b3c3c6b4c3c6a2b2c3c6b3c3c6b4c3c6
Just to demonstrate a technique for doing this using eval (as #ialcuaz asked in the comments). Again I wouldn't recommend doing it this way, the other answers are more appropriate. This technique can be useful when the structure is more complex (i.e. recursive with brackets and so on) when you don't want a full blown parser.
import re
import functools
class Group(object):
def __init__(self, prefix, items):
self.groups = [[prefix + str(x) for x in items]]
def __add__(self, other):
self.groups.extend(other.groups)
return self
def __repr__(self):
return self.pack_tokens(self.groups)
# adapted for Python 2.7 from #cdlane's code
def pack_tokens(self, tokens):
current = tokens[:1][0]
rest = tokens[1:]
if not rest:
return ''.join(current)
return ''.join(token + self.pack_tokens(rest) for token in current)
def createGroup(str, *items):
return Group(str, items)
def expand(compressed):
# Replace a{...}b{...} with a{...} + b{...} as we will overload the '+' operator to help during the evaluation
expr = re.sub(r'(\}\w+\{)', lambda m: '} + ' + m.group(1)[1:-1] + '{', compressed)
# Expand : range to explicit list of items (from #cdlane's answer)
expr = re.sub(r'(\d+):(\d+)', lambda m: ';'.join(map(str, range(int(m.group(1)), int(m.group(2)) + 1))), expr)
# Convert a{x;y;..} to a(x,y, ...) so that it evaluates as a function
expr = expr.replace('{', '(').replace('}', ')').replace(";", ",")
# Extract the group prefixes ('a', 'b', ...)
groupPrefixes = re.findall(ur'(\w+)\([\d,]+\)', expr)
# Build a namespace mapping functions 'a', 'b', ... to createGroup() capturing the groupName prefix in the closure
ns = {prefix: functools.partial(createGroup, prefix) for prefix in groupPrefixes}
# Evaluate the expression using the namespace
return eval(expr, ns)
tests = ['a{1;4:6}', 'a{1;2}b{2:4}', 'a{1;2}b{2:4}c{3;6}']
for test in tests:
print(test, '->', expand(test))
Produces:
('a{1;4:6}', '->', a1a4a5a6)
('a{1;2}b{2:4}', '->', a1b2b3b4a2b2b3b4)
('a{1;2}b{2:4}c{3;6}', '->', a1b2c3c6b3c3c6b4c3c6a2b2c3c6b3c3c6b4c3c6)

Python convert string to array assignment

In my application I am receiving a string 'abc[0]=123'
I want to convert this string to an array of items. I have tried eval() it didnt work for me. I know the array name abc but the number of items will be different in each time.
I can split the string, get array index and do. But I would like to know if there is any direct way to convert this string as an array insert.
I would greately appreciate any suggestion.
are you looking for something like
In [36]: s = "abc[0]=123"
In [37]: vars()[s[:3]] = []
In [38]: vars()[s[:3]].append(eval(s[s.find('=') + 1:]))
In [39]: abc
Out[39]: [123]
But this is not a good way to create a variable
Here's a function for parsing urls according to php rules (i.e. using square brackets to create arrays or nested structures):
import urlparse, re
def parse_qs_as_php(qs):
def sint(x):
try:
return int(x)
except ValueError:
return x
def nested(rest, base, val):
curr, rest = base, re.findall(r'\[(.*?)\]', rest)
while rest:
curr = curr.setdefault(
sint(rest.pop(0) or len(curr)),
{} if rest else val)
return base
def dtol(d):
if not hasattr(d, 'items'):
return d
if sorted(d) == range(len(d)):
return [d[x] for x in range(len(d))]
return {k:dtol(v) for k, v in d.items()}
r = {}
for key, val in urlparse.parse_qsl(qs):
id, rest = re.match(r'^(\w+)(.*)$', key).groups()
r[id] = nested(rest, r.get(id, {}), val) if rest else val
return dtol(r)
Example:
qs = 'one=1&abc[0]=123&abc[1]=345&foo[bar][baz]=555'
print parse_qs_as_php(qs)
# {'abc': ['123', '345'], 'foo': {'bar': {'baz': '555'}}, 'one': '1'}
Your other application is doing it wrong. It should not be specifying index values in the parameter keys. The correct way to specify multiple values for a single key in a GET is to simply repeat the key:
http://my_url?abc=123&abc=456
The Python server side should correctly resolve this into a dictionary-like object: you don't say what framework you're running, but for instance Django uses a QueryDict which you can then access using request.GET.getlist('abc') which will return ['123', '456']. Other frameworks will be similar.

Categories