I have seen quite a few links but mostly it gives me errors:
ValueError: Parse error: unable to parse:
'hover_data=["Confirmed","Deaths","Recovered"],
animation_frame="Date",color_continuous_scale="Portland",radius=7,
zoom=0,height=700"'
For example I want to convert the following string into a dict:
abc= 'fn=True, lat="Lat", lon="Long", hover_name="Country/Province/State",hover_data=["Confirmed","Deaths","Recovered"], animation_frame="Date",color_continuous_scale="Portland",radius=7, zoom=0,height=700"'
Expected output:
{'fn': True, "lat":"Lat",
"lon":"Long",
"hover_name":"Country/Province/State",
"hover_data":["Confirmed","Deaths","Recovered"],
"animation_frame":"Date",
"color_continuous_scale":"Portland",
"radius":7,
"zoom":0,
"height":700}
I tried to use this reference's code:
import re
keyval_re = re.compile(r'''
\s* # Leading whitespace is ok.
(?P<key>\w+)\s*=\s*( # Search for a key followed by..
(?P<str>"[^"]*"|\'[^\']*\')| # a quoted string; or
(?P<float>\d+\.\d+)| # a float; or
(?P<int>\d+) # an int.
)\s*,?\s* # Handle comma & trailing whitespace.
|(?P<garbage>.+) # Complain if we get anything else!
''', re.VERBOSE)
def handle_keyval(match):
if match.group('garbage'):
raise ValueError("Parse error: unable to parse: %r" %
match.group('garbage'))
key = match.group('key')
if match.group('str') is not None:
return (key, match.group('str')[1:-1]) # strip quotes
elif match.group('float') is not None:
return (key, float(match.group('float')))
elif match.group('int') is not None:
return (key, int(match.group('int')))
elif match.group('list') is not None:
return (key, int(match.group('list')))
elif match.group('bool') is not None:
return (key, int(match.group('bool')))
print(dict(handle_keyval(m) for m in keyval_re.finditer(abc)))
There seems to be an unwanted double-quote character as the last character of your string abc.
If that is removed, the following solution will work nicely:
eval("dict(" + abc + ")")
Output:
{'fn': True,
'lat': 'Lat',
'lon': 'Long',
'hover_name': 'Country/Province/State',
'hover_data': ['Confirmed', 'Deaths', 'Recovered'],
'animation_frame': 'Date',
'color_continuous_scale': 'Portland',
'radius': 7,
'zoom': 0,
'height': 700}
⚠️ DON'T USE EVAL.
import re, ast
test_string = 'fn=True, lat="Lat", lon="Long", hover_name="Country/Province/State",hover_data=["Confirmed","Deaths","Recovered"], animation_frame="Date",color_continuous_scale="Portland",radius=7, zoom=0,height=700'
items = re.split(r', |,(?=\w)', test_string)
d = {
key: ast.literal_eval(val)
for item in items
for key, val in [re.split(r'=|\s*=\s*', item)]
}
print(d)
I used a very simple method. Just splitted the string on , and then plain dict comprehension. I've also used ast.literal_eval() to convert strings into their respective keywords and data types.
Related
How can I convert a str representation of the list, such as the below string into a dictionary?
a = '[100:0.345,123:0.34,145:0.86]'
Expected output :
{100:0.345,123:0.34,145:0.86}
First tried to convert the string into a list using ast.literal_eval. But it's showing an error as : invalid syntax
It's showing as invalid syntax because it has the wrong brackets, so you could do
ast.literal_eval(a.replace("[","{").replace("]", "}"))
Or alternatively parse the string yourself in a dictionary comprehension
{x.split(":")[0]: x.split(":")[1] for x in a[1:-1].split(",")}
and if as mentioned there are [ or ] elsewhere in your string the following may be more robust
ast.literal_eval("{" + a[1:-1] +"}")
I would simply try
eval(a.replace('[', '{').replace(']', '}'))
To convert to a dict:
Code:
data = '[100:0.345,123:0.34,145:0.86]'
new_data = dict(y.split(':') for y in (x.strip().strip('[').strip(']')
for x in data.split(',')))
print(new_data)
Or if you need numbers not strings:
new_data = dict((map(float, y.split(':'))) for y in (
x.strip().strip('[').strip(']') for x in data.split(',')))
Results:
{'100': '0.345', '123': '0.34', '145': '0.86'}
{145.0: 0.86, 123.0: 0.34, 100.0: 0.345}
Translate brackets to braces, literal_eval.
rebracket = {91: 123, 93: 125}
a = '[100:0.345,123:0.34,145:0.86]'
literal_eval(a.translate(rebracket))
Given a string representation of a dict:
a = '[100:0.345,123:0.34,145:0.86]'
Ignore the containing braces [..], and break up the elements on commas:
a = a[1:-1].split(",")
For each element, separate the key and value:
d1 = [x.split(":") for x in a]
Reconstitute the parsed data as a dict:
d2 = { int(k[0]) : float(k[1]) for k in d1}
I have a dictionary containing the following key-value pairs: d={'Alice':'x','Bob':'y','Chloe':'z'}
I want to replace the lower case variables(values) by the constants(keys) in any given string.
For example, if my string is:
A(x)B(y)C(x,z)
how do I replace the characters in order to get a resultant string of :
A(Alice)B(Bob)C(Alice,Chloe)
Should I use regular expressions?
re.sub() solution with replacement function:
import re
d = {'Alice':'x','Bob':'y','Chloe':'z'}
flipped = dict(zip(d.values(), d.keys()))
s = 'A(x)B(y)C(x,z)'
result = re.sub(r'\([^()]+\)', lambda m: '({})'.format(','.join(flipped.get(k,'')
for k in m.group().strip('()').split(','))), s)
print(result)
The output:
A(Alice)B(Bob)C(Alice,Chloe)
Extended version:
import re
def repl(m):
val = m.group().strip('()')
d = {'Alice':'x','Bob':'y','Chloe':'z'}
flipped = dict(zip(d.values(), d.keys()))
if ',' in val:
return '({})'.format(','.join(flipped.get(k,'') for k in val.split(',')))
else:
return '({})'.format(flipped.get(val,''))
s = 'A(x)B(y)C(x,z)'
result = re.sub(r'\([^()]+\)', repl, s)
print(result)
Bonus approach for particular input case A(x)B(y)C(Alice,z):
...
s = 'A(x)B(y)C(Alice,z)'
result = re.sub(r'\([^()]+\)', lambda m: '({})'.format(','.join(flipped.get(k,'') or k
for k in m.group().strip('()').split(','))), s)
print(result)
I assume you want to replace the values in a string with the respective keys of the dictionary. If my assumption is correct you can try this without using regex.
First the swap the keys and values using dictionary comprehension.
my_dict = {'Alice':'x','Bob':'y','Chloe':'z'}
my_dict = { y:x for x,y in my_dict.iteritems()}
Then using list_comprehension, you replace the values
str_ = 'A(x)B(y)C(x,z)'
output = ''.join([i if i not in my_dict.keys() else my_dict[i] for i in str_])
Hope this is what you need ;)
Code
import re
d={'Alice':'x','Bob':'y','Chloe':'z'}
keys = d.keys()
values = d.values()
s = "A(x)B(y)C(x,z)"
for i in range(0, len(d.keys())):
rx = r"" + re.escape(values[i])
s = re.sub(rx, keys[i], s)
print s
Output
A(Alice)B(Bob)C(Alice,Chloe)
Also you could use the replace method in python like this:
d={'x':'Alice','y':'Bob','z':'Chloe'}
str = "A(x)B(y)C(x,z)"
for key in d:
str = str.replace(key,d[key])
print (str)
But yeah you should swipe your dictionary values like Kishore suggested.
This is the way that I would do it:
import re
def sub_args(text, tosub):
ops = '|'.join(tosub.keys())
for argstr, _ in re.findall(r'(\(([%s]+?,?)+\))' % ops, text):
args = argstr[1:-1].split(',')
args = [tosub[a] for a in args]
subbed = '(%s)' % ','.join(map(str, args))
text = re.sub(re.escape(argstr), subbed, text)
return text
text = 'A(x)B(y)C(x,z)'
tosub = {
'x': 'Alice',
'y': 'Bob',
'z': 'Chloe'
}
print(sub_args(text, tosub))
Basically you just use the regex pattern to find all of the argument groups and substitute in the proper values--the nice thing about this approach is that you don't have to worry about subbing where you don't want to (for example, if you had a string like 'Fn(F,n)'). You can also have multi-character keys, like 'F(arg1,arg2)'.
I have two type of strings that look something like below
string1 = 'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar'
string2 = 'transcript_id "g3.t1"; gene_id "g3";'
I am trying to create a function that will take the above strings as input and return a dictionary according to the string.
for the string1 dictionary, the structure is like
attributes = {
'ID': 'mrna42',
'Parent': 'gene19',
'integrity': '0.95',
'foo': 'bar',
}
and for the string2
attributes = {
'transcript_id': 'g3.t1',
'gene_id': 'g3',
}
My try:
def parse_single_feature_line(attributestring):
attributes = dict()
for keyvaluepair in attributestring.split(';'):
for key, value in keyvaluepair.split('='):
attributes[key] = value
return attributes
I need help to build the function.
You can have a global solution with regular expressions:
import re
string1 = 'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar'
string2 = 'transcript_id "g3.t1"; gene_id "g3";'
# Define the regular expression
reg_exp = "([\.\-\w_]+)=([\.\-\w_]+);?|([\.\-\w_]+) \"([\.\-\w_]+)\""
# Get results and filter empty elements in tuples
match = [filter(None, x) for x in re.findall(reg_exp, string1+"\n"+string2)]
# Convert to dict
result = {key:value for key, value in match}
This regular expression contains two main groups:
Group A ([\.\-\w_]+)=([\.\-\w_);? and group B ([\.\-\w_]+) \"([\.\-\w_]+)\"
Each groups contains another 2 groups, that will match with the name and value pair. Please notice that you may need to adjust these groups to your expected name and values or use (.*?)
You can use dict comprehension!
>>> string1
'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar'
>>> string2
'transcript_id "g3.t1"; gene_id "g3";'
>>> {each.split('=')[0]:each.split('=')[1] for each in string1.split(';') if each}
{'foo': 'bar', 'integrity': '0.95', 'ID': 'mRNA42', 'Parent': 'gene19'}
>>> {each.split(' ')[0]:each.split(' ')[1] for each in string2.split(';') if each}
{'': 'gene_id', 'transcript_id': '"g3.t1"'}
And to solve the problem you are facing,
def parse_single_feature_line(attributestring):
attributes = dict()
for keyvaluepair in attributestring.split(';'):
key,value=keyvaluepair.split('=') # you get a list when you split keyvaluepair string and not a list of list(if list of lists eg.[["this","these"],["that","those"]] then you can use - for key,value in list_of_lists:)
attributes[key] = value
return attributes
print parse_single_feature_line(string1)
Try this
string1 = 'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar'
string2 = 'transcript_id "g3.t1"; gene_id "g3";'
def str2dict(s):
result={}
for i in s.split(";"):
ele=i.strip()
if not ele:continue
if "=" in i:
key,val=ele.split("=")
else:
key,val=ele.split()
result[key]=val.strip('"')
return result
str2dict(string1)
str2dict(string2)
They are different so need to be handled different.
def return_dict(string):
if "=" in string:
return dict(i.strip().split("=") for i in string.split(";"))
else:
return dict([i.strip().split(" ") for i in string.split(";") if len(i.strip().split(" ")) > 1])
return_dict(string1)
return_dict(string2)
gives:
{'ID': 'mRNA42', 'Parent': 'gene19', 'foo': 'bar', 'integrity': '0.95'}
{'gene_id': '"g3"', 'transcript_id': '"g3.t1"'}
First solution: split on the space and strip the quotes on the second half of the result:
>>> key, val = 'transcript_id "g3.t1"'.split(" ", maxsplit=1)
>>> val = val.strip('"')
>>> key
'transcript_id'
>>> val
'g3.t1'
Second solution (more generic): use a regexp to capture the parts:
>>> import re
>>> match = re.search(r'([a-z_]+) "(.+?)"', 'transcript_id "g3.t1"')
>>> key, val = match.groups()
>>> key
'transcript_id'
>>> val
'g3.t1'
If you know beforehand which of your two formats you have in a given string or file, you can pass a callback to do the substring parsing, ie:
def parse_line(attributestring, itemparse):
attributes = dict()
for keyvaluepair in attributestring.split(';'):
if not keyvaluepair:
# empty string due to a trailing ";"
continue
for key, value in itemparse(keyvaluepair):
attributes[key] = value
return attributes
def parse_eq(kvstring):
return kvstring.split("=")
def parse_space(kvstring):
key, val = 'transcript_id "g3.t1"'.split(" ", maxsplit=1)
return key, val.strip('"')
d1 = parse_line(string1, parse_eq)
d2 = parse_line(string2, parse_space)
simplified version, you can add delimiter to split in regex for more string split,
string1 = 'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar'
string2 = 'transcript_id "g3.t1"; gene_id "g3";'
import re
def parse_single_feature_line(string):
attributes = dict(re.split('[ =]', i.strip()) for i in string.split(';') if i)
return attributes
print parse_single_feature_line(string1)
print parse_single_feature_line(string2)
str = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
how do I convert this string to dict?
"source_ip":"127.0.0.1","db_ip":"43.53.696.23","db_port":"3306"
I have tried
str = dict(str)
but it didn't work
Those fragments look like python sets. If you run them through ast.literal_eval you get something close, but since sets are not ordered, you can't guarantee which of the two items is the key and which is the value. This is a total hack, but I replaced the curly braces with parens so they look more tuple-like and made the dictionary from there.
>>> mystr = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
>>> mystr = mystr.replace('{', '(').replace('}', ')')
>>> import ast
>>> mydict = dict(ast.literal_eval(mystr))
>>> mydict
{u'user_name': u'uz,ifls', u'db_port': u'3306', u'source_ip': u'127.0.0.1', u'db_ip': u'43.53.696.23'}
>>>
A few points:
The top-level data structure is actually a tuple (because in Python, 1, 2, 3 is the same as (1, 2, 3).
As others have pointed out, the inner data structures are set literals, which are not ordered.
Set literals are implemented in Python 2.6 but not in its ast.literal_eval function, which is arguably a bug.
As it turns out, you can make your own custom literal_eval function and make it do what you want.
from _ast import *
from ast import *
# This is mostly copied from `ast.py` in your Python source.
def literal_eval(node_or_string):
"""
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the following
Python literal structures: strings, bytes, numbers, tuples, lists, dicts,
sets, booleans, and None.
"""
if isinstance(node_or_string, str):
node_or_string = parse(node_or_string, mode='eval')
if isinstance(node_or_string, Expression):
node_or_string = node_or_string.body
def _convert(node):
if isinstance(node, (Str)):
return node.s
elif isinstance(node, Tuple):
return tuple(map(_convert, node.elts))
elif isinstance(node, Set):
# ** This is the interesting change.. when
# we see a set literal, we return a tuple.
return tuple(map(_convert, node.elts))
elif isinstance(node, Dict):
return dict((_convert(k), _convert(v)) for k, v
in zip(node.keys, node.values))
raise ValueError('malformed node or string: ' + repr(node))
return _convert(node_or_string)
Then we can do:
>>> s = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
>>> dict(literal_eval(s))
{u'user_name': u'uz,ifls', u'db_port': u'3306', u'source_ip': u'127.0.0.1', u'db_ip': u'43.53.696.23'}
I don't know if you want to convert your entire input string to a dict or not, because the output you gave confuses me.
Otherwise, my answer will give you an output like the second hilighted text you want in a dict format:
a = "{ u'source_ip', u'127.0.0.1'}, { u'db_ip', u'43.53.696.23'}, { u'db_port', u'3306'}, { u'user_name', u'uz,ifls'} "
c = a.replace("{", '').replace("}","").replace(" u'", '').replace("'", '').replace(" ", "").split(",")
d, j = {}, 0
for i in range(len(c)):
if j +2 > len(c):
break
if c[j] == "user_name":
#d[c[j]] = "uz,ifls" #uncomment this line to have a complete dict
continue
d[c[j]] = c[j+1]
j += 2
Output:
print d
{'db_port': '3306', 'source_ip': '127.0.0.1', 'db_ip': '43.53.696.23'}
print type(d)
<type 'dict'>
If you want to have a complete dict of your string uncomment the line which is commented above, and the output will be:
print d
{'user_name': 'uz,ifls', 'db_port': '3306', 'source_ip': '127.0.0.1', 'db_ip': '43.53.696.23'}
print type(d)
<type 'dict'>
I am trying to implement a simple twisted HTTP server which would respond to requests for loading tiles from a database and return them. However I find that the way it interprets request strings quite odd.
This is what I POST to the server:
curl -d "request=loadTiles&grid[0][x]=17&grid[0][y]=185&grid[1][x]=18&grid[1][y]=184" http://localhost:8080/fetch/
What I expect the request.args to be:
{'request': 'loadTiles', 'grid': [{'x': 17, 'y': 185}, {'x': 18, 'y': 184}]}
How Twisted interprets request.args:
{'grid[1][y]': ['184'], 'grid[0][y]': ['185'], 'grid[1][x]': ['18'], 'request': ['loadTiles'], 'grid[0][x]': ['17']}
Is it possible to have it automatically parse the request string and create a list for the grid parameter or do I have to do it manually?
I could json encode the grid parameter and then decode it server side, but it seems like an unneccssary hack.
I don't know why you would expect your urlencoded data to be decoded according to some ad-hoc non-standard rules, or why you would consider the standard treatment "odd"; [ isn't special in query strings. What software decodes them this way?
In any event, this isn't really Twisted, but Python (and more generally speaking, the web-standard way of parsing this data). You can see the sort of data you'll get back via the cgi.parse_qs function interactively. For example:
>>> import cgi
>>> cgi.parse_qs("")
{}
>>> cgi.parse_qs("x=1")
{'x': ['1']}
>>> cgi.parse_qs("x[something]=1")
{'x[something]': ['1']}
>>> cgi.parse_qs("x=1&y=2")
{'y': ['2'], 'x': ['1']}
>>> cgi.parse_qs("x=1&y=2&x=3")
{'y': ['2'], 'x': ['1', '3']}
I hope that clears things up for you.
Maybe instead of a parser, how about something to post-process the request.args you are getting?
from pyparsing import Suppress, alphas, alphanums, nums, Word
from itertools import groupby
# you could do this with regular expressions too, if you prefer
LBRACK,RBRACK = map(Suppress, '[]')
ident = Word('_' + alphas, '_' + alphanums)
integer = Word(nums).setParseAction(lambda t : int(t[0]))
subscriptedRef = ident + 2*(LBRACK + (ident | integer) + RBRACK)
def simplify_value(v):
if isinstance(v,list) and len(v)==1:
return simplify_value(v[0])
if v == integer:
return int(v)
return v
def regroup_args(dd):
ret = {}
subscripts = []
for k,v in dd.items():
# this is a pyparsing short-cut to see if a string matches a pattern
# I also used it above in simplify_value to test for integerness of a string
if k == subscriptedRef:
subscripts.append(tuple(subscriptedRef.parseString(k))+
(simplify_value(v),))
else:
ret[k] = simplify_value(v)
# sort all the matched subscripted args, and then use groupby to
# group by name and list index
# this assumes all indexes 0-n are present in the parsed arguments
subscripts.sort()
for name,nameitems in groupby(subscripts, key=lambda x:x[0]):
ret[name] = []
for idx,idxitems in groupby(nameitems, key=lambda x:x[1]):
idd = {}
for item in idxitems:
name, i, attr, val = item
idd[attr] = val
ret[name].append(idd)
return ret
request_args = {'grid[1][y]': ['184'], 'grid[0][y]': ['185'], 'grid[1][x]': ['18'], 'request': ['loadTiles'], 'grid[0][x]': ['17']}
print regroup_args(request_args)
prints
{'grid': [{'y': 185, 'x': 17}, {'y': 184, 'x': 18}], 'request': 'loadTiles'}
Note that this also simplifies the single-element lists to just the 0'th element value, and converts the numeric strings to actual integers.