How to convert a string dictionary to dictionary in python? - python

I have dictionary in form of string -
dict_in_str = '{"name":"wrong_answers","y":8,"color": colorForSections["wrong_answers"]}'
I have tried both, the json.loads(dict_in_str) and ast.literal_eval(dict_in_str) methods.
But both of them are giving
json.decoder.JSONDecodeError: Expecting value: line 1 column 40 (char 39)
and
ValueError: malformed node or string: <_ast.Subscript object at 0x02082B20>
respectively. On forums and tutorials I have looked upon these methods seems to work.

As mentioned by deceze and matszwejca, he issue is your value in your last part, "color": colorForSections["wrong_answers"]. It can't evaluate the value of 'colorForSections["wrong_answers"]' because it's in a string. When you run loads() on it, it can't convert it to any known type (I'm assuming that's what's going on, egg on my face if not). Try passing in the key string ("wrong_answers") and then retrieving the value from colorForSections after you run loads() on the string. Something like:
dict_in_str = '{"name":"wrong_answers","y":8,"color": "wrong_answers"}'
x = json.loads(dict_in_str)
color = colorForSections[x['color']]

you can replace and split into list and then convert into dict.
dict_in_str = '{"name":"wrong_answers","y":8,"color":
colorForSections["wrong_answers"]}'
data_lst = dict_in_str.replace("'", '').replace('"', '').replace('{',
'').replace('}', '').replace(':', ',').split(',')
lst = [i.strip() for i in data_lst if i.strip()]
result = {lst[i]: lst[i + 1] for i in range(0, len(lst), 2)}
print(result)
print(type(result))
>>> {'name': 'wrong_answers', 'y': '8', 'color': 'colorForSections[wrong_answers]'}
<class 'dict'>

Related

TypeError: list indices must be integers or slices, not re.Pattern

I am trying to extract data from text using regular expressions. I want to loop through the regular expression 'options' and then write the outcome to a specific list.
I think that I may not be writing my loop, and referencing the lists correctly. I get an error on line 27 stating: TypeError: list indices must be integers or slices, not re.Pattern. I have tried to put the regexlist into range(), but i then get this error: TypeError: 'list' object cannot be interpreted as an integer on line 18 this time. I'm not sure on how to get around this?
Please see my code below:
import re
regexcode0 = re.compile(r'Test 0')
regexcode1 = re.compile(r'Test 1')
regexcode2 = re.compile(r'Test 2')
results_Test0 = []
results_Test1 = []
results_Test2 = []
allResults = [results_Test0, results_Test1, results_Test2]
regexlist = [regexcode0, regexcode1, regexcode2]
textBody = 'Hi there, Test 2 was a failure'
def text_extract(text):
for i in regexlist:
match = re.search(i, text)
if match:
matchObj = match.group()
allResults[i].append(matchObj)
if not match:
allResults[i].append('No Solution')
return allResults
print(text_extract(textBody))
I want the results to look like this:
results_Test0 = ['No Solution']
results_Test1 = ['No Solution']
results_Test2 = ['Test 2']
The type of results_Testn is list. The syntax of list[index] requires an integer to indicate a specific postition in the list. You are attempting to use i as an index, and its type is not integer, so that results in the error. If you want an integer for each iteration of the for loop, you can use the enumerate function:
for index, i in enumerate(regexlist):
#do something
In this example, i represents the re pattern and index is the number representing its position in the list, so you can use allresults[index] to save the result.

x.split has no effect

For some reason x.split(':', 1)[-1] doesn't do anything. Could someone explain and maybe help me?
I'm trying to remove the data before : (including ":") but it keeps that data anyway
Code
data = { 'state': 1, 'endTime': 1518852709307, 'fileSize': 000000 }
data = data.strip('{}')
data = data.split(',')
for x in data:
x.split(':', 1)[-1]
print(x)`
Output
"state":1
"endTime":1518852709307
"fileSize":16777216
It's a dictonary, not a list of strings.
I think this is what you're looking for:
data = str({"state":1,"endTime":1518852709307,"fileSize":000000}) #add a str() here
data = data.strip('{}')
data = data.split(',')
for x in data:
x=x.split(':')[-1] # set x to x.split(...)
print(x)
The script below prints out:
1
1518852709307
0
Here is a one-liner version:
print (list(map(lambda x:x[1],data.items())))
Prints out:
[1, 1518852709307, 0]
Which is a list of integers.
Seems like you just want the values in the dictionary
data = {"state":1,"endTime":1518852709307,"fileSize":000000}
for x in data:
print(data[x])
I'm not sure, but I think it's because the computer treats "state" and 1 as separate objects. Therefore, it is merely stripping the string "state" of its colons, of which there are none.
You could make the entire dictionary into a string by putting:
data = str({ Your Dictionary Here })
then, print what you have left in for "for x in data" statement like so:
for x in data:
b = x.split(':', 1)[-1] # creating a new string
print(b)
data in your code is a dictionary. So you can just access your the values of it like data[state] which evaluates to 1.
If you get this data as a string like:
data = "{'state':1, 'endTime':1518852709307, 'fileSize':000000}"
You could use json.loads to convert it into a dictionary and access the data like explained above.
import json
data = '{"state":1, "endTime":1518852709307, "fileSize":0}'
data = json.loads(data)
for _,v in data.items():
print(v)
If you want to parse the string yourself this should work:
data = '{"state":1,"endTime":1518852709307,"fileSize":000000}'
data = data.strip('{}')
data = data.split(',')
for x in data:
x=x.split(':')[-1]
print(x)

Could you explain this?

Quick question: in Python 3, if I have the following code
def file2dict(filename):
dictionary = {}
data = open(filename, 'r')
for line in data:
[ key, value ] = line.split(',')
dictionary[key] = value
data.close()
return dictionary
It means that file MUST contain exactly 2 strings(or numbers, or whatever) on every line in the file because of this line:
[ key, value ] = line.split(',')
So, if in my file I have something like this
John,45,65
Jack,56,442
The function throws an exception.
The question: why key, value are in square brackets? Why, for example,
adr, port = s.accept()
does not use square brackets?
And how to modify this code if I want to attach 2 values to every key in a dictionary? Thank you.
The [ and ] around key, value aren't getting you anything.
The error that you're getting, ValueError: too many values to unpack is because you are splitting text like John,45,65 by the commas. Do "John,45,65".split(',') in a shell. You get
>>> "John,45,65".split(',')
['John', '45', '65']
Your code is trying to assign 3 values, "John", 45, and 65, to two variables, key and value, thus the error.
There are a few options:
1) str.split has an optional maxsplit parameter:
>>> "John,45,65".split(',', 1)
['John', '45,65']
if "45,65" is the value you want to set for that key in the dictionary.
2) Cut the extra value.
If the 65 isn't what you want, then you can do something either like
>>> name, age, unwanted = "John,45,65".split(',',)
>>> name, age, unwanted
('John', '45', '65')
>>> dictionary[name] = age
>>> dictionary
{'John': '45'}
and just not use the unwanted variable, or split into a list and don't use the last element:
>>> data = "John,45,65".split(',')
>>> dictionary[data[0]] = data[1]
>>> dictionary
{'John': '45'}
you can use three variable's instead of two, make first one key,
def file2dict(filename):
dictionary = {}
data = open(filename, 'r')
for line in data:
key, value1,value2 = line.split(',')
dictionary[key] = [int(value1), int(value2)]
data.close()
return dictionary
When doing a line split to a dictionary, consider limiting the number of splits by specifying maxsplit, and checking to make sure that the line contains a comma:
def file2dict(filename):
data = open(filename, 'r')
dictionary = dict(item.split(",",1) for item in data if "," in item)
data.close()
return dictionary

Pyparsing: Parsing semi-JSON nested plaintext data to a list

I have a bunch of nested data in a format that loosely resembles JSON:
company="My Company"
phone="555-5555"
people=
{
person=
{
name="Bob"
location="Seattle"
settings=
{
size=1
color="red"
}
}
person=
{
name="Joe"
location="Seattle"
settings=
{
size=2
color="blue"
}
}
}
places=
{
...
}
There are many different parameters with varying levels of depth--this is just a very small subset.
It also might be worth noting that when a new sub-array is created that there is always an equals sign followed by a line break followed by the open bracket (as seen above).
Is there any simple looping or recursion technique for converting this data to a system-friendly data format such as arrays or JSON? I want to avoid hard-coding the names of properties. I am looking for something that will work in Python, Java, or PHP. Pseudo-code is fine, too.
I appreciate any help.
EDIT: I discovered the Pyparsing library for Python and it looks like it could be a big help. I can't find any examples for how to use Pyparsing to parse nested structures of unknown depth. Can anyone shed light on Pyparsing in terms of the data I described above?
EDIT 2: Okay, here is a working solution in Pyparsing:
def parse_file(fileName):
#get the input text file
file = open(fileName, "r")
inputText = file.read()
#define the elements of our data pattern
name = Word(alphas, alphanums+"_")
EQ,LBRACE,RBRACE = map(Suppress, "={}")
value = Forward() #this tells pyparsing that values can be recursive
entry = Group(name + EQ + value) #this is the basic name-value pair
#define data types that might be in the values
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
quotedString.setParseAction(removeQuotes)
#declare the overall structure of a nested data element
struct = Dict(LBRACE + ZeroOrMore(entry) + RBRACE) #we will turn the output into a Dictionary
#declare the types that might be contained in our data value - string, real, int, or the struct we declared
value << (quotedString | struct | real | integer)
#parse our input text and return it as a Dictionary
result = Dict(OneOrMore(entry)).parseString(inputText)
return result.dump()
This works, but when I try to write the results to a file with json.dump(result), the contents of the file are wrapped in double quotes. Also, there are \n chraacters between many of the data pairs. I tried suppressing them in the code above with LineEnd().suppress() , but I must not be using it correctly.
Parsing an arbitrarily nested structure can be done with pyparsing by defining a placeholder to hold the nested part, using the Forward class. In this case, you are just parsing simple name-value pairs, where then value could itself be a nested structure containing name-value pairs.
name :: word of alphanumeric characters
entry :: name '=' value
struct :: '{' entry* '}'
value :: real | integer | quotedstring | struct
This translates to pyparsing almost verbatim. To define value, which can recursively contain values, we first create a Forward() placeholder, which can be used as part of the definition of entry. Then once we have defined all the possible types of values, we use the '<<' operator to insert this definition into the value expression:
EQ,LBRACE,RBRACE = map(Suppress,"={}")
name = Word(alphas, alphanums+"_")
value = Forward()
entry = Group(name + EQ + value)
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
quotedString.setParseAction(removeQuotes)
struct = Group(LBRACE + ZeroOrMore(entry) + RBRACE)
value << (quotedString | struct | real | integer)
The parse actions on real and integer will convert these elements from strings to float or ints at parse time, so that the values can be used as their actual types immediately after parsing (no need to post-process to do string-to-other-type conversion).
Your sample is a collection of one or more entries, so we use that to parse the total input:
result = OneOrMore(entry).parseString(sample)
We can access the parsed data as a nested list, but it is not so pretty to display. This code uses pprint to pretty-print a formatted nested list:
from pprint import pprint
pprint(result.asList())
Giving:
[['company', 'My Company'],
['phone', '555-5555'],
['people',
[['person',
[['name', 'Bob'],
['location', 'Seattle'],
['settings', [['size', 1], ['color', 'red']]]]],
['person',
[['name', 'Joe'],
['location', 'Seattle'],
['settings', [['size', 2], ['color', 'blue']]]]]]]]
Notice that all the strings are just strings with no enclosing quotation marks, and the ints are actual ints.
We can do just a little better than this, by recognizing that the entry format actually defines a name-value pair suitable for accessing like a Python dict. Our parser can do this with just a few minor changes:
Change the struct definition to:
struct = Dict(LBRACE + ZeroOrMore(entry) + RBRACE)
and the overall parser to:
result = Dict(OneOrMore(entry)).parseString(sample)
The Dict class treats the parsed contents as a name followed by a value, which can be done recursively. With these changes, we can now access the data in result like elements in a dict:
print result['phone']
or like attributes in an object:
print result.company
Use the dump() method to view the contents of a structure or substructure:
for person in result.people:
print person.dump()
print
prints:
['person', ['name', 'Bob'], ['location', 'Seattle'], ['settings', ['size', 1], ['color', 'red']]]
- location: Seattle
- name: Bob
- settings: [['size', 1], ['color', 'red']]
- color: red
- size: 1
['person', ['name', 'Joe'], ['location', 'Seattle'], ['settings', ['size', 2], ['color', 'blue']]]
- location: Seattle
- name: Joe
- settings: [['size', 2], ['color', 'blue']]
- color: blue
- size: 2
There is no "simple" way, but there are harder and not-so-hard ways. If you don't want to hardcode things, then at some point you're going to have to parse it as a structured format. That would involve parsing each line one-by-one, tokenizing it appropriately (for example, separating the key from the value correctly), and then determining how you want to deal with the line.
You may need to store your data in an intermediary format such as a (parse) tree in order to account for the arbitrary nesting relationships (represented by indents and braces), and then after you have finished parsing the data, take your resulting tree and then go through it again to get your arrays or JSON.
There are libraries available such as ANTLR that handles some of the manual work of figuring out how to write the parser.
Take a look at this code:
still_not_valid_json = re.sub (r'(\w+)=', r'"\1":', pseudo_json ) #1
this_one_is_tricky = re.compile ('("|\d)\n(?!\s+})', re.M)
that_one_is_tricky_too = re.compile ('(})\n(?=\s+\")', re.M)
nearly_valid_json = this_one_is_tricky.sub (r'\1,\n', still_not_valid_json) #2
nearly_valid_json = that_one_is_tricky_too.sub (r'\1,\n', nearly_valid_json) #3
valid_json = '{' + nearly_valid_json + '}' #4
You can convert your pseudo_json in parseable json via some substitutions.
Replace '=' with ':'
Add missing commas between simple value (like "2" or "Joe") and next field
Add missing commas between closing brace of a complex value and next field
Embrace it with braces
Still there are issues. In your example 'people' dictionary contains two similar keys 'person'. After parsing only one key remains in the dictionary. This is what I've got after parsing:{u'phone': u'555-5555', u'company': u'My Company', u'people': {u'person': {u'settings': {u'color': u'blue', u'size': 2}, u'name': u'Joe', u'location': u'Seattle'}}}
If only you could replace second occurence of 'person=' to 'person1=' and so on...
Replace the '=' with ':', Then just read it as json, add in trailing commas
Okay, I came up with a final solution that actually transforms this data into a JSON-friendly Dict as I originally wanted. It first using Pyparsing to convert the data into a series of nested lists and then loops through the list and transforms it into JSON. This allows me to overcome the issue where Pyparsing's toDict() method was not able to handle where the same object has two properties of the same name. To determine whether a list is a plain list or a property/value pair, the prependPropertyToken method adds the string __property__ in front of property names when Pyparsing detects them.
def parse_file(self,fileName):
#get the input text file
file = open(fileName, "r")
inputText = file.read()
#define data types that might be in the values
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
yes = CaselessKeyword("yes").setParseAction(replaceWith(True))
no = CaselessKeyword("no").setParseAction(replaceWith(False))
quotedString.setParseAction(removeQuotes)
unquotedString = Word(alphanums+"_-?\"")
comment = Suppress("#") + Suppress(restOfLine)
EQ,LBRACE,RBRACE = map(Suppress, "={}")
data = (real | integer | yes | no | quotedString | unquotedString)
#define structures
value = Forward()
object = Forward()
dataList = Group(OneOrMore(data))
simpleArray = (LBRACE + dataList + RBRACE)
propertyName = Word(alphanums+"_-.").setParseAction(self.prependPropertyToken)
property = dictOf(propertyName + EQ, value)
properties = Dict(property)
object << (LBRACE + properties + RBRACE)
value << (data | object | simpleArray)
dataset = properties.ignore(comment)
#parse it
result = dataset.parseString(inputText)
#turn it into a JSON-like object
dict = self.convert_to_dict(result.asList())
return json.dumps(dict)
def convert_to_dict(self, inputList):
dict = {}
for item in inputList:
#determine the key and value to be inserted into the dict
dictval = None
key = None
if isinstance(item, list):
try:
key = item[0].replace("__property__","")
if isinstance(item[1], list):
try:
if item[1][0].startswith("__property__"):
dictval = self.convert_to_dict(item)
else:
dictval = item[1]
except AttributeError:
dictval = item[1]
else:
dictval = item[1]
except IndexError:
dictval = None
#determine whether to insert the value into the key or to merge the value with existing values at this key
if key:
if key in dict:
if isinstance(dict[key], list):
dict[key].append(dictval)
else:
old = dict[key]
new = [old]
new.append(dictval)
dict[key] = new
else:
dict[key] = dictval
return dict
def prependPropertyToken(self,t):
return "__property__" + t[0]

How to get a dicitionary key value from a string that contains dictionary?

I have a string that contains dictionary:
data = 'IN.Tags.Share.handleCount({"count":17737,"fCnt":"17K","fCntPlusOne":"17K","url":"www.test.com\\/"});'
How can i get value of an dictionary element count? (In my case 17737)
P.S. maybe I need to delete IN.Tags.Share.handleCount from string before getting a dictionary by i.e.
k = data.replace("IN.Tags.Share.handleCount", "") but the problem that '()' remains after delete?
Thanks
import re, ast
data = 'IN.Tags.Share.handleCount({"count":17737,"fCnt":"17K","fCntPlusOne":"17K","url":"www.test.com\/"});'
m = re.match('.*({.*})', data)
d = ast.literal_eval(m.group(1))
print d['count']

Categories