grabbing an ID from string - python

I have a string where I'd like to grab the "id" number 12079500908. I am trying to use ast.literal_eval but received a ValueError: malformed string. Is there any other way to get the id number from the string below?
doc_request = urllib2.Request("https://api.box.com/2.0/search?query=SEARCHTERMS", headers=doc_headers)
doc_response = urllib2.urlopen(doc_request)
view_doc_response = doc_response.read()
doc_dict=ast.literal_eval(view_doc_response)
Edit
Output:
view_doc_response = '{"total_count":1,"entries":[{"type":"file","id":"12079500908","sequence_id":"1","etag":"1","sha1":"6887169228cab0cfb341059194bc980e1be8ad90","name":"file.pdf","description":"","size":897838,"path_collection":{"total_count":2,"entries":[{"type":"folder","id":"0","sequence_id":null,"etag":null,"name":"All Files"},{"type":"folder","id":"1352745576","sequence_id":"0","etag":"0","name":"Patient Files"}]},"created_at":"2013-12-03T10:23:30-08:00","modified_at":"2013-12-03T11:17:52-08:00","trashed_at":null,"purged_at":null,"content_created_at":"2013-12-03T10:23:30-08:00","content_modified_at":"2013-12-03T11:17:52-08:00","created_by":{"type":"user","id":"20672372","name":"name","login":"email"},"modified_by":{"type":"user","id":"206732372","name":"name","login":"email"},"owned_by":{"type":"user","id":"206737772","name":"name","login":"email"},"shared_link":{"url":"https:\\/\\/www.box.net\\/s\\/ymfslf1phfqiw65bunjg","download_url":"https:\\/\\/www.box.net\\/shared\\/static\\/ymfslf1phfqiw65bunjg.pdf","vanity_url":null,"is_password_enabled":false,"unshared_at":null,"download_count":0,"preview_count":0,"access":"open","permissions":{"can_download":true,"can_preview":true}},"parent":{"type":"folder","id":"1352745576","sequence_id":"0","etag":"0","name":"Patient Files"},"item_status":"active"}],"limit":30,"offset":0}'
calling doc_dict gives:
ValueError: malformed string

ast.literal_eval is for parsing valid Python syntax, what you have is JSON. Valid JSON looks a lot like Python syntax except that JSON can contain null, true, and false which are mapped to None, True, and False in Python when passed through a JSON decoder. You can use json.loads for this. The code might look something like this:
import json
doc_dict = json.loads(view_doc_response)
first_id = doc_dict['entries'][0]['id'] # with your data, should be 12079500908
Note that this assumes that you manually added the ... at the end of the string, presumably after shortening it. If that ... is actually in your code as well then you have invalid JSON and you will need to do some processing before it will work.

Related

python, cannot convert string to json

I am using python 3.10.4. I have a string which is as follows:
str_obj = "[(190229461780583, 'main'), (302030571463836, 'feature/BRANCH_livestream'), (1071064128966159, 'feature/BRANCH_sensitive'), (1137211553786277, 'main'), (1283366887974580, 'feature/add_sql_vm'), (1492751472739439, 'feature/BRANCH-2977'), (2662272892040840, 'main'), (4078787696930326, 'main')]"
I need to convert it to a json object. I need it in json format for further data extraction. Following this link I used json.loads() to convert it from string to json:
json_obj = json.loads(str_obj)
However I am getting the error:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I checked this link but it is not really related to my problem.
I cannot simply drop "" from str_obj as this is the value I am receiving from an environment variable sent to me from a DevOps pipeline.
I would like to know what the root cause problem is and how I can resolve it.
Probably something like this would help you if you wanted to make json out of this string.
Surely the best option would be to fix the input data
str_obj = "[(190229461780583, 'main'), (302030571463836, 'feature/BRANCH_livestream'), (1071064128966159, 'feature/BRANCH_sensitive'), (1137211553786277, 'main'), (1283366887974580, 'feature/add_sql_vm'), (1492751472739439, 'feature/BRANCH-2977'), (2662272892040840, 'main'), (4078787696930326, 'main')]"
# clear the input string
str_obj = str_obj.replace('[','').replace(']','').replace('(','').replace(')','').replace('\'','').replace(' ','').split(',')
# create output string
output = {}
# Create pairs of key:value
str_to_list = [str_obj[i:i+2] for i in range(0, len(str_obj), 2)]
# append to output dict
for e in str_to_list:
output[e[0]] = e[1]
# You can easly create json form dict
json_object = json.dumps(output)
That isn't json, json uses key value pairs. Did you want ast.literal_eval?
import ast
ast.literal_eval(str_obj)
# [(190229461780583, 'main'), (302030571463836, 'feature/BRANCH_livestream'), (1071064128966159, 'feature/BRANCH_sensitive'), (1137211553786277, 'main'), (1283366887974580, 'feature/add_sql_vm'), (1492751472739439, 'feature/BRANCH-2977'), (2662272892040840, 'main'), (4078787696930326, 'main')]
The value of your str_obj is simply not a valid json format at all. You can use a JSON parser online to verify the syntax you need to turn into a valid JSON : Online JSON parser

String to list python conversion

My input string looks like below
rule = "['xor',[{'asset':'pc','operator':'=','basis':true}]]"
Expected output
Output = ['xor',[{'asset':'pc','operator':'=','basis':true}]]
Also this is legacy code where I cannot do ast.literal_eval(rule) since basis has non-string value true which will throw error 'malformed string'
Any suggestions to do the same?
I tried with rule.strip('][').split(', '), but the output is not the expected format:
["'and',[{'fact':'waived','operator':'=','basis':true}"]
If you're OK with using eval, then you can define true in the environment to eval:
>>> rule = "['xor',[{'asset':'pc','operator':'=','basis':true}]]"
>>> print(eval(rule, {'true': True}))
['xor', [{'basis': True, 'asset': 'pc', 'operator': '='}]]
I think if you are not using tuples in those strings you could parse it as json.
import json
my_data = json.loads(my_string)
This will depend on the details of what you parsing though so buyer beware.

Remove 'u' in JSON dictionary

I hope to use the dictionary load by json file. However, each item contains the character 'u'. I need to remove the 'u's.
I tried dumps, but it does not work.
import ast
import json
data= {u'dot',
u'dog',
u'fog',
u'eeee'}
res = eval(json.dumps(data))
print res
I hope to get: {
'dot',
'dog',
'fog,
'eeee'
}
But the error is:
TypeError: set([u'eeee', u'fog', u'dog', u'dot']) is not JSON serializable
The strings that start with u are unicode strings.
In your case, this has nothing to do with the problem:
data= {u'dot',
u'dog',
u'fog',
u'eeee'}
This creates a set and stores the results in the data variable. The json serializer can't handle sets since the json spec makes no mention of them. If you change this to be a list, the serializer can handle the data:
res = set(eval(json.dumps(list(data))))
Here I'm converting the data variable to a list to serialize it, then converting it back to a set to store the result in the res variable.
Alternatively, you can directly ask Python to convert the unicode strings to strings, using something like this:
res = {x.encode("utf-8") for x in data}
print(res)

Yet another Python looping over JSON array

I spent several hours on this, tried everything I found online, pulled some of the hair left on my head...
I have this JSON sent to a Flask webservice I'm writing :
{'jsonArray': '[
{
"nom":"0012345679",
"Start":"2018-08-01",
"Finish":"2018-08-17",
"Statut":"Validee"
},
{
"nom":"0012345679",
"Start":"2018-09-01",
"Finish":"2018-09-10",
"Statut":"Demande envoyée au manager"
},
{
"nom":"0012345681",
"Start":"2018-04-01",
"Finish":"2018-04-08",
"Statut":"Validee"
},
{
"nom":"0012345681",
"Start":"2018-07-01",
"Finish":"2018-07-15",
"Statut":"Validee"
}
]'}
I want to simply loop through the records :
app = Flask(__name__)
#app.route('/graph', methods=['POST'])
def webhook():
if request.method == 'POST':
req_data = request.get_json()
print(req_data) #-> shows JSON that seems to be right
##print(type(req_data['jsonArray']))
#j1 = json.dumps(req_data['jsonArray'])
#j2 = json.loads(req_data['jsonArray'])
#data = json.loads(j1)
#for rec in data:
# print(rec) #-> This seems to consider rec as one of the characters of the whole JSON string, and prints every character one by one
#for key in data:
# value = data[key]
# print("The key and value are ({}) = ({})".format(key, value)) #-> TypeError: string indices must be integers
for record in req_data['jsonArray']:
for attribute, value in rec.items(): #-> Gives error 'str' object has no attribute 'items'
print(attribute, value)
I believe I am lost between JSON object, python dict object, strings, but I don't know what I am missing. I really tried to put the JSON received through json.dumps and json.loads methods, but still nothing. What am I missing ??
I simply want to loop through each record to create another python object that I will feed to a charting library like this :
df = [dict(Task="0012345678", Start='2017-01-01', Finish='2017-02-02', Statut='Complete'),
dict(Task="0012345678", Start='2017-02-15', Finish='2017-03-15', Statut='Incomplete'),
dict(Task="0012345679", Start='2017-01-17', Finish='2017-02-17', Statut='Not Started'),
dict(Task="0012345679", Start='2017-01-17', Finish='2017-02-17', Statut='Complete'),
dict(Task="0012345680", Start='2017-03-10', Finish='2017-03-20', Statut='Not Started'),
dict(Task="0012345680", Start='2017-04-01', Finish='2017-04-20', Statut='Not Started'),
dict(Task="0012345680", Start='2017-05-18', Finish='2017-06-18', Statut='Not Started'),
dict(Task="0012345681", Start='2017-01-14', Finish='2017-03-14', Statut='Complete')]
The whole thing is wrapped in single quotes, meaning it's a string and you need to parse it.
for record in json.loads(req_data['jsonArray']):
Looking at your commented code, you did this:
j1 = json.dumps(req_data['jsonArray'])
data = json.loads(j1)
Using json.dumps on a string is the wrong idea, and moreover json.loads(json.dumps(x)) is just the same as x, so that just got you back where you started, i.e. data was the same thing as req_data['jsonArray'] (a string).
This was the right idea:
j2 = json.loads(req_data['jsonArray'])
but you never used j2.
As you've seen, iterating over a string gives you each character of the string.

.get method for nested json doesn't work

I have a large file, that contains valid nested json on each line, each json looks like (real data is much bigger, so this peace of json will be shown for illustration just):
{"location":{"town":"Rome","groupe":"Advanced",
"school":{"SchoolGroupe":"TrowMet", "SchoolName":"VeronM"}},
"id":"145",
"Mother":{"MotherName":"Helen","MotherAge":"46"},"NGlobalNote":2,
"Father":{"FatherName":"Peter","FatherAge":"51"},
"Study":[{
"Teacher":["MrCrock","MrDaniel"],
"Field":{"Master1":["Marketing", "Politics", "Philosophy"],
"Master2":["Economics", "Management"], "ExamCode": "1256"}
}],
"season":["summer","spring"]}
I need to parse this file, in order to extract only some key-values from every json, to obtain the dataframe that should look like:
Groupe Id MotherName FatherName Master2
Advanced 56 Laure James Economics, Management
Middle 11 Ann Nicolas Web-development
Advanced 6 Helen Franc Literature, English Language
I use method proposed me in the other question .get but it doesn't work with nested json, so for instance if I try:
def extract_data(data):
""" convert 1 json dict to records for import"""
dummy = {}
jfile = json.loads(data.strip())
return (
jfile.get('Study', dummy).get('Field', np.nan).get('Master1',np.nan),
jfile.get('location', dummy).get('groupe', np.nan))
for this line jfile.get('Study', dummy).get('Field', np.nan).get('Master1', np.nan) it throws me an error:
AttributeError: 'list' object has no attribute 'get'
obviously it happens because the value of "Study" is not a dictionary, neither list, but a valid json! how can I deal with this problem? Does exist a method that works like .get, but for json? I guess there is another option : decode this json and then parse it with .get, but the problem that it is in the core of another json, so I have no clue how to decode it!
Data is a valid JSON formatted string. JSON contains four basic elements:
Object: defined with curly braces {}
Array: defined with braces []
Value: can be a string, a number, an object, an array, or the literals true, false or null
String: defined by double quotes and contain Unicode characters or common backslash escapes
Using json.loads will convert the string into a python object recursively. It means that every inner JSON element will be represented as a python object.
Therefore:
jfile.get('Study') ---> python list
To retrieve Field you should iterate over the study list:
file = json.loads(data.strip())
study_list = jfile.get('Study', []) # don't set default value with different type
for item in study_list:
print item.get('Field')

Categories