Cannot remove single quotes from string - python

I got a super weird problem: I got a list of strings which looks like this:
batch = ['{"SageMakerOutput":[{"label":"LABEL_8","score":0.9152628183364868},"inputs":"test"}',
'{"SageMakerOutput":[{"label":"LABEL_8","score":0.9769203066825867},"inputs":"Alles OK"}',
'{"SageMakerOutput":[{"label":"LABEL_8","score":0.9345938563346863},"inputs":"F"}']
In each entry of the list I want to remove the single quotes "'" but somehow I cannot remove it with .replace():
for line in batch:
line = line.replace("'","")
I dont get it

Alright, so judging by your comments and your other post, it seems like what you have are strings, and what you want are dictionaries. I've copied your data from the other post because your data isn't correct in this post (you're missing ] in this post).
Solution
import json
batch = ['{"SageMakerOutput":[{"label":"LABEL_8","score":0.9152628183364868}],"inputs":"test"}',
'{"SageMakerOutput":[{"label":"LABEL_8","score":0.9769203066825867}],"inputs":"Alles OK"}']
batch = [json.loads(b) for b in batch]
Output:
[{'SageMakerOutput': [{'label': 'LABEL_8', 'score': 0.9152628183364868}],
'inputs': 'test'},
{'SageMakerOutput': [{'label': 'LABEL_8', 'score': 0.9769203066825867}],
'inputs': 'Alles OK'}]
Explanation
Those objects in batch are what are known as JSON objects. They're just strings with a very specific structure. They are analogous to Python's dict type with some very minor differences and can very easily be converted to Python dict objects using Python's built-in json module, which automatically translates those minor differences between JSON and Python (e.g., booleans in JSON strings are true and false, but in Python they need to be True and False).
Notes
Advice to OP and future readers: this post is a classic case of the XY problem. In the future, try to be more clear about your end goal. Your question in this post isn't actually answerable because it's impossible to do what you were asking.

Related

Python: Json.load gives list and can't parse data from it

I have a data.json file, which looks like this:
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
I am trying to get "Event" from this file using python and miserably failing at this.
with open('data.json', 'r') as json_file:
data = json.load(json_file)
print (data['Event'])
I get the following error:
TypeError: list indices must be integers or slices, not str
And even when I try
print (data[0]['Event'])
then I get this error:
TypeError: string indices must be integers
One more thing:
print(type(data))
gives me "list"
I have searched all over and have not found a solution to this. I would really appreciate your suggestions.
You could use the ast module for this:
import ast
mydata = ["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
data = ast.literal_eval(mydata[0])
data
{'Day': 'Today', 'Event': '1', 'Date': '2019-03-20'}
data['Event']
'1'
Edit
Your original code does load the data into a list structure, but only contains a single string entry inside that list, despite proper json syntax. ast, like json, will parse that string entry into a python data structure, dict.
As it sits, when you try to index that list, it's not the same as calling a key in a dict, hence the slices cannot be str:
alist = [{'a':1, 'b':2, 'c':3}]
alist['a']
TypeError
# need to grab the dict entry from the list
adict = alist[0]
adict['a']
1
You need to convert the elements in data to dict using json module.
Ex:
import json
with open(filename) as infile:
data = json.load(infile)
for d in data:
print(json.loads(d)['Event'])
Or:
data = list(map(json.loads, data))
print(data[0]["Event"])
Output:
1
Your problem is that you are parsing it as a list that consists of a single element that is a string.
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
See how the entire content of the list is surrounded by " on either side and every other " is preceded by a \? The slash generally means to ignore the special meaning the following character might have, but interpret it as purely a string.
If you have control over the file's contents, the easiest solution would be to adjust it. You will want it to be in a format like this:
[{"Day":"Today", "Event": "1", "Date": "2019-03-20"}]
Edit: As others have suggested, you can also parse it in its current state. Granted, cleaning the data is tedious, but oftentimes worth the effort. Though this may not be one of those cases. I'm leaving this answer up anyway because it may help with explaining why OPs initial attempt did not work, and why he received the error messages he got.

Python- Insert new values into 'nested' list?

What I'm trying to do isn't a huge problem in php, but I can't find much assistance for Python.
In simple terms, from a list which produces output as follows:
{"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765}...
All I want to do is add in an extra field, containing the 'runner name' in the data set below, into each of the 'runners' sub lists from the initial data set, based on selection_id=selectionId.
So initially I iterate through the full dataset, and then create a separate list to get the runner name from the runner id (I should point out that runnerId===selectionId===selection_id, no idea why there are multiple names are used), this works fine and the code is shown below:
for market_book in market_books:
market_catalogues = trading.betting.list_market_catalogue(
market_projection=["RUNNER_DESCRIPTION", "RUNNER_METADATA", "COMPETITION", "EVENT", "EVENT_TYPE", "MARKET_DESCRIPTION", "MARKET_START_TIME"],
filter=betfairlightweight.filters.market_filter(
market_ids=[market_book.market_id],
),
max_results=100)
data = []
for market_catalogue in market_catalogues:
for runner in market_catalogue.runners:
data.append(
(runner.selection_id, runner.runner_name)
)
So as you can see I have the data in data[], but what I need to do is add it to the initial data set, based on the selection_id.
I'm more comfortable with Php or Javascript, so apologies if this seems a bit simplistic, but the code snippets I've found on-line only seem to assist with very simple Python lists and nothing 'nested' (to me the structure seems similar to a nested array).
As per the request below, here is the full list:
{"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":20,"size":3},{"price":19.5,"size":26.36},{"price":19,"size":2}],"availableToLay":[{"price":21,"size":13},{"price":22,"size":2},{"price":23,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832767},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":11,"size":9.75},{"price":10.5,"size":3},{"price":10,"size":28.18}],"availableToLay":[{"price":11.5,"size":12},{"price":13.5,"size":2},{"price":14,"size":7.75}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832766},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":48,"size":2},{"price":46,"size":5},{"price":42,"size":5}],"availableToLay":[{"price":60,"size":7},{"price":70,"size":5},{"price":75,"size":10}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832769},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":18.5,"size":28.94},{"price":18,"size":5},{"price":17.5,"size":3}],"availableToLay":[{"price":21,"size":20},{"price":23,"size":2},{"price":24,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832768},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":4.3,"size":9},{"price":4.2,"size":257.98},{"price":4.1,"size":51.1}],"availableToLay":[{"price":4.4,"size":20.97},{"price":4.5,"size":30},{"price":4.6,"size":16}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832771},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":24,"size":6.75},{"price":23,"size":2},{"price":22,"size":2}],"availableToLay":[{"price":26,"size":2},{"price":27,"size":2},{"price":28,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832770},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":5.7,"size":149.33},{"price":5.5,"size":29.41},{"price":5.4,"size":5}],"availableToLay":[{"price":6,"size":85},{"price":6.6,"size":5},{"price":6.8,"size":5}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":10064909}],"publishTime":1551612312125,"priceLadderDefinition":{"type":"CLASSIC"},"keyLineDescription":null,"marketDefinition":{"bspMarket":false,"turnInPlayEnabled":false,"persistenceEnabled":false,"marketBaseRate":5,"eventId":"28180290","eventTypeId":"2378961","numberOfWinners":1,"bettingType":"ODDS","marketType":"NONSPORT","marketTime":"2019-03-29T00:00:00.000Z","suspendTime":"2019-03-29T00:00:00.000Z","bspReconciled":false,"complete":true,"inPlay":false,"crossMatching":false,"runnersVoidable":false,"numberOfActiveRunners":8,"betDelay":0,"status":"OPEN","runners":[{"status":"ACTIVE","sortPriority":1,"id":10064909},{"status":"ACTIVE","sortPriority":2,"id":12832765},{"status":"ACTIVE","sortPriority":3,"id":12832766},{"status":"ACTIVE","sortPriority":4,"id":12832767},{"status":"ACTIVE","sortPriority":5,"id":12832768},{"status":"ACTIVE","sortPriority":6,"id":12832770},{"status":"ACTIVE","sortPriority":7,"id":12832769},{"status":"ACTIVE","sortPriority":8,"id":12832771},{"status":"LOSER","sortPriority":9,"id":10317013},{"status":"LOSER","sortPriority":10,"id":10317010}],"regulators":["MR_INT"],"countryCode":"GB","discountAllowed":true,"timezone":"Europe\/London","openDate":"2019-03-29T00:00:00.000Z","version":2576584033,"priceLadderDefinition":{"type":"CLASSIC"}}}
i think i understand what you are trying to do now
first hold your data as a python object (you gave us a json object)
import json
my_data = json.loads(my_json_string)
for item in my_data['runners']:
item['selectionId'] = [item['selectionId'], my_name_here]
the thing is that my_data['runners'][i]['selectionId'] is a string, unless you want to concat the name and the id together, you should turn it into a list or even a dictionary
each item is a dicitonary so you can always also a new keys to it
item['new_key'] = my_value
So, essentially this works...with one exception...I can see from the print(...) in the loop that the attribute is updated, however what I can't seem to do is then see this update outside the loop.
mkt_runners = []
for market_catalogue in market_catalogues:
for r in market_catalogue.runners:
mkt_runners.append((r.selection_id, r.runner_name))
for market_book in market_books:
for runner in market_book.runners:
for x in mkt_runners:
if runner.selection_id in x:
setattr(runner, 'x', x[1])
print(market_book.market_id, runner.x, runner.selection_id)
print(market_book.json())
So the print(market_book.market_id.... displays as expected, but when I print the whole list it shows the un-updated version. I can't seem to find an obvious solution, which is odd, as it seems like a really simple thing (I tried messing around with indents, in case that was the problem, but it doesn't seem to be, its like its not refreshing the market_book list post update of the runners sub list)!

Type Error while using for loop in list of dictionaries in Python

I'm getting an error while using that structure on Python 3.6 while parsing data from a json file:
for topic in data:
cqas = [{'context': paragraph['context'],
'id': qa['id'],
'question': qa['question'],
'answer': qa['answers'][0]['text'],
'answer_start': qa['answers'][0]['answer_start'],
'answer_end': qa['answers'][0]['answer_start'] + \
len(qa['answers'][0]['text']) - 1,
'topic': topic['title'] }
for paragraph in topic['paragraphs']
for qa in paragraph['qas']]
I couldn't find a documentation about using for loops in list of dicts as mentioned above. I want to learn it because I'm also getting an error message when using that structure:
Dataset: https://raw.githubusercontent.com/YerevaNN/R-NET-in-Keras/master/data/dev-v1.1.json
The part len(qa['answers'][0]['text']) in your code throws the error because len() expects a string. You can verify yourself by typing len(3) in a python interpreter. Check your data whether the 'text' key in the array in qa['answers'] contains integer values somewhere. Or modify the piece of code if you actually want integers there.
I ran the portion from the original code and data you provided and it parsed w/o throwing errors. Since you mention you use a 'similar' dataset I assume you or someone else modified the contents...
(It would be helpful if you provided the original your dataset and your code)

convert string representation of a dict inside json dict value to dict

Hello all and sorry if the title was worded poorly. I'm having a bit of trouble wrapping my head around how to solve this issue I have encountered. I would have liked to simply pass a dict as the value for this key in my json obj but sadly I have to pass it as a string. So, I have a json dict object that looks like this
data = {"test": "Fuzz", "options": "'{'size':'Regular','connection':'unconnected'}'"}. Obviously, I would prefer that the second dict value weren't a string representation of a dictionary but rather a dictionary. Is the best route here to just strip the second and second to last single quotes for the data[options] or is there a better alternative?
Sorry for any confusion. This is how the json object looks after I perform
json.dump(data, <filename>)
The value for options can be thought of as another variable say x and it's equivalent to '{'size':'Regular','connection':'unconnected'}'
I could do x[1:-1] but I'm not sure if that is the most pythonic way to do things here.
import ast
bad_string_dict = "'{'size':'Regular','connection':'unconnected'}'"
good_string_dict = bad_string_dict.strip("'")
good_dict = ast.literal_eval(good_string_dict)
print(good_dict)
You will have to strip quotation mark, no other way around
Given OP's comments I suggest the following:
Set the environment variable to a known data format (example: json/yaml/...), not a specific language (python)
Use the json module (or the format you've chosen) to load the data
The data should look like this:
raw_data = {"test": "Fuzz", "options": "{\"size\": \"Regular\", \"connection\": \"unconnected\"}"}
And the code should look like this:
raw_options = raw_data['options']
options = json.loads(raw_options)
data = {**raw_data, 'options': options}

Converting JSON string into Python dictionary

I don't have much experience in Python and I've ran into problem converting sql query data which is technically a list containing a JSON string into a Python dictionary. I'm querying the sqlite3 database which is returning a piece of data like this:
def get_answer(self, id):
self.__cur.execute('select answer from some_table where id= %s;' % id)
res_data = self.__cur.fetchall()
return res_data
The data is a single JSON format element which its simplified version looks like this:
[
{"ind": [ {"v": 101}, {"v": 102}, {"v": 65 }]},
{"ind": [ {"v": 33}, {"v": 102}, {"v": 65}]}
]
But when I try to convert the res_data to JSON, with code like this:
temp_json = simplejson.dumps(get_answer(trace_id))
it returns a string and when I get the len(temp_json) it returns the number of characters in res_data instead of the number of objects in res_data. However, if I use Visual Studio's JSON visualizer on what get_answer(trace_id) returns, it properly shows each of the objects res_data.
I also tried to convert the res_data to a dictionary with code like this:
dict_data = ast.literal_eval(Json_data)
or
dict_data = ast.literal_eval(Json_data[0])
and in both cases it throws a "malformed string" exception. I tried to write it to a file and read it back as a JSON but it didn't worked.
Before doing that I had the copy pasted the res_data manually and used:
with open(file_name) as json_file:
Json_data = simplejson.load(json_file)
and it worked like a charm. I've been experimenting different ways stated in SO and elsewhere but although the problem seems very straight forward, I still haven't found a solution so your help is highly appreciated.
OK, I finally found the solution:
states = json.loads(temp_json[0][0])
one confusing issue was that states = json.loads(temp_json[0]) was throwing the "Expected string or buffer" exception and temp_json was a list containing only one element, so I didn't think I will get anything from temp_json[0][0].
I hope it helps others too!
I think you are confusing the data formats. What you supposedly get back from your database wrapper seems to be a list of dictionaries (it is not SQL either - your question title is misleading). But I actually think that sqlite3 would give you a list of tuples.
JSON is a text format or more precisely the serialization of an object to a string. That's why json.dumps (= dump to string) results in a string and json.loads(= load string) would result in a python object, a dictionary or a list of dictionaries).
json.dumps does NOT mean "dump JSON to string". It is rather the .dumps method of the json module which takes a Python object (dict, list) and serializes it to JSON.
I will extend the answer if I understand what exactly you want to achieve but you get JSON with json.dumps(), JSON is a string format. In contrast simplejson.load() gives you a Python list or dict.
Did you try json.loads() just in case the return from your database is actually a string (which you could easily test).

Categories