Convert json into spacy format - python

I'm new to Python ,help me how to pass json value as parameter instead of load from filename.Please check below code for reference..
import json
filename = input("Enter your train data filename : ")
print(filename)
with open(filename) as train_data:
train = json.load(train_data)
TRAIN_DATA = []
for data in train:
ents = [tuple(entity) for entity in data['entities']]
TRAIN_DATA.append((data['content'],{'entities':ents}))
with open('{}'.format(filename.replace('json','txt')),'w') as write:
write.write(str(TRAIN_DATA))
In above code json value loaded from file ,instead of file i want to pass json value and load ....
Ex:
train_data=[{"content":"what is the price of polo?","entities":[[21,25,"PrdName"]]}
with open(filename) as train_data:
train = json.load(train_data)
Thanks,

"json value" doesn't mean anything. Json is a text format, not a data type, and what json.loads() do is to transform the json text to python objects - dicts, lists etc - according to the json syntax and what exact type makes sense in Python (json object -> dict, json array -> list etc). You can check this by yourself in your Python shell:
>>> import json
>>> jsonstr = '{"foo":"bar", "baaz":[1, 2, 3]}'
>>> json_data = json.loads(jsonstr)
>>> json_data
{'foo': 'bar', 'baaz': [1, 2, 3]}
>>> type(json_data)
<class 'dict'>
IOW, if you already have the correct Python dict, you have nothing else to do.

Related

JSON - string value conversion to List

Code used for extraction from JSON
import json
string = json.loads(data)
string['Body']
import base64
base64.b64decode(string['Body'])
bytes_data = base64.b64decode(string['Body'])
str(bytes_data, encoding='utf-8')
I have a following format that is extracted from JSON
"[{"id":"XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Air","values":[{"v":"46","q":192,"t":"2021-10-28T13:47:59.7880096Z"}]},
{"id":"XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Atomise","values":[{"v":"3.1","q":192,"t":"2021-10-28T13:47:59.7880096Z"}]}]"
Any idea about converting it to actual list
[{"id":"XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Air","values":[{"v":"46","q":192,"t":"2021-10-28T13:47:59.7880096Z"}]},
{"id":"XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Atomise","values":[{"v":"3.1","q":192,"t":"2021-10-28T13:47:59.7880096Z"}]}]
things I have tried :
list(bytearray(bytes_data))
for loop - for this output string, but this is a convoluted way to do it.
some more conversion stuff. looking for something that is compact.
Reverse engineering your question....
Given a JSON file with base64 data
$ cat /tmp/data.json
{
"Body": "W3siaWQiOiJYWFhYX1UyXzE3MDIxNjpYWFhYX1UyXzE3MDIxNjpGQkVfMjMwMTUuQWlyIiwidmFsdWVzIjpbeyJ2IjoiNDYiLCJxIjoxOTIsInQiOiIyMDIxLTEwLTI4VDEzOjQ3OjU5Ljc4ODAwOTZaIn1dfSwKeyJpZCI6IlhYWFhfVTJfMTcwMjE2OlhYWFhfVTJfMTcwMjE2OkZCRV8yMzAxNS5BdG9taXNlIiwidmFsdWVzIjpbeyJ2IjoiMy4xIiwicSI6MTkyLCJ0IjoiMjAyMS0xMC0yOFQxMzo0Nzo1OS43ODgwMDk2WiJ9XX1dCg=="
}
When read and extracted
import json
import base64
with open('/tmp/data.json') as f:
string = json.load(f)
body = string['Body']
Then decoded... a list is returned
import pprint
l = json.loads(base64.b64decode(body)
pprint.pprint(l)
[{'id': 'XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Air',
'values': [{'v': '46', 'q': 192, 't': '2021-10-28T13:47:59.7880096Z'}]},
{'id': 'XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Atomise',
'values': [{'v': '3.1', 'q': 192, 't': '2021-10-28T13:47:59.7880096Z'}]}]
Use the built-in json module:
import json
data = json.loads(bytes_data)
It seems like you have a json inside a json, so load it twice:
import json
import base64
string = json.loads(data)
bytes_data = base64.b64decode(string['Body'])
output = json.loads(bytes_data)
use json load method like this, suppose you have JSON array and want to convert in LIST then do following
import json
array = '{"Items": ["IPhone", "Earphone", "Powerbackup"]}'
data = json.loads(array)
print (data['Items'])

Can't use python json.loads to turn json string into dictionary .TypeError: string indices must be integers

This is my json string
"{\"version\":\"1.4.12\",\"name\":\"earmark_parser\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/earmark_parser\",\"license\":\"Apache 2.0\"}"
"{\"version\":\"1.4.0\",\"name\":\"statix\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/statix\",\"license\":\"ISC\"}"
"{\"version\":\"1.1.0\",\"name\":\"nimble_parsec\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/nimble_parsec\",\"license\":\"Apache 2.0\"}"
"{\"version\":\"1.0.5\",\"name\":\"makeup\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/makeup\",\"license\":\"Unsure (found: BSD, Unrecognized license file content)\"}"
"{\"version\":\"1.5.2\",\"name\":\"poolboy\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/poolboy\",\"license\":\"Unsure (found: Unlicense, Apache 2.0, ISC)\"}"
"{\"version\":\"3.1.0\",\"name\":\"poison\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/poison\",\"license\":\"CC0-1.0\"}"
"{\"version\":\"1.2.2\",\"name\":\"jason\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/jason\",\"license\":\"Apache 2.0\"}"
"{\"version\":\"2.5.1\",\"name\":\"recon\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/recon\",\"license\":\"Unsure (found: BSD, Unrecognized license file content)\"}"
"{\"version\":\"0.6.2\",\"name\":\"licensir\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/licensir\",\"license\":\"MIT\"}"
"{\"version\":\"0.1.9\",\"name\":\"castore\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/castore\",\"license\":\"Apache 2.0\"}"
"{\"version\":\"1.2.1\",\"name\":\"mint\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/mint\",\"license\":\"Apache 2.0\"}"
"{\"version\":\"0.6.4\",\"name\":\"mojito\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/mojito\",\"license\":\"MIT\"}"
"{\"version\":\"0.15.1\",\"name\":\"makeup_elixir\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/makeup_elixir\",\"license\":\"Unsure (found: BSD, Unrecognized license file content)\"}"
"{\"version\":\"0.23.0\",\"name\":\"ex_doc\",\"licenseFile\":\"/home/alan/code/elixir-test/cards/deps/ex_doc\",\"license\":\"Apache 2.0\"}"
with open("check-deps.txt",'r') as f:
data = f.readlines()
rst = []
for json_string in data:
my_json_dict = json.loads(json_string)
print(my_json_dict["version"])
I want to turn it into a dictionary, but it went out an error. TypeError: string indices must be integers, why can't I use json loads to change json string into dictionary
Nesting another json.loads() statement will fix your error, this particular data needs to be parsed twice to convert it to a dictionary.
for json_string in data:
my_json_dict = json.loads(json.loads(json_string))
print(my_json_dict["version"])
If you check the type of the returned variable my_json_dict, you can see that is a string (after the json.loads) and not a dictionary as you expected, that's why you are getting the error.
for json_string in data:
my_json_dict = json.loads(json_string)
print(type(my_json_dict))
So to solve that you need to nest another json.loads(), like this:
with open("prueba1.txt",'r') as f:
data = f.readlines()
rst = []
for json_string in data:
my_json_dict = json.loads((json.loads(json_string)))
print(my_json_dict['version'])

How to save and restore data that has some features are lists?

I have a tabular dataset like:
Feature 1 (String), Feature 2 (Int), Feature 3 (List of String)
Record 1:
Record 2:
...
I store it as .csv. However, I found it is not convenient to restore the data of Feature 3. An example of restored feature 3 of a record: ['(abcd, 0)', '(dwg, 1)', '(sdgwa, 7)']. It is a list of items. Each item has a string and an integer, written in a bracket.
I drop the [ and ] of this String, and try to split the remained as a list. However, the coma also appears within each item. What is the recommended practice to store such kind of data with lists?
I always solve this problem with either JSON or pickle and dictionaries.
If you go with JSON:
import json
my_json = {f1: "some string", f2: 100, f3: [100, 50, 10], f4: ['hello', 'blah']}
log = open("outfile.json", "w+")
json.dump(my_json, log)
log.close()
#to load it back:
log = open("outfile.json", "r")
my_jsons = json.load(log)
log.close()
You can also use pickle and dictionaries:
import pickle
my_dict = {f1: "some string", f2: 100, f3: [100, 50, 10], f4: ['hello', 'blah']}
pickle.dump(my_dict, open("outfile.p", "wb" ))
#to load:
my_dict = pickle.load(open("outfile.p", "rb"))
#jakub #circuito is right
csv is hard to store list
you can use JSON or pickle
if you wana use csv anyway
it can be convert to string, such as :
abcd:0|dwg:1|sdgwa:7

how to replace the values of a dict in a txt file in python

I have a text file something.txt holds data like :
sql_memory: 300
sql_hostname: server_name
sql_datadir: DEFAULT
i have a dict parameter={"sql_memory":"900", "sql_hostname":"1234" }
I need to replace the values of paramter dict into the txt file , if parameters keys are not matching from keys in txt file then values in txt should left as it is .
For example, sql_datadir is not there in parameter dict . so, no change for the value in txt file.
Here is what I have tried :
import json
def create_json_file():
with open(something.txt_path, 'r') as meta_data:
lines = meta_data.read().splitlines()
lines_key_value = [line.split(':') for line in lines]
final_dict = {}
for lines in lines_key_value:
final_dict[lines[0]] = lines[1]
with open(json_file_path, 'w') as foo:
json.dumps(final_dict,foo, indent=4)
def generate_server_file(parameters):
create_json_file()
with open(json_file_path, 'r') as foo:
server_json_data = json.load(foo)
for keys in parameters:
if keys not in server_json_data:
raise KeyError("Cannot find keys")
# Need to update the paramter in json file
# and convert json file into txt again
x={"sql_memory":"900", "sql_hostname":"1234" }
generate_server_file(x)
Is there a way I can do this without converting the txt file into a JSON ?
Expected output file(something.txt) :
sql_memory: 900
sql_hostname: 1234
sql_datadir: DEFAULT
Using Python 3.6
If you want to import data from a text file use numpy.genfromtxt.
My Code:
import numpy
data = numpy.genfromtxt("something.txt", dtype='str', delimiter=';')
print(data)
something.txt:
Name;Jeff
Age;12
My Output:
[['Name' 'Jeff']
['Age' '12']]
It`s very useful and I use it all of the time.
If your full example is using Python dict literals, a way to do this would be to implement a serializer and a deserializer. Since yours closely follows object literal syntax, you could try using ast.literal_eval, which safely parses a literal from a string. Notice, it will not handle variable names.
import ast
def split_assignment(string):
'''Split on a variable assignment, only splitting on the first =.'''
return string.split('=', 1)
def deserialize_collection(string):
'''Deserialize the collection to a key as a string, and a value as a dict.'''
key, value = split_assignment(string)
return key, ast.literal_eval(value)
def dict_doublequote(dictionary):
'''Print dictionary using double quotes.'''
pairs = [f'"{k}": "{v}"' for k, v in dictionary.items()]
return f'{{{", ".join(pairs)}}}'
def serialize_collection(key, value):
'''Serialize the collection to a string'''
return f'{key}={dict_doublequote(value)}'
And example using the data above produces:
>>> data = 'parameter={"sql_memory":"900", "sql_hostname":"1234" }'
>>> key, value = deserialize_collection(data)
>>> key, value
('parameter', {'sql_memory': '900', 'sql_hostname': '1234'})
>>> serialize_collection(key, value)
'parameter={"sql_memory": "900", "sql_hostname": "1234"}'
Please note you'll probably want to use JSON.dumps rather than the hack I implemented to serialize the value, since it may incorrectly quote some complicated values. If single quotes are fine, a much more preferable solution would be:
def serialize_collection(key, value):
'''Serialize the collection to a string'''
return f'{key}={str(value)}'

Parse a json file and add the strings to a URL

How do I parse a json output get the list from data only and then add the output into say google.com/confidetial and the other strings in the list.
so my json out put i will name it "text"
text = {"success":true,"code":200,"data":["Confidential","L1","Secret","Secret123","foobar","maret1","maret2","posted","rontest"],"errs":[],"debugs":[]}.
What I am looking to do is get the list under data only. so far the script i got is giving me the entire json out put.
json.loads(text)
print text
output = urllib.urlopen("http://google.com" % text)
print output.geturl()
print output.read()
jsonobj = json.loads(text)
print jsonobj['data']
Will print the list in the data section of your JSON.
If you want to open each as a link after google.com, you could try this:
def processlinks(text):
output = urllib.urlopen('http://google.com/' % text)
print output.geturl()
print output.read()
map(processlinks, jsonobj['data'])
info = json.loads(text)
json_text = json.dumps(info["data"])
Using json.dumps converts the python data structure gotten from json.loads back to regular json text.
So, you could then use json_text wherever you were using text before and it should only have the selected key, in your case: "data".
Perhaps something like this where result is your JSON data:
from itertools import product
base_domains = ['http://www.google.com', 'http://www.example.com']
result = {"success":True,"code":200,"data":["Confidential","L1","Secret","Secret123","foobar","maret1","maret2","posted","rontest"],"errs":[],"debugs":[]}
for path in product(base_domains, result['data']):
print '/'.join(path) # do whatever
http://www.google.com/Confidential
http://www.google.com/L1
http://www.google.com/Secret
http://www.google.com/Secret123
http://www.google.com/foobar
http://www.google.com/maret1
http://www.google.com/maret2
http://www.google.com/posted
http://www.google.com/rontest
http://www.example.com/Confidential
http://www.example.com/L1
http://www.example.com/Secret
http://www.example.com/Secret123
http://www.example.com/foobar
http://www.example.com/maret1
http://www.example.com/maret2
http://www.example.com/posted
http://www.example.com/rontest

Categories