Extract info from json file, Python - python

I am trying to extract information out of a JSON file that I dumped. I used Mido module to get the information I need, and the only way I've found to get these features is by dumping it as a JSON. But, after some search, I've tried and not managed to extract and store the info in a python array (numpy array)*
Below, you see the sample code. I am pretty sure that double [[ at the beginning of the file is making all this trouble.
if __name__ == '__main__':
filename = 'G:\\ΣΧΟΛΗ p12127\\Πτυχιακη\\MAPS Dataset\\MAPS_AkPnBcht_1\\AkPnBcht\\ISOL\\CH\\MAPS_ISOL_CH0.1_F_AkPnBcht.mid'
midi_file = mdfile(filename)
for i, track in enumerate(midi_file.tracks):
print('Track: ', i,'\nNumber of messages: ', track)
for message in track:
print(message)
def midifile_to_dict(mid):
tracks = []
for track in mid.tracks:
tracks.append([vars(msg).copy() for msg in track])
return {
'ticks_per_beat': mid.ticks_per_beat,
'tracks': tracks,
}
mid = mido.MidiFile('G:\\ΣΧΟΛΗ p12127\\Πτυχιακη\\MAPSDataset\\MAPS_AkPnBcht_1\\AkPnBcht\\ISOL\\CH\\MAPS_ISOL_CH0.1_F_AkPnBcht.mid')
dataj = json.dumps(midifile_to_dict(mid), indent=2)
data = json.loads(dataj)
Output :
{
"ticks_per_beat": 32767,
"tracks": [[
{
"type": "set_tempo",
"tempo": 439440,
"time": 0
},
{
"type": "end_of_track",
"time": 0
}
]]
}
So, to close this up, how could I extract this info? Is JSON even a good approach? And if so, how could I actually extract this info? Or a way to go inside the [[ ]].
[Windows 7, Python 3.6+]

Related

Inconsistent error: json.decoder.JSONDecodeError: Extra data: line 30 column 2 (char 590)

I have .json documents generated from the same code. Here multiple nested dicts are being dumped to the json documents. While loadling with json.load(opened_json), I get the json.decoder.JSONDecodeError: Extra data: line 30 column 2 (char 590) like error for some of of the files whereas not for others. It is not understood why. What is the proper way to dump multiple dicts (maybe nested) into json docs and in my current case what is way to read them all? (Extra: Dicts can be over multiple lines, so 'linesplitting' does not work probably.)
Ex: Say I am json.dump(data, file) with data = {'meta_data':{some_data}, 'real_data':{more_data}}.
Let us take these two fake files:
{
"meta_data": {
"id": 0,
"start": 1238397024.0,
"end": 1238397056.0,
"best": []
},
"real_data": {
"YAS": {
"t1": [
1238397047.2182617
],
"v1": [
5.0438767766574255
],
"v2": [
4.371670270544587
]
}
}
}
and
{
"meta_data": {
"id": 0,
"start": 1238397056.0,
"end": 1238397088.0,
"best": []
},
"real_data": {
"XAS": {
"t1": [
1238397047.2182617
],
"v1": [
5.0438767766574255
],
"v2": [
4.371670270544587
]
}
}
}
and try to load them using json.load(open(file_path)) for duplicatling the problem.
You chose not to offer a
reprex.
Here is the code I'm running
which is intended to represent what you're running.
If there is some discrepancy, update the original
question to clarify the details.
import json
from io import StringIO
some_data = dict(a=1)
more_data = dict(b=2)
data = {"meta_data": some_data, "real_data": more_data}
file = StringIO()
json.dump(data, file)
file.seek(0)
d = json.load(file)
print(json.dumps(d, indent=4))
output
{
"meta_data": {
"a": 1
},
"real_data": {
"b": 2
}
}
As is apparent, over the circumstances you have
described the JSON library does exactly what we
would expect of it.
EDIT
Your screenshot makes it pretty clear
that a bunch of ASCII NUL characters are appended
to the 1st file.
We can easily reproduce that JSONDecodeError: Extra data
symptom by adding a single line:
json.dump(data, file)
file.write(chr(0))
(Or perhaps chr(0) * 80 more closely matches the truncated screenshot.)
If your file ends with extraneous characters, such as NUL,
then it will no longer be valid JSON and compliant
parsers will report a diagnostic message when they
attempt to read it.
And there's nothing special about NUL, as a simple
file.write("X") suffices to produce that same
diagnostic.
You will need to trim those NULs from the file's end
before attempting to parse it.
For best results, use UTF8 unicode encoding with no
BOM.
Your editor should have settings for
switching to utf8.
Use $ file foo.json to verify encoding details,
and $ iconv --to-code=UTF-8 < foo.json
to alter an unfortunate encoding.
You need to read the file, you can do both of these.
data = json.loads(open("data.json").read())
or
with open("data.json", "r") as file:
data = json.load(file)

How to parse mulitple Json Arrays with the same name in Python

I want to parse a Json File where all Json Arrays have the same Name just as the following:
[
{
"envelope": {
"source":"user1",
"data":{
"message": "Hello World 0"
}
}
},
{
"envelope": {
"source":"user1",
"data":{
"message": "Hello World 1"
}
}
},
...
]
And this is my code so far:
def check_messages():
data = requests.get(urlWhereIGetJson)
for d in data:
message = d['envelope']['data']['message']
print(message)
My code is only giving me back the first entry. ("Hello World 0") but I need every message.
I hope somebody can teach me and show how to parse the json correctly.
I am not able to edit the JSON File.
I'm sorry for my english
Here is what you need
response = requests.get(urlWhereIGetJson)
if response.status_code == 200:
data = json.loads(response.text)
for record in data:
print(record['envelope']['data']['message'])
For more information https://www.w3schools.com/python/ref_requests_response.asp
Saeed already covered how to do this, but if you would like it in a nice Pandas DataFrame, you could do something like this:
data = json.load('file.json')
series = pd.Series(data) # Since it's an array, I'll turn that into a Series first
df = pd.json_normalize(series)
That DataFrame would automatically unnest inner json objects and represent them with column names like envelope.source and envelope.data.message.
Edit: Also if you did want this to read the json from requests, just use json.loads(data.json())

When I remove an object from an array in json with python, it ruins my json file. How do I fix this?

when I remove an object from an array in json with python, it ruins my json file. For example here is my python:
srs = server["purchaseableRoles"]
if srs == []:
await ctx.send("Your server has no purchaseable roles at the moment, you cant remove one.")
else:
for i in srs:
splitsrs = i.split("|")
if int(splitsrs[0]) == role.id:
print(str(i))
srs.remove(str(i))
print(srs)
f.seek(0)
json.dump(serversjson,f,sort_keys=True,indent=4)
await ctx.send("removed the role from the role shop!")
(please ignore the discord.py stuff, its in a discord bot. Also ignore the weird indents I don't know why it pasted in like that)
That code is pretty much removing an object from an array in a json file, and updating it. Except for when I do that, here is my json:
{
"servers": [
{
"id": 1234,
"purchaseableRoles": [
"1234|5678"
]
},
{
"id": 813070255127396352,
"purchaseableRoles": []
}
]
}071595911774238|5"
]
}
]
}
it just gets all messed up. Here was my json before the mess-up:
{
"servers": [
{
"id": 1234,
"purchaseableRoles": [
"1234|5678"
]
},
{
"id": 813070255127396352,
"purchaseableRoles": [
"813071595911774238|5"
]
}
]
}
is there a way to prevent this? Thanks in advance if you help.
You are moving the file cursor to the begining of the file and overwrite the Bytes in the file with Bytes in the JSON string.
But you removed some stuff, so the string that overwrites the file is shorter than the string in the file.
So at the end of the file, some of the old string remains.
There is a method file_obj.truncate(). But there is little documentation on it in Python3, apparently. Something like this:
f.seek(0)
f.truncate()

Get array from file which also includes strings

I'm writing an automation script in Python that makes use of another library. The output I'm given contains the array I need, however the output also includes log messages in string format that are irrelevant.
For my script to work, I need to retrieve only the array which is in the file.
Here's an example of the output I'm getting.
Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
{
"action": {
"type": "block"
},
"trigger": {
"url-filter": "/adservice\\.",
"unless-domain": [
"adservice.io"
]
}
}
]
Generated a total of 1 rules (1 blocks, 0 exceptions)
How would I get only the array from this file?
FWIW, I'd rather not have the logic based on the strings outside of the array, as they could be subject to change.
UPDATE: Script I'm getting the data from is here: https://github.com/brave/ab2cb/tree/master/ab2cb
My full code is here:
def pipe_in(process, filter_lists):
try:
for body, _, _ in filter_lists:
process.stdin.write(body)
finally:
process.stdin.close()
def write_block_lists(filter_lists, path, expires):
block_list = generate_metadata(filter_lists, expires)
process = subprocess.Popen(('ab2cb'),
cwd=ab2cb_dirpath,
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
threading.Thread(target=pipe_in, args=(process, filter_lists)).start()
result = process.stdout.read()
with open('output.json', 'w') as destination_file:
destination_file.write(result)
destination_file.close()
if process.wait():
raise Exception('ab2cb returned %s' % process.returncode)
The output will ideally be modified in stdout and written later to file as I still need to modify the data within the previously mentioned array.
You can use regex too
import re
input = """
Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
{
"action": {
"type": "block"
},
"trigger": {
"url-filter": "/adservice\\.",
"unless-domain": [
"adservice.io"
]
}
}
]
Generated a total of 1 rules (1 blocks, 0 exceptions)
asd
asd
"""
regex = re.compile(r"\[(.|\n)*(?:^\]$)", re.M)
x = re.search(regex, input)
print(x.group(0))
EDIT
re.M turns on 'MultiLine matching'
https://repl.it/repls/InfantileDopeyLink
I have written a library for this purpose. It's not often that I get to plug it!
from jsonfinder import jsonfinder
logs = r"""
Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
{
"action": {
"type": "block"
},
"trigger": {
"url-filter": "/adservice\\.",
"unless-domain": [
"adservice.io"
]
}
}
]
Generated a total of 1 rules (1 blocks, 0 exceptions)
Something else that looks like JSON: [1, 2]
"""
for start, end, obj in jsonfinder(logs):
if (
obj
and isinstance(obj, list)
and isinstance(obj[0], dict)
and {"action", "trigger"} <= obj[0].keys()
):
print(obj)
Demo: https://repl.it/repls/ImperfectJuniorBootstrapping
Library: https://github.com/alexmojaki/jsonfinder
Install with pip install jsonfinder.

List Indices in json in Python

I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?
The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None

Categories