Python, How to denormalise CSV file to Nested Json

Python, How to denormalise CSV file to Nested Json - python

I currently have a CSV file with the header:
productCode | code | dataFields.0.category | dataFields.0.name | dataFields.0.code | ... with dataFields[n] up to 9.
When I convert the code to Json i get:
{
"Example": [
{
"exCode": "example_code",
"name": "ex",
"code": "ex_2",
"dataFields.0.category": "EXAMPLE",
"dataFields.0.name": "exampl",
"dataFields.0.code": "exampl",
"dataFields.0.unit": "v",
"dataFields.1.category": "EXAMPLE",
"dataFields.1.name": "exampl2",
"dataFields.1.code": "exampl2",
"dataFields.1.unit": "e",
"dataFields.2.category": "EXAMPLE2",
"dataFields.2.name": null,
"dataFields.2.code": null,
"dataFields.2.unit": "e",
}]
}
However, I'm trying to convert the CSV to look like:
{
"Example": [
{
"exCode": "example_code",
"name": "exampl",
"code": "exampl",
"dataFields": [{
"category": "EXAMPLE",
"name": "exampl",
"code": "exampl",
"unit": "v"
},
{
"category": "EXAMPLE",
"name": "exampl2",
"code": "exampl2",
"unit": "e"
}
}]
}
I have been writing this project on Python without using recursion to look at least 2 levels of nesting deep into a nested json array to
remove the normalised fields and add in denormalised (nested) fields. However, my main problem is that "dataFields" will get overwritten instead of adding multiple "dataFields" elements.
This is what I have so far:
def denormalize_json(json_list):
for children in json_list:
for inner_children in json_list[children]:
# print("Arr: ", inner_children)
for innest_child in inner_children:
split_norm = innest_child.split('.')
if len(split_norm) == 2: # If it is only a singular nested field
# Add correct field
changes_arr.append([children, inner_children, split_norm[0], split_norm[1], innest_child])
elif len(split_norm) == 3: # If it is normalized with more than one field
changes_arr.append([children, inner_children, split_norm[0], split_norm[2], innest_child])
print(changes_arr)
# Make changes to json_list
for i in range(len(changes_arr)):
changes = changes_arr[i]
# change inner children to correct fields
# add correct fields
try:
inner_children = changes[1]
inner_children[changes[2]].append(changes[3].append(inner_children[changes[4]]))
del(inner_children[changes[4]])
json_list[changes[0]] = inner_children
except Exception as e:
print(e)

Related

Convert multiple string stored in a variable into a single list in python

I hope everyone is doing well.
I need a little help where I need to get all the strings from a variable and need to store into a single list in python.
For example -
I have json file from where I am getting ids and all the ids are getting stored into a variable called id as below when I run print(id)
17298626-991c-e490-bae6-47079c6e2202
17298496-19bd-2f89-7b5f-881921abc632
17298698-3e17-7a9b-b337-aacfd9483b1b
172986ac-d91d-c4ea-2e50-d53700480dd0
172986d0-18aa-6f51-9c62-6cb087ad31e5
172986f4-80f0-5c21-3aee-12f22a5f4322
17298712-a4ac-7b36-08e9-8512fa8322dd
17298747-8cc6-d9d0-8d05-50adf228c029
1729875c-050f-9a99-4850-bb0e6ad35fb0
1729875f-0d50-dc94-5515-b4891c40d81c
17298761-c26b-3ce5-e77e-db412c38a5b4
172987c8-2b5d-0d94-c365-e8407b0a8860
1729881a-e583-2b54-3a52-d092020d9c1d
1729881c-64a2-67cf-d561-6e5e38ed14cb
172987ec-7a20-7eb6-3ebe-a9fb621bb566
17298813-7ac4-258b-d6f9-aaf43f9147b1
17298813-f1ef-d28a-0817-5f3b86c3cf23
17298828-b62b-9ee6-248b-521b0663226e
17298825-7449-2fcb-378e-13671cb4688a
I want these all values to be stored into a single list.
Can some please help me out with this.
Below is the code I am using:
import json
with open('requests.json') as f:
data = json.load(f)
print(type(data))
for i in data:
if 'traceId' in i:
id = i['traceId']
newid = id.split()
#print(type(newid))
print(newid)
And below is my json file looks like:
[
{
"id": "376287298-hjd8-jfjb-khkf-6479280283e9",
"submittedTime": 1591692502558,
"traceId": "17298626-991c-e490-bae6-47079c6e2202",
"userName": "ABC",
"onlyChanged": true,
"description": "Not Required",
"startTime": 1591694487929,
"result": "NONE",
"state": "EXECUTING",
"paused": false,
"application": {
"id": "16b22a09-a840-f4d9-f42a-64fd73fece57",
"name": "XYZ"
},
"applicationProcess": {
"id": "dihihdosfj9279278yrie8ue",
"name": "Deploy",
"version": 12
},
"environment": {
"id": "fkjdshkjdshglkjdshgldshldsh03r937837",
"name": "DEV"
},
"snapshot": {
"id": "djnglkfdglki98478yhgjh48yr844h",
"name": "DEV_snapshot"
},
},
{
"id": "17298495-f060-3e9d-7097-1f86d5160789",
"submittedTime": 1591692844597,
"traceId": "17298496-19bd-2f89-7b5f-881921abc632",
"userName": "UYT,
"onlyChanged": true,
"startTime": 1591692845543,
"result": "NONE",
"state": "EXECUTING",
"paused": false,
"application": {
"id": "osfodsho883793hgjbv98r3098w",
"name": "QA"
},
"applicationProcess": {
"id": "owjfoew028r2uoieroiehojehfoef",
"name": "EDC",
"version": 5
},
"environment": {
"id": "16cf69c5-4194-e557-707d-0663afdbceba",
"name": "DTESTU"
},
}
]
From where I am trying to get the traceId.

you could use simple split method like the follwing:
ids = '''17298626-991c-e490-bae6-47079c6e2202 17298496-19bd-2f89-7b5f-881921abc632 17298698-3e17-7a9b-b337-aacfd9483b1b 172986ac-d91d-c4ea-2e50-d53700480dd0 172986d0-18aa-6f51-9c62-6cb087ad31e5 172986f4-80f0-5c21-3aee-12f22a5f4322 17298712-a4ac-7b36-08e9-8512fa8322dd 17298747-8cc6-d9d0-8d05-50adf228c029 1729875c-050f-9a99-4850-bb0e6ad35fb0 1729875f-0d50-dc94-5515-b4891c40d81c 17298761-c26b-3ce5-e77e-db412c38a5b4 172987c8-2b5d-0d94-c365-e8407b0a8860 1729881a-e583-2b54-3a52-d092020d9c1d 1729881c-64a2-67cf-d561-6e5e38ed14cb 172987ec-7a20-7eb6-3ebe-a9fb621bb566 17298813-7ac4-258b-d6f9-aaf43f9147b1 17298813-f1ef-d28a-0817-5f3b86c3cf23 17298828-b62b-9ee6-248b-521b0663226e 17298825-7449-2fcb-378e-13671cb4688a'''
l = ids.split(" ")
print(l)
This will give the following result, I assumed that the separator needed is simple space you can adjust properly:
['17298626-991c-e490-bae6-47079c6e2202', '17298496-19bd-2f89-7b5f-881921abc632', '17298698-3e17-7a9b-b337-aacfd9483b1b', '172986ac-d91d-c4ea-2e50-d53700480dd0', '172986d0-18aa-6f51-9c62-6cb087ad31e5', '172986f4-80f0-5c21-3aee-12f22a5f4322', '17298712-a4ac-7b36-08e9-8512fa8322dd', '17298747-8cc6-d9d0-8d05-50adf228c029', '1729875c-050f-9a99-4850-bb0e6ad35fb0', '1729875f-0d50-dc94-5515-b4891c40d81c', '17298761-c26b-3ce5-e77e-db412c38a5b4', '172987c8-2b5d-0d94-c365-e8407b0a8860', '1729881a-e583-2b54-3a52-d092020d9c1d', '1729881c-64a2-67cf-d561-6e5e38ed14cb', '172987ec-7a20-7eb6-3ebe-a9fb621bb566', '17298813-7ac4-258b-d6f9-aaf43f9147b1', '17298813-f1ef-d28a-0817-5f3b86c3cf23', '17298828-b62b-9ee6-248b-521b0663226e', '17298825-7449-2fcb-378e-13671cb4688a']
Edit
You get list of lists because each iteration you read only 1 id, so what you need to do is to initiate an empty list and append each id to it in the following way:
l = []
for i in data
if 'traceId' in i:
id = i['traceId']
l.append(id)

you can append the ids variable to the list such as,
#list declaration
l1=[]
#this must be in your loop
l1.append(ids)

I'm assuming you get the id as a str type value. Using id.split() will return a list of all ids in one single Python list, as each id is separated by space here in your example.
id = """17298626-991c-e490-bae6-47079c6e2202 17298496-19bd-2f89-7b5f-881921abc632
17298698-3e17-7a9b-b337-aacfd9483b1b 172986ac-d91d-c4ea-2e50-d53700480dd0
172986d0-18aa-6f51-9c62-6cb087ad31e5 172986f4-80f0-5c21-3aee-12f22a5f4322
17298712-a4ac-7b36-08e9-8512fa8322dd 17298747-8cc6-d9d0-8d05-50adf228c029
1729875c-050f-9a99-4850-bb0e6ad35fb0 1729875f-0d50-dc94-5515-b4891c40d81c
17298761-c26b-3ce5-e77e-db412c38a5b4 172987c8-2b5d-0d94-c365-e8407b0a8860
1729881a-e583-2b54-3a52-d092020d9c1d 1729881c-64a2-67cf-d561-6e5e38ed14cb
172987ec-7a20-7eb6-3ebe-a9fb621bb566 17298813-7ac4-258b-d6f9-aaf43f9147b1
17298813-f1ef-d28a-0817-5f3b86c3cf23 17298828-b62b-9ee6-248b-521b0663226e
17298825-7449-2fcb-378e-13671cb4688a"""
id_list = id.split()
print(id_list)
Output:
['17298626-991c-e490-bae6-47079c6e2202', '17298496-19bd-2f89-7b5f-881921abc632',
'17298698-3e17-7a9b-b337-aacfd9483b1b', '172986ac-d91d-c4ea-2e50-d53700480dd0',
'172986d0-18aa-6f51-9c62-6cb087ad31e5', '172986f4-80f0-5c21-3aee-12f22a5f4322',
'17298712-a4ac-7b36-08e9-8512fa8322dd', '17298747-8cc6-d9d0-8d05-50adf228c029',
'1729875c-050f-9a99-4850-bb0e6ad35fb0', '1729875f-0d50-dc94-5515-b4891c40d81c',
'17298761-c26b-3ce5-e77e-db412c38a5b4', '172987c8-2b5d-0d94-c365-e8407b0a8860',
'1729881a-e583-2b54-3a52-d092020d9c1d', '1729881c-64a2-67cf-d561-6e5e38ed14cb',
'172987ec-7a20-7eb6-3ebe-a9fb621bb566', '17298813-7ac4-258b-d6f9-aaf43f9147b1',
'17298813-f1ef-d28a-0817-5f3b86c3cf23', '17298828-b62b-9ee6-248b-521b0663226e',
'17298825-7449-2fcb-378e-13671cb4688a']
split() splits by default with space as a separator. You can use the sep argument to use any other separator if needed.

Python: Iterate JSON and remove items with specific criteria

I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.

If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]

users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))

You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here

Error in producing the output for the json for a specific id value

I have the following json tree.
json_tree ={
"Garden": {
"Seaside": {
"#loc": "127.0.0.1",
"#myID": "1.3.1",
"Shoreside": {
"#myID": "3",
"InfoList": {
"Notes": {
"#code": "0",
"#myID": "1"
},
"Count": {
"#myID": "2",
"#val": "0"
}
},
"state": "0",
"Tid": "3",
"Lakesshore": {
"#myID": "4",
"InfoList": {
"Notes": {
"#code": "0",
"#oid": "1"
},
"Count": {
"#myID": "2",
"#val": "0"
}
},
"state": "0",
"Tid": "4"
}
},
"state": "0",
"Tid": "2"
},
"Tid": "1",
"state": "0"
}
}
I have a method which takes in the "Tid" value and returns the output in the following format.
This is where the issue lies. I do not understand why for the value of Tid = 2, I get "ERROR" stating that the InfoList not exists. For other Tid values, it works well. Can someone help me to resolve this issue?
There is NO InfoList at "Tid:"2 but I am not sure on how to update my logic to handle this.
def get_output (d, id):
if isinstance(d, dict) and d.get('id') == id:
yield {"Tid": d['Tid'], "Notes": d['InfoList']['Notes']['#code'], "status": d['state']}
for i in getattr(d, "values", lambda: [])():
yield from get_based_on_id(i, id)
# The id is from 2 and higher
key_list = list(get_output (json_tree, id))
# To create the json result
jsonify(key_list if not key_list else key_list[0])
For "Tid" values of 2 and higher the get_output method creates this output:
{
"Tid": "3",
"Notes": "2000",
"state": "2"
}
This part shown below works well. The issue is ONLY with the code shown above.
def get_output_id_1 (d, id):
if isinstance(d, dict) and d.get('id') == id:
yield {"id": d['Tid'], "state": d['state']}
for i in getattr(d, "values", lambda: [])():
yield from get_root_id(i, id)
For "Tid" value of 1 and higher the get_output_id_1 method creates this output:
{
"Tid": "1",
"state": "1",
}
Any help is appreciated.

The problem is you are using direct access to leverage a key that may or may not be in the dictionary. To get around this, use the dict.get method, which will return None or some default value that you specify in case the key isn't present:
small_example = {
'Tid': '2',
'status': 'some status'
}
# there is no InfoList key here, so to get around that, I can use something like:
info_list = small_example.get('InfoList')
repr(info_list)
None
Now, you can specify a default return value for get if you need to chain things together, like with a nested dictionary call:
{
'Tid': small_example['Tid'],
'Notes': small_example.get('InfoList', {}).get('Notes', {}).get('#code'),
'status': small_example.get('state')
}
See how on the first two calls, I return an empty dictionary in case InfoList and/or Notes are missing, which supports the subsequent call to get. Without that, I would get an AttributeError:
small_example.get('InfoList').get('Notes')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'get'
So your yield statement should look like:
yield {
"Tid": d['Tid'],
"Notes": d.get('InfoList', {}).get('Notes', {}).get('#code'),
"status": d.get('state')
}
Edit: What if you want a different default for Notes?
This gets a little tricky, especially if you want a data structure that doesn't support .get, such as str.
Your yield statement might have to be produced from a different function to make things a little more tidy:
# d is expected to be a dictionary
def create_yield(d):
# I'm using direct access on `Tid` because I'm assuming it should
# always be there, and if not it will raise a KeyError, you can
# modify this to fit your use case
container = {'Tid': d['Tid'],
'status': d.get('state')}
notes = small_example.get('InfoList', {}).get('Notes')
# if notes is a dict (not None), then we can get `#code` from it
if notes is not None:
container['Notes'] = notes.get('#code')
# otherwise, don't set the `Notes` attribute
return container
# later in your code at your yield statement
# you can call this function instead of manually building the dictionary
yield create_yield(small_example)

Grab element from json dump

I'm using the following python code to connect to a jsonrpc server and nick some song information. However, I can't work out how to get the current title in to a variable to print elsewhere. Here is the code:
TracksInfo = []
for song in playingSongs:
data = { "id":1,
"method":"slim.request",
"params":[ "",
["songinfo",0,100, "track_id:%s" % song, "tags:GPASIediqtymkovrfijnCYXRTIuwxN"]
]
}
params = json.dumps(data, sort_keys=True, indent=4)
conn.request("POST", "/jsonrpc.js", params)
httpResponse = conn.getresponse()
data = httpResponse.read()
responce = json.loads(data)
print json.dumps(responce, sort_keys=True, indent=4)
TrackInfo = responce['result']["songinfo_loop"][0]
TracksInfo.append(TrackInfo)
This brings me back the data in json format and the print json.dump brings back:
pi#raspberrypi ~/pithon $ sudo python tom3.py
{
"id": 1,
"method": "slim.request",
"params": [
"",
[
"songinfo",
"0",
100,
"track_id:-140501481178464",
"tags:GPASIediqtymkovrfijnCYXRTIuwxN"
]
],
"result": {
"songinfo_loop": [
{
"id": "-140501481178464"
},
{
"title": "Witchcraft"
},
{
"artist": "Pendulum"
},
{
"duration": "253"
},
{
"tracknum": "1"
},
{
"type": "Ogg Vorbis (Spotify)"
},
{
"bitrate": "320k VBR"
},
{
"coverart": "0"
},
{
"url": "spotify:track:2A7ZZ1tjaluKYMlT3ItSfN"
},
{
"remote": 1
}
]
}
}
What i'm trying to get is result.songinfoloop.title (but I tried that!)

The songinfo_loop structure is.. peculiar. It is a list of dictionaries each with just one key.
Loop through it until you have one with a title:
TrackInfo = next(d['title'] for d in responce['result']["songinfo_loop"] if 'title' in d)
TracksInfo.append(TrackInfo)
A better option would be to 'collapse' all those dictionaries into one:
songinfo = reduce(lambda d, p: d.update(p) or d,
responce['result']["songinfo_loop"], {})
TracksInfo.append(songinfo['title'])

songinfo_loop is a list not a dict. That means you need to call it by position, or loop through it and find the dict with a key value of "title"
positional:
responce["result"]["songinfo_loop"][1]["title"]
loop:
for info in responce["result"]["songinfo_loop"]:
if "title" in info.keys():
print info["title"]
break
else:
print "no song title found"
Really, it seems like you would want to have the songinfo_loop be a dict, not a list. But if you need to leave it as a list, this is how you would pull the title.

The result is really a standard python dict, so you can use
responce["result"]["songinfoloop"]["title"]
which should work

How to format unicode strings to utf-8 in Python?

I'm reading in a JSON string which is littered with u'string' style strings. Example:
[
{
"!\/award\/award_honor\/honored_for": {
"award": {
"id": "\/en\/spiel_des_jahres"
},
"year": {
"value": "1996"
}
},
"guid": "#9202a8c04000641f80000000003a0ee6",
"type": "\/games\/game",
"id": "\/en\/el_grande",
"name": "El Grande"
},
{
"!\/award\/award_honor\/honored_for": {
"award": {
"id": "\/en\/spiel_des_jahres"
},
"year": {
"value": "1995"
}
},
"guid": "#9202a8c04000641f80000000000495ec",
"type": "\/games\/game",
"id": "\/en\/settlers_of_catan",
"name": "Settlers of Catan"
}
]
If I assign name = result.name. Then when I log of pass that value to a Django template, it displays as u'Dominion'
How do I format it to display as Dominion?
++ UPDATE ++
I think the problem has to do with printing values from a list or dictionary. For example:
result = freebase.mqlread(query)
games = {}
count = 0
r = result[0]
name = r.name
games["name"] = name,
self.response.out.write(games["name"])
self.response.out.write(name)
This displays as:
(u'Dominion',) // saved response to dictionary, and then printed
Dominion // when calling the value directly from the response
I need to iterate through an array of JSON items and the values are being shown with the unicode. Why?

The comma at the end of games["name"] = name, makes it a 1-tuple. Remove it.

>>> # example
>>> s = u"Jägermütze"
>>> s.encode("utf-8")
'J\xc3\xa4germ\xc3\xbctze'
>>> print s.encode("utf-8") # on a utf-8 terminal
Jägermütze
Don't know much about Django, but not accepting snicode strings seems unpythonic to me.

You can use str(your string) to do this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, How to denormalise CSV file to Nested Json - python

Related

Convert multiple string stored in a variable into a single list in python

Python: Iterate JSON and remove items with specific criteria

Error in producing the output for the json for a specific id value

Grab element from json dump

How to format unicode strings to utf-8 in Python?

Categories

Resources