Remove null from a json data (python) - python

When I use the Rest API to download data from Firebase, it looks like this.
{
"Dataset1": [
null,
{
"Key1": 1,
"Key2": 2
},
{
"Key1": 3,
"Key2": 4
}
],
"Dataset2": [
null,
{
"Key1": 1,
"Key2": 2
},
{
"Key1": 3,
"Key2": 4
}
]
}
Is it possible to remove the null value before saving the data to a file? I know the null exists because of how I designed my database, but it is too late for me to redesign the data now. I tried is_not but no luck yet.

It looks like you've stored nodes with sequentially incrementing keys in your database (i.e. "1", "2", "3"). When you do this, Firebase interprets it as an array structure, and coerces it to a (zero-based) array when you retrieve it. And since you have no node for index 0, it adds a null there.
To prevent this array coercion, store nodes with non-numeric keys, for example by prefixing each number with a short non-numeric value. Like "key1", "key2", "key3".
Also see:
Best Practices: Arrays in Firebase.

It seems it's just the first element in each list. You could just use a simple dict comprehension for this if so:
{k: v[1:] for k, v in data.items()}
If not you could use this comprehension:
{k: [e for e in v if e != None] for k, v in data.items()}

Try this code:
Dataset2 = list()
for data in Dataset:
if data not null:
Dataset2.append(data)
Dataset = Dataset2
del Dataset2

Related

Create JSON from specific lines of data

I have this test data that I am trying to use to create a JSON for just select items. I have the items listed and would just like to output a JSON list with select item.
What I have:
import json
# Data to be written
dictionary ={
"id": "04",
"name": "sunil",
"department": "HR"
}
# Serializing json
x= json.dumps(dictionary,indent=0)
# JSON String
y = json.loads(x)
# Goal is to print:
{
"id": "04",
"name": "sunil"
}
If you don't need to save the department key you can use this:
del y['department']
Then your y variable will print what you wanted:
{"id": "04", "name": "sunil"}
Other ways to solve the same issue
for key in list(y.keys()):
#add all potential keys you want to remain in the final dictionary
if key == "id" or key == "name":
continue
else:
del y[key]
However, iterating over a dictionary is pretty slow. You could assign values to temporary variables and then remake the dictionary like this:
temp_id = y['id']
temp_name = y['name']
y.clear()
y['id'] = temp_id
y['name'] = temp_name
This should be faster that iterating over a dictionary.

Converting csv to nested Json using python

I want to convert csv file to json file.
I have large data in csv file.
CSV Column Structure
This is my column structure in csv file . I has 200+ records.
id.oid libId personalinfo.Name personalinfo.Roll_NO personalinfo.addr personalinfo.marks.maths personalinfo.marks.physic clginfo.clgName clginfo.clgAddr clginfo.haveCert clginfo.certNo clginfo.certificates.cert_name_1 clginfo.certificates.cert_no_1 clginfo.certificates.cert_exp_1 clginfo.certificates.cert_name_2 clginfo.certificates.cert_no_2 clginfo.certificates.cert_exp_2 clginfo.isDept clginfo.NoofDept clginfo.DeptDetails.DeptName_1 clginfo.DeptDetails.location_1 clginfo.DeptDetails.establish_date_1 _v updatedAt.date
Expected Json
[{
"id":
{
"$oid": "00001"
},
"libId":11111,
"personalinfo":
{
"Name":"xyz",
"Roll_NO":101,
"addr":"aa bb cc ddd",
"marks":
[
"maths":80,
"physic":90
.....
]
},
"clginfo"
{
"clgName":"pqr",
"clgAddr":"qwerty",
"haveCert":true, //this is boolean true or false
"certNo":1, //this could be 1-10
"certificates":
[
{
"cert_name_1":"xxx",
"cert_no_1":12345,
"cert_exp.1":"20/2/20202"
},
{
"cert_name_2":"xxx",
"cert_no_2":12345,
"cert_exp_2":"20/2/20202"
},
......//could be up to 10
],
"isDept":true, //this is boolean true or false
"NoofDept":1 , //this could be 1-10
"DeptDetails":
[
{
"DeptName_1":"yyy",
"location_1":"zzz",
"establish_date_1":"1/1/1919"
},
......//up to 10 records
]
},
"__v": 1,
"updatedAt":
{
"$date": "2022-02-02T13:35:59.843Z"
}
}]
I have tried using pandas but I'm getting output as
My output
[{
"id.$oid": "00001",
"libId":11111,
"personalinfo.Name":"xyz",
"personalinfo.Roll_NO":101,
"personalinfo.addr":"aa bb cc ddd",
"personalinfo.marks.maths":80,
"personalinfo.marks.physic":90,
"clginfo.clgName":"pqr",
"clginfo.clgAddr":"qwerty",
"clginfo.haveCert":true,
"clginfo.certNo":1,
"clginfo.certificates.cert_name_1":"xxx",
"clginfo.certificates.cert_no_1":12345,
"clginfo.certificates.cert_exp.1":"20/2/20202"
"clginfo.certificates.cert_name_2":"xxx",
"clginfo.certificates.cert_no_2":12345,
"clginfo.certificates.cert_exp_2":"20/2/20202"
"clginfo.isDept":true,
"clginfo.NoofDept":1 ,
"clginfo.DeptDetails.DeptName_1":"yyy",
"clginfo.DeptDetails.location_1":"zzz",
"eclginfo.DeptDetails.stablish_date_1":"1/1/1919",
"__v": 1,
"updatedAt.$date": "2022-02-02T13:35:59.843Z",
}]
I am new to python I only know the basic Please help me getting this output.
200+ records is really tiny, so even naive solution is good.
It can't be totally generic because I don't see how it can be seen from the headers that certificates is a list, unless we rely on all names under certificates having _N at the end.
Proposed solution using only basic python:
read header row - split all column names by period. Iterate over resulting list and create nested dicts with appropriate keys and dummy values (if you want to handle lists: create array if current key ends with _N and use N as an index)
for all rows:
clone dictionary with dummy values
for each column use split keys from above to put the value into the corresponding dict. same solution from above for lists.
append the dictionary to list of rows

Create a new dictionary from existing with new keys

I have a dictionary d, I want to modify the keys and create a new dictionary. What is best way to do this?
Here's my existing code:
import json
d = json.loads("""{
"reference": "DEMODEVB02C120001",
"business_date": "2019-06-18",
"final_price": 40,
"products": [
{
"quantity": 4,
"original_price": 10,
"final_price": 40,
"id": "123"
}
]
}""")
d2 ={
'VAR_Reference':d['reference'],
'VAR_date': d['business_date'],
'VAR_TotalPrice': d['final_price']
}
Is there a better way to map the values using another mapping dictionary or a file where mapping values can be kept.
for eg, something like this:
d3 = {
'reference':'VAR_Reference',
'business_date': 'VAR_date',
'final_price': 'VAR_TotalPrice'
}
Appreciate any tips or hints.
You can use a dictionary comprehension to iterate over your original dictionary, and fetch your new keys from the mapping dictionary
{d3.get(key):value for key, value in d.items()}
You can also iterate over d3 and get the final dictionary (thanks #IcedLance for the suggestion)
{value:d.get(key) for key, value in d3.items()}

Convert python nested JSON-like data to dataframe

My records looks like this and I need to write it to a csv file:
my_data={"data":[{"id":"xyz","type":"book","attributes":{"doc_type":"article","action":"cut"}}]}
which looks like json, but the next record starts with "data" and not "data1" which forces me to read each record separately. Then, I convert it to a dict using eval(), to iterate thru keys and values for a certain path to get to the values I need. Then, I generate a list of keys and values based on the keys I need. Then, a pd.dataframe() converts that list into a dataframe which I know how to convert to csv. My code that works is below. But I am sure there are better ways to do this. Mine scales poorly. Thx.
counter=1
k=[]
v=[]
res=[]
m=0
for line in f2:
jline=eval(line)
counter +=1
for items in jline:
k.append(jline[u'data'][0].keys())
v.append(jline[u'data'][0].values())
print 'keys are:', k
i=0
j=0
while i <3 :
while j <3:
if k[i][j]==u'id':
res.append(v[i][j])
j += 1
i += 1
#res is my result set
del k[:]
del v[:]
Changing my_data to be:
my_data = [{"id":"xyz","type":"book","attributes":{"doc_type":"article","action":"cut"}}, # Data One
{"id":"xyz2","type":"book","attributes":{"doc_type":"article","action":"cut"}}, # Data Two
{"id":"xyz3","type":"book","attributes":{"doc_type":"article","action":"cut"}}] # Data Three
You can dump this directly into a dataframe as so:
mydf = pd.DataFrame(my_data)
It's not clear what your data path would be, but if you are looking for specific combinations of id, type, etc. You could explicitly search
def find_my_way(data, pattern):
# pattern = {'id':'someid', 'type':'sometype'...}
res = []
for row in data:
if row.get('id') == pattern.get('id'):
res.append(row)
return row
mydf = pd.DataFrame(find_my_way(mydata, pattern))
EDIT:
Without going into how the api works, in pseudo-code, you'll want to do something like the following:
my_objects = []
calls = 0
while calls < maximum:
my_data = call_the_api(params)
data = my_data.get('data')
if not data:
calls+=1
continue
# Api calls to single objects usually return a dictionary, to group objects they return lists. This handles both cases
if isinstance(data, list):
my_objects = [*data, *my_objects]
elif isinstance(data, {}):
my_objects = [{**data}, *my_objects]
# This will unpack the data response into a list that you can then load into a DataFrame with the attributes from the api as the columns
df = pd.DataFrame(my_objects)
Assuming your data from the api looks like:
"""
{
"links": {},
"meta": {},
"data": {
"type": "FactivaOrganizationsProfile",
"id": "Goog",
"attributes": {
"key_executives": {
"source_provider": [
{
"code": "FACSET",
"descriptor": "FactSet Research Systems Inc.",
"primary": true
}
]
}
},
"relationships": {
"people": {
"data": {
"type": "people",
"id": "39961704"
}
}
}
},
"included": {}
}
"""
per the documentation, which is why I'm using my_data.get('data').
That should get you all of the data (unfiltered) into a DataFrame
Saving the DataFrame for the last bit is a bit more memory friendly

Merging list of python dictionaries by column value

I have data that is a list of python dictionaries, each representing a row in the data, and want to combine several of these into one dictionary.
I need to combine them by a common value in a single column, note the dictionaries to merge may or may not contain similar columns and values should be concatenated, not clobbered.
Here is an example (combining dicts by value in column 'a'):
data = [{ 'a':0, 'b':10, 'c':20 }
{ 'a':2, 'd':30, 'e':40 }
{ 'a':0, 'b':50, 'c':60 }
{ 'a':1, 'd':70, 'c':80 }
{ 'a':1, 'b':90, 'e':100 }]
Desired output is:
new_data = [{ 'a':0, 'b':[10,50], 'c':[20,60] }
{ 'a':1, 'd':[70], 'c':[80], 'b':[90], 'e':[100] }
{ 'a':2, 'd':[30], 'e':[40] }]
I have a simple function that can accomplish this, but need a faster method (Data has approx 1,000,000 rows and 20 columns). My method of finding the dictionaries I want to merge is very expensive.
Here is where I have an issue with computation time:
unique_idx, locations = [], {}
for i, row in enumerate(data):
_id = row['a']
if _id not in unique_idx:
unique_idx.append(_id)
locations[_id] = [i]
else:
locations[_id].append(i)
grouped_data = [data[loc] for loc in locations.values()]
I need a faster method to collect dictionaries that contain the same value in one column. Ideally I want a quick method with plain python, but if this can be done simply with a pandas DataFrame that is good as well.

Categories