consul json parsing with python - python

I am trying to pull multiple values from consul.
after pulling data using the following code:
import consul
c = consul.Consul("consulServer")
index, data = c.kv.get("key",recurese=False)
print data
I am getting the following json in my data list:
[ {
'LockIndex': 0,
'ModifyIndex': 54,
'Value': '1',
'Flags': 0,
'Key': 'test/one',
'CreateIndex': 54
}, {
'LockIndex': 0,
'ModifyIndex': 69,
'Value': '2',
'Flags': 0,
'Key': 'test/two',
'CreateIndex': 69
}]
I want to transform this output to key:value json file. for this example it should look like:
{
"one": "1",
"two": "2"
}
I have two questions:
1. Is there a better way to get multiple values from consul kv?
2. Assuming there is no better way, what is the best way to convert the json from the first example to the second one?
Thanks,

Related

Best way to convert string values to nested JSON

I have a string that looks like this:
Standard,NonStandard=[Hybrid,Non-standard,Preferred],AnotherOne=[a, b, c]
and am looking into ways to convert it to this dictionary / json structure via Python.
[{
'value': 'Standard',
},
{
'value': 'NonStandard',
'sub': [ 'Hybrid', 'Non-standard', 'Preferred' ]
},
{
'value': 'AnotherOne',
'sub': [ 'a', 'b', 'c']
}]
I think I can do this via looping over strings and keeping track of the =[ and closing ] but was wondering if there is a more "pythonic" solution.

Converting csv to nested Json using python

I want to convert csv file to json file.
I have large data in csv file.
CSV Column Structure
This is my column structure in csv file . I has 200+ records.
id.oid libId personalinfo.Name personalinfo.Roll_NO personalinfo.addr personalinfo.marks.maths personalinfo.marks.physic clginfo.clgName clginfo.clgAddr clginfo.haveCert clginfo.certNo clginfo.certificates.cert_name_1 clginfo.certificates.cert_no_1 clginfo.certificates.cert_exp_1 clginfo.certificates.cert_name_2 clginfo.certificates.cert_no_2 clginfo.certificates.cert_exp_2 clginfo.isDept clginfo.NoofDept clginfo.DeptDetails.DeptName_1 clginfo.DeptDetails.location_1 clginfo.DeptDetails.establish_date_1 _v updatedAt.date
Expected Json
[{
"id":
{
"$oid": "00001"
},
"libId":11111,
"personalinfo":
{
"Name":"xyz",
"Roll_NO":101,
"addr":"aa bb cc ddd",
"marks":
[
"maths":80,
"physic":90
.....
]
},
"clginfo"
{
"clgName":"pqr",
"clgAddr":"qwerty",
"haveCert":true, //this is boolean true or false
"certNo":1, //this could be 1-10
"certificates":
[
{
"cert_name_1":"xxx",
"cert_no_1":12345,
"cert_exp.1":"20/2/20202"
},
{
"cert_name_2":"xxx",
"cert_no_2":12345,
"cert_exp_2":"20/2/20202"
},
......//could be up to 10
],
"isDept":true, //this is boolean true or false
"NoofDept":1 , //this could be 1-10
"DeptDetails":
[
{
"DeptName_1":"yyy",
"location_1":"zzz",
"establish_date_1":"1/1/1919"
},
......//up to 10 records
]
},
"__v": 1,
"updatedAt":
{
"$date": "2022-02-02T13:35:59.843Z"
}
}]
I have tried using pandas but I'm getting output as
My output
[{
"id.$oid": "00001",
"libId":11111,
"personalinfo.Name":"xyz",
"personalinfo.Roll_NO":101,
"personalinfo.addr":"aa bb cc ddd",
"personalinfo.marks.maths":80,
"personalinfo.marks.physic":90,
"clginfo.clgName":"pqr",
"clginfo.clgAddr":"qwerty",
"clginfo.haveCert":true,
"clginfo.certNo":1,
"clginfo.certificates.cert_name_1":"xxx",
"clginfo.certificates.cert_no_1":12345,
"clginfo.certificates.cert_exp.1":"20/2/20202"
"clginfo.certificates.cert_name_2":"xxx",
"clginfo.certificates.cert_no_2":12345,
"clginfo.certificates.cert_exp_2":"20/2/20202"
"clginfo.isDept":true,
"clginfo.NoofDept":1 ,
"clginfo.DeptDetails.DeptName_1":"yyy",
"clginfo.DeptDetails.location_1":"zzz",
"eclginfo.DeptDetails.stablish_date_1":"1/1/1919",
"__v": 1,
"updatedAt.$date": "2022-02-02T13:35:59.843Z",
}]
I am new to python I only know the basic Please help me getting this output.
200+ records is really tiny, so even naive solution is good.
It can't be totally generic because I don't see how it can be seen from the headers that certificates is a list, unless we rely on all names under certificates having _N at the end.
Proposed solution using only basic python:
read header row - split all column names by period. Iterate over resulting list and create nested dicts with appropriate keys and dummy values (if you want to handle lists: create array if current key ends with _N and use N as an index)
for all rows:
clone dictionary with dummy values
for each column use split keys from above to put the value into the corresponding dict. same solution from above for lists.
append the dictionary to list of rows

Format an f-string for each dataframe object

Requirement
My requirement is to have a Python code extract some records from a database, format and upload a formatted JSON to a sink.
Planned approach
1. Create JSON-like templates for each record. E.g.
json_template_str = '{{
"type": "section",
"fields": [
{{
"type": "mrkdwn",
"text": "Today *{total_val}* customers saved {percent_derived}%."
}}
]
}}'
2. Extract records from DB to a dataframe.
3. Loop over dataframe and replace the {var} variables in bulk using something like .format(**locals()))
Question
I haven't worked with dataframes before.
What would be the best way to accomplish Step 3 ? Currently I am
3.1 Looping over the dataframe objects 1 by 1 for i, df_row in df.iterrows():
3.2 Assigning
total_val= df_row['total_val']
percent_derived= df_row['percent_derived']
3.3 In the loop format and add str to a list block.append(json.loads(json_template_str.format(**locals()))
I was trying to use the assign() method in dataframe but was not able to figure out a way to use like a lambda function to create a new column with my expected value that I can use.
As a novice in pandas, I feel there might be a more efficient way to do this (which may even involve changing the JSON template string - which I can totally do). Will be great to hear thoughts and ideas.
Thanks for your time.
I would not write a JSON string by hand, but rather create a corresponding python object and then use the json library to convert it into a string. With this in mind, you could try the following:
import copy
import pandas as pd
# some sample data
df = pd.DataFrame({
'total_val': [100, 200, 300],
'percent_derived': [12.4, 5.2, 6.5]
})
# template dictionary for a single block
json_template = {
"type": "section",
"fields": [
{"type": "mrkdwn",
"text": "Today *{total_val:.0f}* customers saved {percent_derived:.1f}%."
}
]
}
# a function that will insert data from each row
# of the dataframe into a block
def format_data(row):
json_t = copy.deepcopy(json_template)
text_t = json_t["fields"][0]["text"]
json_t["fields"][0]["text"] = text_t.format(
total_val=row['total_val'], percent_derived=row['percent_derived'])
return json_t
# create a list of blocks
result = df.agg(format_data, axis=1).tolist()
The resulting list looks as follows, and can be converted into a JSON string if needed:
[{
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *100* customers saved 12.4%.'
}]
}, {
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *200* customers saved 5.2%.'
}]
}, {
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *300* customers saved 6.5%.'
}]
}]

is it possible to use wildcards for field names in mongodb?

I have a set of field names as follows:
"field0.registers.hilo"
"field0.registers.lllo"
...
"field1.registers.hilo"
"field1.registers.lllo"
...
"field2.registers.hilo"
"field2.registers.lllo"
...
"fieldn.registers.hilo"
"fieldn.registers.lllo"
...
Is there a way to indicate the fields in mongodb with the index to range from 0 to n succinctly without having to expand it all out beforehand?
something like this example for project:
{ $project: { "fieldn.registers.hilo": 1, "fieldn.registers.lllo": 1 } }
For now, I am fully expanding all the project fields from 0 to n in python before interfacing with the collection using pymongo.
is it possible to use wildcards for field names in mongodb?
No.
If your data is in this structure, refactor it to use lists. That's exactly what lists are desgined for.
Taking the refactored example below, Use $elemMatch to project only the array elements needed:
from pymongo import MongoClient
db = MongoClient()['mydatabase']
db.register.insert_many([{
'registers': [
{
'field': 0,
'hilo': 1,
'lllo': 2
},
{
'field': 1,
'hilo': 2,
'lllo': 3
},
{
'field': 2,
'hilo': 3,
'lllo': 4
}
]}])
print(db.register.find_one({}, {'registers': {'$elemMatch': {'field': 1}}}))
prints:
{'_id': ObjectId('60b64e57c3214d73c390557b'), 'registers': [{'field': 1, 'hilo': 2, 'lllo': 3}]}

How to extract part of a JSON as another JSON and separate them several JSON files on Python

I started learning data analysis using Python. And I have been trying to use Adzuna dataset for my course project. The response from my API call looks like this:
{
"results": [
{
"salary_min": 50000,
"longitude": -0.776902,
"location": {
"__CLASS__": "Adzuna::API::Response::Location",
"area": [
"UK",
"South East England",
"Marlow"
],
"display_name": "Marlow, Buckinghamshire"
},
"salary_is_predicted": 0,
"description": "JavaScript Developer Corporate ...",
"__CLASS__": "Adzuna::API::Response::Job",
"created": "2013-11-08T18:07:39Z",
"latitude": 51.571999,
"redirect_url": "http://adzuna.co.uk/jobs/land/ad/129698749...",
"title": "Javascript Developer",
"category": {
"__CLASS__": "Adzuna::API::Response::Category",
"label": "IT Jobs",
"tag": "it-jobs"
},
"id": "129698749",
"salary_max": 55000,
"company": {
"__CLASS__": "Adzuna::API::Response::Company",
"display_name": "Corporate Project Solutions"
},
"contract_type": "permanent"
},
... another 19 samples here ...
],
"mean": 43900.46,
"__CLASS__": "Adzuna::API::Response::JobSearchResults",
"count": 74433
}
My goal is to extract 20 samples under "results" individually so that I can create a numpy dataset later for data analysis. So, I wrote Python like this:
item_dict = json.loads(response.text)
# Since "results" start/end with [ and ], Python treats it as a list. So, I need to remove them.
string_data = str(item_dict['results']).lstrip("[")
string_data = string_data.rstrip("]")
# Convert "results" string back to JSON, then extract each sample from 20 samples
json_results_data = json.loads(string_data)
for sample in json_results_data:
print(sample)
However, json_results_data = json.loads(string_data) doesn't convert the "results" string to JSON well. I am new to Python, so I may be asking a stupid question, but please let me know if you can figure out an easy way to fix this. Thanks.
Stop stipping the square brackets... its meant to be a list.
Try this
item_dict = json.loads(response.text)
for sample in item_dict["results"]:
print(sample)
Your issue was you thought you had a dict (json) but you have a list of dicts.
Solution
What you are trying to achieve is organize your data from a json object. The first line in your code item_dict = json.loads(response.text) returns you a dict object and hence you could simply use that.
I would show two methods:
Using a pandas.DataFrame to organize your data.
Using a for loop to just print your data.
But note that, the pandas.DataFrame allows you to quickly convert your data into a numpy array as well (use: df.to_numpy())
import pandas as pd
results = response.json['results'] # item_dict['results']
df = pd.DataFrame(results)
print(df)
# df.to_numpy()
output:
a b c d e
0 1.0 2.0 NaN dog True
1 20.0 2.0 0.0 cat True
2 1.0 NaN NaN bird True
3 NaN 2.0 88.0 pig False
Instead, if you just want to print out each dictionary inside results, you could just do this:
for result in results:
print(result)
Dummy Data
item_dict = {
'results': [
{'a': 1, 'b': 2, 'c': None, 'd': 'dog', 'e': True},
{'a': 20, 'b': 2, 'c': 0, 'd': 'cat', 'e': True},
{'a': 1, 'b': None, 'c': None, 'd': 'bird', 'e': True},
{'a': None, 'b': 2, 'c': 88, 'd': 'pig', 'e': False}
],
"mean": 43900.46,
"__CLASS__": "Adzuna::API::Response::JobSearchResults",
"count": 74433
}

Categories