Respected People,
I am having problem in handling JSON data sent to the server, using requests, as I am unable to frame the MySQL query.
{
"Firsthouse": {
"Doors": "10",
"windows": "9"
},
"Secondhouse": {
"doors": "1",
"windows": "10",
"pools": "2"
}
}
This is how is I am processing the data on server:
request_data = request.get_json()
load_data = json.loads(request_data)
If the JSON were consistent wrt fields (like No "pools" in Firsthouse), I'd have the following query after some further processing:
for i in load_data:
rows = i['doors'],i['windows'],i['pools']
data_list.append(rows)
query = "insert into table (doors,windows,pools) values (%s,%s,%s)"
q_tup = data_list
cursor.executemany(query, q_tup)
But, the fields are not fixed in my JSON, there could be five max fields - doors, windows, pools, floors, chimneys.
Should I write 5 queries based on the presence of fields in JSON data, using If-else block ?
Many thanks for hint/ideas.
Related
I want to store key-value JSON data in aws DynamoDB where key is a date string in YYYY-mm-dd format and value is entries which is a python dictionary. When I used boto3 client to save data there, it saved it as a data type object, which I don't want. My purpose is simple: Store JSON data against a key which is a date, so that later I will query the data by giving that date. I am struggling with this issue because I did not find any relevant link which says how to store JSON data and retrieve it without any conversion.
I need help to solve it in Python.
What I am doing now:
item = {
"entries": [
{
"path": [
{
"name": "test1",
"count": 1
},
{
"name": "test2",
"count": 2
}
],
"repo": "test3"
}
],
"date": "2022-10-11"
}
dynamodb_client = boto3.resource('dynamodb')
table = self.dynamodb_client.Table(table_name)
response = table.put_item(Item = item)
What actually saved:
[{"M":{"path":{"L":[{"M":{"name":{"S":"test1"},"count":{"N":"1"}}},{"M":{"name":{"S":"test2"},"count":{"N":"2"}}}]},"repo":{"S":"test3"}}}]
But I want to save exactly the same JSON data as it is, without any conversion at all.
When I retrieve it programmatically, you see the difference of single quote, count value change.
response = table.get_item(
Key={
"date": "2022-10-12"
}
)
Output
{'Item': {'entries': [{'path': [{'name': 'test1', 'count': Decimal('1')}, {'name': 'test2', 'count': Decimal('2')}], 'repo': 'test3'}], 'date': '2022-10-12} }
Sample picture:
Why not store it as a single attribute of type string? Then you’ll get out exactly what you put in, byte for byte.
When you store this in DynamoDB you get exactly what you want/have provided. Key is your date and you have a list of entries.
If you need it to store in a different format you need to provide the JSON which correlates with what you need. It's important to note that DynamoDB is a key-value store not a document store. You should also look up the differences in these.
I figured out how to solve this issue. I have two column name date and entries in my dynamo db (also visible in screenshot in ques).
I convert entries values from list to string then saved it in db. At the time of retrival, I do the same, create proper json response and return it.
I am also sharing sample code below so that anybody else dealing with the same situation can have atleast one option.
# While storing:
entries_string = json.dumps([
{
"path": [
{
"name": "test1",
"count": 1
},
{
"name": "test2",
"count": 2
}
],
"repo": "test3"
}
])
item = {
"entries": entries_string,
"date": "2022-10-12"
}
dynamodb_client = boto3.resource('dynamodb')
table = dynamodb_client.Table(<TABLE-NAME>)
-------------------------
# While fetching:
response = table.get_item(
Key={
"date": "2022-10-12"
}
)['Item']
entries_string=response['entries']
entries_dic = json.loads(entries_string)
response['entries'] = entries_dic
print(json.dumps(response))
how to create document and collection in mongodb to make python code configuration. Get attribute name, datatype, function to be called from mongodb ?
mongodb collection sample example
db.attributes.insertMany([
{ attributes_names: "email", attributes_datype: "string", attributes_isNull="false", attributes_std_function = "email_valid" }
{ attributes_names: "address", attributes_datype: "string", attributes_isNull="false", attributes_std_function = "address_valid" }
]);
Python script and function
def email_valid(df):
df1 = df.withColumn(df.columns[0], regexp_replace(lower(df.columns[0]), "^a-zA-Z0-9#\._\-| ", ""))
extract_expr = expr(
"regexp_extract_all(emails, '(\\\w+([\\\.-]?\\\w+)*#\\[A-Za-z\-\.]+([\\\.-]?\\\w+)*(\\\.\\\w{2,3})+)', 0)")
df2 = df1.withColumn(df.columns[0], extract_expr) \
.select(df.columns[0])
return df2
How to get all the mongodb values in python script and call the function according to attribues.
To create MongoDB collection from a python script :
import pymongo
# connect to your mongodb client
client = pymongo.MongoClient(connection_url)
# connect to the database
db = client[database_name]
# get the collection
mycol = db[collection_name]
from bson import ObjectId
from random_object_id import generate
# create a sample dictionary for the collection data
mydict = { "_id": ObjectId(generate()),
"attributes_names": "email",
"attributes_datype": "string",
"attributes_isNull":"false",
"attributes_std_function" : "email_valid" }
# insert the dictionary into the collection
mycol.insert_one(mydict)
To insert multiple values in the MongoDB, use insert_many() instead of insert_one() and pass the list of dictionary to it. So your list of dictionary will look like this
mydict = [{ "_id": ObjectId(generate()),
"attributes_names": "email",
"attributes_datype": "string",
"attributes_isNull":"false",
"attributes_std_function" : "email_valid" },
{ "_id": ObjectId(generate()),
"attributes_names": "email",
"attributes_datype": "string",
"attributes_isNull":"false",
"attributes_std_function" : "email_valid" }]
To get all the data from MongoDB collection into python script :
data = list()
for x in mycol.find():
data.append(x)
import pandas as pd
data = pd.json_normalize(data)
And then access the data as you access an element of a list of dictionaries:
value = data[0]["attributes_names"]
I have a table that already exists with the following schema:
{
"schema": {
"fields": [
{
"mode": "required",
"name": "full_name",
"type": "string"
},
{
"mode": "required",
"name": "age",
"type": "integer"
}]
}
}
It already contains entries like:
{'full_name': 'John Doe',
'age': int(33)}
I want to insert a new record with a new field and have the load job automatically add the new column as it loads. The new format looks like this:
record = {'full_name': 'Karen Walker',
'age': int(48),
'zipcode': '63021'}
My code is as follows:
from google.cloud import bigquery
client = bigquery.Client(project=projectname)
table = client.get_table(table_id)
config = bigquery.LoadJobConfig()
config.autoedetect = True
config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
config.schema_update_options = [
bigquery.SchemaUpdateOption.ALLOW_FIELD_ADDITION,
]
job = client.load_table_from_json([record], table, job_config=config)
job.result()
This results in the following error:
400 Provided Schema does not match Table my_project:my_dataset:mytable. Field age has changed mode from REQUIRED to NULLABLE
I can fix this by changing config.schema_update_options as follows:
bigquery.SchemaUpdateOption.ALLOW_FIELD_ADDITION,
bigquery.SchemaUpdateOption.ALLOW_FIELD_RELAXATION
]
This allows me to insert the new record, with zipcode added to the schema, but it causes both full_name and age to become NULLABLE, which is not the behavior I want. Is there a way to prevent schema auto-detect from changing the existing columns?
If you need to add fields to your schema, you can do the following:
from google.cloud import bigquery
client = bigquery.Client()
table = client.get_table("your-project.your-dataset.your-table")
original_schema = table.schema # Get your current table's schema
new_schema = original_schema[:] # Creates a copy of the schema.
# Add new field to schema
new_schema.append(bigquery.SchemaField("new_field", "STRING"))
# Set new schema in your table object
table.schema = new_schema
# Call API to update your table with the new schema
table = client.update_table(table, ["schema"])
After updating your table's schema you can load your new records with this additional field ignoring any schema configurations.
I have a JSON data that is inconsistent with fields.
{
"Firsthouse": {
"Doors": "10",
"windows": "9"
},
"Secondhouse": {
"doors": "1",
"windows": "10",
"pools": "2"
}
}
In "Secondhouse" field "pools" is present while it is absent in "Firsthouse".
If I want to write an insert query, do I need to have 6 different queries for presence/absence of such fields, like below:
#This is a query when 3 fields are present
query = "insert into table (doors,windows,pools) values (%s,%s,%s)"
q_tup = data_list_3Fields
cursor.executemany(query, q_tup)
#This is a query when 4 fields are present
query = "insert into table (doors,windows,pools,floors) values (%s,%s,%s,%s)"
q_tup = data_list_4Fields
cursor.executemany(query, q_tup)
Is there a proper approach to do this?
I wrote a Flask REST implementation to receive the following data.
After checking the API key from the client, the server should store the data which comes in the following API definition. The issue, I am facing is, I have got many strings under the same field 'services' where I would appreciate any help.
{
"id": "string",
"termsAndConditions": "string",
"offererBranchId": "string",
"requesterBranchId": "string",
"accepted": "2017-05-24T10:06:31.012Z",
"services": [
{
"id": "string",
"name": "string",
"aggregationLevel": [
"string"
],
"aggregationMethod": [
"string"
],
"timestep": [
"string"
]
]
}
}
My code is below, if the field name 'services' has a single string with it, like the other ones (i.e "id","termsAndConditions" etc.).
from flask_pymongo import PyMongo
import json
app = Flask(__name__)
app.config['MONGO_DBNAME'] = 'demo'
app.config['MONGO_URI'] = 'mongodb://xxxx#xxxx.mlab.com:xxxx/demo'
mongo = PyMongo(app)
users = mongo.db.users
#app.route('/service-offer/confirmed/REQUESTER',methods=['POST'])
def serviceofferconfirmed():
key = request.headers.get('X-API-Key')
users=mongo.db.users
api_record=users.find_one({'name':"apikey"})
actual_API_key=api_record['X-API-Key']
if key==actual_API_key:
offer={"id": request.json["id"],
"termsAndConditions":request.json["termsAndConditions"],
"offererBranchId":request.json["offererBranchId"],
"requesterBranchId": request.json["requesterBranchId"],
"accepted":request.json["accepted"],
"services":request.json["services"] # Here I need help to match the schema.
}
users.insert(offer)
return "Service Data Successfully Stored"
return jsonify("Pleae check your API Key or URL")
I wish to receive the whole data which are many strings and store the data under the field name 'services'.
you can use isinstance("str", request.json["services"])
if you don't want value as string for services
if not isinstance("str", request.json["services"]):
// your code..........