DataFrame to MongoDB - python

I have a high-nested dictionary with pandas dataframes like:
{HEAD:
{NameOne:
{TAG : VALUE}
}
{NameTwo : DataFrame}
{NameThree : DataFrame}
}
and I want to send it to MongoDB via PyMongo
client = MongoClient('mylink')
db = client['DB_NAME']
collection = db['COLLECTION_NAME']
file = {...}
collection.insert_one(file)
But I have this error:
bson.errors.InvalidDocument: cannot encode object: he show my
Dataframe here of type:

Pymongo needs to be able to convert each element of the dictionary into something it can store as a BSON document. If you try and insert somehting it can't convert (such as a pandas dataframe), you will see the InvalidDocument exception.
You will have to convert each of the embedded dataframes to something pymongo can encode before you can store the document in MongoDB.
You could start with df.to_dict().

Related

insert list,values into mysql table

I need to insert the list,some values into table
I have tried executemany but it's not worked.
list1=['a','b','c','d','e',['f','g','h','i']]
query="insert into "+metadata_table_name+"(created_by,Created_on,File_Path,Category,File_name,Fields) values(%s,%s,%s,%s,%s,%s)" # inserting the new record
cursor.executemany(query,list1)
list should be entered into the last(Fileds) Column
Please help me.
Thanks in Advance.
You have to think about data types. Does MySQL have a suitable data type for python's nested lists? I don't know such types.
A possible solution is to use JSON encoding and storing the lists as strings in MySQL table. Encode last element of your list to JSON string:
import json
list1=['a','b','c','d','e',['f','g','h','i']]
query_params = list1[0:-1] + [json.dumps(list1[:-1])]
query="insert into "+metadata_table_name+" .(created_by,Created_on,File_Path,Category,File_name,Fields) values(%s,%s,%s,%s,%s,%s)" # inserting the new record
cursor.executemany(query, query_params)
For using stored data later you have to convert back JSON string to a list:
fields_list = json.loads(fields_str)

How to parse a response from API query (in JSON {"key":value} format) into a list of variables so that it can be input into a SQL table?

I am using Python's requests module to query an API to retrieve information, which comes in the form of a JSON object like: {"key":value, "key": value, "key": value, etc.}. There are multiple key:value pairs, and the values are either strings, integers, or floats. I would like to put this information into a SQL table, where the "key" corresponds to a column heading and the "value" corresponds to an entry in that column. How can I parse a JSON object, so that it instead of being in {"key":value} format, the "key" becomes a variable with the value "value"? In other words:
From: {"key":value}
To: key = value
The code for querying the API looks like:
req = requests.get('URL')
If I convert it into text, so you can see the JSON object from the API query looks like:
print(data)
print(type(data))
What gets returned is in the form of this:
[{"key1":19.0,"key2":"D4AE057C1E4A","key3":-66,"key4":1530240344}]
<class 'str'>
What do I need to do to get it to look like:
key1=19.0,
key2="D4AE057C1E4A",
key3:-66,
key4:1530240344
?? I need the data to be in the form of variables, so that I can then put it into my code for inputting the object as a row into the table in the SQL database. That code looks like:
session = examples.get_session('name_of_database', 'name_of_user')
x=examples.name_of_table(key1=19,key2="D4AE057C1E4A",key3=-66,key4=1530240344)
session.add(x)
session.commit()
import json
data = json.loads('[{"key1":19.0,"key2":"D4AE057C1E4A","key3":-66,"key4":1530240344}]')
for key,value in data[0].items():
exec("%s = '%s'" % (key,value))
This iterates through the dict, and creates variables from the key names

Inserting nested json into a column in Postgres

I'm writing an insert function and I have a nested dictionary that I want to insert into a column in postgres, is there a way to insert the whole json into the column? Lets say I have to insert the value of the key "val" into a column, how can I achieve that? I'm using psycopg2 library in my python code.
"val": {
"name": {
"mike": "2.3",
"roy": "4.2"
}
}
Yes, you can extract nested JSON using at least Postgres 9.4 and up by casting your string to JSON and using the "Get JSON object field by key" operator:
YOUR_STRING ::CAST JSON_OPERATOR
'{"val":1}' ::JSON -> 'val'
This works in at least Postgres 9.4 and up:
INSERT INTO my_table (my_json)
VALUES ('"val":{"name":{"mike":"2.3"}}'::JSON->'val');
Depending on your column type you may choose to cast to JSONB instead of JSON (the above will only work for TEXT and JSON).
INSERT INTO my_table (my_json)
VALUES ('"val":{"name":{"mike":"2.3"}}'::JSONB->'val');
See: https://www.postgresql.org/docs/9.5/static/functions-json.html

Why does db.insert(dict) add _id key to the dict object while using pymongo

I am using pymongo in the following way:
from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(a)
print a
This prints
{'_id': ObjectId('53ad61aa06998f07cee687c3'), 'key1': 'value1'}
on the console.
I understand that _id is added to the mongo document. But why is this added to my python dictionary too? I did not intend to do this. I am wondering what is the purpose of this? I could be using this dictionary for other purposes to and the dictionary gets updated as a side effect of inserting it into the document? If I have to, say, serialise this dictionary into a json object, I will get a
ObjectId('53ad610106998f0772adc6cb') is not JSON serializable
error. Should not the insert function keep the value of the dictionary same while inserting the document in the db.
As many other database systems out there, Pymongo will add the unique identifier necessary to retrieve the data from the database as soon as it's inserted (what would happen if you insert two dictionaries with the same content {'key1':'value1'} in the database? How would you distinguish that you want this one and not that one?)
This is explained in the Pymongo docs:
When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection.
If you want to change this behavior, you could give the object an _id attribute before inserting. In my opinion, this is a bad idea. It would easily lead to collisions and you would lose juicy information that is stored in a "real" ObjectId, such as creation time, which is great for sorting and things like that.
>>> a = {'_id': 'hello', 'key1':'value1'}
>>> collection.insert(a)
'hello'
>>> collection.find_one({'_id': 'hello'})
{u'key1': u'value1', u'_id': u'hello'}
Or if your problem comes when serializing to Json, you can use the utilities in the BSON module:
>>> a = {'key1':'value1'}
>>> collection.insert(a)
ObjectId('53ad6d59867b2d0d15746b34')
>>> from bson import json_util
>>> json_util.dumps(collection.find_one({'_id': ObjectId('53ad6d59867b2d0d15746b34')}))
'{"key1": "value1", "_id": {"$oid": "53ad6d59867b2d0d15746b34"}}'
(you can verify that this is valid json in pages like jsonlint.com)
_id act as a primary key for documents, unlike SQL databases, its required in mongodb.
to make _id serializable, you have 2 options:
set _id to a JSON serializable datatype in your documents before inserting them (e.g. int, str) but keep in mind that it must be unique per document.
use a custom BSON serializion encoder/decoder classes:
from bson.json_util import default as bson_default
from bson.json_util import object_hook as bson_object_hook
class BSONJSONEncoder(json.JSONEncoder):
def default(self, o):
return bson_default(o)
class BSONJSONDecoder(json.JSONDecoder):
def __init__(self, **kwrgs):
JSONDecoder.__init__(self, object_hook=bson_object_hook)
as #BorrajaX answered already want to add some more.
_id is a unique identifier, when a document is inserted to the collection it generates with some random numbers. Either you can set your own id or you can use what MongoDB has created for you.
As documentation mentions about this.
For your case, you can simply ignore this key by using del keyword del a["_id"].
or
if you need _id for further operations you can use dumps from bson module.
import json
from bson.json_util import loads as bson_loads, dumps as bson_dumps
a["_id"]=json.loads(bson_dumps(a["_id"]))
or
before inserting document you can add your custom _id you won't need serialize your dictionary
a["_id"] = "some_id"
db1.collection1.insert(a)
This behavior can be circumvented by using the copy module. This will pass a copy of the dictionary to pymongo leaving the original intact. Based on the code snippet in your example, one should modifiy it like so:
import copy
from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(copy.copy(a))
print a
Clearly the docs answer your question
MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents, though it contains more data types than JSON.
The value of a field can be any of the BSON data types, including other documents, arrays, and arrays of documents. The following document contains values of varying types:
var mydoc = {
_id: ObjectId("5099803df3f4948bd2f98391"),
name: { first: "Alan", last: "Turing" },
birth: new Date('Jun 23, 1912'),
death: new Date('Jun 07, 1954'),
contribs: [ "Turing machine", "Turing test", "Turingery" ],
views : NumberLong(1250000)
}
to know more about BSON

mongodb update the value for all keys based on a python function

I am using pymongo
I have a mongo db in which all documents have a
"timestamp" : "25-OCT-2011"
So a string is stored in the key timestamp in all documents.
I want to apply a python function as mentioned below on these string dates and convert them into a datetime object. What's the best way to do this in mongodb?
import datetime
def make_date(str_date):
return datetime.datetime.strptime(str_date, "%d-%b-%Y")
To fit your needs:
import bson
for document in list(database.collection.find({ })):
converted_date = make_date(document['timestamp'])
database.collection.update(
{ "_id": bson.objectid.ObjectId(document['_id']) },
{ "converted": converted_date }
)
I use the ObjectId as a query to be sure that I update the document that I just got. I do that because I'm unsure whether timestamp collisions would lead to unwanted consequences.

Categories