Best approach to JSON info to insert into a database

Best approach to JSON info to insert into a database - python

I have managed to extract from an API some information but the format its in is hard for a novice programmer like me. I can save it to a file or move it to a new list etc. but what stumps me is should I not mess with the data and insert it as is, or - do I make it into a human type format and basically deconstruct it to use after?
The JSON was already difficult as it was in a nested dictionary, and the value was a list. So after trying things out I want it to actually sit in a database. I am using postgresql as the database for now and am learning python.
response = requests.post(url3, headers=headers)
jsonResponse = response.json()
my_data = jsonResponse['message_response']['scanresults'][:]
store_list = []
for item in my_data:
dev_details = {"mac":None, "user":None, "resource_name":None}
dev_details['mac'] = item['mac_address']
dev_details['user'] = item['agent_logged_on_users']
dev_details['devName'] = item['resource_name']
store_list.append(dev_details)
try:
connection = psycopg2.connect(
user="",
other_info="")
# create cursor to perform db actions
curs = connection.cursor()
sql = "INSERT INTO public.tbl_devices (mac, user, devName) VALUES (%(mac)s, %(user)s, %(devName)s);"
curs.execute(sql, store_list)
connection.commit()
finally:
if (connection):
curs.close()
connection.close()
print("Connection terminated")
I have ended up with a dictionary as records inside a list:
[{rec1},{rec2}..etc]
And naturally putting the info in the database it is complaining about "list indices must be integers or slices" so wanting some advice on A) the way to add this into a database table or B) use a different approach.
Many thanks in advance

Good that you ask! The answer is almost certainly that you should not just dump the JSON into the database as it is. That makes things easy in the beginning, but you'll pay the price when you try to query or modify the data later.
For example, if you have data like
[
{ "name": "a", "keys": [1, 2, 3] },
{ "name": "b", "keys": [4, 5, 6] }
]
create tables
CREATE TABLE key_list (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
mytable_id BIGINT REFERENCES mytable NOT NULL
);
CREATE TABLE key (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
value integer NOT NULL,
key_list_id bigint REFERENCES key_kist NOT NULL
);
and store the values in that fashion.

Related

SQL connection problems. ProgrammingError: Incorrect number of bindings supplied. The current statement uses 10, and there are 120 supplied

I have seen there are a lot of posts like this. I have also considered the feedback on the posts but there is a new error regarding incorrect number of bindings.
I created a table on SQL
conn = sqlite3.connect('AQM_2022.db')
c = conn.cursor()
c.execute('''CREATE TABLE Reg2
(CPI,
UNR INT NOT NULL,
M1 INT NOT NULL,
M2 INT NOT NULL,
IMP INT NOT NULL,
EXP INT NOT NULL,
RetailSales INT NOT NULL,
GBBalance INT NOT NULL,
PPI INT NOT NULL,
const INT)''')
print("Table created successfully")*
And i want to export following numbers to my SQL database:
index1=dfGB.index.strftime('%Y-%m-%d %H-%M-%S')
dfGB['Date1']=index1
dfGB.head(5)
I converted it into lists
records_to_insert = dfGB.values.tolist()
records_to_insert
But when i want to export it to SQL:
c = conn.cursor()
c.executemany("INSERT INTO Reg2(CPI,UNR,M1,M2,IMP,EXP,RetailSales,GBBalance,PPI,const) VALUES (?,?,?,?,?,?,?,?,?,?)", [records_to_insert])
conn.commit()
con.close()
The following error pops up:
ProgrammingError: Incorrect number of bindings supplied. The current statement uses 10, and there are 120 supplied.
Does somebody know what the problem could be?
Best regards

You need to provide a list of rows to sqlite3.Cursor.executemany:
You are providing a flat list of 120 values.
Something along the lines of
recs = dfGB.values.tolist()
recs = [recs [v:v+10] for v in range(0,len(recs), 10]
should provide you with a correctly chunked list of list of 10 items each.
If your list goes into million of elements you may want to chunk iteratively instead of creating a new list: How do you split a list into evenly sized chunks?

Targeting specific values from JSON API and inserting into Postgresql, using Python

Right now i am able to connect to the url api and my database. I am trying to insert data from the url to the postgresql database using psycopg2. I dont fully understand how to do this, and this is all i could come up with to do this.
import urllib3
import json
import certifi
import psycopg2
from psycopg2.extras import Json
http = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where())
url = '<API-URL>'
headers = urllib3.util.make_headers(basic_auth='<user>:<passowrd>')
r = http.request('GET', url, headers=headers)
data = json.loads(r.data.decode('utf-8'))
def insert_into_table(data):
for item in data['issues']:
item['id'] = Json(item['id'])
with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
with conn.cursor() as cursor:
query = """
INSERT into
Countries
(revenue)
VALUES
(%(id)s);
"""
cursor.executemany(query, data)
conn.commit()
insert_into_table(data)
So this code give me a TypeError: string indices must be integers on cursor.executemany(query, data)
So i know that json.loads brings back a type object and that json.dumps brings a type string . I wasn't sure which one i should be using. and i know i am completely missing something on how im targeting the 'id' value, and inserting it into the query.
Also a little about the API, it is very large and complex and eventually i'll have to go down multiple trees to grab certain values, here is an example of what i'm pulling from.
I am trying to grab "id" under "issues" and not "issue type"
{
"expand": "<>",
"startAt": 0,
"maxResults": 50,
"total": 13372,
"issues": [
{
"expand": "<>",
"id": "41508",
"self": "<>",
"key": "<>",
"fields": {
"issuetype": {
"self": "<>",
"id": "1",
"description": "<>",
"iconUrl": "<>",
"name": "<>",
"subtask": <>,
"avatarId": <>
},

First, extract ids into a list of tuples:
ids = list((item['id'],) for item in data['issues'])
# example ids: [('41508',), ('41509',)]
Next use the function extras.execute_values():
from psycopg2 import extras
query = """
INSERT into Countries (revenue)
VALUES %s;
"""
extras.execute_values(cursor, query, ids)
Why I was getting type errors?
The second argument of the function executemany(query, vars_list) should be a sequence while data is an object which elements cannot be accessed by integer indexes.
Why to use execute_values() instead of executemany()?
Because of performance, the first function executes a single query with multiple arguments, while the second one executes as many queries as arguments.
Note, that by default the third argument of execute_values() is a list of tuples, so we extracted ids just in this way.
If you have to insert values into more than one column, each tuple in the list should contain all the values for a single inserted row, example:
values = list((item['id'], item['key']) for item in data['issues'])
query = """
INSERT into Countries (id, revenue)
VALUES %s;
"""
extras.execute_values(cur, query, values)

If you're trying to get just the id and insert it into your table, you should try
ids = []
for i in data['issues']:
ids.append(i['id'])
Then you can pass your ids list to you cursor.executemany function.

The issue you have is not in the way you are parsing your JSON, it occurs when you try to insert it into your table using cursor.executemany().
data is a single object, Are you attempting to insert all of the data your fetch returns into your table all at once? Or are you trying to insert a specific part of the data (a list of issue IDs)?
You are passing data into your cursor.executemany call. data is an object. I believe you wish to pass data.issues which is the list of issues that you modified.
If you only wish to insert the ids into the table try this:
def insert_into_table(data):
with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
with conn.cursor() as cursor:
query = """
INSERT into
Countries
(revenue)
VALUES
(%(id)s);
"""
for item in data['issues']:
item['id'] = Json(item['id'])
cursor.execute(query, item['id')
conn.commit()
insert_into_table(data)
If you wish keep the efficiency of using cursor.executemany() You need create an array of the IDs, as the current object structure doesn't arrange them the way the cursor.executemany() requires.

How to save dictionaries of different lengths to the same table in database?

What would be the most elegant way to save multiple dictionaries - most of them following the same structure, but some having more/less keys - to the same SQL database table?
The steps I can think of are the following:
Determine which dictionary has the most keys and then create a table which follows the dictionary's keys order.
Sort every dictionary to match this column order.
Insert each dictionary's values into the table. Do not insert anything (possible?) if for a particular table column no key exists in the dictionary.
Some draft code I have:
man1dict = {
'name':'bartek',
'surname': 'wroblewski',
'age':32,
}
man2dict = {
'name':'bartek',
'surname': 'wroblewski',
'city':'wroclaw',
'age':32,
}
with sqlite3.connect('man.db') as conn:
cursor = conn.cursor()
#create table - how do I create it automatically from man2dict (the longer one) dicionary, also assigning the data type?
cursor.execute('CREATE TABLE IF NOT EXISTS People(name TEXT, surname TEXT, city TEXT, age INT)')
#show table
cursor.execute('SELECT * FROM People')
print(cursor.fetchall())
#insert into table - this will give 'no such table' error if dict does not follow table column order
cursor.execute('INSERT INTO People VALUES('+str(man1dict.values())+')', conn)

Use NoSQL databases such as MongoDB for this purpose. They will handle these themselves. Using relational data for something that is not relational, this is an anti-pattern. This will break your code, degrade your application's scalability and when you want to change the table structure, it will more cumbersome to do so.

It might be easiest to save the dict as pickle and then unpickle it later. ie
import pickle, sqlite3
# SAVING
my_pickle = pickle.dumps({"name": "Bob", "age": 24})
conn = sqlite3.connect("test.db")
c = conn.cursor()
c.execute("CREATE TABLE test (dict BLOB)")
conn.commit()
c.execute("insert into test values (?)", (my_pickle,))
conn.commit()
# RETRIEVING
b = [n[0] for n in c.execute("select dict from test")]
dicts = []
for d in b:
dicts.append(pickle.loads(d))
print(dicts)
This outputs
[{"name": "Bob", "age": 24}]

How to create a BAR Chart using information from mysql table?

How can I create a plot using information from a database table(mysql)? So for the x axis I would like to use the id column and for the y axis I would like to use items in cart(number). You can use any library as you want if it gives the result that I would like to have. Now in my plot(I attached the photo) on the x label it gives an interval of 500 (0,500,1000 etc) but I would like to have the ids(1,2,3,4,...3024) and for the y label I would like to see the items in cart. I attached the code. I will appreciate any help.
import pymysql
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
conn = pymysql.connect(host='localhost', user='root', passwd='', db='amazon_cart')
cur = conn.cursor()
x = cur.execute("SELECT `id`,`items in cart(number)`,`product title` FROM `csv_9_05`")
plt.xlabel('Product Id')
plt.ylabel('Items in cart(number)')
rows = cur.fetchall()
df = pd.DataFrame([[xy for xy in x] for x in rows])
x=df[0]
y=df[1]
plt.bar(x,y)
plt.show()
cur.close()
conn.close()
SQL OF THE TABLE
DROP TABLE IF EXISTS `csv_9_05`;
CREATE TABLE IF NOT EXISTS `csv_9_05` (
`id` int(50) NOT NULL AUTO_INCREMENT,
`product title` varchar(2040) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`product price` varchar(55) NOT NULL,
`items in cart` varchar(2020) DEFAULT NULL,
`items in cart(number)` varchar(50) DEFAULT NULL,
`link` varchar(2024) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=3025 DEFAULT CHARSET=latin1;

Hm... I think restructuring your database is going to make a lot of things much easier for you. Given the schema you've provided here, I would recommend increasing the number of tables you have and doing some joins. Also, your data type for integer values (the number of items in a cart) should be int, not varchar. Your table fields shouldn't have spaces in their names, and I'm not sure why a product's id and the number of products in a cart are given a 1-to-1 relationship.
But that's a separate issue. Just rebuilding this database is probably going to be more work than the specific task you're asking about. You really should reformat your DB, and if you have questions about how, please tell me. But for now I'll try to answer your question based on your current configuration.
I'm not terribly well versed in Pandas, so I'll answer this without the use of that module.
If you declare your cursor like so:
cursor = conn.cursor(pymysql.cursors.DictCursor)
x = cur.execute("SELECT `id`,`items in cart(number)`,`product title` FROM `csv_9_05`")
Then your rows will be returned as a list of 3024 dictionaries, i.e.:
rows = cursor.fetchall()
# this will produce the following list:
# rows = [
# {'id': 1, 'items in cart(number)': 12, 'product_title': 'hammer'},
# {'id': 2, 'items in cart(number)': 5, 'product_title': 'nails'},
# {...},
# {'id': 3024, 'items in cart(number)': 31, 'product_title': 'watermelons'}
# ]
Then, plotting becomes really easy.
plt.figure(1)
plt.bar([x['id'] for x in rows], [y['items in cart(number)'] for y in rows])
plt.xlabel('Product Id')
plt.ylabel('Items in cart(number)')
plt.show()
plt.close()
I think that should do it.

Insert Values from dictionary into sqlite database

I cannot get my head around it.
I want to insert the values of a dictionary into a sqlite databse.
url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5f...1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url)) #soup it up
for data in soup.find_all('photo'): #parsing the data
dict = { #filter the data, find_all creats dictionary KEY:VALUE
"id_p": data.get('id'),
"title_p": data.get('title'),
"tags_p": data.get('tags'),
"latitude_p": data.get('latitude'),
"longitude_p": data.get('longitude'),
}
#print (dict)
connector.execute("insert into DATAGERMANY values (?,?,?,?,?)", );
connector.commit()
connector.close
My keys are id_p, title_p etc. and the values I retrieve through data.get.
However, I cannot insert them.
When I try to write id, title, tags, latitude, longitude behind ...DATAGERMANY values (?,?,?,?,?)", ); I get
NameError: name 'title' is not defined.
I tried it with dict.values and dict but then its saying table DATAGERMANY has 6 columns but 5 values were supplied.
Adding another ? gives me the error (with `dict.values): ValueError: parameters are of unsupported type
This is how I created the db and table.
#creating SQLite Database and Table
connector = sqlite3.connect("GERMANY.db") #create Database and Table, check if NOT NULL is a good idea
connector.execute('''CREATE TABLE DATAGERMANY
(id_db INTEGER PRIMARY KEY AUTOINCREMENT,
id_photo INTEGER NOT NULL,
title TEXT,
tags TEXT,
latitude NUMERIC NOT NULL,
longitude NUMERIC NOT NULL);''')
The method should work even if there is no valueto fill in into the database... That can happen as well.

You can use named parameters and insert all rows at once using executemany().
As a bonus, you would get a good separation of html-parsing and data-pipelining logic:
data = [{"id_p": photo.get('id'),
"title_p": photo.get('title'),
"tags_p": photo.get('tags'),
"latitude_p": photo.get('latitude'),
"longitude_p": photo.get('longitude')} for photo in soup.find_all('photo')]
connector.executemany("""
INSERT INTO
DATAGERMANY
(id_photo, title, tags, latitude, longitude)
VALUES
(:id_p, :title_p, :tags_p, :latitude_p, :longitude_p)""", data)
Also, don't forget to actually call the close() method:
connector.close()
FYI, the complete code:
import sqlite3
from urllib2 import urlopen
from bs4 import BeautifulSoup
url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5f...1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url))
connector = sqlite3.connect(":memory:")
cursor = connector.cursor()
cursor.execute('''CREATE TABLE DATAGERMANY
(id_db INTEGER PRIMARY KEY AUTOINCREMENT,
id_photo INTEGER NOT NULL,
title TEXT,
tags TEXT,
latitude NUMERIC NOT NULL,
longitude NUMERIC NOT NULL);''')
data = [{"id_p": photo.get('id'),
"title_p": photo.get('title'),
"tags_p": photo.get('tags'),
"latitude_p": photo.get('latitude'),
"longitude_p": photo.get('longitude')} for photo in soup.find_all('photo')]
cursor.executemany("""
INSERT INTO
DATAGERMANY
(id_photo, title, tags, latitude, longitude)
VALUES
(:id_p, :title_p, :tags_p, :latitude_p, :longitude_p)""", data)
connector.commit()
cursor.close()
connector.close()

As written, your connector.execute() statement is missing the parameters argument.
It should be used like this:
connector.execute("insert into some_time values (?, ?)", ["question_mark_1", "question_mark_2"])
Unless you need the dictionary for later, I would actually use a list or tuple instead:
row = [
data.get('id'),
data.get('title'),
data.get('tags'),
data.get('latitude'),
data.get('longitude'),
]
Then your insert statement becomes:
connector.execute("insert into DATAGERMANY values (NULL,?,?,?,?,?)", *row)
Why these changes?
The NULL in the values (NULL, ...) is so the auto-incrementing primary key will work
The list instead of the dictionary because order is important, and dictionaries don't preserve order
The *row so the five-element row variable will be expanded (see here for details).
Lastly, you shouldn't use dict as a variable name, since that's a built-in variable in Python.

If you're using Python 3.6 or above, you can do this for dicts:
dict_data = {
'filename' : 'test.txt',
'size' : '200'
}
table_name = 'test_table'
attrib_names = ", ".join(dict_data.keys())
attrib_values = ", ".join("?" * len(dict_data.keys()))
sql = f"INSERT INTO {table_name} ({attrib_names}) VALUES ({attrib_values})"
cursor.execute(sql, list(dict_data.values()))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best approach to JSON info to insert into a database - python

Related

SQL connection problems. ProgrammingError: Incorrect number of bindings supplied. The current statement uses 10, and there are 120 supplied

Targeting specific values from JSON API and inserting into Postgresql, using Python

How to save dictionaries of different lengths to the same table in database?

How to create a BAR Chart using information from mysql table?

Insert Values from dictionary into sqlite database

Categories

Resources