Right now i am able to connect to the url api and my database. I am trying to insert data from the url to the postgresql database using psycopg2. I dont fully understand how to do this, and this is all i could come up with to do this.
import urllib3
import json
import certifi
import psycopg2
from psycopg2.extras import Json
http = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where())
url = '<API-URL>'
headers = urllib3.util.make_headers(basic_auth='<user>:<passowrd>')
r = http.request('GET', url, headers=headers)
data = json.loads(r.data.decode('utf-8'))
def insert_into_table(data):
for item in data['issues']:
item['id'] = Json(item['id'])
with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
with conn.cursor() as cursor:
query = """
INSERT into
Countries
(revenue)
VALUES
(%(id)s);
"""
cursor.executemany(query, data)
conn.commit()
insert_into_table(data)
So this code give me a TypeError: string indices must be integers on cursor.executemany(query, data)
So i know that json.loads brings back a type object and that json.dumps brings a type string . I wasn't sure which one i should be using. and i know i am completely missing something on how im targeting the 'id' value, and inserting it into the query.
Also a little about the API, it is very large and complex and eventually i'll have to go down multiple trees to grab certain values, here is an example of what i'm pulling from.
I am trying to grab "id" under "issues" and not "issue type"
{
"expand": "<>",
"startAt": 0,
"maxResults": 50,
"total": 13372,
"issues": [
{
"expand": "<>",
"id": "41508",
"self": "<>",
"key": "<>",
"fields": {
"issuetype": {
"self": "<>",
"id": "1",
"description": "<>",
"iconUrl": "<>",
"name": "<>",
"subtask": <>,
"avatarId": <>
},
First, extract ids into a list of tuples:
ids = list((item['id'],) for item in data['issues'])
# example ids: [('41508',), ('41509',)]
Next use the function extras.execute_values():
from psycopg2 import extras
query = """
INSERT into Countries (revenue)
VALUES %s;
"""
extras.execute_values(cursor, query, ids)
Why I was getting type errors?
The second argument of the function executemany(query, vars_list) should be a sequence while data is an object which elements cannot be accessed by integer indexes.
Why to use execute_values() instead of executemany()?
Because of performance, the first function executes a single query with multiple arguments, while the second one executes as many queries as arguments.
Note, that by default the third argument of execute_values() is a list of tuples, so we extracted ids just in this way.
If you have to insert values into more than one column, each tuple in the list should contain all the values for a single inserted row, example:
values = list((item['id'], item['key']) for item in data['issues'])
query = """
INSERT into Countries (id, revenue)
VALUES %s;
"""
extras.execute_values(cur, query, values)
If you're trying to get just the id and insert it into your table, you should try
ids = []
for i in data['issues']:
ids.append(i['id'])
Then you can pass your ids list to you cursor.executemany function.
The issue you have is not in the way you are parsing your JSON, it occurs when you try to insert it into your table using cursor.executemany().
data is a single object, Are you attempting to insert all of the data your fetch returns into your table all at once? Or are you trying to insert a specific part of the data (a list of issue IDs)?
You are passing data into your cursor.executemany call. data is an object. I believe you wish to pass data.issues which is the list of issues that you modified.
If you only wish to insert the ids into the table try this:
def insert_into_table(data):
with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
with conn.cursor() as cursor:
query = """
INSERT into
Countries
(revenue)
VALUES
(%(id)s);
"""
for item in data['issues']:
item['id'] = Json(item['id'])
cursor.execute(query, item['id')
conn.commit()
insert_into_table(data)
If you wish keep the efficiency of using cursor.executemany() You need create an array of the IDs, as the current object structure doesn't arrange them the way the cursor.executemany() requires.
Related
I'm a beginner in python and i want to finish my school project.
My program consists of displaying the results of a query from my postgresql database and then making a function that formats the results of this query into a JSON file.
My code :
import psycopg2
import json
//Allows the connection to the database and displays the results in the terminal
def run(stmt):
cur = psycopg2.connect(database='x', user='s', password='!', host='1').cursor()
cur.execute(stmt)
result = cur.fetchall()
print(list(result))
//query that displays data from the database
stmt = 'select cast(row_to_json(row) as text) from (SELECT id, nom, lon, lat FROM u.b_l_b_s JOIN u.b_s ON b_s.id = b_s_id JOIN u.b_l ON b_l.id = b_l_id JOIN u.b_s_t ON b_s_t.id = b_s_t_id ) row;'
run(stmt)
//Output :
[('{"id":370,"nom":"MORO","lon":47.466001,"lat":-18.852607}',), ('{"id":46,"nom":"NOROE","lon":47.473006,"lat":-18.852907}',), ('{"id":45,"nom":"ANORO PLAQUE","lon":47.473404,"lat":-18.850003}',)
I want all the data of the query to be recorded in a loop (max 10) like this:
{
'stations': {
result['label']: {
'id': result['id'],
'nom': result['nom'],
'position': {
'lat': result['lat'],
'lon': result['lon']
}
}
result['label']: {
'id': result['id'],
'nom': result['nom'],
'position': {
'lat': result['lat'],
'lon': result['lon']
}
}
}
}
Thank you in advance to those who will take the time to answer my question.
You're getting close, but you may want to convert your values at the end.
There are functions in PostgreSQL to return a JSON array (https://www.postgresql.org/docs/9.3/functions-json.html), but I think they're more complex than what you need.
Instead you can use other features of Psycopg2. It returns tuples by default, but you can ask it to return dictionaries. This saves you from the manual conversion you're doing.
from psycopg2.extras import RealDictCursor
def run(stmt):
conn = psycopg2.connect(database='x', user='s', password='!', host='1')
cur = conn.cursor(cursor_factory=RealDictCursor)
...
Next, you are asked to return only 10 records. You could do this in your query by adding LIMIT 10 at the end
SELECT id, nom, lon, lat FROM u.b_l_b_s JOIN u.b_s ON b_s.id = b_s_id JOIN u.b_l ON b_l.id = b_l_id JOIN u.b_s_t ON b_s_t.id = b_s_t_id LIMIT 10
or you can do that when you fetch using fetchmany instead of fetchall. This is especially good if the table is large.
result = cur.fetchmany(10)
When you use RealDictCursor your results look a bit weird, its a list of 'RealDictRow' objects, but they still behave as a dictionary.
>>> result
[RealDictRow([('id', 370), ...etc
>>> result[0]['id']
370
OK, so you have your ten records, and they are in a list dictionaries. Now you can loop over the list. Assuming you're returning a variable called final_result it could look like this:
final_result = {'stations': []} # we collect each result as an element of this list
for record in result:
final_result['stations'].append(record)
# finally, we convert it to JSON
json.dumps(final_result)
I have managed to extract from an API some information but the format its in is hard for a novice programmer like me. I can save it to a file or move it to a new list etc. but what stumps me is should I not mess with the data and insert it as is, or - do I make it into a human type format and basically deconstruct it to use after?
The JSON was already difficult as it was in a nested dictionary, and the value was a list. So after trying things out I want it to actually sit in a database. I am using postgresql as the database for now and am learning python.
response = requests.post(url3, headers=headers)
jsonResponse = response.json()
my_data = jsonResponse['message_response']['scanresults'][:]
store_list = []
for item in my_data:
dev_details = {"mac":None, "user":None, "resource_name":None}
dev_details['mac'] = item['mac_address']
dev_details['user'] = item['agent_logged_on_users']
dev_details['devName'] = item['resource_name']
store_list.append(dev_details)
try:
connection = psycopg2.connect(
user="",
other_info="")
# create cursor to perform db actions
curs = connection.cursor()
sql = "INSERT INTO public.tbl_devices (mac, user, devName) VALUES (%(mac)s, %(user)s, %(devName)s);"
curs.execute(sql, store_list)
connection.commit()
finally:
if (connection):
curs.close()
connection.close()
print("Connection terminated")
I have ended up with a dictionary as records inside a list:
[{rec1},{rec2}..etc]
And naturally putting the info in the database it is complaining about "list indices must be integers or slices" so wanting some advice on A) the way to add this into a database table or B) use a different approach.
Many thanks in advance
Good that you ask! The answer is almost certainly that you should not just dump the JSON into the database as it is. That makes things easy in the beginning, but you'll pay the price when you try to query or modify the data later.
For example, if you have data like
[
{ "name": "a", "keys": [1, 2, 3] },
{ "name": "b", "keys": [4, 5, 6] }
]
create tables
CREATE TABLE key_list (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
mytable_id BIGINT REFERENCES mytable NOT NULL
);
CREATE TABLE key (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
value integer NOT NULL,
key_list_id bigint REFERENCES key_kist NOT NULL
);
and store the values in that fashion.
What would be the most elegant way to save multiple dictionaries - most of them following the same structure, but some having more/less keys - to the same SQL database table?
The steps I can think of are the following:
Determine which dictionary has the most keys and then create a table which follows the dictionary's keys order.
Sort every dictionary to match this column order.
Insert each dictionary's values into the table. Do not insert anything (possible?) if for a particular table column no key exists in the dictionary.
Some draft code I have:
man1dict = {
'name':'bartek',
'surname': 'wroblewski',
'age':32,
}
man2dict = {
'name':'bartek',
'surname': 'wroblewski',
'city':'wroclaw',
'age':32,
}
with sqlite3.connect('man.db') as conn:
cursor = conn.cursor()
#create table - how do I create it automatically from man2dict (the longer one) dicionary, also assigning the data type?
cursor.execute('CREATE TABLE IF NOT EXISTS People(name TEXT, surname TEXT, city TEXT, age INT)')
#show table
cursor.execute('SELECT * FROM People')
print(cursor.fetchall())
#insert into table - this will give 'no such table' error if dict does not follow table column order
cursor.execute('INSERT INTO People VALUES('+str(man1dict.values())+')', conn)
Use NoSQL databases such as MongoDB for this purpose. They will handle these themselves. Using relational data for something that is not relational, this is an anti-pattern. This will break your code, degrade your application's scalability and when you want to change the table structure, it will more cumbersome to do so.
It might be easiest to save the dict as pickle and then unpickle it later. ie
import pickle, sqlite3
# SAVING
my_pickle = pickle.dumps({"name": "Bob", "age": 24})
conn = sqlite3.connect("test.db")
c = conn.cursor()
c.execute("CREATE TABLE test (dict BLOB)")
conn.commit()
c.execute("insert into test values (?)", (my_pickle,))
conn.commit()
# RETRIEVING
b = [n[0] for n in c.execute("select dict from test")]
dicts = []
for d in b:
dicts.append(pickle.loads(d))
print(dicts)
This outputs
[{"name": "Bob", "age": 24}]
I am using Python2.7, Pymongo and MongoDB. I'm trying to get rid of the default _id values in MongoDB. Instead, I want certain fields of columns to go as _id.
For example:
{
"_id" : ObjectId("568f7df5ccf629de229cf27b"),
"LIFNR" : "10099",
"MANDT" : "100",
"BUKRS" : "2646",
"NODEL" : "",
"LOEVM" : ""
}
I would like to concatenate LIFNR+MANDT+BUKRS as 100991002646 and hash it to achieve uniqueness and store it as new _id.
But how far hashing helps for unique ids? And how do I achieve it?
I understood that using default hash function in Python gives different results for different machines (32 bit / 64 bit). If it is true, how would I go about generating _ids?
But I need LIFNR+MANDT+BUKRS to be used however. Thanks in advance.
First you can't update the _id field. Instead you should create a new field and set it value to the concatenated string. To return the concatenated value you need to use the .aggregate() method which provides access to the aggregation pipeline. The only stage in the pipeline is the $project stage where you use the $concat operator which concatenates strings and returns the concatenated string.
From there you then iterate the cursor and update each document using "bulk" operations.
bulk = collection.initialize_ordered_bulk_op()
count = 0
cursor = collection.aggregate([
{"$project": {"value": {"$concat": ["$LIFNR", "$MANDT", "$BUKRS"]}}}
])
for item in cursor:
bulk.find({'_id': item['_id']}).update_one({'$set': {'id': item['value']}})
count = count + 1
if count % 200 == 0:
bulk.execute()
if count > 0:
bulk.execute()
MongoDB 3.2 deprecates Bulk() and its associated methods so you will need to use the bulk_write() method.
from pymongo import UpdateOne
requests = []
for item in cursor:
requests.append(UpdateOne({'_id': item['_id']}, {'$set': {'id': item['value']}}))
collection.bulk_write(requests)
Your documents will then look like this:
{'BUKRS': '2646',
'LIFNR': '10099',
'LOEVM': '',
'MANDT': '100',
'NODEL': '',
'_id': ObjectId('568f7df5ccf629de229cf27b'),
'id': '100991002646'}
I cannot get my head around it.
I want to insert the values of a dictionary into a sqlite databse.
url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5f...1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url)) #soup it up
for data in soup.find_all('photo'): #parsing the data
dict = { #filter the data, find_all creats dictionary KEY:VALUE
"id_p": data.get('id'),
"title_p": data.get('title'),
"tags_p": data.get('tags'),
"latitude_p": data.get('latitude'),
"longitude_p": data.get('longitude'),
}
#print (dict)
connector.execute("insert into DATAGERMANY values (?,?,?,?,?)", );
connector.commit()
connector.close
My keys are id_p, title_p etc. and the values I retrieve through data.get.
However, I cannot insert them.
When I try to write id, title, tags, latitude, longitude behind ...DATAGERMANY values (?,?,?,?,?)", ); I get
NameError: name 'title' is not defined.
I tried it with dict.values and dict but then its saying table DATAGERMANY has 6 columns but 5 values were supplied.
Adding another ? gives me the error (with `dict.values): ValueError: parameters are of unsupported type
This is how I created the db and table.
#creating SQLite Database and Table
connector = sqlite3.connect("GERMANY.db") #create Database and Table, check if NOT NULL is a good idea
connector.execute('''CREATE TABLE DATAGERMANY
(id_db INTEGER PRIMARY KEY AUTOINCREMENT,
id_photo INTEGER NOT NULL,
title TEXT,
tags TEXT,
latitude NUMERIC NOT NULL,
longitude NUMERIC NOT NULL);''')
The method should work even if there is no valueto fill in into the database... That can happen as well.
You can use named parameters and insert all rows at once using executemany().
As a bonus, you would get a good separation of html-parsing and data-pipelining logic:
data = [{"id_p": photo.get('id'),
"title_p": photo.get('title'),
"tags_p": photo.get('tags'),
"latitude_p": photo.get('latitude'),
"longitude_p": photo.get('longitude')} for photo in soup.find_all('photo')]
connector.executemany("""
INSERT INTO
DATAGERMANY
(id_photo, title, tags, latitude, longitude)
VALUES
(:id_p, :title_p, :tags_p, :latitude_p, :longitude_p)""", data)
Also, don't forget to actually call the close() method:
connector.close()
FYI, the complete code:
import sqlite3
from urllib2 import urlopen
from bs4 import BeautifulSoup
url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5f...1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url))
connector = sqlite3.connect(":memory:")
cursor = connector.cursor()
cursor.execute('''CREATE TABLE DATAGERMANY
(id_db INTEGER PRIMARY KEY AUTOINCREMENT,
id_photo INTEGER NOT NULL,
title TEXT,
tags TEXT,
latitude NUMERIC NOT NULL,
longitude NUMERIC NOT NULL);''')
data = [{"id_p": photo.get('id'),
"title_p": photo.get('title'),
"tags_p": photo.get('tags'),
"latitude_p": photo.get('latitude'),
"longitude_p": photo.get('longitude')} for photo in soup.find_all('photo')]
cursor.executemany("""
INSERT INTO
DATAGERMANY
(id_photo, title, tags, latitude, longitude)
VALUES
(:id_p, :title_p, :tags_p, :latitude_p, :longitude_p)""", data)
connector.commit()
cursor.close()
connector.close()
As written, your connector.execute() statement is missing the parameters argument.
It should be used like this:
connector.execute("insert into some_time values (?, ?)", ["question_mark_1", "question_mark_2"])
Unless you need the dictionary for later, I would actually use a list or tuple instead:
row = [
data.get('id'),
data.get('title'),
data.get('tags'),
data.get('latitude'),
data.get('longitude'),
]
Then your insert statement becomes:
connector.execute("insert into DATAGERMANY values (NULL,?,?,?,?,?)", *row)
Why these changes?
The NULL in the values (NULL, ...) is so the auto-incrementing primary key will work
The list instead of the dictionary because order is important, and dictionaries don't preserve order
The *row so the five-element row variable will be expanded (see here for details).
Lastly, you shouldn't use dict as a variable name, since that's a built-in variable in Python.
If you're using Python 3.6 or above, you can do this for dicts:
dict_data = {
'filename' : 'test.txt',
'size' : '200'
}
table_name = 'test_table'
attrib_names = ", ".join(dict_data.keys())
attrib_values = ", ".join("?" * len(dict_data.keys()))
sql = f"INSERT INTO {table_name} ({attrib_names}) VALUES ({attrib_values})"
cursor.execute(sql, list(dict_data.values()))