I'm working on a project that requires a column in postgresql to be updated by the Mapbox geocoding api to convert an address into lon,lat coordinates. I created a FOR loop to read in the address from each row. I'd like to then save the unique lon,lat coordinates created into the "coordinates" column.
However, the code I've written updates the entire "coordinates" column with the first row's lon,lat coordinates, rather than iterating and updating each row's "coordinates" column individually.
Where did I go wrong? Any help would be greatly appreciated.
Main Code
import psycopg2
import json
from psycopg2.extras import RealDictCursor
import sys
from mapbox import Geocoder
from mapboxgeocode import getCoord
import numpy as np
con = None
try:
con = psycopg2.connect(database='database', user='username')
cur = con.cursor()
cur.execute("DROP TABLE IF EXISTS permits")
cur.execute("""CREATE TABLE permits(issued_date DATE, address
VARCHAR(200), workdesc VARCHAR(600),permit_type VARCHAR(100), permit_sub_type
VARCHAR(100), anc VARCHAR(4), applicant VARCHAR(100),owner_name
VARCHAR(200))""")
cur.execute(""" COPY permits FROM '/path/to/csv/file'
WITH DELIMITER ',' CSV HEADER """)
cur.execute("""ALTER TABLE permits ADD COLUMN id SERIAL PRIMARY KEY;
UPDATE permits set id = DEFAULT;""")
cur.execute("""ALTER TABLE permits ADD COLUMN coordinates VARCHAR(80);
UPDATE permits SET coordinates = 4;""")
cur.execute("""ALTER TABLE permits ADD COLUMN city VARCHAR(80);
UPDATE permits SET city = 'Washington,DC'; ALTER TABLE permits ALTER
COLUMN city SET NOT NULL;""")
cur.execute("UPDATE permits SET address = address || ' ' || city;")
cur.execute("SELECT * FROM permits;")
for row in cur.fetchall():
test = row[1]
help = getCoord(test)
cur.execute("UPDATE permits SET coordinates = %s;", (help,) )
print(test)
con.commit()
except psycopg2.DatabaseError, e:
print 'Error %s' % e
sys.exit(1)
finally:
if con:
cur.close()
con.commit()
con.close()
Geocode Function
from mapbox import Geocoder
import numpy as np
def getCoord(address):
geocoder = Geocoder(access_token='xxxxxxxxxxxxxxxx')
response = geocoder.forward(address)
first = response.geojson()['features'][0]
row = first['geometry']['coordinates']
return row
You need to add a WHERE condition in your UPDATE statement. Without WHERE, SQL simply thinks you want to update all of the coordinates columns. A proper WHERE condition will let it know specifically which cell in the column it needs to modify.
You'll probably want to use your primary key, as it's a unique identifier. Perhaps a statement along the lines of:
cur.execute("UPDATE permits SET coordinates = %s WHERE id = %s;", (help, row[index of the id column]) )
I think the row index you need would be row[8], but you'll have to confirm that in your code. I hope that gets it working.
Related
I created a variable that stores patient ID and a count of the number of missed appointments per patient. I created a table with SQLite and I am trying to store my variable into my created table but I am getting an error of "ValueError: parameters are of unsupported type". Here is my code so far:
import pandas as pd
import sqlite3
conn = sqlite3.connect('STORE')
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS PatientNoShow")
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" text)""")
df = pd.read_csv(r"C:\missedappointments.csv")
df2 = df[df['No-show']=="Yes"]
pt_counts = df2["PatientId"].value_counts()
c.executemany("INSERT OR IGNORE INTO PatientNoShow VALUES (?, ?)", pt_counts)
Thank you in advance for any help! Still learning, so any kind of "explain to me like I'm 5" answers will be appreciated! Also, once I create my tables and store info in them, how would I print or get a visual of the output?
You wrote that the two variables are of type text in
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" text)""")
but pt_counts contains integers because it counts the values in the column PatientId, besides .executemany() needs a sequence to work properly.
This piece of code should work if PatientId is of string type:
import pandas as pd
import sqlite3
conn = sqlite3.connect('STORE')
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS PatientNoShow")
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" integer)""") # type changed
df = pd.read_csv(r"C:/Users/bob/Desktop/Trasporti_project/Matchings_locations/norm_data/standard_locations.csv")
pt_counts = df["standard_name"].value_counts()
c.executemany("INSERT OR IGNORE INTO PatientNoShow VALUES (?, ?)", pt_counts.iteritems()) # this is a sequence
I am performing an ETL task where I am querying tables in a Data Warehouse to see if it contains IDs in a DataFrame (df) which was created by joining tables from the operational database.
The DataFrame only has ID columns from each joined table in the operational database. I have created a variable for each of these columns, e.g. 'billing_profiles_id' as below:
billing_profiles_dim_id = df['billing_profiles_dim_id']
I am attempting to iterated row by row to see if the ID here is in the 'billing_profiles_dim' table of the Data Warehouse. Where the ID is not present, I want to populate the DWH tables row by row using the matching ID rows in the ODB:
for key in billing_profiles_dim_id:
sql = "SELECT * FROM billing_profiles_dim WHERE id = '"+str(key)+"'"
dwh_cursor.execute(sql)
result = dwh_cursor.fetchone()
if result == None:
sqlQuery = "SELECT * from billing_profile where id = '"+str(key)+"'"
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name"')
op_cursor = op_connector.execute(sqlInsert)
billing_profile = op_cursor.fetchone()
So far at least, I am receiving the following error:
SyntaxError: EOL while scanning string literal
This error message points at the close of barcket at
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name"')
Which I am currently unable to solve. I'm also aware that this code may run into another problem or two. Could someone please see how I can solve the current issue and please ensure that I head down the correct path?
You are missing a double tick and a +
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name+"')"
But you should really switch to prepared statements like
sql = "SELECT * FROM billing_profiles_dim WHERE id = '%s'"
dwh_cursor.execute(sql,(str(key),))
...
sqlInsert = ('INSERT INTO billing_profile_dim VALUES '
'(%s, %s )')
dwh_cursor.execute(sqlInsert , (str(key), billing_profile.name))
I am trying to update mysql database table
So I started by creating ORM Object helping me to reduce the volume of an update query by using UPDATE, WHERE Conditions
First of all, I created an ORM variable as this ORM Object is a filtered data from dataframe by using a condition in another pd.data_frame CSV
this is my simple rule as to be easy to create conditions like this
myOutlook_inBox = pd.read_csv (r'' + mydir + 'test.CSV', usecols=
['Subject','Body', 'From: (Name)', 'To: (Name)' ], encoding='latin-1')
this is simple ORM extracted data from pd.read_csv
replaced_sbj_value = myOutlook_inBox['Subject']
.str.extract(pat='(L(?:DEL|CAI|SIN).\d{5})').dropna()
and this ORM is extracting csv.column from myOutlook_inBox['Subject']
replaced_sbj_value = myOutlook_inBox['Subject']
.str.extract(pat='(L(?:DEL|CAI|SIN).\d{5})').dropna()
myOutlook_inBox["Subject"] = replaced_sbj_value
and this is a condition that I am using to filter a specific data
frm_mwfy_to_te = myOutlook_inBox.loc[myOutlook_inBox['From:
(Name)'].str.contains("mowafy", na=False)
& myOutlook_inBox['To:(Name)'].str.contains("te",
na=False)].drop_duplicates(keep=False)
frm_mwfy_to_te.Subject
and this variable is filtered rows in mysql database in a column called Subject
filtered_data = all_data
.loc[all_data.site_code.str.contains('|'.join(frm_mwfy_to_te.Subject))]
and this is my sql query, all I need now I need to create a query that's updates column called "pending" filters in a column called "site_code" and update rows which value contains filtered_data as to update or replace values in column pending with a value TE
update_db_query = engine.execute("UPDATE govtracker SET pending = 'TE'
WHERE site_code = " + filtered_data)
I am thinking that I am on the wrong scenario any Ideas to solve this
Note: I don't need to mention the old value in my query I just want to update the value in the same row according to the the filtered data frame by the new value I mentioned in the query
For example
according to frm_mwfy_to_te.Subject as Subject is a columns name called in csv file
Let's say the output of this ORM frm_mwfy_to_te.Subject
Subject
LCAIN20804
LDELE30434
LSINI20260
and this is my whole code
from sqlalchemy import create_engine
import pandas as pd
import os
import csv
import MySQLdb
from sqlalchemy import types, create_engine
# MySQL Connection
MYSQL_USER = 'root'
MYSQL_PASSWORD = 'Mharooney'
MYSQL_HOST_IP = '127.0.0.1'
MYSQL_PORT = 3306
MYSQL_DATABASE = 'mydb'
engine = create_engine('mysql+mysqlconnector://'+MYSQL_USER+'
:'+MYSQL_PASSWORD+'#'+MYSQL_HOST_IP+':'+str(MYSQL_PORT)+'/'+MYSQL_DATABASE,
echo=False)
#engine = create_engine('mysql+mysqldb://root:#localhost:123456/myDB?
charset=utf8mb4&binary_prefix=true', echo=False)
mydir = (os.getcwd()).replace('\\', '/') + '/'
all_data = pd.read_sql('SELECT * FROM govtracker', engine)
# .drop(['#'], axis=1)
myOutlook_inBox = pd.read_csv(r'' + mydir + 'test.CSV', usecols=['Subject',
'Body', 'From: (Name)', 'To: (Name)'],
encoding='latin-1')
myOutlook_inBox.columns = myOutlook_inBox.columns.str.replace(' ', '')
#this object extract 5 chars and 5 numbers from specific column in csv
replaced_sbj_value = myOutlook_inBox['Subject'].str.extract(pat='(L(?:DEL|CAI|SIN).\d{5})').dropna()
#this columns I want to filter in database
myOutlook_inBox["Subject"] = replaced_sbj_value
# this conditions filters and get and dublicate repeated data from outlook
exported file
# Condition 1 any mail from mowafy to te
frm_mwfy_to_te = myOutlook_inBox.loc[myOutlook_inBox['From:
(Name)'].str.contains("mowafy", na=False)
& myOutlook_inBox['To:
(Name)'].str.contains("te", na=False)].drop_duplicates(
keep=False)
frm_mwfy_to_te.Subject
filtered_data = all_data.loc[all_data.site_code.str.contains
('|'.join(frm_mwfy_to_te.Subject))]
print(myOutlook_inBox)
all_data.replace('\n', '', regex=True)
df = all_data.where((pd.notnull(all_data)), None)
print(df)
print("Success")
print(frm_mwfy_to_te.Subject)
print(filtered_data)
# rows = engine.execute("SELECT * FROM govtracker")#.fetchall()
# print(rows)
update_db_query = engine.execute("UPDATE govtracker SET pending = 'TE'
WHERE site_code = " + filtered_data)
"""engine = create_engine('postgresql+psycopg2://user:pswd#mydb')
df.to_sql('temp_table', engine, if_exists='replace')"""
# select_db_query = pd.read_sql("SELECT * FROM govtracker", con = engine)
#print(update_db_query)
Now let's say this is the output of my ORM then I will use this ORM as to filter and get the row of these three values from mysql database as to update every row contains these values and I want to update columns called Pending and pending status in my sql
and this is my database query
CREATE TABLE `mydb`.`govtracker` (
`id` INT,
`site_name` VARCHAR(255),
`region` VARCHAR(255),
`site_type` VARCHAR(255),
`site_code` VARCHAR(255),
`tac_name` VARCHAR(255),
`dt_readiness` DATE,
`rfs` VARCHAR(255),
`rfs_date` DATE,
`huawei_1st_submission_date` DATE,
`te_1st_submission_date` DATE,
`huawei_2nd_submission_date` DATE,
`te_2nd_submission_date` DATE,
`huawei_3rd_submission_date` DATE,
`te_3rd_submission_date` DATE,
`acceptance_date_opt` DATE,
`acceptance_date_plan` DATE,
`signed_sites` VARCHAR(255),
`as_built_date` DATE,
`as_built_status` VARCHAR(255),
`date_dt` DATE,
`dt_status` VARCHAR(255),
`shr_status` VARCHAR(255),
`dt_planned` INT(255),
`integeration_status` VARCHAR(255),
`comments_snags` LONGTEXT,
`cluster_name` LONGTEXT,
`type_standalone_colocated` VARCHAR(255),
`installed_type_standalone_colocated` VARCHAR(255),
`status` VARCHAR(255),
`pending` VARCHAR(255),
`pending_status` LONGTEXT,
`problematic_details` LONGTEXT,
`ets_tac` INT(255),
`region_r` VARCHAR(255),
`sf6_signed_date` DATE,
`sf6_signed_comment` LONGTEXT,
`comment_history` LONGTEXT,
`on_air_owner` VARCHAR(255),
`pp_owner` VARCHAR(255),
`report_comment` LONGTEXT,
`hu_opt_area_owner` VARCHAR(255),
`planning_owner` VARCHAR(255),
`po_number` VARCHAR(255),
`trigger_date` DATE,
`as_built_status_tr` VARCHAR(255)
) ENGINE = InnoDB;
Another Important note:
In excel while I using filter in some column it shows the all values in the column I selected lets to say Pending is the column I've selected which have values Accepted & PAC in progress Planning TE PP DT FM Rollout Integration Opt Team
So now all the rest columns have values like this
So should I have to create a table something like columns_values and fill this table with all these values I have, as these values are static values
It is easy to solve my case
Last Note: This database is according to an existing xlsm file but I push the data from xlsm to mysql and now mysql Is my main database, not the excel formats but I am updating mysql database through csv file not in my database the orm object frm_mwfy_to_te.Subject is an extracted data from the data frame in the csv file
Any Ideas Here?
I hope everything is clear enough
Is this material could help me or not?
https://auth0.com/blog/sqlalchemy-orm-tutorial-for-python-developers/#SQLAlchemy-ORM
It's called TL;DR
Important Note: the value of fitered data is actually as pandas Dataframe but for one column only from CSV file because I want to filter with this dataframe column values like I posted before to update some columns in my database I just started with updating one column called pending one as to see the result after that I'll update the other columns by the way the script the I want to create that to search in my database with this values in filtered data for an example I have a value called LCAIN20804 I want to take this value and to filter in database table then go the column called Huawei 1st submission date if it wasn't filled then fill with current data if it was filled go to pending column and replace the old value with TE then go to pending_status and replace the old value with waiting TE acceptance and so on that's a small part of my script I want to create
I hope this is clear enough
If you want to turn a pandas DataFrame into a SQL update statement, it may be nice to first transform it into a list of tuples, where the tuples are the new column values, and then use engine.executemany (https://stackoverflow.com/a/27743541/5015356)
values = [tuple(x) for x in filtered_data.values]
query = """
UPDATE govtracker
SET pending = 'TE'
WHERE site_code = '%s')
"""
connection = engine.connect()
update_db_query = connection.execute(query, values)
For each tuple (<sitecode>), this will execute the update statement. If you want to update more columns or expand the where clause, just add the additional columns to filtered_data, and add a new %s where you want the other value to appear.
Just make sure you keep the columns in the correct order!
How can I create a plot using information from a database table(mysql)? So for the x axis I would like to use the id column and for the y axis I would like to use items in cart(number). You can use any library as you want if it gives the result that I would like to have. Now in my plot(I attached the photo) on the x label it gives an interval of 500 (0,500,1000 etc) but I would like to have the ids(1,2,3,4,...3024) and for the y label I would like to see the items in cart. I attached the code. I will appreciate any help.
import pymysql
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
conn = pymysql.connect(host='localhost', user='root', passwd='', db='amazon_cart')
cur = conn.cursor()
x = cur.execute("SELECT `id`,`items in cart(number)`,`product title` FROM `csv_9_05`")
plt.xlabel('Product Id')
plt.ylabel('Items in cart(number)')
rows = cur.fetchall()
df = pd.DataFrame([[xy for xy in x] for x in rows])
x=df[0]
y=df[1]
plt.bar(x,y)
plt.show()
cur.close()
conn.close()
SQL OF THE TABLE
DROP TABLE IF EXISTS `csv_9_05`;
CREATE TABLE IF NOT EXISTS `csv_9_05` (
`id` int(50) NOT NULL AUTO_INCREMENT,
`product title` varchar(2040) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`product price` varchar(55) NOT NULL,
`items in cart` varchar(2020) DEFAULT NULL,
`items in cart(number)` varchar(50) DEFAULT NULL,
`link` varchar(2024) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=3025 DEFAULT CHARSET=latin1;
Hm... I think restructuring your database is going to make a lot of things much easier for you. Given the schema you've provided here, I would recommend increasing the number of tables you have and doing some joins. Also, your data type for integer values (the number of items in a cart) should be int, not varchar. Your table fields shouldn't have spaces in their names, and I'm not sure why a product's id and the number of products in a cart are given a 1-to-1 relationship.
But that's a separate issue. Just rebuilding this database is probably going to be more work than the specific task you're asking about. You really should reformat your DB, and if you have questions about how, please tell me. But for now I'll try to answer your question based on your current configuration.
I'm not terribly well versed in Pandas, so I'll answer this without the use of that module.
If you declare your cursor like so:
cursor = conn.cursor(pymysql.cursors.DictCursor)
x = cur.execute("SELECT `id`,`items in cart(number)`,`product title` FROM `csv_9_05`")
Then your rows will be returned as a list of 3024 dictionaries, i.e.:
rows = cursor.fetchall()
# this will produce the following list:
# rows = [
# {'id': 1, 'items in cart(number)': 12, 'product_title': 'hammer'},
# {'id': 2, 'items in cart(number)': 5, 'product_title': 'nails'},
# {...},
# {'id': 3024, 'items in cart(number)': 31, 'product_title': 'watermelons'}
# ]
Then, plotting becomes really easy.
plt.figure(1)
plt.bar([x['id'] for x in rows], [y['items in cart(number)'] for y in rows])
plt.xlabel('Product Id')
plt.ylabel('Items in cart(number)')
plt.show()
plt.close()
I think that should do it.
I have one database with two tables, both have a column called barcode, the aim is to retrieve barcode from one table and search for the entries in the other where extra information of that certain barcode is stored. I would like to have bothe retrieved data to be saved in a DataFrame. The problem is when I want to insert the retrieved data into DataFrame from the second query, it stores only the last entry:
import mysql.connector
import pandas as pd
cnx = mysql.connector(user,password,host,database)
query_barcode = ("SELECT barcode FROM barcode_store")
cursor = cnx.cursor()
cursor.execute(query_barcode)
data_barcode = cursor.fetchall()
Up to this point everything works smoothly, and here is the part with problem:
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_info = pd.DataFrame(cursor.fetchall())
pro_info contains only the last matching barcode information! While I want to retrieve all the information for each data_barcode match.
That's because you are consistently overriding existing pro_info with new data in each loop iteration. You should rather do something like:
query_info = ("SELECT product_code FROM product_info")
cursor.execute(query_info)
pro_info = pd.DataFrame(cursor.fetchall())
Making so many SELECTs is redundant since you can get all records in one SELECT and instantly insert them to your DataFrame.
#edit: However if you need to use the WHERE statement to fetch only specific products, you need to store records in a list until you insert them to DataFrame. So your code will eventually look like:
pro_list = []
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_list.append(cursor.fetchone())
pro_info = pd.DataFrame(pro_list)
Cheers!