How to create a BAR Chart using information from mysql table? - python

How can I create a plot using information from a database table(mysql)? So for the x axis I would like to use the id column and for the y axis I would like to use items in cart(number). You can use any library as you want if it gives the result that I would like to have. Now in my plot(I attached the photo) on the x label it gives an interval of 500 (0,500,1000 etc) but I would like to have the ids(1,2,3,4,...3024) and for the y label I would like to see the items in cart. I attached the code. I will appreciate any help.
import pymysql
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
conn = pymysql.connect(host='localhost', user='root', passwd='', db='amazon_cart')
cur = conn.cursor()
x = cur.execute("SELECT `id`,`items in cart(number)`,`product title` FROM `csv_9_05`")
plt.xlabel('Product Id')
plt.ylabel('Items in cart(number)')
rows = cur.fetchall()
df = pd.DataFrame([[xy for xy in x] for x in rows])
x=df[0]
y=df[1]
plt.bar(x,y)
plt.show()
cur.close()
conn.close()
SQL OF THE TABLE
DROP TABLE IF EXISTS `csv_9_05`;
CREATE TABLE IF NOT EXISTS `csv_9_05` (
`id` int(50) NOT NULL AUTO_INCREMENT,
`product title` varchar(2040) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`product price` varchar(55) NOT NULL,
`items in cart` varchar(2020) DEFAULT NULL,
`items in cart(number)` varchar(50) DEFAULT NULL,
`link` varchar(2024) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=3025 DEFAULT CHARSET=latin1;

Hm... I think restructuring your database is going to make a lot of things much easier for you. Given the schema you've provided here, I would recommend increasing the number of tables you have and doing some joins. Also, your data type for integer values (the number of items in a cart) should be int, not varchar. Your table fields shouldn't have spaces in their names, and I'm not sure why a product's id and the number of products in a cart are given a 1-to-1 relationship.
But that's a separate issue. Just rebuilding this database is probably going to be more work than the specific task you're asking about. You really should reformat your DB, and if you have questions about how, please tell me. But for now I'll try to answer your question based on your current configuration.
I'm not terribly well versed in Pandas, so I'll answer this without the use of that module.
If you declare your cursor like so:
cursor = conn.cursor(pymysql.cursors.DictCursor)
x = cur.execute("SELECT `id`,`items in cart(number)`,`product title` FROM `csv_9_05`")
Then your rows will be returned as a list of 3024 dictionaries, i.e.:
rows = cursor.fetchall()
# this will produce the following list:
# rows = [
# {'id': 1, 'items in cart(number)': 12, 'product_title': 'hammer'},
# {'id': 2, 'items in cart(number)': 5, 'product_title': 'nails'},
# {...},
# {'id': 3024, 'items in cart(number)': 31, 'product_title': 'watermelons'}
# ]
Then, plotting becomes really easy.
plt.figure(1)
plt.bar([x['id'] for x in rows], [y['items in cart(number)'] for y in rows])
plt.xlabel('Product Id')
plt.ylabel('Items in cart(number)')
plt.show()
plt.close()
I think that should do it.

Related

Python store variable with two columns into table created with SQLite

I created a variable that stores patient ID and a count of the number of missed appointments per patient. I created a table with SQLite and I am trying to store my variable into my created table but I am getting an error of "ValueError: parameters are of unsupported type". Here is my code so far:
import pandas as pd
import sqlite3
conn = sqlite3.connect('STORE')
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS PatientNoShow")
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" text)""")
df = pd.read_csv(r"C:\missedappointments.csv")
df2 = df[df['No-show']=="Yes"]
pt_counts = df2["PatientId"].value_counts()
c.executemany("INSERT OR IGNORE INTO PatientNoShow VALUES (?, ?)", pt_counts)
Thank you in advance for any help! Still learning, so any kind of "explain to me like I'm 5" answers will be appreciated! Also, once I create my tables and store info in them, how would I print or get a visual of the output?
You wrote that the two variables are of type text in
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" text)""")
but pt_counts contains integers because it counts the values in the column PatientId, besides .executemany() needs a sequence to work properly.
This piece of code should work if PatientId is of string type:
import pandas as pd
import sqlite3
conn = sqlite3.connect('STORE')
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS PatientNoShow")
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" integer)""") # type changed
df = pd.read_csv(r"C:/Users/bob/Desktop/Trasporti_project/Matchings_locations/norm_data/standard_locations.csv")
pt_counts = df["standard_name"].value_counts()
c.executemany("INSERT OR IGNORE INTO PatientNoShow VALUES (?, ?)", pt_counts.iteritems()) # this is a sequence

Best approach to JSON info to insert into a database

I have managed to extract from an API some information but the format its in is hard for a novice programmer like me. I can save it to a file or move it to a new list etc. but what stumps me is should I not mess with the data and insert it as is, or - do I make it into a human type format and basically deconstruct it to use after?
The JSON was already difficult as it was in a nested dictionary, and the value was a list. So after trying things out I want it to actually sit in a database. I am using postgresql as the database for now and am learning python.
response = requests.post(url3, headers=headers)
jsonResponse = response.json()
my_data = jsonResponse['message_response']['scanresults'][:]
store_list = []
for item in my_data:
dev_details = {"mac":None, "user":None, "resource_name":None}
dev_details['mac'] = item['mac_address']
dev_details['user'] = item['agent_logged_on_users']
dev_details['devName'] = item['resource_name']
store_list.append(dev_details)
try:
connection = psycopg2.connect(
user="",
other_info="")
# create cursor to perform db actions
curs = connection.cursor()
sql = "INSERT INTO public.tbl_devices (mac, user, devName) VALUES (%(mac)s, %(user)s, %(devName)s);"
curs.execute(sql, store_list)
connection.commit()
finally:
if (connection):
curs.close()
connection.close()
print("Connection terminated")
I have ended up with a dictionary as records inside a list:
[{rec1},{rec2}..etc]
And naturally putting the info in the database it is complaining about "list indices must be integers or slices" so wanting some advice on A) the way to add this into a database table or B) use a different approach.
Many thanks in advance
Good that you ask! The answer is almost certainly that you should not just dump the JSON into the database as it is. That makes things easy in the beginning, but you'll pay the price when you try to query or modify the data later.
For example, if you have data like
[
{ "name": "a", "keys": [1, 2, 3] },
{ "name": "b", "keys": [4, 5, 6] }
]
create tables
CREATE TABLE key_list (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
mytable_id BIGINT REFERENCES mytable NOT NULL
);
CREATE TABLE key (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
value integer NOT NULL,
key_list_id bigint REFERENCES key_kist NOT NULL
);
and store the values in that fashion.

Set Sqlite query results as variables [duplicate]

This question already has answers here:
How can I get dict from sqlite query?
(16 answers)
Closed 4 years ago.
Issue:
Hi, right now I am making queries to sqlite and assigning the result to variables like this:
Table structure: rowid, name, something
cursor.execute("SELECT * FROM my_table WHERE my_condition = 'ExampleForSO'")
found_record = cursor.fetchone()
record_id = found_record[0]
record_name = found_record[1]
record_something = found_record[2]
print(record_name)
However, it's very possible that someday I have to add a new column to the table. Let's put the example of adding that column:
Table structure: rowid, age, name, something
In that scenario, if we run the same code, name and something will be assigned wrongly and the print will not get me the name but the age, so I have to edit the code manually to fit the current index. However, I am working now with tables of more than 100 fields for a complex UI and doing this is tiresome.
Desired output:
I am wondering if there is a better way to catch results by using dicts or something like this:
Note for lurkers: The next snipped is made up code that does not works, do not use it.
cursor.execute_to(my_dict,
'''SELECT rowid as my_dict["id"],
name as my_dict["name"],
something as my_dict["something"]
FROM my_table WHERE my_condition = "ExampleForSO"''')
print(my_dict['name'])
I am probably wrong with this approach, but that's close to what I want. That way if I don't access the results as an index, and if add a new column, no matter where it's, the output would be the same.
What is the correct way to achieve it? Is there any other alternatives?
You can use namedtuple and then specify connection.row_factory in sqlite. Example:
import sqlite3
from collections import namedtuple
# specify my row structure using namedtuple
MyRecord = namedtuple('MyRecord', 'record_id record_name record_something')
con = sqlite3.connect(":memory:")
con.isolation_level = None
con.row_factory = lambda cursor, row: MyRecord(*row)
cur = con.cursor()
cur.execute("CREATE TABLE my_table (record_id integer PRIMARY KEY, record_name text NOT NULL, record_something text NOT NULL)")
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Andrej', 'This is something'))
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Andrej', 'This is something too'))
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Adrika', 'This is new!'))
for row in cur.execute("SELECT * FROM my_table WHERE record_name LIKE 'A%'"):
print(f'ID={row.record_id} NAME={row.record_name} SOMETHING={row.record_something}')
con.close()
Prints:
ID=1 NAME=Andrej SOMETHING=This is something
ID=2 NAME=Andrej SOMETHING=This is something too
ID=3 NAME=Adrika SOMETHING=This is new!

Using python function output to update individual postgresql rows

I'm working on a project that requires a column in postgresql to be updated by the Mapbox geocoding api to convert an address into lon,lat coordinates. I created a FOR loop to read in the address from each row. I'd like to then save the unique lon,lat coordinates created into the "coordinates" column.
However, the code I've written updates the entire "coordinates" column with the first row's lon,lat coordinates, rather than iterating and updating each row's "coordinates" column individually.
Where did I go wrong? Any help would be greatly appreciated.
Main Code
import psycopg2
import json
from psycopg2.extras import RealDictCursor
import sys
from mapbox import Geocoder
from mapboxgeocode import getCoord
import numpy as np
con = None
try:
con = psycopg2.connect(database='database', user='username')
cur = con.cursor()
cur.execute("DROP TABLE IF EXISTS permits")
cur.execute("""CREATE TABLE permits(issued_date DATE, address
VARCHAR(200), workdesc VARCHAR(600),permit_type VARCHAR(100), permit_sub_type
VARCHAR(100), anc VARCHAR(4), applicant VARCHAR(100),owner_name
VARCHAR(200))""")
cur.execute(""" COPY permits FROM '/path/to/csv/file'
WITH DELIMITER ',' CSV HEADER """)
cur.execute("""ALTER TABLE permits ADD COLUMN id SERIAL PRIMARY KEY;
UPDATE permits set id = DEFAULT;""")
cur.execute("""ALTER TABLE permits ADD COLUMN coordinates VARCHAR(80);
UPDATE permits SET coordinates = 4;""")
cur.execute("""ALTER TABLE permits ADD COLUMN city VARCHAR(80);
UPDATE permits SET city = 'Washington,DC'; ALTER TABLE permits ALTER
COLUMN city SET NOT NULL;""")
cur.execute("UPDATE permits SET address = address || ' ' || city;")
cur.execute("SELECT * FROM permits;")
for row in cur.fetchall():
test = row[1]
help = getCoord(test)
cur.execute("UPDATE permits SET coordinates = %s;", (help,) )
print(test)
con.commit()
except psycopg2.DatabaseError, e:
print 'Error %s' % e
sys.exit(1)
finally:
if con:
cur.close()
con.commit()
con.close()
Geocode Function
from mapbox import Geocoder
import numpy as np
def getCoord(address):
geocoder = Geocoder(access_token='xxxxxxxxxxxxxxxx')
response = geocoder.forward(address)
first = response.geojson()['features'][0]
row = first['geometry']['coordinates']
return row
You need to add a WHERE condition in your UPDATE statement. Without WHERE, SQL simply thinks you want to update all of the coordinates columns. A proper WHERE condition will let it know specifically which cell in the column it needs to modify.
You'll probably want to use your primary key, as it's a unique identifier. Perhaps a statement along the lines of:
cur.execute("UPDATE permits SET coordinates = %s WHERE id = %s;", (help, row[index of the id column]) )
I think the row index you need would be row[8], but you'll have to confirm that in your code. I hope that gets it working.

Pandas to_sql fails on duplicate primary key

I'd like to append to an existing table, using pandas df.to_sql() function.
I set if_exists='append', but my table has primary keys.
I'd like to do the equivalent of insert ignore when trying to append to the existing table, so I would avoid a duplicate entry error.
Is this possible with pandas, or do I need to write an explicit query?
There is unfortunately no option to specify "INSERT IGNORE". This is how I got around that limitation to insert rows into that database that were not duplicates (dataframe name is df)
for i in range(len(df)):
try:
df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
except IntegrityError:
pass #or any other action
You can do this with the method parameter of to_sql:
from sqlalchemy.dialects.mysql import insert
def insert_on_duplicate(table, conn, keys, data_iter):
insert_stmt = insert(table.table).values(list(data_iter))
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(insert_stmt.inserted)
conn.execute(on_duplicate_key_stmt)
df.to_sql('trades', dbConnection, if_exists='append', chunksize=4096, method=insert_on_duplicate)
for older versions of sqlalchemy, you need to pass a dict to on_duplicate_key_update. i.e., on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(dict(insert_stmt.inserted))
please note that the "if_exists='append'" related to the existing of the table and what to do in case the table not exists.
The if_exists don't related to the content of the table.
see the doc here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’
fail: If table exists, do nothing.
replace: If table exists, drop it, recreate it, and insert data.
append: If table exists, insert data. Create if does not exist.
Pandas has no option for it currently, but here is the Github issue. If you need this feature too, just upvote for it.
The for loop method above slow things down significantly. There's a method parameter you can pass to panda.to_sql to help achieve customization for your sql query
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql
The below code should work for postgres and do nothing if there's a conflict with primary key "unique_code". Change your insert dialects for your db.
def insert_do_nothing_on_conflicts(sqltable, conn, keys, data_iter):
"""
Execute SQL statement inserting data
Parameters
----------
sqltable : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted
"""
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import table, column
columns=[]
for c in keys:
columns.append(column(c))
if sqltable.schema:
table_name = '{}.{}'.format(sqltable.schema, sqltable.name)
else:
table_name = sqltable.name
mytable = table(table_name, *columns)
insert_stmt = insert(mytable).values(list(data_iter))
do_nothing_stmt = insert_stmt.on_conflict_do_nothing(index_elements=['unique_code'])
conn.execute(do_nothing_stmt)
pd.to_sql('mytable', con=sql_engine, method=insert_do_nothing_on_conflicts)
Pandas doesn't support editing the actual SQL syntax of the .to_sql method, so you might be out of luck. There's some experimental programmatic workarounds (say, read the Dataframe to a SQLAlchemy object with CALCHIPAN and use SQLAlchemy for the transaction), but you may be better served by writing your DataFrame to a CSV and loading it with an explicit MySQL function.
CALCHIPAN repo: https://bitbucket.org/zzzeek/calchipan/
I had trouble where I was still getting the IntegrityError
...strange but I just took the above and worked it backwards:
for i, row in df.iterrows():
sql = "SELECT * FROM `Table_Name` WHERE `key` = '{}'".format(row.Key)
found = pd.read_sql(sql, con=Engine)
if len(found) == 0:
df.iloc[i:i+1].to_sql(name="Table_Name",if_exists='append',con = Engine)
In my case, I was trying to insert new data in an empty table, but some of the rows are duplicated, almost the same issue here, I "may" think about fetching existing data and merge with the new data I got and continue in process, but this is not optimal, and may work only for small data, not a huge tables.
As pandas do not provide any kind of handling for this situation right now, I was looking for a suitable workaround for this, so I made my own, not sure if that will work or not for you, but I decided to control my data first instead of luck of waiting if that worked or not, so what I did is removing duplicates before I call .to_sql so if any error happens, I know more about my data and make sure I know what is going on:
import pandas as pd
def write_to_table(table_name, data):
df = pd.DataFrame(data)
# Sort by price, so we remove the duplicates after keeping the lowest only
data.sort(key=lambda row: row['price'])
df.drop_duplicates(subset=['id_key'], keep='first', inplace=True)
#
df.to_sql(table_name, engine, index=False, if_exists='append', schema='public')
So in my case, I wanted to keep the lowest price of rows (btw I was passing an array of dict for data), and for that, I did sorting first, not necessary but this is an example of what I mean with control the data that I want to keep.
I hope this will help someone who got almost the same as my situation.
When you use SQL Server you'll get a SQL error when you enter a duplicate value into a table that has a primary key constraint. You can fix it by altering your table:
CREATE TABLE [dbo].[DeleteMe](
[id] [uniqueidentifier] NOT NULL,
[Value] [varchar](max) NULL,
CONSTRAINT [PK_DeleteMe]
PRIMARY KEY ([id] ASC)
WITH (IGNORE_DUP_KEY = ON)); <-- add
Taken from https://dba.stackexchange.com/a/111771.
Now your df.to_sql() should work again.
The solutions by Jayen and Huy Tran helped me a lot, but they didn't work straight out of the box. The problem I faced with Jayen code is that it requires that the DataFrame columns be exactly as those of the database. This was not true in my case as there were some DataFrame columns that I won't write to the database.
I modified the solution so that it considers the column names.
from sqlalchemy.dialects.mysql import insert
import itertools
def insertWithConflicts(sqltable, conn, keys, data_iter):
"""
Execute SQL statement inserting data, whilst taking care of conflicts
Used to handle duplicate key errors during database population
This is my modification of the code snippet
from https://stackoverflow.com/questions/30337394/pandas-to-sql-fails-on-duplicate-primary-key
The help page from https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.values
proved useful.
Parameters
----------
sqltable : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted. It is a zip object.
The length of it is equal to the chunck size passed in df_to_sql()
"""
vals = [dict(zip(z[0],z[1])) for z in zip(itertools.cycle([keys]),data_iter)]
insertStmt = insert(sqltable.table).values(vals)
doNothingStmt = insertStmt.on_duplicate_key_update(dict(insertStmt.inserted))
conn.execute(doNothingStmt)
I faced the same issue and I adopted the solution provided by #Huy Tran for a while, until my tables started to have schemas.
I had to improve his answer a bit and this is the final result:
def do_nothing_on_conflicts(sql_table, conn, keys, data_iter):
"""
Execute SQL statement inserting data
Parameters
----------
sql_table : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted
"""
columns = []
for c in keys:
columns.append(column(c))
if sql_table.schema:
my_table = table(sql_table.name, *columns, schema=sql_table.schema)
# table_name = '{}.{}'.format(sql_table.schema, sql_table.name)
else:
my_table = table(sql_table.name, *columns)
# table_name = sql_table.name
# my_table = table(table_name, *columns)
insert_stmt = insert(my_table).values(list(data_iter))
do_nothing_stmt = insert_stmt.on_conflict_do_nothing()
conn.execute(do_nothing_stmt)
How to use it:
history.to_sql('history', schema=schema, con=engine, method=do_nothing_on_conflicts)
The idea is the same as #Nfern's but uses recursive function to divide the df into half in each iteration to skip the row/rows causing the integrity violation.
def insert(df):
try:
# inserting into backup table
df.to_sql("table",con=engine, if_exists='append',index=False,schema='schema')
except:
rows = df.shape[0]
if rows>1:
df1 = df.iloc[:int(rows/2),:]
df2 = df.iloc[int(rows/2):,:]
insert(df1)
insert(df2)
else:
print(f"{df} not inserted. Integrity violation, duplicate primary key/s")

Categories