I have created a sqlite database. Even though I have included the the relationship between the primary and foreign keys, when I am generating the ER diagram I am not able to see the connections between them. I am using datagrip to create the diagram. I tested other databases in datagrip and dbvisualizer and i do not have any problems with them but only in this.
ER diagram -
This is the script i used for creating two tables in the database -
def create_titles_table():
# connect to the database
conn = sqlite3.connect("imdb.db")
# create a cursor
c = conn.cursor()
print()
print("Creating titles table...")
c.execute(
"""CREATE TABLE IF NOT EXISTS titles
(titleId TEXT NOT NULL, titleType TEXT,
primaryTitle TEXT, originalTitle TEXT,
isAdult INTEGER, startYear REAL,
endYear REAL, runtimeMinutes REAL,
PRIMARY KEY (titleId)
)
"""
)
# commit changes
conn.commit()
# read the title data
df = load_data("title.basics.tsv")
# replace \N with nan
df.replace("\\N", np.nan, inplace=True)
# rename columns
df.rename(columns={"tconst": "titleId"}, inplace=True)
# drop the genres column
title_df = df.drop("genres", axis=1)
# convert the data types from str to numeric
title_df["startYear"] = pd.to_numeric(title_df["startYear"], errors="coerce")
title_df["endYear"] = pd.to_numeric(title_df["endYear"], errors="coerce")
title_df["runtimeMinutes"] = pd.to_numeric(
title_df["runtimeMinutes"], errors="coerce"
)
# insert the data into titles table
title_df.to_sql("titles", conn, if_exists="replace", index=False)
# commit changes
conn.commit()
# close the connection
conn.close()
print("Completed!")
print()
def create_ratings_table():
# connect to the database
conn = sqlite3.connect("imdb.db")
# create a cursor
c = conn.cursor()
print()
print("Creating ratings table...")
c.execute(
"""CREATE TABLE IF NOT EXISTS ratings
(titleId TEXT NOT NULL, averageRating REAL, numVotes INTEGER,
FOREIGN KEY (titleId) REFERENCES titles(titleId)
)
"""
)
# commit changes
conn.commit()
# read the data
df = load_data("title.ratings.tsv")
df.rename(columns={"tconst": "titleId"}, inplace=True)
# insert the data into the ratings table
df.to_sql("ratings", conn, if_exists="replace", index=False)
# commit changes
conn.commit()
# close the connection
conn.close()
print("Completed!")
print()
Can anyone tell me where am i making the mistake?
Related
After some data manipulation I store two columns in a txt file in a csv format as following:
result.txt ->
id,avg
0,38.0
1,56.5
3,66.5
4,48.666666666666664
then I store the data in a table, which is where i find trouble, i tried running a .sql query that stores the data successfully, but executing the same query from python doesnt seem to work for some reason.
python code->
.
.
.
open('C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/result.txt', 'w').write(res)
print(res)
try:
with mysql.connector.connect(
host="localhost",
user='root',
password='tt',
database="dp",
) as connection:
clear_table_query = "drop table if exists test_db.marks;"
create_table_query = '''
create table test_db.marks (
id varchar(255) not null,
avg varchar(255) not null,
primary key (id)
);
'''
# droping the table and recreating it works fine
add_csv_query = "LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/result.txt' INTO TABLE marks FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\\n' IGNORE 1 LINES;"
print(add_csv_query) # query is printed correctly
with connection.cursor() as cursor:
cursor.execute(clear_table_query)
cursor.execute(create_table_query)
cursor.execute(add_csv_query)
cursor.execute("SELECT * FROM test_db.marks;") # this produces -> Unread result found
except mysql.connector.Error as e:
print(e)
connection.close()
I have a SQLite table that I wanted to update. This table ('abc') already has a row inserted through some other process for id and useremail. Now, I want my query to lookup this record based on where condition (on useremail) and update the value of column logintime. I am pretty new to Sqlite so need some help in figuring it out. Code below -
creating a new table (works OK)
conn = sql.connect('/content/sample_data/userlogs.db')
c = conn.cursor()
c.execute("""CREATE TABLE IF NOT EXISTS abc (
id INTEGER PRIMARY KEY,
useremail TEXT,
logintime TEXT,
logouttime TEXT
);
""")
conn.commit()
conn.close()
code for inserting a record (works OK)
email = ['jojo#jojo.com']
conn = sql.connect('/content/sample_data/userlogs.db')
c = conn.cursor()
c.execute('insert into abc (useremail) values(?)', email)
code for updating column logintime where value in column useremail = email:
conn = sql.connect('/content/sample_data/userlogs.db')
c = conn.cursor()
now = datetime.now()
c.execute('UPDATE abc SET logintime = ? WHERE useremail = ?', (now, email))
I am having trouble with this c.execute statement.
I query 4hrs data from source PLC MS SQL db, process it with python and write the data to main Postgresql table.
While writing to main Postgres table hourly, there is a duplicate value (previous 3 hrs) -it will result in error (primary key) and prevent the transaction & python error.
So,
I create a temp PostgreSQL table without any key every time hourly
Then copy pandas dataframe to temp table
Then insert rows from temp table --> main PostgreSQL table
Drop temp PostgreSQL table
This python script runs in windows task scheduler hourly
Below is my query.
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
conn = engine.raw_connection()
cur = conn.cursor()
cur.execute("""CREATE TABLE public.table_temp
(
datetime timestamp without time zone NOT NULL,
tagid text COLLATE pg_catalog."default" NOT NULL,
mc text COLLATE pg_catalog."default" NOT NULL,
value text COLLATE pg_catalog."default",
quality text COLLATE pg_catalog."default"
)
TABLESPACE pg_default;
ALTER TABLE public.table_temp
OWNER to postgres;""");
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_temp', null="")
cur.execute("""Insert into public.table_main select * From table_temp ON CONFLICT DO NOTHING;""");
cur.execute("""DROP TABLE table_temp CASCADE;""");
conn.commit()
I would like to know if there is any efficient/faster way to do it
If I'm correct in assuming that the data is in the data frame you should just be able to do
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
df.drop_duplicates(subset=None) # Replace None with list of column names that define the primary key ex. ['column_name1', 'column_name2']
df.to_sql('table_main', engine, if_exists='append')
Edit due to comment:
If that's the case you have the right idea. You can make it more efficient by using to_sql to insert the data into the temp table first like so.
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
df.to_sql('table_temp', engine, if_exists='replace')
cur.execute("""Insert into public.table_main select * From table_temp ON CONFLICT DO NOTHING;""");
# cur.execute("""DROP TABLE table_temp CASCADE;"""); # You can drop if you want to but the replace option in to_sql will drop and recreate the table
conn.commit()
I am trying to create few tables in Postgres from pandas dataframe but I am kept getting this error.
psycopg2.errors.InvalidForeignKey: there is no unique constraint matching given keys for referenced table "titles"
After looking into this problem for hours, i finally found that when I am inserting the data into parent table from pandas dataframe, the primary key constraint gets removed for some reasons and due to that I am getting this error when trying to refernece it from another table.
But I am not having this problem when I am using pgAdmin4 to create the table and inserting few rows of data manually.
you can see when I created the tables using pgAdmin, the primary key and foreign keys are getting created as expected and I have no problem with it.
But when I try to insert the data from pandas dataframe using psycopg2 library, the primary key is not getting created.
I Can't able to understand why is this happening.
The code I am using to create the tables -
# function for faster data insertion
def psql_insert_copy(table, conn, keys, data_iter):
"""
Execute SQL statement inserting data
Parameters
----------
table : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted
"""
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)
columns = ", ".join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = "{}.{}".format(table.schema, table.name)
else:
table_name = table.name
sql = "COPY {} ({}) FROM STDIN WITH CSV".format(table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
def create_titles_table():
# connect to the database
conn = psycopg2.connect(
dbname="imdb",
user="postgres",
password=os.environ.get("DB_PASSWORD"),
host="localhost",
)
# create a cursor
c = conn.cursor()
print()
print("Creating titles table...")
c.execute(
"""CREATE TABLE IF NOT EXISTS titles(
title_id TEXT PRIMARY KEY,
title_type TEXT,
primary_title TEXT,
original_title TEXT,
is_adult INT,
start_year REAL,
end_year REAL,
runtime_minutes REAL
)
"""
)
# commit changes
conn.commit()
# read the title data
df = load_data("title.basics.tsv")
# replace \N with nan
df.replace("\\N", np.nan, inplace=True)
# rename columns
df.rename(
columns={
"tconst": "title_id",
"titleType": "title_type",
"primaryTitle": "primary_title",
"originalTitle": "original_title",
"isAdult": "is_adult",
"startYear": "start_year",
"endYear": "end_year",
"runtimeMinutes": "runtime_minutes",
},
inplace=True,
)
# drop the genres column
title_df = df.drop("genres", axis=1)
# convert the data types from str to numeric
title_df["start_year"] = pd.to_numeric(title_df["start_year"], errors="coerce")
title_df["end_year"] = pd.to_numeric(title_df["end_year"], errors="coerce")
title_df["runtime_minutes"] = pd.to_numeric(
title_df["runtime_minutes"], errors="coerce"
)
# create SQLAlchemy engine
engine = create_engine(
"postgresql://postgres:" + os.environ["DB_PASSWORD"] + "#localhost:5432/imdb"
)
# insert the data into titles table
title_df.to_sql(
"titles", engine, if_exists="replace", index=False, method=psql_insert_copy
)
# commit changes
conn.commit()
# close cursor
c.close()
# close the connection
conn.close()
print("Completed!")
print()
def create_genres_table():
# connect to the database
conn = psycopg2.connect(
dbname="imdb",
user="postgres",
password=os.environ.get("DB_PASSWORD"),
host="localhost",
)
# create a cursor
c = conn.cursor()
print()
print("Creating genres table...")
c.execute(
"""CREATE TABLE IF NOT EXISTS genres(
title_id TEXT NOT NULL,
genre TEXT,
FOREIGN KEY (title_id) REFERENCES titles(title_id)
)
"""
)
# commit changes
conn.commit()
# read the data
df = load_data("title.basics.tsv")
# replace \N with nan
df.replace("\\N", np.nan, inplace=True)
# rename columns
df.rename(columns={"tconst": "title_id", "genres": "genre"}, inplace=True)
# select only relevant columns
genres_df = df[["title_id", "genre"]].copy()
genres_df = genres_df.assign(genre=genres_df["genre"].str.split(",")).explode(
"genre"
)
# create engine
engine = create_engine(
"postgresql://postgres:" + os.environ["DB_PASSWORD"] + "#localhost:5432/imdb"
)
# insert the data into genres table
genres_df.to_sql(
"genres", engine, if_exists="replace", index=False, method=psql_insert_copy
)
# commit changes
conn.commit()
# close cursor
c.close()
# close the connection
conn.close()
print("Completed!")
print()
if __name__ == "__main__":
print()
print("Creating IMDB Database...")
# connect to the database
conn = psycopg2.connect(
dbname="imdb",
user="postgres",
password=os.environ.get("DB_PASSWORD"),
host="localhost",
)
# create the titles table
create_titles_table()
# create genres table
create_genres_table()
# close the connection
conn.close()
print("Done with Everything!")
print()
I think the problem is to_sql(if_exists="replace"). Try using to_sql(if_exists="append") - my understanding is that "replace" drops the whole table and creates a new one with no constraints.
I have an Sqlite database with a table that includes a geo column. When I add this table into QGIS as a layer, it shows a map of Chicago with polygons as shown below. I think, the polygon points are stored in the column named geo.
I am trying to plot the same in Python to be able to add more things on top of this layout using Matplotlib. To begin with, I could load the table named "Zone" in Python using the following (that I wrote):
import sqlite3 # Package for SQLite
### BEGIN DEFINING A READER FUNCTION ###
def Conditional_Sqdb_reader(Sqdb,Tablename,Columns,Condition):
conn = sqlite3.connect(Sqdb) # Connects the file to Python
print("\nConnected to %s.\n"%(Sqdb))
conn.execute('pragma foreign_keys = off') # Allows making changes into the SQLite file
print("SQLite Foreign_keys are unlocked...\n")
c = conn.cursor() # Assigns c as the cursor
print("Importing columns: %s \nin table %s from %s.\n"%(Columns,Tablename,Sqdb))
c.execute('''SELECT {columns}
FROM {table}
{condition}'''.format(table=Tablename,
columns=Columns,
condition=Condition)) # Selects the table to read/fetch
Sql_headers = [description[0] for description in c.description]
Sql_columns = c.fetchall() # Reads the table and saves into the memory as Sql_rows
print("Importing completed...\n")
conn.commit() # Commits all the changes made
conn.execute('pragma foreign_keys = on') # Locks the SQLite file
print("SQLite Foreign_keys are locked...\n")
conn.close() # Closes the SQLite file
print("Disconnected from %s.\n"%(Sqdb))
return Sql_headers,Sql_columns
### END DEFINING A READER FUNCTION ###
Sqdb = '/mypath/myfile.sqlite'
Tablename = "Zone" # Change this with your desired table to play with
Columns = """*""" # Changes this with your desired columns to import
Condition = '' # Add your condition and leave blank if no condition
headings,data = Conditional_Sqdb_reader(Sqdb,Tablename,Columns,Condition)
The data on the table is stored in "data" as a list. So, data[0][-1] yields the geo of the polygon of the first row, which looks something like: b'\x00\x01$i\x00\x00#\xd9\x94\x8b\xd6<\x1bAb\xda7\xb6]\xb1QA\xf0\xf7\x8b\x19UC\x1bA\x9c\xde\xc5\r\xc3\xb1QA|\x03\x00\x00\x00\x01\x00\x00\x00\x06\x00\x00\x00Hlw\xef-C\x1bA\x9c\xde\xc5\r\xc3\xb1QA\xf0\xf7\x8b\x19UC\x1bAv\xc0u)^\xb1QA\xbcw\xd4\x88\xf1<\x1bAb\xda7\xb6]\xb1QA\xa5\xdc}n\xd7<\x1bA\x84.\xe1r\xbe\xb1QA#\xd9\x94\x8b\xd6<\x1bA\xce\x8eT\xef\xc1\xb1QAHlw\xef-C\x1bA\x9c\xde\xc5\r\xc3\xb1QA\xfe' I do not know how to decode this and convert to a meaningful series of points, but that is what it is and QGIS apparently can do it with no hassle. How can I plot all these polygons in Python while being able to add other things within the Matplotlib world later on?
After spending quite a few hours and learning a lot of things, I found the solution. Basically, using mod_spatialite in sqlite3 was the key per here. When I embedded this package, it allowed me use spatialite functions such as ST_As_Text which converts the sql binary string to a string starting with POLYGON((.... which is sort of a geopanda type entry. There is plenty of sources explaining how we can plot such data. In essence, here is my code (compare it to the one in my question):
import sqlite3 # Package for SQLite
### BEGIN DEFINING A READER FUNCTION ###
def Conditional_Sqdb_reader(Sqdb,Tablename,Columns,Condition):
conn = sqlite3.connect(Sqdb) # Connects the file to Python
conn.enable_load_extension(True)
#mod_spatialite (recommended)
conn.execute('SELECT load_extension("mod_spatialite.so")')
conn.execute('SELECT InitSpatialMetaData(1);')
print("\nConnected to %s.\n"%(Sqdb))
conn.execute('pragma foreign_keys = off') # Allows making changes into the SQLite file
print("SQLite Foreign_keys are unlocked...\n")
c = conn.cursor() # Assigns c as the cursor
print("Importing columns: %s \nin table %s from %s.\n"%(Columns,Tablename,Sqdb))
c.execute('''SELECT {columns}
FROM {table}
{condition}'''.format(table=Tablename,
columns=Columns,
condition=Condition)) # Selects the table to read/fetch
Sql_headers = [description[0] for description in c.description]
Sql_columns = c.fetchall() # Reads the table and saves into the memory as Sql_rows
print("Importing completed...\n")
conn.commit() # Commits all the changes made
conn.execute('pragma foreign_keys = on') # Locks the SQLite file
print("SQLite Foreign_keys are locked...\n")
conn.close() # Closes the SQLite file
print("Disconnected from %s.\n"%(Sqdb))
return Sql_headers,Sql_columns
### END DEFINING A READER FUNCTION ###
Sqdb = '/Users/tanercokyasar/Desktop/Qgis/chicago2018-Supply.sqlite'
Tablename = "Zone" # Change this with your desired table to play with
Columns = """*,
ST_AsText(GEO) as GEO""" # Changes this with your desired columns to import
Condition = '' # Add your condition and leave blank if no condition
headings,data = Conditional_Sqdb_reader(Sqdb,Tablename,Columns,Condition)