I've got some weird problem here and stuck. I'm rewriting python script that generates some CSV files, and I need to write the same info on MySQL server.
I've managed to get it working... somehow.
Here is the part that creates CSV:
final_table.get_tunid_town_pivot().to_csv('result_pivot_tunid_town_' + ConsoleLog.get_curr_date_underline() + '.csv', sep=';')
And here is the part that loads data into MySQL table:
conn = pymysql.connect(host='localhost', port=3306, user='test', passwd='test', db='test')
final_table.get_tunid_town_pivot().to_sql(con=conn, name='TunID', if_exists='replace', flavor='mysql', index=False, chunksize=10000)
conn.close()
The problem is that there are 4 columns in dataframe, but in MySQL i get only one last column. I have no idea why is that happening, and I found zero similar problems. Any help please?
Your DataFrame has (probably due to the pivoting) a MultiIndex of 3 levels and only 1 column. By default, to_sql will also write the index to the SQL table, but you did specify index=False, so only the one column will be written to SQL.
So either do not specify to not include the index (so use index=True), or either reset the index and write the frame then (df.reset_index().to_sql(..., index=False)).
Also note that using a pymysql connection in to_sql is deprecated (it should give you a warning), you have to use it through an SQLAlchemy engine.
Related
I have a database connection and I insert data into a table using to_sql.
xls.to_sql(table, con=engine, if_exists='append', index=False, chunksize=10000)
I've been trying to obtain the number of rows inserted, or the number of rows in the table (considering I truncate the table before inserting new data). I've been unsuccessful though.
Can you help?
I've tried:
countRow=engine.execute("select count(*) from "+table);
print(countRow)
I find it odd that this doesn't work because I use the same thing to truncate the table. Am I missing something or doing something wrong here?
As #Deepak Tripathi suggested, I used this and it worked:
engine.execute("select count(*) from "+table+ ";").fetchall()
I am writing data from a dash app to a SQL database setup by Django and then reading back the table in a callback. I have a column that the value should either be 1 or 2 but the value is as below in the SQL database:
SQL Database view of column that should contain 1 or 2
When this is read back to a pandas dataframe it appears as b'\00x\01x... or something along those lines, which then gets read wrong when it needs to be used.
The django code for the column is:
selected = models.IntegerField(default=1, null=True)
I am writing and reading the data using SQLAlchemy. Number appeared perfectly in pandas dataframe before involving SQL. Read and write code:
select = pd.read_sql_table('temp_sel', con=engine)
select.to_sql('temp_sel', con=engine, if_exists='append', index=False)
Any help would be appreciated.
Solved by specifying the variable as an integer before sending to SQL as follows:
j = int(j)
I am trying to bulk insert pandas dataframe data into Postgresql. In Pandas dataframe I have 35 columns and in Postgresql table I have 45 columns. I am choosing 12 matching column from pandas dataframe and inserting into postgresql table. For this I am using the following code snippets:
df = pd.read_excel(raw_file_path,sheet_name = 'Sheet1',usecols=col_names) <---col_names = list of desired columns (12 columns)
cols = ','.join(list(df.columns))
tuples = [tuple(x) for x in df.to_numpy()]
query = "INSERT INTO {0}.{1} ({2}) VALUES (%%s,%%s,%%s,%%s,%%s,%%s,%%s,%%s,%%s,%%s,%%s,%%s);".format(schema_name,table_name,cols)
curr = conn.cursor()
try:
curr.executemany(query,tuples)
conn.commit()
curr.close()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
curr.close()
return 1
finally:
if conn is not None:
conn.close()
print('Database connection closed.')
When running I am getting this error:
SyntaxError: syntax error at or near "%"
LINE 1: ...it,purchase_group,indenter_name,wbs_code) VALUES (%s,%s,%s,%...
Even if I use ? in place of %%s I am still getting this error.
Can anybody throw some light on this?
P.S. I am using Postgresql version 10.
What you're doing now is actually insert a pandas dataframe one row at a time. Even if this worked, it would be an extremely slow operation. At the same time, if the data might contain strings, just placing them into a query string like this leaves you open to SQL injection.
I wouldn't reinvent the wheel. Pandas has a to_sql function that takes a dataframe and converts it into a query for you. You can specify what to do on conflict (when a row already exists).
It works with SQLAlchemy, which has excellent support for PostgreSQL. And even though it might be a new package to explore and install, you're not required to use it anywhere else to make this work.
from sqlalchemy import create_engine
engine = create_engine('postgresql://localhost:5432/mydatabase')
pd.read_excel(
raw_file_path,
sheet_name = 'Sheet1',
usecols=col_names # <---col_names = list of desired columns (12 columns)
).to_sql(
schema=schema_name,
name=table_name,
con=engine,
method='multi' # this makes it do all inserts in one go
)
Has anyone experienced this before?
I have a table with "int" and "varchar" columns - a report schedule table.
I am trying to import an excel file with ".xls" extension to this table using a python program. I am using pandas to_sql to read in 1 row of data.
Data imported is 1 row 11 columns.
Import works successfully but after the import I noticed that the datatypes in the original table have now been altered from:
int --> bigint
char(1) --> varchar(max)
varchar(30) --> varchar(max)
Any idea how I can prevent this? The switch in datatypes is causing issues in downstrean routines.
df = pd.read_excel(schedule_file,sheet_name='Schedule')
params = urllib.parse.quote_plus(r'DRIVER={SQL Server};SERVER=<<IP>>;DATABASE=<<DB>>;UID=<<UDI>>;PWD=<<PWD>>')
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = create_engine(conn_str)
table_name='REPORT_SCHEDULE'
df.to_sql(name=table_name,con=engine, if_exists='replace',index=False)
TIA
Consider using the dtype argument of pandas.DataFrame.to_sql where you pass a dictionary of SQLAlchemy types to named columns:
import sqlalchemy
...
data.to_sql(name=table_name, con=engine, if_exists='replace', index=False,
dtype={'name_of_datefld': sqlalchemy.types.DateTime(),
'name_of_intfld': sqlalchemy.types.INTEGER(),
'name_of_strfld': sqlalchemy.types.VARCHAR(length=30),
'name_of_floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True),
'name_of_booleanfld': sqlalchemy.types.Boolean}
I think this has more to do with how pandas handles the table if it exists. The "replace" value to the if_exists argument tells pandas to drop your table and recreate it. But when re-creating your table, it will do it based on its own terms (and the data stored in that particular DataFrame).
While providing column datatypes will work, doing it for every such case might be cumbersome. So I would rather truncate the table in a separate statement and then just append data to it, like so:
Instead of:
df.to_sql(name=table_name, con=engine, if_exists='replace',index=False)
I'd do:
with engine.connect() as con:
con.execute("TRUNCATE TABLE %s" % table_name)
df.to_sql(name=table_name, con=engine, if_exists='append',index=False)
The truncate statement basically drops and recreates your table too, but it's done internally by the database, and the table gets recreated with the same definition.
I currently have a Python dataframe that is 23 columns and 20,000 rows.
Using Python code, I want to write my data frame into a MSSQL server that I have the credentials for.
As a test I am able to successfully write some values into the table using the code below:
connection = pypyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=XXX;'
'uid=XXX;'
'pwd=XXX')
cursor = connection.cursor()
for index, row in df_EVENT5_15.iterrows():
cursor.execute("INSERT INTO MODREPORT(rowid, OPCODE, LOCATION, TRACKNAME)
cursor.execute("INSERT INTO MODREPORT(rowid, location) VALUES (?,?)", (5, 'test'))
connection.commit()
But how do I write all the rows in my data frame table to the MSSQL server? In order to do so, I need to code up the following steps in my Python environment:
Delete all the rows in the MSSQL server table
Write my dataframe to the server
When you say Python data frame, I'm assuming you're using a Pandas dataframe. If it's the case, then you could use the to_sql function.
df.to_sql("MODREPORT", connection, if_exists="replace")
The if_exists argument set to replace will delete all the rows in the existing table before writing the records.
I realise it's been a while since you asked but the easiest way to delete ALL the rows in the SQL server table (point 1 of the question) would be to send the command
TRUNCATE TABLE Tablename
This will drop all the data in the table but leave the table and indexes empty so you or the DBA would not need to recreate it. It also uses less of the transaction log when it runs.