I am trying to get column names from my postgres sql table using psycopg2 but it is returning unordered column list not same as how columns are shown in table.
This is how database table look when saved as pandas dataframe:
cur.execute("Select * from actor")
tupples = cur.fetchall()
cur.execute("select column_name from information_schema.columns where table_name = 'actor'")
column_name = cur.fetchall()
df = pd.DataFrame(tupples,columns = column_name)
(actor_id,) (last_update,) (first_name,) (last_name,)
1 PENELOPE GUINESS 2006-02-15 04:34:33
2 NICK WAHLBERG 2006-02-15 04:34:33
3 ED CHASE 2006-02-15 04:34:33
4 JENNIFER DAVIS 2006-02-15 04:34:33
5 JOHNNY LOLLOBRIGIDA 2006-02-15 04:34:33
This is how database table looked like when i see in pgadmin2:
I just want the column_name to return the column names of sql table as shown in image.
Related
CREATE TABLE temp (
id UINTEGER,
name VARCHAR,
age UINTEGER
);
CREATE SEQUENCE serial START 1;
Insertion with series works just fine:
INSERT INTO temp VALUES(nextval('serial'), 'John', 13)
How I can use the sequence with pandas dataframe?
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
con.execute("INSERT INTO temp SELECT * FROM df")
RuntimeError: Binder Error: table temp has 3 columns but 2 values were supplied
I don't want to iterate item by item. The goal is to efficiently insert 1000s of items from python to DB. I'm ok to change pandas to something else.
Can't you have nextval('serial') as part of your select query when reading the df?
e.g.,
con.execute("INSERT INTO temp SELECT nextval('serial'), Name, Age FROM df")
I am using jupyter notebook to access Teradata database.
Assume I have a dataframe
Name Age
Sam 5
Tom 6
Roy 7
I want to let the whole column "Name" content become the WHERE condition of a sql query.
query = '''select Age
from xxx
where Name in (Sam, Tom, Roy)'''
age = pd.read_sql(query,conn)
How to format the column so that the whole column can be insert to the sql statement automatically instead of manually paste the column content?
Join the Name column and insert into the query using f-string:
query = f'''select Age
from xxx
where Name in ({", ".join(df.Name)})'''
print(query)
select Age
from xxx
where Name in (Sam, Tom, Roy)
This question already has answers here:
How can I bind a list to a parameter in a custom query in SQLAlchemy?
(9 answers)
Closed 2 years ago.
my table in a database is as follow
Username city Type
Anna Paris abc
Marc london abc
erica rome AF
Sara Newyork cbd
silvia paris AD
I have a list contains string values
typelist = {'abc', 'cbd'}
and i want to query my database using sqlalchemy , to get data from a table where a column type equals the values in the list :
Username city Type
Anna Paris abc
Marc london abc
Sara Newyork cbd
im trying this code
sql = "SELECT * FROM table WHERE data IN :values"
query = sqlalchemy.text(sql).bindparams(values=tuple(typelist))
conn.engine.execute(query)
but it return just one value from the typelist not all the list values .
Username city Type
Sara Newyork cbd
sql = "SELECT * FROM table WHERE data IN :values"
query = sqlalchemy.text(sql).bindparams(sqlalchemy.bindparam("values", expanding=True))
conn.engine.execute(query, {"values": typelist})
Reference: https://docs.sqlalchemy.org/en/13/core/sqlelement.html#sqlalchemy.sql.expression.bindparam.params.expanding
My solution will work but you will need to format your string like this
sql = "SELECT * FROM table WHERE data IN ('data1', 'data2', 'data3')"
No need to use bind param here. Use this if you dont get any proper solution
You could use a dynamic SQL approach where you create a string from your list values and add the string to your SELECT statement.
queryList = ['abc', 'def']
def list_to_string(inList):
strTemp = """'"""
for x in inList:
strTemp += str(x) + """','"""
return strTemp[:-2]
sql = """SELECT * FROM table WHERE data in (""" + list_to_string(queryList) + """)"""
print(sql)
Im new to SQL and trying to unlearn what I know in python. I have a script where I connect to the odbc of SSMS to work with data in Python:
import pyodbc
import pandas as pd
#odbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=PMZZ315\RION;'
'Database=Warehouse;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
df = pd.read_sql_query("SELECT [LetId],[StreetAddressLine1],[CompanyName] FROM Dim.Let", conn)
df
df.head()
#print(df.columns)
# Select duplicate rows except first occurrence based on all columns
duplicateRowsDF = df[df.duplicated(['CompanyName','StreetAddressLine1'])]
#print("Duplicate Rows except first occurrence based on all columns are :")
print(duplicateRowsDF)
duplicateRowsDF.to_csv("duplicateRowsDFodbc.csv")
What function in SQL can substitute the df.duplicated function? All I am trying to do is detect duplicate records ignoring the first instance if the company name and street address are repeated
Reprex of output dataset:
LetId StreetAddressLine1 CompanyName
32 1451 West Brimson View Court Palmer
405 1808 North Lonion Ave Ozark
465 4223 Monty Hwy Alabama
SQL tables represent unordered sets. Ordering is only provided by columns in the data. There is no "first" without an ordering. Let me assume that letid defines the ordering.
The canonical way in SQL uses row_number():
select t.*
from (select t.*,
row_number() over (partition by CompanyName, StreetAddressLine1 order by letid) as seqnum
from t
) t
where seqnum = 1;
I have the following data frame
ipdb> csv_data
country sale date trans_factor
0 India 403171 12/01/2012 1
1 Bhutan 394096 12/01/2012 2
2 Nepal super 12/01/2012 3
3 madhya 355883 12/01/2012 4
4 sudan man 12/01/2012 5
As of now i am using below code to insert data in table, like if table already exists, drop it and create new table
csv_file_path = data_mapping_record.csv_file_path
original_csv_header = pandas.read_csv(csv_file_path).columns.tolist()
csv_data = pandas.read_csv(csv_file_path, skiprows=[0], names=original_csv_header, infer_datetime_format=True)
table_name = data_mapping_record.csv_file_path.split('/')[-1].split('.')[0]
engine = create_engine(
'postgresql://username:password#localhost:5432/pandas_data')
# Delete table if already exits
engine.execute("""DROP TABLE IF EXISTS "%s" """ % (table_name))
# Write the pandas dataframe to database using sqlalchemy and pands.to_sql
csv_data_frame.to_sql(table_name, engine, chunksize=1000)
But what i need is, without deleting the table, if table already exists just append the data to the already existing one, is there any way in pandas to_sql method ?
IIUC you can simply use if_exists='append' parameter:
csv_data_frame.to_sql(table_name, engine, if_exists='append', chunksize=1000)
from docs:
if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’
fail: If
table exists, do nothing.
replace: If table exists, drop it, recreate
it, and insert data.
append: If table exists, insert data. Create if
does not exist.