select data from table and compare with dataframe - python

I have a dataframe like this
Name age city
John 31 London
Pierre 35 Paris
...
Kasparov 40 NYC
I would like to select data from redshift city table using sql where city are included in city of the dataframe
query = select * from city where ....
Can you help me to accomplish this query?
Thank you

Jeril's answer is going to right direction but not complete. df.unique() result is not a string it's series. You need a string in your where clause
# create a string for cities to use in sql, the way sql expects the string
unique_cities = ','.join("'{0}'".format(c) for c in list(df['city'].unique()))
# output
'London','Paris'
#sql query would be
query = f"select * from city where name in ({unique_cities})"
The code above is assuming you are using python 3.x
Please let me know if this solves your issue

You can try the following:
unique_cities = df['city'].unique()
# sql query
select * from city where name in unique_cities

Related

why does selecting statements in sqlite not fetching any info when it exists

I'm trying to select from table using where condition and using parameters like follows:
cur.execute("SELECT GSTIN,\"Taxable Value\",CGST,SGST FROM books")
books = cur.fetchall()
for book in books:
cur.execute('SELECT "GSTIN","Taxable Value", "CGST" ,"SGST" FROM twob WHERE GSTIN = ? AND "Taxable Value" = ? AND SGST = ? AND CGST = ?;',(book[0], book[1], book[3], book[2]))
print(cur.fetchall())
print(book[0], book[1], book[3], book[2])
Here books is extracted from another table and by using it i want to extract the same row where the following values are same still the print(cur.fetchall()) is empty and i've checked it mannually using sqlite by entering book values manually using the exact statement, please guide me what am i doing wrong here.
the result is as follows:
[]
ABC 123 133424 23
[]
tushar 120 4353 424
[]
okay 240 1 45
I figured it out, There was a replacement in the excel file i was retrieving the data from between columns "SGST" and "CGST" thats why it was printing an empty list

How to create a function with SQL in Python and create columns?

I´m accessing a Microsoft SQL Server database with pyodbc in Python and I have many tables regarding states and years. I´m trying to create a pandas.DataFrame with all of them, but I don't know how to create a function and still create columns specifying YEAR and STATE for each of these states and years (I'm using NY2000 as an example). How should I build that function or "if loop"? Sorry for the lack of clarity, it's my first post here :/
tables = tuple([NY2000DX,NY2001DX,NY2002DX,AL2000DX,AL2001DX,AL2002DX,MA2000DX,MA2001DX,MA2002DX])
jobs = tuple([55,120])
query = """ SELECT
ID,
Job_ID,
FROM {}
WHERE Job_ID IN {}
""".format(tables,jobs)
NY2000 = pd.read_sql(query, server)
NY2000["State"] = NY
NY2000["Year"] = 2000
My desirable result would be a DF with the information from all tables with columns specifing State and Year. Like:
Year
State
ID
Job_ID
2000
NY
13
55
2001
NY
20
55
2002
NY
25
55
2000
AL
15
120
2001
AL
60
120
2002
AL
45
120
------------
-------
--------
----------
Thanks for the support :)
I agree with the comments about a normalised database and you haven't posted the table structures either. I'm assuming the only way to know year and state is by the table name, if so then you can do something along these lines:
df=pd.DataFrame({"Year":[],"State":[],"ID":[],"JOB_ID":[]})
tables = ["NY2000DX2","NY2001DX","NY2002DX","AL2000DX","AL2001DX","AL2002DX","MA2000DX","MA2001DX","MA2002DX"]
jobs = tuple([55,120])
def readtables(tablename, jobsincluded):
query = """ SELECT
{} YEAR,
{} STATE,
ID,
Job_ID,
FROM {}
WHERE Job_ID IN {}
""".format(tablename[2:6],tablename[:2],tablename,jobsincluded)
return query
for table in tables:
print(readtables(table,jobs))
#dftable= pd.read_sql('readtables(table,jobs)', conn)
#df=pd.concat[df,dftable]
please note that I commented out the actual table reading and concatenation into the final dataframe, as I don't actually have a connection to test. I just printed the resulting queries as a proof of concept.

make column dataframe become sql query statement

I am using jupyter notebook to access Teradata database.
Assume I have a dataframe
Name Age
Sam 5
Tom 6
Roy 7
I want to let the whole column "Name" content become the WHERE condition of a sql query.
query = '''select Age
from xxx
where Name in (Sam, Tom, Roy)'''
age = pd.read_sql(query,conn)
How to format the column so that the whole column can be insert to the sql statement automatically instead of manually paste the column content?
Join the Name column and insert into the query using f-string:
query = f'''select Age
from xxx
where Name in ({", ".join(df.Name)})'''
print(query)
select Age
from xxx
where Name in (Sam, Tom, Roy)

SqlAlchemy with list in where clause [duplicate]

This question already has answers here:
How can I bind a list to a parameter in a custom query in SQLAlchemy?
(9 answers)
Closed 2 years ago.
my table in a database is as follow
Username city Type
Anna Paris abc
Marc london abc
erica rome AF
Sara Newyork cbd
silvia paris AD
I have a list contains string values
typelist = {'abc', 'cbd'}
and i want to query my database using sqlalchemy , to get data from a table where a column type equals the values in the list :
Username city Type
Anna Paris abc
Marc london abc
Sara Newyork cbd
im trying this code
sql = "SELECT * FROM table WHERE data IN :values"
query = sqlalchemy.text(sql).bindparams(values=tuple(typelist))
conn.engine.execute(query)
but it return just one value from the typelist not all the list values .
Username city Type
Sara Newyork cbd
sql = "SELECT * FROM table WHERE data IN :values"
query = sqlalchemy.text(sql).bindparams(sqlalchemy.bindparam("values", expanding=True))
conn.engine.execute(query, {"values": typelist})
Reference: https://docs.sqlalchemy.org/en/13/core/sqlelement.html#sqlalchemy.sql.expression.bindparam.params.expanding
My solution will work but you will need to format your string like this
sql = "SELECT * FROM table WHERE data IN ('data1', 'data2', 'data3')"
No need to use bind param here. Use this if you dont get any proper solution
You could use a dynamic SQL approach where you create a string from your list values and add the string to your SELECT statement.
queryList = ['abc', 'def']
def list_to_string(inList):
strTemp = """'"""
for x in inList:
strTemp += str(x) + """','"""
return strTemp[:-2]
sql = """SELECT * FROM table WHERE data in (""" + list_to_string(queryList) + """)"""
print(sql)

SQL SELECT where cell is a certain length and includes specific characters

I'm trying to create a SELECT statement that selects rows where NAME is max. 5 characters and the . is in the NAME.
I only want the first, so I'm including a LIMIT 1 to the statement.
I have worked with the following
searchstring = "."
sql = "SELECT * FROM Table WHERE NAME LIKE %s LIMIT 1"
val = (("%"+searchstring+"%"),)
cursor.execute(sql, val)
But I'm not sure how to incorporate the length of NAME in my statement.
My "Table" is as follows:
ID NAME
1 Jim
2 J.
3 Jonathan
4 Jack M.
5 M.S.
So based on the table above, I would expect row 2 and 5 to be selected.
I could select all, and loop through them. But as I only want the first, I'm thinking I would prefer a SQL statement?
Thanks in advance.
You can use CHAR_LENGTH function along with LIKE:
SELECT * FROM Table WHERE name LIKE '%.%' AND CHAR_LENGTH(name) <= 5 LIMIT 1
Try LEN()
Select LEN(result string);
This will return the length of string. but this will count spaces also. Try removing it with LTRIM().
Oracle SQL
SELECT * FROM Table WHERE name LIKE '%.%' AND LENGTH(name) < 6 and rownum < 2
Base on the sql language(oracle, mysql, sql server, etc) use
length() or char_length()
rownum or limit

Categories