I am trying to print the results of the joined table from postgresql to python. However when I try to print the results, the table shows up but I receive NaN data. Can someone help?
conn = psy.connect( dbname = "funda_project", host = "localhost", user =
"postgres", password = "ledidhima2021.")
cursor = conn.cursor()
conn.commit()
createjointable2 = '''SELECT(
distance_data."Municipality",
distance_data."Childcare/Nursery",
distance_data."Leisure/Culture/Library",
sales_details."Purchase_price",
sales_details."Publication_date",
sales_details."Date_of_signature",
house_details."Type_of_house",
house_details."Object_categorie",
house_details."Construction_year",
house_details."Energy_label_class",
demo_data."Age_Group_Relation_(15-20)",
demo_data."Age_Group_Relation_(20-25)",
demo_data."Age_Group_Relation_(25-45)")
FROM "distance_data"
INNER JOIN "zip_data"
ON "distance_data"."Municipality" = "zip_data"."Municipality"
INNER JOIN "demo_data"
ON "zip_data"."Municipality" = "demo_data"."Municipality"
INNER JOIN "sales_details"
ON "zip_data"."globalId" = "sales_details"."GlobalID"
INNER JOIN "house_details"
ON "zip_data"."globalId" = "house_details"."GlobalID"
;'''
cursor.execute(createjointable2);
from pandas import DataFrame
eri= pd.DataFrame(cursor.fetchall())
datalist = list(eri)
results = pd.DataFrame (eri, columns = ["Municipality", "Childcare/Nursery",
"Leisure/Culture/Library", "Purchase_price", "Publication_date", "Date_of_signature",
"Type_of_house", "Object_categorie", "Construction_year", "Energy_label_class",
"Age_Group_Relation_(15-20)", "Age_Group_Relation_(20-25)", "Age_Group_Relation_(25-45)"])
results
Pandas has a built-in SQL query reading function pd.read_sql_query(query, connection), which assign the returned table value to a dataframe.
dataframe = pd.read_sql_query("SELECT * FROM table;", conn)
conn being the connection object you created and is also in your code.
Another way is almost what you tried as well:
from pandas import DataFrame
df = DataFrame(cursor.fetchall())
df.columns = cursor.keys()
Related
I'm trying to retrieve the data using an API call and use the data to do the Collaborative Filtering
Here's my Python code:
def getRecommendations(databaseLocation, uid):
con = sqlite3.connect(databaseLocation)
query = '''SELECT * FROM viewTABLE;'''
df = pd.read_sql_query("SELECT * FROM viewTABLE;", con)
dataPrep = df[['uid', 'recipeId', 'isView']]
print(df)
con.close()
return get_recommendations(dataPrep).loc[[uid]].values.tolist()[0]
It returns:
Empty DataFrame
Columns: [id, uid, recipeId, isView]
Index: []
It's definitely not SQLite connection problem since it returns correct columns but without any data in it
I have a database that contains multiple tables, and I am trying to import each table as a pandas dataframe. I can do this for a single table as follows:
import pandas as pd
import pandas.io.sql as psql
import pypyodbc
conn = pypyodbc.connect("DRIVER={SQL Server};\
SERVER=serveraddress;\
UID=uid;\
PWD=pwd;\
DATABASE=db")
df1 = psql.read_frame('SELECT * FROM dbo.table1', conn)
The number of tables in the database will change, and at any time I would like to be able to import each table into its own dataframe. How can I get all of these tables into pandas?
Depending on your SQL server, you can inspect the tables in a database.
For example:
tables_df = pd.read_sql('SELECT table_name FROM database_name', conn)
Now your table names are accessible as a pandas data frame, you just need to parse it out:
table_name_list = tables_df.table_name
select_template = 'SELECT * FROM {table_name}'
frames_dict = {}
for tname in table_name_list:
query = select_template.format(table_name = tname)
frames_dict[tname] = pd.read_sql(query, conn)
Your dictionary frames_dict contains all the dataframes with the table_name as the key
I was trying to read some data from a text file and write it down in a Sql server table using Pandas Module and FOR LOOP. Below is my code..
import pandas as pd
import pyodbc
driver = '{SQL Server Native Client 11.0}'
conn = pyodbc.connect(
Trusted_Connection = 'Yes',
Driver = driver,
Server = '***********',
Database = 'Sullins_Data'
)
def createdata():
cursor = conn.cursor()
cursor.execute(
'insert into Sullins_Datasheet(Part_Number,Web_Link) values(?,?);',
(a,j))
conn.commit()
a = pd.read_csv('check9.txt',header=None, names=['Part_Number','Web_Links'] ) # 2 Columns, 8 rows
b = pd.DataFrame(a)
p_no = (b['Part_Number'])
w_link = (b['Web_Links'])
# print(p_no)
for i in p_no:
a = i
for l in w_link:
j = l
createdata()
As you can see from the code that I have created 2 variables a and j to hold the value of both the columns of the text file one by one and write it in the sql table.
But after running the code I have got only the last row value in the table out of 8 rows.
When I used createdate function inside w_link for loop, it write the duplicate value in the table.
Please suggest where I am doing wrong.
here is sample of how your code is working
a = 0
b = 0
ptr=['s','d','f','e']
pt=['a','b','c','d']
for i in ptr:
a=i
print(a,end='')
for j in pt:
b=j
print(b,end='')
I wrote a script which first runs a SQL query to get the data from Redshift (via Databricks). Then, I want to display it in a pandas data frame. The problem is that somehow the names of the columns were removes/are not displayed. Why?
#SQL Query
query = """
SELECT * FROM table1 limit 1;
"""
# Execute the query
try:
cursor.execute(query)
except OperationalError as msg:
print ("Command skipped: ")
#Fetch all rows from the result
rows = cursor.fetchall()
# Convert into a Pandas Dataframe
df = pd.DataFrame( [[ij for ij in i] for i in rows] )
df.head()
Output:
As you can see, the column names turned into numbers (in yellow). The intent was to display column name 1: Customer_id, column name 2: Purchases, column name 3: Product_id etc.
I appreciate any help. Thanks!
As suggested by #Chris you can use pd.read_sql in the following way:-
query = """SELECT * FROM table1 limit 1;"""
connection = psycopg2.connect(user = 'your_username',
password = 'password',
host = 'host_ip',
port = 5432,
database = 'db_name')
data = pd.read_sql(sql=query, con=connection)
Now when you will print your data it will show the column names as well!
I have tried many different things to pull the data from Access and put it into a neat data frame. right now my code looks like this.
from pandas import DataFrame
import numpy as np
import pyodbc
from sqlalchemy import create_engine
db_file = r'C:\Users\username\file.accdb'
user = 'user'
password = 'pw'
odbc_conn_str = 'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=%s;UID=%s;PWD=%s' % (db_file, user, password)
conn = pyodbc.connect(odbc_conn_str)
cur = conn.cursor()
qry = cur.execute("SELECT * FROM table WHERE INST = '796116'")
dataf = DataFrame(qry.fetchall())
print(dataf)
this puts the data into a data frame but the second row is a list. I need the snippet below to be in 4 separate columns, not 2 with a list.
0 (u'RM257095', u'c1', u'796116')
1 (u'RM257097', u'c2', u'796116')
2 (u'RM257043', u'c3', u'796116')
3 (u'RM257044', u'c4', u'796116')
I have used modules like kdb_utils which has a read_query function and it pulled the data from kdb and separated it into a neat dataframe. Is there anything like this for access or another way to pull the data and neatly put it into a data frame?
Consider using pandas' direct read_sql method:
import pyodbc
import pandas as pd
...
cnxn = pyodbc.connect('DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ=' + \
'{};Uid={};Pwd={};'.format(db_file, user, password)
query = "SELECT * FROM mytable WHERE INST = '796116'"
dataf = pd.read_sql(query, cnxn)
cnxn.close()