pypyodbc and pandas printing Chinese characters on a Mac - python

I am trying to query a MS SQL Server database from a Mac using pandas and pypyodbc. My columns names are returned in Chinese characters. This does not happen when running the code from a Windows-based machine. I tried setting the display encoding properties, but that does not work. It doesn't appear to just be a display issue because I cannot reference the columns such as data['col'] as I get a KeyError: 'col'
import pandas as pd
import pypyodbc
import sys
pd.options.display.encoding = sys.stdout.encoding
connection = pypyodbc.connect('Driver={ODBC Driver 13 for SQL Server};'
'Server=server;'
'Database=database;'
'uid=username;'
'pwd=pw')
data = pd.read_sql("""SELECT * FROM dbo.Table""",con=connection)
print data

Related

Pandas read_sql reading Access Database: First row missing

I have an external database, pre MS Access 2007 that I am connecting to with pypyodbc and converting to dataframe with pandas. That by itself seems to work ok, however, the first line of data seems to be missing.
import pypyodbc
import pandas as pd
conn_str = (
r'DRIVER={Microsoft Access Driver (*.mdb)};'
r'DBQ=c:\path_to_my.mdb;'
)
cnxn = pypyodbc.connect(conn_str)
data = pd.read_sql("SELECT * FROM Table1 ORDER BY date DESC",cnxn)
cnxn.close()
#print a few lines and only specific columns
print(data.loc[:,('date','time')].head())
As a result I get
as my first line. The first entry in row 0 is from 04:00 o'clock.
In the databse itself, however, I have a more recent entry from 06:00
Why is the first line omitted in my read?

ImportError: Using URI string without sqlalchemy installed, while executing REGEXP function on pandas SQL API with SqlLite

I am trying to execute REGEXP Funtion of SqlLite using Pandas SQL API, but getting an error of
"ImportError: Using URI string without sqlalchemy installed."ohon
The python code is as follows :
import pandas as pd
import csv, sqlite3
import json, re
conn = sqlite3.connect(":memory:")
print(sqlite3.version)
print(sqlite3.sqlite_version)
def regexp(y, x, search=re.search):
return 1 if search(y, x) else 0
conn.create_function("regexp", 2, regexp)
df = pd.read_json("idxData1.json", lines=True)
df.to_sql("temp_log", conn, if_exists="append", index=False)
rsDf = pd.read_sql_query(
conn, """SELECT * from temp_log WHERE user REGEXP 'ph'""", chunksize=20,
)
for gendf in rsDf:
for item in gendf.to_dict(orient="records"):
print(item)
The error it throws is
raise ImportError("Using URI string without sqlalchemy installed.")
ImportError: Using URI string without sqlalchemy installed.
Can anyone suggest what I am missing. Please not that I have a specific requirement of using Pandas SQL API.
You get this error because you specified the parameters to read_sql_query in the wrong order. Specifically, the 1st parameter should be the query, and the connection comes second, like this:
rsDf = pd.read_sql_query(
"""SELECT * from temp_log WHERE user REGEXP 'ph'""", conn, chunksize=20,
)
You can simply just run the following command and install SQLAlchemy:
pip3 install SQLAlchemy
As #Xbel said:
Note that the error may be raised because the order of the parameters is wrong, e.g., adding a connection first and then the SQL statement. As it seems, it was the case. Note that installing SQLAlchemy does not help.

TypeError when try to load information form website

I have a problem with getting information from a website to my python program. Thats what I tried:
import pandas as pd
import pyodbc as odbc
import geopandas as gpd
import shapely
import shapefile
import sys
import datetime
server = '....'
database = '<database>'
username = '<username >'
password = '<password>'
driver={'ODBC Driver 13 for SQL Server'}
sql_conn = odbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=....;DATABASE='+database+';UID='+username+';PWD='+ password)
query = "select * from view;"
df = pd.read_sql(query, sql_conn)
df.head()
Error:
TypeError: can only concatenate str (not "set") to str
Does anybody know what I did wrong? I jusst want to collect the information and save it for further processing. I googled but could not find a mistake...
driver={'ODBC Driver 13 for SQL Server'} driver is a python set.
change to driver='ODBC Driver 13 for SQL Server'
See https://snakify.org/en/lessons/sets/#:~:text=Set%20in%20Python%20is%20a,union%2C%20intersection%2C%20difference).
You are trying to add str type to set type
'DRIVER='+driver+'
as you defined driver as string in curly braces, which means: set with given string as sole element. Probably you meant to define it as string, so delete braces; otherwise you, you can get (random) element from set using
next(iter(driver))

MySQL seems to change utf-8 encodings from '\xce\x94' to '\\xce\\x94'. How can I avoid this?

I am currently working on writing csv-tables into a sql database using python3 and MySQLdb.
This is part of the code I use:
import MySQLdb as mbd
import pandas as pd
from sqlalchemy import create_engine
import os
#establish a connection to mysql
conn = mbd.connect(host='localhost',
user = 'me',
passwd = '****',
use_unicode=True,
charset="utf8")
file_ = "./stackoverflow_example.tsv"
df = pd.read_csv(file_, sep = '\t', engine='python',
quotechar='"', decimal='.', encoding='utf-8')
df_name = os.path.basename(file_)[:-4]
df.name = df_name
engine = create_engine('mysql+mysqldb://me:****#localhost/me?charset=utf8')
df.to_sql(con = engine, name = df.name, if_exists = 'replace',
index = False)
A stated in the titel, the symbol "Δ" can be read by pandas and gets displayed as b'\xce\x94' (encoded).
When I run the code above I get an error message stating that "\xce\x94" is an invalid symbol.
OperationalError: (_mysql_exceptions.OperationalError) (1366, "Incorrect string value: '\\xCE\\x9463'
How can I avoid that the extra '\' get added while writing the table in the database?
I tested it with the following content for the tsv file:
(Sorry, I dont know how to display tsv content better)
Source_[First_Author] Year Mutations Mutations
Raamsdonk 2000 trp-1-Δ63 leu2-Δ1
MySQL version: mysql Ver 14.14 Distrib 5.7.24

PyMySQL with Python 3.5 - selecting into pandas dataframe with the LIKE clause fails due to escape characters?

I am using PyMySQL to fetch some data from the MySQL DB into pandas dataframe. I need to run a select with the LIKE clause, but seems like PyMySQL does something weird with the select statement and doesn't like when one has % in the query:
#connection to MySQL
engine = create_engine('mysql+pymysql://user:password#localhost:1234/mydb', echo=False)
#get descriptions we want
decriptions = pd.read_sql(sql=r"select content from listings where content not like '%The Estimate%'", con = engine)
I get error:
ValueError: unsupported format character 'T' (0x54) at index 54
Any advice on how to get around this?
Try using %%
decriptions = pd.read_sql(sql=r"select content
from listings where content not like '%%The Estimate%'", con = engine)

Categories