My code:
import yql
y = yql.Public()
query = 'SELECT * FROM yahoo.finance.option_contracts WHERE symbol="SPY"'
y.execute(query)
Result:
yql.YQLError: No definition found for Table yahoo.finance.option_contracts
I know that the table exists because I can test the query at http://developer.yahoo.com/yql/console/ and it works. What am I missing?
Update: I posted the url to the console but not the query I tried in the console. The query is now attached.
http://goo.gl/mNXwC
Since the yahoo.finance.option_contracts table is a Community Open Data Table you will want to include it as part of the environment for the query. The easiest way to do that is to load up the environment file for all community tables; just like clicking "Show Community Tables" in the YQL console.
One would normally do that by specifying an env=... parameter in the YQL query URL, or (as you have done) with a use clause in the query itself.
The Python library that you are using lets you pass in the environment file as an argument to execute().
import yql
y = yql.Public()
query = 'SELECT * FROM yahoo.finance.option_contracts WHERE symbol="SPY"'
y.execute(query, env="store://datatables.org/alltableswithkeys")
Here's an example of extending yql.Public to be able to define the default environment on instantiation.
class MyYql(yql.Public):
def __init__(self, api_key=None, shared_secret=None, httplib2_inst=None, env=None):
super(MyYql, self).__init__(api_key, shared_secret, httplib2_inst)
self.env = env if env else None
def execute(self, query, params=None, **kwargs):
kwargs["env"] = kwargs.get("env", self.env)
return super(MyYql, self).execute(query, params, **kwargs);
It can be used like:
y = MyYql(env="store://datatables.org/alltableswithkeys")
query = 'SELECT * FROM yahoo.finance.option_contracts WHERE symbol="SPY"'
r = y.execute(query)
You can still override the env in an individual call to y.execute() if you need to.
Amending query to the following is what works.
query = 'use "http://www.datatables.org/yahoo/finance/yahoo.finance.option_contracts.xml" as foo; SELECT * FROM foo WHERE symbol="SPY"'
More elegant solutions might exist. Please share if such do. Thanks.
Related
Here is some custom code I wrote that I think might be problematic for this particular use case.
class SQLServerConnection:
def __init__(self, database):
...
self.connection_string = \
"DRIVER=" + str(self.driver) + ";" + \
"SERVER=" + str(self.server) + ";" + \
"DATABASE=" + str(self.database) + ";" + \
"Trusted_Connection=yes;"
self.engine = sqlalchemy.create_engine(
sqlalchemy.engine.URL.create(
"mssql+pyodbc", \
query={'odbc_connect': self.connection_string}
)
)
# Runs a command and returns in plain text (python list for multiple rows)
# Can be a select, alter table, anything like that
def execute(self, command, params=False):
# Make a connection object with the server
with self.engine.connect() as conn:
# Can send some parameters along with a plain text query...
# could be single dict or list of dict
# Doc: https://docs.sqlalchemy.org/en/14/tutorial/dbapi_transactions.html#sending-multiple-parameters
if params:
output = conn.execute(sqlalchemy.text(command,params))
else:
output = conn.execute(sqlalchemy.text(command))
# Tell SQL server to save your changes (assuming that is applicable, is not with select)
# Doc: https://docs.sqlalchemy.org/en/14/tutorial/dbapi_transactions.html#committing-changes
try:
conn.commit()
except Exception as e:
#pass
warn("Could not commit changes...\n" + str(e))
# Try to consolidate select statement result into single object to return
try:
output = output.all()
except:
pass
return output
If I try:
cnxn = SQLServerConnection(database='MyDatabase')
cnxn.execute("SELECT * INTO [dbo].[MyTable_newdata] FROM [dbo].[MyTable] ")
or
cnxn.execute("SELECT TOP 0 * INTO [dbo].[MyTable_newdata] FROM [dbo].[MyTable] ")
Python returns this object without error, <sqlalchemy.engine.cursor.LegacyCursorResult at 0x2b793d71880>, but upon looking in MS SQL Server, the new table was not generated. I am not warned about the commit step failing with the SELECT TOP 0 way; I am warned ('Connection' object has no attribute 'commit') in the above way.
CREATE TABLE, ALTER TABLE, or SELECT (etc) appears to work fine, but SELECT * INTO seems to not be working, and I'm not sure how to troubleshoot further. Copy-pasting the query into SQL Server and running appears to work fine.
As noted in the introduction to the 1.4 tutorial here:
A Note on the Future
This tutorial describes a new API that’s released in SQLAlchemy 1.4 known as 2.0 style. The purpose of the 2.0-style API is to provide forwards compatibility with SQLAlchemy 2.0, which is planned as the next generation of SQLAlchemy.
In order to provide the full 2.0 API, a new flag called future will be used, which will be seen as the tutorial describes the Engine and Session objects. These flags fully enable 2.0-compatibility mode and allow the code in the tutorial to proceed fully. When using the future flag with the create_engine() function, the object returned is a subclass of sqlalchemy.engine.Engine described as sqlalchemy.future.Engine. This tutorial will be referring to sqlalchemy.future.Engine.
That is, it is assumed that the engine is created with
engine = create_engine(connection_url, future=True)
You are getting the "'Connection' object has no attribute 'commit'" error because you are creating an old-style Engine object.
You can avoid the error by adding future=True to your create_engine() call:
self.engine = sqlalchemy.create_engine(
sqlalchemy.engine.URL.create(
"mssql+pyodbc",
query={'odbc_connect': self.connection_string}
),
future=True
)
Use this recipe instead:
#!python
from sqlalchemy.sql import Select
from sqlalchemy.ext.compiler import compiles
class SelectInto(Select):
def __init__(self, columns, into, *arg, **kw):
super(SelectInto, self).__init__(columns, *arg, **kw)
self.into = into
#compiles(SelectInto)
def s_into(element, compiler, **kw):
text = compiler.visit_select(element)
text = text.replace('FROM',
'INTO TEMPORARY TABLE %s FROM' %
element.into)
return text
if __name__ == '__main__':
from sqlalchemy.sql import table, column
marker = table('marker',
column('x1'),
column('x2'),
column('x3')
)
print SelectInto([marker.c.x1, marker.c.x2], "tmp_markers").\
where(marker.c.x3==5).\
where(marker.c.x1.in_([1, 5]))
This needs some tweaking, hence it will replace all subquery selects as select INTOs, but test it for now, if it worked it would be better than raw text statments.
Have you tried this from this answer by #Michael Berkowski:
INSERT INTO assets_copy
SELECT * FROM assets;
The answer states that MySQL documentation states that SELECT * INTO isn't supported.
Imagine this is my query :
query = '''
SELECT *
FROM table
WHERE id = {{myid}}'''
params = {'myid':3}
j= JinJaSql(param_style='pyformat'}
myquery, bind_params = j.prepare_query(query,params)
when I print bind_params I would get
{'myid_1':3}
why my parameter name was changed to myid_1 while I named it myid. Is there anything wrong with my code? How can I fix it?
According to the readme on the JinjaSql github page, if using the "pyformat" or "named" param style, the bound parameters returned by prepare_query are guaranteed to be unique. I suspect that this is why myid gets changed to myid_1.
When calling read_sql you should make sure to use the myquery and bind_params returned by prepare_query and not the query and params used in the prepare_query call, i.e.
# ...
# your code above
# ...
myquery, bind_params = j.prepare_query(query, params)
result = pd.read_sql(myquery, conn, params=bind_params)
Note that the format of your params for pd.read_sql depends on the SQL dialect you're using (docs). If you're using postgres for example, dictionary params are the way to go.
Hope this helps!
Currently I can successfully connect and run queries from Python on HIVE using hive_utils doing the following:
import hive_utils
query = """ select * from table
where partition = x
"""
conn = hive_utils.HiveClient(server=x, port=10000,db='default')
a = conn.execute(query)
a = list(a)
Queries that include conditional statements however (and that work over HUE) such as:
query = """ select * from table
where partition = x
and app_id = y
"""
have returned this error:
HiveServerException: errorCode=1, message='Query returned non-zero
code:1' cause: FAILED: Execution Error return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask SQLState=’08S01’
Since i am not sending any kind of user information when i establish a connection, I suspect the error is due to the type of permissions available to whichever user is being set.
How do I identify myself as a particular user?
Please refer to the following:
Re: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask
Most relevant section is:
Go to this link :
http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_Hive.html
and add
hadoop-0.20-core.jar
hive/lib/hive-exec-0.7.1.jar
hive/lib/hive-jdbc-0.7.1.jar
hive/lib/hive-metastore-0.7.1.jar
hive/lib/hive-service-0.7.1.jar
hive/lib/libfb303.jar
lib/commons-logging-1.0.4.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar
to the class path of your project , add this jars from the lib of hadoop
and hive, and try the code. and also add the path of hadoop, hive, and
hbase(if your are using) lib folder path to the project class path, like
you have added the jars.
I have a weird issue, which is probably easy to resolve.
I have a class Database with an __init__ and an executeDictMore method (among others).
class Database():
def __init__(self, database, server,login, password ):
self.database = database
my_conv = { FIELD_TYPE.LONG: int }
self.conn = MySQLdb.Connection(user=login, passwd=password, db=self.database, host=server, conv=my_conv)
self.cursor = self.conn.cursor()
def executeDictMore(self, query):
self.cursor.execute(query)
data = self.cursor.fetchall()
if data == None :
return None
result = []
for d in data:
desc = self.cursor.description
dict = {}
for (name, value) in zip(desc, d) :
dict[name[0]] = value
result.append(dict)
return result
Then I instantiate this class in a file db_functions.py :
from Database import Database
db = Database()
And I call the executeDictMore method from a function of db_functions :
def test(id):
query = "SELECT * FROM table WHERE table_id=%s;" %(id)
return db.executeDictMore(query)
Now comes the weird part.
If I import db_functions and call db_functions.test(id) from a python console:
import db_functions
t = db_functions.test(12)
it works just fine.
But if I do the same thing from another python file I get the following error :
AttributeError: Database instance has no attribute 'executeDictMore'
I really don't understand what is going on here. I don't think I have another Database class interfering. And I append the folder where the modules are in sys.path, so it should call the right module anyway.
If someone has an idea, it's very welcome.
You have another Database module or package in your path somewhere, and it is getting imported instead.
To diagnose where that other module is living, add:
import Database
print Database.__file__
before the from Database import Database line; it'll print the filename of the module. You'll have to rename one or the other module to not conflict.
You could at least try to avoid SQL injection. Python provides such neat ways to do so:
def executeDictMore(self, query, data=None):
self.cursor.execute(query, data)
and
def test(id):
query = "SELECT * FROM table WHERE table_id=%s"
return db.executeDictMore(query, id)
are the ways to do so.
Sorry, this should rather be a comment, but an answer allows for better formatting. Iam aware that it doesn't answer your question...
You should insert (not append) into your sys.path if you want it first in the search path:
sys.path.insert(0, '/path/to/your/Database/class')
Im not too sure what is wrong but you could try passing the database object to the function as an argument like
db_functions.test(db, 12) with db being your Database class
Using Flask, I'm curious to know if SQLAlchemy is still the best way to go for querying my database with raw SQL (direct SELECT x FROM table WHERE ...) instead of using the ORM or if there is an simpler yet powerful alternative ?
Thank for your reply.
I use SQLAlchemy for direct queries all the time.
Primary advantage: it gives you the best protection against SQL injection attacks. SQLAlchemy does the Right Thing whatever parameters you throw at it.
I find it works wonders for adjusting the generated SQL based on conditions as well. Displaying a result set with multiple filter controls above it? Just build your query in a set of if/elif/else constructs and you know your SQL will be golden still.
Here is an excerpt from some live code (older SA version, so syntax could differ a little):
# Pull start and end dates from form
# ...
# Build a constraint if `start` and / or `end` have been set.
created = None
if start and end:
created = sa.sql.between(msg.c.create_time_stamp,
start.replace(hour=0, minute=0, second=0),
end.replace(hour=23, minute=59, second=59))
elif start:
created = (msg.c.create_time_stamp >=
start.replace(hour=0, minute=0, second=0))
elif end:
created = (msg.c.create_time_stamp <=
end.replace(hour=23, minute=59, second=59))
# More complex `from_` object built here, elided for example
# [...]
# Final query build
query = sa.select([unit.c.eli_uid], from_obj=[from_])
query = query.column(count(msg.c.id).label('sent'))
query = query.where(current_store)
if created:
query = query.where(created)
The code where this comes from is a lot more complex, but I wanted to highlight the date range code here. If I had to build the SQL using string formatting, I'd probably have introduced a SQL injection hole somewhere as it is much easier to forget to quote values.
After I worked on a small project of mine, I decided to try to just use MySQLDB, without SQL Alchemy.
It works fine and it's quite easy to use, here's an example (I created a small class that handles all the work to the database)
import MySQLdb
from MySQLdb.cursors import DictCursor
class DatabaseBridge():
def __init__(self, *args, **kwargs):
kwargs['cursorclass'] = DictCursor
self.cnx = MySQLdb.connect (**kwargs)
self.cnx.autocommit(True)
self.cursor = self.cnx.cursor()
def query_all(self, query, *args):
self.cursor.execute(query, *args)
return self.cursor.fetchall()
def find_unique(self, query, *args):
rows = self.query_all(query, *args);
if len(rows) == 1:
return rows[0]
return None
def execute(self, query, params):
self.cursor.execute(query, params)
return self.cursor.rowcount
def get_last_id(self):
return self.cnx.insert_id()
def close(self):
self.cursor.close()
self.cnx.close()
database = DatabaseBridge(**{
'user': 'user',
'passwd': 'password',
'db': 'my_db'
})
rows = database.query_all("SELECT id, name, email FROM users WHERE is_active = %s AND project = %s", (1, "My First Project"))
(It's a dumb example).
It works like a charm BUT you have to take these into consideration :
Multithreading is not supported ! It's ok if you don't work with multiprocessing from Python.
You won't have all the advantages of SQLAlchemy (Database to Class (model) wrapper, Query generation (select, where, order_by, etc)). This is the key point on how you want to work with your database.
But on the other hand, and like SQLAlchemy, there is protections agains't SQL injection attacks :
A basic query would be like this :
cursor.execute("SELECT * FROM users WHERE data = %s" % "Some value") # THIS IS DANGEROUS
But you should do :
cursor.execute("SELECT * FROM users WHERE data = %s", ("Some value",)) # This is secure!
Saw the difference ? Read again ;)
The difference is that I replaced %, by , : We pass the arguments as ... arguments to the execute, and these are escaped. When using %, arguments aren't escaped, enabling SQL Injection attacks!
The final word here is that it depends on your usage and what you plan to do with your project. For me, SQLAlchemy was on overkill (it's a basic shell script !), so MysqlDB was perfect.