Replace mysql.connector-function with reading csv/tsv data - python

I would like to run a large python script programmed by someone else (https://github.com/PatentsView/PatentsView-Disambiguation to be precise). At numerous times during the execution of this code, python connects to a mysql database in order to read some data. The problem is that I
a) cannot get a mysql installation on the server I use. Being only a guest at the institution whose computers I use, I cannot really influence IT to change this
b) would like to alter the code as little as possible, since I have very little python experience
The original code uses a function granted_table() that returns a mysql.connector.connect(...), where host, user, password etc. are given in the dots. My idea is to switch out this function with one that reads in TSV files instead, which I have all stored on my machine. This way, I do not have to go over the long script and fiddle around with it (I have almost no experience with python and might mess something up without realizing it).
I have tried a few things already and it almost works, but not quite. The reproducible example follows:
First, there is the original function that accesses mysql, which has to go
def granted_table(config):
if config['DISAMBIGUATION']['granted_patent_database'].lower() == 'none':
logging.info('[granted_table] no db given')
return None
else:
logging.info('[granted_table] trying to connect to %s', config['DISAMBIGUATION']['granted_patent_database'])
return mysql.connector.connect(host=config['DATABASE']['host'],
user=config['DATABASE']['username'],
password=config['DATABASE']['password'],
database=config['DISAMBIGUATION']['granted_patent_database'])
Second, the function that tries to call the above function to read in data (I shortened and simplified it but kept the structure as I understand it. This version of the function actually only prints whatever data it receives): I do not want to change this if at all possible
def build_granted(config):
cnx = granted_table(config)
cursor = cnx.cursor()
query = "SELECT uuid, patent_id, assignee_id, rawlocation_id, type, name_first, name_last, organization, sequence FROM rawassignee"
cursor.execute(query)
for rec in cursor:
print(rec)
Third, there is my attempt at a new function granted_table(config) which will behave like the old one but not access mysql
def granted_table(config):
return(placeholder())
class placeholder:
def cursor(self):
return conn_to_data()
class conn_to_data:
def execute(self,query): # class method
#define folder etc
print(query)
filename = "foo.tsv"
path ="/share/bar/"
file=open((path+filename))
print(file.read(100))
return file
Now, if I execute all of the above and then run
config="test"
build_granted(config)
the code breaks with
"Type Error: conn_to_data object is not iterable",
so it does not access the actual data as intended. However, if I change the line cursor.execute(query) to cursor=cursor.execute(query) it works as intended. I could do that if there is really no better solution, but it seems like there must be. I am a python newb, so maybe there even is a simple function/class that is intended for this and one of you can point it out. You might realize that the code currently always opens the same file, without actually interpreting the query. This is something I will still have to fix later, once it even works at all. It seems to me to be a somewhat messy but conceptually simple string-parsing problem, but if some of you have a smart idea, it would also be a huge help.
Thanks in advance!

Related

Flyte 0.16.2: Error loading Blob - How to get Types.Blob.fetch() to work in task decorated function?

I have a Flyte task function like this:
#task
def do_stuff(framework_obj):
framework_obj.get_outputs() # This calls Types.Blob.fetch(some_uri)
Trying to load a blob URI using flytekit.sdk.types.Types.Blob.fetch, but getting this error:
ERROR:flytekit: Exception when executing No temporary file system is present. Either call this method from within the context of a task or surround with a 'with LocalTestFileSystem():' block. Or specify a path when calling this function. Note: Cleanup is not automatic when a path is specified.
I can confirm I can load blobs using with LocalTestFileSystem(), in tests, but when actually trying to run a workflow, I'm not sure why I'm getting this error, as the function that calls blob-processing is decorated with #task so it's definitely a Flyte Task. I also confirmed that the task node exists on the Flyte web console.
What path is the error referencing and how do I call this function appropriately?
Using Flyte Version 0.16.2
Could you please give a bit more information about the code? This is flytekit version 0.15.x? I'm a bit confused since that version shouldn't have the #task decorator. It should only have #python_task which is an older API. If you want to use the new python native typing API you should install flytekit==0.17.0 instead.
Also, could you point to the documentation you're looking at? We've updated the docs a fair amount recently, maybe there's some confusion around that. These are the examples worth looking at. There's also two new Python classes, FlyteFile and FlyteDirectory that have replaced the Blob class in flytekit (though that remains what the IDL type is called).
(would've left this as a comment but I don't have the reputation to yet.)
Some code to help with fetching outputs and reading from a file output
#task
def task_file_reader():
client = SynchronousFlyteClient("flyteadmin.flyte.svc.cluster.local:81", insecure=True)
exec_id = WorkflowExecutionIdentifier(
domain="development",
project="flytesnacks",
name="iaok0qy6k1",
)
data = client.get_execution_data(exec_id)
lit = data.full_outputs.literals["o0"]
ctx = FlyteContext.current_context()
ff = TypeEngine.to_python_value(ctx, lv=lit,
expected_python_type=FlyteFile)
with open(ff, 'rb') as fh:
print(fh.readlines())

How can i accept and run user's code securely on my web app?

I am working on a django based web app that takes python file as input which contains some function, then in backend i have some lists that are passed as parameters through the user's function,which will generate a single value output.The result generated will be used for some further computation.
Here is how the function inside the user's file look like :
def somefunctionname(list):
''' some computation performed on list'''
return float value
At present the approach that i am using is taking user's file as normal file input. Then in my views.py i am executing the file as module and passing the parameters with eval function. Snippet is given below.
Here modulename is the python file name that i had taken from user and importing as module
exec("import "+modulename)
result = eval(f"{modulename}.{somefunctionname}(arguments)")
Which is working absolutely fine. But i know this is not the secured approach.
My question , Is there any other way through which i can run users file securely as the method that i am using is not secure ? I know the proposed solutions can't be full proof but what are the other ways in which i can run this (like if it can be solved with dockerization then what will be the approach or some external tools that i can use with API )?
Or if possible can somebody tell me how can i simply sandbox this or any tutorial that can help me..?
Any reference or resource will be helpful.
It is an important question. In python sandboxing is not trivial.
It is one of the few cases where the question which version of python interpreter you are using. For example, Jyton generates Java bytecode, and JVM has its own mechanism to run code securely.
For CPython, the default interpreter, originally there were some attempts to make a restricted execution mode, that were abandoned long time ago.
Currently, there is that unofficial project, RestrictedPython that might give you what you need. It is not a full sandbox, i.e. will not give you restricted filesystem access or something, but for you needs it may be just enough.
Basically the guys there just rewrote the python compilation in a more restricted way.
What it allows to do is to compile a piece of code and then execute, all in a restricted mode. For example:
from RestrictedPython import safe_builtins, compile_restricted
source_code = """
print('Hello world, but secure')
"""
byte_code = compile_restricted(
source_code,
filename='<string>',
mode='exec'
)
exec(byte_code, {__builtins__ = safe_builtins})
>>> Hello world, but secure
Running with builtins = safe_builtins disables the dangerous functions like open file, import or whatever. There are also other variations of builtins and other options, take some time to read the docs, they are pretty good.
EDIT:
Here is an example for you use case
from RestrictedPython import safe_builtins, compile_restricted
from RestrictedPython.Eval import default_guarded_getitem
def execute_user_code(user_code, user_func, *args, **kwargs):
""" Executed user code in restricted env
Args:
user_code(str) - String containing the unsafe code
user_func(str) - Function inside user_code to execute and return value
*args, **kwargs - arguments passed to the user function
Return:
Return value of the user_func
"""
def _apply(f, *a, **kw):
return f(*a, **kw)
try:
# This is the variables we allow user code to see. #result will contain return value.
restricted_locals = {
"result": None,
"args": args,
"kwargs": kwargs,
}
# If you want the user to be able to use some of your functions inside his code,
# you should add this function to this dictionary.
# By default many standard actions are disabled. Here I add _apply_ to be able to access
# args and kwargs and _getitem_ to be able to use arrays. Just think before you add
# something else. I am not saying you shouldn't do it. You should understand what you
# are doing thats all.
restricted_globals = {
"__builtins__": safe_builtins,
"_getitem_": default_guarded_getitem,
"_apply_": _apply,
}
# Add another line to user code that executes #user_func
user_code += "\nresult = {0}(*args, **kwargs)".format(user_func)
# Compile the user code
byte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")
# Run it
exec(byte_code, restricted_globals, restricted_locals)
# User code has modified result inside restricted_locals. Return it.
return restricted_locals["result"]
except SyntaxError as e:
# Do whaever you want if the user has code that does not compile
raise
except Exception as e:
# The code did something that is not allowed. Add some nasty punishment to the user here.
raise
Now you have a function execute_user_code, that receives some unsafe code as a string, a name of a function from this code, arguments, and returns the return value of the function with the given arguments.
Here is a very stupid example of some user code:
example = """
def test(x, name="Johny"):
return name + " likes " + str(x*x)
"""
# Lets see how this works
print(execute_user_code(example, "test", 5))
# Result: Johny likes 25
But here is what happens when the user code tries to do something unsafe:
malicious_example = """
import sys
print("Now I have the access to your system, muhahahaha")
"""
# Lets see how this works
print(execute_user_code(malicious_example, "test", 5))
# Result - evil plan failed:
# Traceback (most recent call last):
# File "restr.py", line 69, in <module>
# print(execute_user_code(malitious_example, "test", 5))
# File "restr.py", line 45, in execute_user_code
# exec(byte_code, restricted_globals, restricted_locals)
# File "<user_code>", line 2, in <module>
#ImportError: __import__ not found
Possible extension:
Pay attention that the user code is compiled on each call to the function. However, it is possible that you would like to compile the user code once, then execute it with different parameters. So all you have to do is to save the byte_code somewhere, then to call exec with a different set of restricted_locals each time.
EDIT2:
If you want to use import, you can write your own import function that allows to use only modules that you consider safe. Example:
def _import(name, globals=None, locals=None, fromlist=(), level=0):
safe_modules = ["math"]
if name in safe_modules:
globals[name] = __import__(name, globals, locals, fromlist, level)
else:
raise Exception("Don't you even think about it {0}".format(name))
safe_builtins['__import__'] = _import # Must be a part of builtins
restricted_globals = {
"__builtins__": safe_builtins,
"_getitem_": default_guarded_getitem,
"_apply_": _apply,
}
....
i_example = """
import math
def myceil(x):
return math.ceil(x)
"""
print(execute_user_code(i_example, "myceil", 1.5))
Note that this sample import function is VERY primitive, it will not work with stuff like from x import y. You can look here for a more complex implementation.
EDIT3
Note, that lots of python built in functionality is not available out of the box in RestrictedPython, it does not mean it is not available at all. You may need to implement some function for it to become available.
Even some obvious things like sum or += operator are not obvious in the restricted environment.
For example, the for loop uses _getiter_ function that you must implement and provide yourself (in globals). Since you want to avoid infinite loops, you may want to put some limits on the number of iterations allowed. Here is a sample implementation that limits number of iterations to 100:
MAX_ITER_LEN = 100
class MaxCountIter:
def __init__(self, dataset, max_count):
self.i = iter(dataset)
self.left = max_count
def __iter__(self):
return self
def __next__(self):
if self.left > 0:
self.left -= 1
return next(self.i)
else:
raise StopIteration()
def _getiter(ob):
return MaxCountIter(ob, MAX_ITER_LEN)
....
restricted_globals = {
"_getiter_": _getiter,
....
for_ex = """
def sum(x):
y = 0
for i in range(x):
y = y + i
return y
"""
print(execute_user_code(for_ex, "sum", 6))
If you don't want to limit loop count, just use identity function as _getiter_:
restricted_globals = {
"_getiter_": labmda x: x,
Note that simply limiting the loop count does not guarantee security. First, loops can be nested. Second, you cannot limit the execution count of a while loop. To make it secure, you have to execute unsafe code under some timeout.
Please take a moment to read the docs.
Note that not everything is documented (although many things are). You have to learn to read the project's source code for more advanced things. Best way to learn is to try and run some code, and to see what kind function is missing, then to see the source code of the project to understand how to implement it.
EDIT4
There is still another problem - restricted code may have infinite loops. To avoid it, some kind of timeout is required on the code.
Unfortunately, since you are using django, that is multi threaded unless you explicitly specify otherwise, simple trick for timeouts using signeals will not work here, you have to use multiprocessing.
Easiest way in my opinion - use this library. Simply add a decorator to execute_user_code so it will look like this:
#timeout_decorator.timeout(5, use_signals=False)
def execute_user_code(user_code, user_func, *args, **kwargs):
And you are done. The code will never run more than 5 seconds.
Pay attention to use_signals=False, without this it may have some unexpected behavior in django.
Also note that this is relatively heavy on resources (and I don't really see a way to overcome this). I mean not really crazy heavy, but it is an extra process spawn. You should hold that in mind in your web server configuration - the api which allows to execute arbitrary user code is more vulnerable to ddos.
For sure with docker you can sandbox the execution if you are careful. You can restrict CPU cycles, max memory, close all network ports, run as a user with read only access to the file system and all).
Still,this would be extremely complex to get it right I think. For me you shall not allow a client to execute arbitrar code like that.
I would be to check if a production/solution isn't already done and use that. I was thinking that some sites allow you to submit some code (python, java, whatever) that is executed on the server.

Can you use mock_open to simulate serial connections?

Morning folks,
I'm trying to get a few unit tests going in Python to confirm my code is working, but I'm having a real hard time getting a Mock anything to fit into my test cases. I'm new to Python unit testing, so this has been a trying week thus far.
The summary of the program is I'm attempting to do serial control of a commercial monitor I got my hands on and I thought I'd use it as a chance to finally use Python for something rather than just falling back on one of the other languages I know. I've got pyserial going, but before I start shoving a ton of commands out to the TV I'd like to learn the unittest part so I can write for my expected outputs and inputs.
I've tried using a library called dummyserial, but it didn't seem to be recognising the output I was sending. I thought I'd give mock_open a try as I've seen it works like a standard IO as well, but it just isn't picking up on the calls either. Samples of the code involved:
def testSendCmd(self):
powerCheck = '{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK']).encode('utf-8')
read_text = 'Stuff\r'
mo = mock_open(read_data=read_text)
mo.in_waiting = len(read_text)
with patch('__main__.open', mo):
with open('./serial', 'a+b') as com:
tv = SharpTV(com=com, TVID=999, tvInput = 'DVI')
tv.sendCmd(SharpCodes['POWER'], SharpCodes['CHECK'])
com.write(b'some junk')
print(mo.mock_calls)
mo().write.assert_called_with('{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK']).encode('utf-8'))
And in the SharpTV class, the function in question:
def sendCmd(self, type, msg):
sent = self.com.write('{0}{1:>4}\r'.format(type,msg).encode('utf-8'))
print('{0}{1:>4}\r'.format(type,msg).encode('utf-8'))
Obviously, I'm attempting to control a Sharp TV. I know the commands are correct, that isn't the issue. The issue is just the testing. According to documentation on the mock_open page, calling mo.mock_calls should return some data that a call was made, but I'm getting just an empty set of []'s even in spite of the blatantly wrong com.write(b'some junk'), and mo().write.assert_called_with(...) is returning with an assert error because it isn't detecting the write from within sendCmd. What's really bothering me is I can do the examples from the mock_open section in interactive mode and it works as expected.
I'm missing something, I just don't know what. I'd like help getting either dummyserial working, or mock_open.
To answer one part of my question, I figured out the functionality of dummyserial. The following works now:
def testSendCmd(self):
powerCheck = '{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK'])
com = dummyserial.Serial(
port='COM1',
baudrate=9600,
ds_responses={powerCheck : powerCheck}
)
tv = SharpTV(com=com, TVID=999, tvInput = 'DVI')
tv.sendCmd(SharpCodes['POWER'], SharpCodes['CHECK'])
self.assertEqual(tv.recv(), powerCheck)
Previously I was encoding the dictionary values as utf-8. The dummyserial library decodes whatever you write(...) to it so it's a straight string vs. string comparison. It also encodes whatever you're read()ing as latin1 on the way back out.

Baffling python mysql connection issue

Hello guys I am currently working in a python project at my school. First I want to make clear that I'm not a python programmer (I was just called to put out the flames in this project because no one else would and I was brave enough to say yes).
I have the following problem here. I have to write a method that connects to an existing localhost MySQL database (I'm using connector version 1.0.12 and python 2.6) and then does pretty basic stuff. The parameters are sent by a GTK-written GUI (I didn't write that interface). So I wrote my method like this:
def compMySQL(self, user, database, password, db_level, table_level, column_level):
sql_page_textview = self.mainTree.get_widget('sql_text_view')
sql_page_textview.modify_font(pango.FontDescription("courier 10"))
sql_page_buffer = sql_page_textview.get_buffer()
#Gonna try connecting to DB
try:
print("Calling conn with U:{0} P:{1} DB:{2}".format(user,password,database))
cnxOMC = mysql.connector.connect(user, password,'localhost',database)
except:
print "Error: Database connection failed. User name or Database name may be wrong"
return
#More code ...
But when I run my code I get this:
Calling conn with U:root P:PK17LP12r DB:TESTERS
Error: Database connection failed. User name or Database name may be wrong
And I don't know why, since the arguments sent are the same arguments that get printed (telling me that the GUI the other guy coded works fine) and they are valid login parameters. If I hardcode the login parameters directly insetad of using the GUI everything goes ok and the functions executes properly; the following code executes nice and smooth:
def compMySQL(self, user, database, password, db_level, table_level, column_level):
sql_page_textview = self.mainTree.get_widget('sql_text_view')
sql_page_textview.modify_font(pango.FontDescription("courier 10"))
sql_page_buffer = sql_page_textview.get_buffer()
#Gonna try hardcoding
try:
#print("Calling conn with U:{0} P:{1} DB:{2}".format(user,password,database))
cnxOMC = mysql.connector.connect(user="root", password='PK17LP12r', host='localhost', database='TESTERS')
print 'No prob with conn'
except:
print "Error: Database connection failed. User name or Database name may be wrong"
return
#more code ...
Console output:
No prob with conn
Any ideas guys? This one is killing me. I'm just learning Python but I imagine the problem to be something very easy for a seasoned python developer so any help would be strongly appreciated.
Thanks in advance.
The difference between the two versions is not that you are hard coding the parameters in the second one, it is that you are calling by keyword args rather than positional ones. The docs for MySQL connector don't seem to give the actual positional order, and there's no reason to think they are in the order you've given, so looks like you should always call by kwarg:
cnxOMC = mysql.connector.connect(user=user, password=password,host=host,database=database)

logging values from arduino to postgres db

I have a temperature sensor ([LM35][1]) interfaced with an Arduino board and my [sketch][2] is able to log values to the serial port, say /dev/ttyACM0 in Ubuntu, and I was able to install pySerial and log the temperature values into a file... I used the command
python -m serial.tools.miniterm /dev/ttyACM0 >> templogger.csv
So it will log values like
27
28
27
into the templogger.csv file.
Instead of logging into the csv file Can I log these values directily to a postresql database say Db1 with username 'abc' and password 'xyz'. Is it possible through python, Can you please help by providing the required script
Start with psycopg2, which implements the Python database API.
You'll probably want to start with simple individual INSERTs, but later on you'll be able to batch work into transactions or use COPY to improve write performance.
In general, it's better to post questions for more specific problems than this. See Stack Overflow Help. If you're stuck on something it's fine to ask for help, but it's preferable that you try to figure it out first. You'll get better results if your questions are more like: "I've tried to use psycopg2 to connect my Python program to PostgreSQL so I can write sensor logging to the database, but when I INSERT a row I get [exact error text here]. Searching for the error message hasn't helped me, so I'm a bit stuck."
In other words, try it, and if you get stuck, ask. I'd normally just close-vote a question like this as "too localized" or "not a real question", which would give you an automatic link to the question writing guide, but you're new here and you've made an effort to write a decent question so I'm trying to explain in a bit more detail.
Instead of logging into the csv file Can I log these values directily to a postresql database say Db1 with username 'abc' and password 'xyz'. Is it possible through python, Can you please help by providing the required script?
Have a good look at a SQLAlchemy tutorial, and try to make a code that can works for you. Basically, you can extrapolate from the example:
from serial import Serial
from sqlalchemy import *
class Temperature(object):
pass
class TemperatureStore:
def __init__(self, dburl):
self.db = create_engine('postgresql://%s' % dburl)
self.db.echo = True
metadata = BoundMetaData(db)
temperatures = Table('temperatures', metadata, autoload=True)
temperaturemapper = mapper(Temperature, temperatures)
self.session = create_session()
def push_temperature(self, temp):
temp = Temperature()
temp.value = temp
session.save(temp)
def main():
ser = Serial(port='/dev/ttyXXX', baudrate=115200)
tempstore = TemperatureStore('abc:xyz#localhost:5432/Db1')
while True:
tempstore.push_temperature(ser.read())
This is just an example that may or may not work (I did not test it, just wrote it in 5 minutes), so you'd better get some inspiration from that, and try to work out your own!

Categories