passing information from one script to another - python

I have two python scripts, scriptA and scriptB, which run on Unix systems. scriptA takes 20s to run and generates a number X. scriptB needs X when it is run and takes around 500ms. I need to run scriptB everyday but scriptA only once every month. So I don't want to run scriptA from scriptB. I also don't want to manually edit scriptB each time I run scriptA. I thought of updating a file through scriptA but I'm not sure where such a file can be placed ideally so that scriptB can read it later; independent of the location of these two scripts. What is the best way of storing this value X in an Unix system so that it can be used later by scriptB?

Many programs in Linux/Unix keep config in /etc/ and use subfolder in /var/ for other files.
But probably you could need root privilages.
If you run script in your home folder than you could create file ~/.scripB.rc or folder ~/.scriptB/ or ~/.config/scriptB/
See also on wikipedia Filesystem Hierarchy Standard

It sounds like you want to serialize ScriptA's results, save it in a file or database somewhere, then have ScriptB read those results (possibly also modifying the file or updating the database entry to indicate that those results have now been processed).
To make that work you need for ScriptA and ScriptB to agree on the location and format of the data ... and you might want to implement some sort of locking to ensure that ScriptB doesn't end up with corrupted inputs if it happens to be run at the same time that ScriptA is writing or updating the data (and, conversely, that ScriptA doesn't corrupt the data store by writing thereto while ScriptB is accessing it).
Of course ScriptA and ScriptB could each have a filename or other data location hard-coded into their sources. However, that would violation the DRY Principle. So you might want them to share a configuration file. (Of course the configuration filename is also repeated in these sources ... or at least the import of the common bit of configuration code ... but the latter still ensures that an installation/configuration detail (location and, possibly, format, of the data store) is decoupled from the source code. Thus it can be changed (in the shared config) without affecting the rest of the code for either script.
As for precisely which type of file and serialization to use ... that's a different question.
These days, as strange as it may sound, I'd would suggest using SQLite3. It may seem like over-kill to use an SQL "database" for simply storing a single value. However, SQLite3 is included in the Python standard libraries, and it only needs a filename for configuration.
You could also use a pickle or JSON or even YAML (which would require a third party module) ... or even just text or some binary representation using something like struct. However, any of those will require that you parse your results and deal with any parsing or formatting errors. JSON would be the simplest option among these alternatives. Additionally you'd have to do your own file locking and handling if you wanted ScriptA and ScriptB (and, potentially, any other scripts you ever write for manipulating this particular data) to be robust against any chance of concurrent operations.
The advantage of SQLite3 is that it handles the parsing and decoding and the locking and concurrency for you. You create the table once (perhaps embedded in ScriptA as a rarely used "--initdb" option for occasions when you need to recreate the data store). Your code to read it might look as simple as:
#!/usr/bin/python
import sqlite3
db = sqlite3.connect('./foo.db')
cur = db.cursor()
results = cur.execute(SELECT value, MAX(date) FROM results').fetchone()[0]
... and writing a new value would look a bit like:
#!/usr/bin/python
# (Same import, db= and cur= from above)
with db:
cur.execute('INSERT INTO results (value) VALUES (?)', (myvalue,))
All of this assuming you had, at some time, initialized the data store (foo.db in this example) with something like:
#!/usr/bin/python
# (Same import, db= and cur= from above)
with db:
cur.execute('CREATE TABLE IF NOT EXISTS results (value INTEGER NOT NULL, date TIMESTAMP DEFAULT current_timestamp)')
(Actually you could just execute that command every time if you wanted your scripts to recovery silently from cleaning out the old data).
This might seem like more code than a JSON file-based approach. However, SQLite3 is providing ACID(transactional) semantics as well as abstracting away the serialization and deserialization.
Also note that I'm glossing over a few details. My example above are actually creating a whole table of results, with timestamps for when they were written to your datastore. These would accumulate over time and, if you were using this approach, you'd periodically want to clean up your "results" table with a command like:
#!/usr/bin/python
# (Same import, db= and cur= from above)
with db:
cur.execute('DELETE FROM results where date < ?', cur.execute('SELECT MAX(date) from results').fetchone())
Alternatively if you really never want to have access to your prior results that change from INSERT into UPDATE like so:
#!/usr/bin/python
# (Same import, db= and cur= from above)
with db:
cur.execute(cur.execute('UPDATE results SET value=(?)', (mynewvalue,))
(Also note that the (mynewvalue,) is a single element tuple. The DBAPI requires that our parameters be wrapped in tuples which is easy to forget when you first start using it with single parameters such as this).
Obviously if you took this UPDATE only approach you could drop the 'date' column from the 'results' table and all those references to MAX(data) from the queries.
I chose use the slightly more complex schema in my early examples because they allow your scripts to be a bit more robust with very little additional complexity. You could then do other error checking, detecting missing values where ScriptB finds that ScriptA hasn't been run as intended, for example).

Edit/run crontab -e:
# this will run every month on the 25th at 2am
0 2 25 * * python /path/to/scriptA.py > /dev/null
# this will run every day at 2:10 am
10 2 * * * python /path/to/scriptB.py > /dev/null
Create an external file for both scripts:
In scriptA:
>>> with open('/path/to/test_doc','w+') as f:
... f.write('1')
...
In scriptB:
>>> with open('/path/to/test_doc','r') as f:
... v = f.read()
...
>>> v
'1'

You can take a look at PyPubSub
It's a python package which provides a publish - subscribe Python API that facilitates event-based programming.
It'll give you an OS independent solution to your problem and only requires few additional lines of code in both A and B.
Also you don't need to handle messy files!

Assuming you are not running the two scripts at the same time, you can (pickle and) save the go between object anywhere so long as when you load and save the file you point to the same system path. For example:
import pickle # or import cPickle as pickle
# Create a python object like a dictionary, list, etc.
favorite_color = { "lion": "yellow", "kitty": "red" }
# Write to file ScriptA
f_myfile = open('C:\\My Documents\\My Favorite Folder\\myfile.pickle', 'wb')
pickle.dump(favorite_color, f_myfile)
f_myfile.close()
# Read from file ScriptB
f_myfile = open('C:\\My Documents\\My Favorite Folder\\myfile.pickle', 'rb')
favorite_color = pickle.load(f_myfile) # variables come out in the order you put them in
f_myfile.close()

Related

How to use the kubernetes-client for executing "kubectl apply"

I have a python script which basically runs the following three commands:
kubectl apply -f class.yaml
kubectl apply -f rbac.yaml
kubectl apply -f deployment-arm.yaml
I want to use the kubernetes-client written in python to replace it. My current code, loads the there yaml files (using pyyaml), edits them a bit, inserts into a file and use the command line kubectl to execute those three commands. Some of the code:
# load files, edit them and dump into new files, part ...
result = run(['kubectl', 'apply', '-f', class_file_path])
# status check part ...
result = run(['kubectl', 'apply', '-f', rbac_file_path])
# status check part ...
result = run(['kubectl', 'apply', '-f', deployment_file_path])
# status check part ...
What I want to do: Replace those three commands with the python kubernetes-client. Reading the docs and seeing the topic, I came across with the create_namespaced_deployment method which I think I need to use for the deployment_file_path file. But I can't seem to figure out what I need to do with the two other files.
Assuming that I already loaded the three yaml files (using pyyaml) and edited them (without dumping into new files) and now you have free yaml dicts deployment_dict, class_dict, and rbac_dict, How can I use the client to execute the three above methods?
EDIT: BTW if it's not possible to pass the three dicts, I could just dump them into files again but I want to use the python client instead of the kubectl. How to do it?
There is a separate function for every object and action:
from kubernetes import client, config
import yaml
body = yaml.safe_load("my_deployment.yml")
config.load_kube_config()
apps_api = client.AppsV1Api()
apps_api.create_namespaced_deployment(body=body, namespace="default")
apps_api.replace_namespaced_deployment(body=body, namespace="default")
apps_api.patch_namespaced_deployment(body=body, namespace="default")
apps_api.delete_namespaced_deployment(body=body, namespace="default")
body = yaml.safe_load("my_cluster_role.yml")
rbac_api = client.RbacAuthorizationV1Api()
rbac_api.create_cluster_role(body=body)
rbac_api.patch_cluster_role(body=body)
rbac_api.replace_cluster_role(body=body)
rbac_api.delete_cluster_role(body=body)
# And so on
When you use kubectl apply you don't care if the object already exists, what API to use, which method to apply, etc. With the client library, as you can see from the example above, you need to:
Load kube-config.
Select the right API to use.
Select the method you want to use. Note that create_something will not work if that something already exists.
I recommend you to go through the examples that the library provides, they really are great to learn the thing.

Modify flow file attributes in NiFi with Python sys.stdout?

In my pipeline I have a flow file that contains some data I'd like to add as attributes to the flow file. I know in Groovy I can add attributes to flow files, but I am less familiar with Groovy and much more comfortable with using Python to parse strings (which is what I'll need to do to extract the values of these attributes). The question is, can I achieve this in Python when I use ExecuteStreamCommand to read in a file with sys.stdin.read() and write out my file with sys.stdout.write()?
So, for example, I use the code below to extract the timestamp from my flowfile. How do I then add ts as an attribute when I'm writing out ff?
import sys
ff = sys.stdin.read()
t_split = ff.split('\t')
ts = t_split[0]
sys.stdout.write(ff)
Instead of writing back the entire file again, you can simply write the attribute value from the input FlowFile
sys.stdout.write(ts) #timestamp in you case
and then, set the Output Destination Attribute property of the ExecuteStreamCommand processor with the desired attribute name.
Hence, the output of the stream command will be put into an attribute of the original FlowFile and the same can be found in the original relationship queue.
For more details, you can refer to ExecuteStreamCommand-Properties
If you're not importing any native (CPython) modules, you can try ExecuteScript with Jython rather than ExecuteStreamCommand. I have an example in Jython in an ExecuteScript cookbook. Note that you don't use stdin/stdout with ExecuteScript, instead you have to get the flow file from the session and either transfer it as-is (after you're done reading) or overwrite it (there are examples in the second part of the cookbook).

Securely sending information between two python scripts

Brief summary:
I have two files: foo1.pyw and foo2.py
I need to send large amounts of sensitive information to foo2.py from foo1.pyw, and then back again.
Currently, I am doing this by writing to a .txt file, and then opening it with foo2.py using: os.system('foo2.py [text file here] [other arguments passing information]') The problem here is that the .txt file then leaves a trace when it is removed. I need to send information to foo2.py and back without having to write to a temp file.
The information will be formatted text, containing only ASCII characters, including letters, digits, symbols, returns, tabs, and spaces.
I can give more detail if needed.
You could use encryption like AES with python:http://eli.thegreenplace.net/2010/06/25/aes-encryption-of-files-in-python-with-pycrypto or use a transport layer: https://docs.python.org/2/library/ssl.html.
If what you're worrying about is the traces left on the HD, and real time interception is not the issue, why not just shred the temp file afterwards?
Alternatively, for a lot more work, you can setup a ramdisk and hold the file in memory.
The right way to do this is probably with a sub-process and pipe, accessible via subprocess.Popen You can then directly pipe information between the scripts.
I think the simplest solution would be to just call the function within foo2.py from foo1.py:
# foo1.py
import foo2
result = foo2.do_something_with_secret("hi")
# foo2.py
def do_something_with_secret(s):
print(s)
return 'yeah'
Obviously, this wouldn't work if you wanted to replace foo2.py with an arbitrary executable.
This may be a little tricky if they two are in different directories, run under different versions of Python, etc.

Programmatically create ODBC Data Source using the IBM SPSS 20 standalone Data File Driver

The problem:
I have a .sav file containing a lot of data from a questionnaire. This data is transferred to an access database through an ODBC connection using VBA. The VBA macro is being used by a lot of not so tech-savvy coworkers and each time they need to use a new .sav file, they need to create or alter a connection. It can be a hastle to explain how every time, so I would like to be able to create this connection programmatically (ideally contained in the VBA macro, but if that's not possible something like a python script will do).
The solution I thought of:
I'm not very familiar with VBA, so to begin with I tried doing it with python. I found this example, which seemed (to me) like a solution:
http://code.activestate.com/recipes/414879-create-an-odbc-data-source/
However, I'm not sure what arguments to feed SQLConfigDataSource. I tried:
create_sys_dsn(
'IBM SPSS Statistics 20 Data File Driver - Standalone',
SDSN='SAVDB',
HST='C:\ProgramFiles\IBM\SPSS\StatisticsDataFileDriver\20\Standalone\cfg\oadm.ini',
PRT='StatisticsSAVDriverStandalone',
CP_CONNECT_STRING='data/Dustin_w44-47_2011.sav',
CP_UserMissingIsNull=1)
But to no avail.
Being green in this field, I realise that my proposed solution might not even be the correct one. Does stackoverflow have any ideas?
Thanks in advance
once you have the filename, from a prompt such as
dim fd as Object
dim CSVFilename as String
Set fd = Application.FileDialog(3)
'Use a With...End With block to reference the FileDialog object.
With fd
.AllowMultiSelect = False
.Filters.Add "Data Files", "*.csv", 1
.ButtonName = "Forecast"
If .Show = -1 Then
CSVFilename = .SelectedItems(1)
' returns an array, even with multiselect off
Else
Exit Sub
End If
End With
'Set the object variable to Nothing.
Set fd = Nothing
then you should be able to construct the connection string, which will allow you to open the file
Dim MyConnect as new Connection
With MyConnect
.Provider="IBM SPSS Statistics 20 Data File Driver - Standalone"
.ConnectionString="data/" & CSVFilename
.Open
End With
I would also point out that if you can import the csv file using the External Data -> Text File (or equivalent), then a command more like
DoCmd.TransferText(TransferType, SpecificationName, TableName, FileName, _
HasFieldNames, HTMLTableName, CodePage)
would probably work out easier for you
For anyone that happens to need a solution for this, what I ended up doing was creating the system DSN by manually editing the registry.
To see which keys and values needs to be modified, create a system DSN manually and inspect the values in 'HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI[DSN]' and 'HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\ODBC Data Sources'. It should be fairly clear how to create a DSN through VBA then.

Is there anyway to modify stat information like mtime or ctime manually in Python or any language at all?

I am trying the following code:
os.stat(path_name)[stat.ST_CTIME] = ctime
However, this gives the following error:
exceptions.TypeError: 'posix.stat_result' object does not support item assignment
Is there anyway to modify ctime?
Thanks!
os.utime(filename, timetuple) can be used to set the atime and mtime of a file. As far as I know there is no way to modify the ctime from userland without resorting to hacks such as playing with the clock or direct edition of the filesystem (which I really do not recommend), and this is true for any programming language (Python, Perl, C, C++...) : it's internal OS stuff, and you don't want to touch it.
See for example in the documentation of the touch command (http://www.delorie.com/gnu/docs/fileutils/fileutils_54.html):
Although touch provides options for
changing two of the times -- the times
of last access and modification -- of
a file, there is actually a third one
as well: the inode change time. This
is often referred to as a file's
ctime. The inode change time
represents the time when the file's
meta-information last changed. One
common example of this is when the
permissions of a file change. Changing
the permissions doesn't access the
file, so the atime doesn't change, nor
does it modify the file, so the mtime
doesn't change. Yet, something about
the file itself has changed, and this
must be noted somewhere. This is the
job of the ctime field. This is
necessary, so that, for example, a
backup program can make a fresh copy
of the file, including the new
permissions value. Another operation
that modifies a file's ctime without
affecting the others is renaming. In
any case, it is not possible, in
normal operations, for a user to
change the ctime field to a
user-specified value.
GNU stroke implements the change-system-time trick to change ctime of a file. If that's what you want, GNU stroke does it for you: http://stroke.sourceforge.net/.
There is no direct way to set change time, it gets updated whenever inode information changes, like ownership, link count, mode, etc..
Try setting the mode to the already set mode:
os.chmod(path_name, os.stat(path_name)[stat.ST_MODE])
Ran into this lately, this is how I ended up doing it:
def _update_ctime(filename, cdatetime):
args = [
'sudo',
'touch',
'-d', str(cdatetime)[:19],
filename,
]
proc = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if proc.returncode != 0: print('=> Error changing time on file {}'.format(cdatetime))
Not error proof (depending on locales, etc, but can be improved...

Categories