Programmatically determine if running in DSX

Programmatically determine if running in DSX - python

How can I programmatically determine if the python code in my notebook is running under DSX?
I'd like to be able to do different things under a local Jupyter notebook vs. DSX.

While the method presented in another answer (look for specific environment variables) works today, it may stop working in the future. This is not an official API that DSX exposes. It will obviously also not work if somebody decides to set these environment variables on their non-DSX system.
My take on this is that "No, there is no way to reliably determine whether the notebook is running on DSX".
In general, (in my opinion) notebooks are not really designed as artifacts that you can arbitrarily deploy anywhere; there will always need to be someone wearing the "application developer" hat and transform them - how to do that, you could put into a markdown cell inside the notebook.

You can print your environment or look for some specific environment variable. I am sure you will find some differences.
For example:
import os
if os.environ.get('SERVICE_CALLER'):
print ('In DSX')
else:
print ('Not in DSX')

Related

Determine if Code is Running on Databricks or IDE (PyCharm)

I am in the process of building a Python package that can be used by Data Scientists to manage their MLOps lifecycle. Now, this package can be used either locally (usually on PyCharm) or on Databricks.
I want a certain functionality of the package to be dependent on where it is running, i.e. I want it to do something different if it is called by a Databricks notebook and something else entirely if it is running locally.
Is there any way I can determine where it is being called from?
I am a little doubtful as to whether we can use something like the following that checks if your code is running on a notebook or otherwise since this will be a package that is going to be stored in your Databricks environment,
How can I check if code is executed in the IPython notebook?

The workaround I've found to work is to check for databricks specific environment variables.
import os
def is_running_in_databricks() -> bool:
return "DATABRICKS_RUNTIME_VERSION" in os.environ

Azure databricks toggle environment vars contain quotes in python

I know there are a lot of questions here about how to handle quotes in environment variables. This question has a different focus so please read on:
Before last week we had set our environment variables on our databricks cluster (7.3 LTS, includes Apache Spark 3.0.1, Scala 2.12) like this:
EXAMPLE_FOO="gaga"
For whatever reason (don't remember) we needed the quotes to get this result in python:
print(os.environ["EXAMPLE_FOO"]) => gaga
Since last week the behavior changed, now we get:
print(os.environ["EXAMPLE_FOO"]) => "gaga"
with the quotes. We have now clue why this suddenly changed. There was no software update or alike from our side on this production system. We would like to understand the root cause. Has some library on databricks changed or is there a setup flag in the databricks configuration where you can toggle this behavior?
Note: We know how to handle both cases in python so ne need to tell me how to handle the variables. We need to know what suddenly may have caused the issue.

It looks like your workspace was already upgraded to incorporate this breaking change that was highlighted in the release notes. You also should have communication from the Databricks support about this change. Basically, you don't need to use escaping anymore, so you can remove the quotes.
But it's really better to raise a support ticket with Microsoft to understand impact of this issue, and define the next steps.

VS Code - Intellisense varies

I have to admit, I am very stumped and must be missing something obvious.
On one user profile on my macbook, VS code works like a dream. For code like
import numpy as np
np.random.
I get code help/completion. FWIW, I am using a conda environment.
When I switch my User to one that I use in class to teach, with a conda environment that matches, and for what I believe are the same extensions installed, Intellisense does not occur.
I know that this has been asked, but I have yet to see a clear resolution, and the fact that I am using the same machine, with what I believe is the same setup, is really puzzling.
Thanks in advance.

According to your description, it is recommended that you try to check the following points:
Please check whether the module "numpy" has been successfully installed in the VSCode environment you are currently using. Only when this module is installed successfully, VSCode can recognize the method in the module and provide prompts.
Since the code prompt and completion function in VSCode is provided by the Python extension, it is recommended that you check whether it is installed and available.
In addition, you can try to use the extension "Pylance", which provides outstanding language service functions and IntelliCode, etc.
Update:
The code prompt and completion function provided by different language services are different. Since you want to see randint, rand, random_integers as an option, you can use "python.languageServer":"Jedi",in settings.json.

Case sensitivity with names of modules and files in python 2.7.15

I have encountered a rather funny situation: I work in a big scientific collaboration whose major software package is based on C++ and python (2.7.15 still). This collaboration also has multiple servers (SL6) to run the framework on. Since I joined the collaboration recently, I received instructions on how to set up the software and run it. All works perfectly on the server. Now, there are reasons not to connect to the server to do simple tasks or code development, instead it is preferrable to do these kind of things on your local laptop. Thus, I set up a virtual machine (docker) according to a recipe I received, installed a couple of things (fuse, cvmfs, docker images, etc.) and in this way managed to connect my MacBook (OSX 10.14.2) to the server where some of the libraries need to be sourced in order for the software to be compiled and run. And after 2h it does compile! So far so good..
Now comes the fun part: you run the software by executing a specific python script which is fed as argument another python script. Not funny yet. But somewhere in this big list of python scripts sourcing one another, there is a very simple task:
import logging
variable = logging.DEBUG
This is written inside a script that is called Logging.py. So the script and library only are different by the first letter: l or L. On the server, this runs perfectly smooth. On my local VM set up, I get the error
AttributeError: 'module' object has no attribute 'DEBUG'
I checked the python versions (which python) and the location of the logging library (print logging.__file__), and in both set ups I get the same result for both commands. So the same python version is run, and the same logging library is sourced but in one case there is a mix up with the name of the file that sources the library.
So I am wondering, if there is some "convention file" (like a .vimrc for vi) sourced somewhere where this issue could be resolved by setting some tolerance parameter to some other value...?
Thanks a lot for the help!
conni

as others have said, OSX treats names as case-insensitive by default, so the Python bundled logging module will be confused with your Logging.py file. I'd suggest the better fix would be to get the Logging.py file renamed, as this would improve compatibility of the code base. otherwise, you could create a "Case-sensitive" APFS file system using "Disk Utility"
if you go with creating a file system, I'd suggest not changing the root/system partition to case-sensitive as this will break various programs in subtle ways. you could either repartition your disk and create a case-sensitive filesystem, or create an "Image" (this might be slower, not sure how much) and work in there. Just make sure you pick the "APFS (Case-sensitive)" format when creating the filesystem!

Why autocompletion options in Spyder 3.1 are not fully working in the Editor?

Running on Mac Sierra, the autocompletion in Spyder (from Anaconda distribution), seems quite erratic. When used from the Ipython console, works as expected. However, when used from the editor (which is my main way of writing), is erratic. The autocompletion works (i.e. when pressing TAB a little box appears showing options) for some modules, such as pandas or matplotlib. So writing 'pd.' and hitting TAB, gets the box with options as expected. However, this does not happen with many other objects: for example, after defining a dataframe named 'df', typing 'df.' TAB shows nothing. In the Ipython console, 'df.' TAB would show the available procedures for that dataframe, such as groupby, and also its columns, etc..
So the question is threefold. First, is there any particular configuration that should be enabled to get this to work? I don't think so, given some time spent googling, but just wanna make sure. Second, could someone state what is the official word on what works and what doesn't in terms of autocompletion (e.g. what particular modules do work from the editor, and which ones doesn't?). Finally, what are the technical aspects of the differences between the editor and the Ipython console in the performance of the autocompletion with Spyder? I read something about Jedi vs. PsychoPy modules, so got curious (however, please keep in mind that although I have scientific experience, I am relatively new to computation, so please keep it reasonably simple for an educated but not expert person).
UPDATE: As a side question, it would be great to know why is the autocompletion better in Rodeo (another IDE). It is more new, has way fewer overall options than Spyder, but the autocompletion works perfectly in the editor.

(Spyder developer here)
My answers:
is there any particular configuration that should be enabled to get this to work?
In Spyder 3.1 we added the numpydoc library to improve completions of some objects (like Matplotlib figures and NumPy arrays). If Dataframe completions are not working for you (they are for me), please open an issue in our issue tracker on Github to track and solve this problem.
could someone state what is the official word on what works and what doesn't in terms of autocompletion (e.g. what particular modules do work from the editor, and which ones doesn't?)
The most difficult part is getting completions of definitions when an object is generated by functions or methods developed in C/C++/Fortran and not in Python. I mean, things like
import numpy as np
a = np.array([])
a.<TAB>
As I said, this should be working now for arrays, figures and dataframes, but it doesn't work for all libraries (and most scientific Python libraries are created in C/C++/Fortran and wrapped in Python for speed).
The problem is that the completion libraries we use (Rope and Jedi) can't deal with this case very well because array (for example) can't be introspected in a static way (i.e. without running code involving it). So we have to resort to tricks like analyzing array's docstring to see its return type and introspect that instead.
what are the technical aspects of the differences between the editor and the Ipython console in the performance of the autocompletion with Spyder?
The most important difference is that in the IPython console you have to run your code before getting completions about it. For example, please run this in a fresh IPython console
In [1]: import pandas as pd
...: df = pd.Da<Tab>
and you will see that it won't return you any completions for Da (when it obviously should return Dataframe).
But, after evaluation, it is quite simple to get completions. You can simply run
dir(pd)
to get them (that's what IPython essentially does internally).
On the other hand, Spyder's Editor doesn't have a console to run code into, so it has to get completions by running static analysis tools in your code (like Jedi and Rope). As I said, they introspect your code without running it. While they work very well for pure Python code, they have the problems I described above for compiled libraries.
And trying to evaluate the code you have in the Editor to get completions is usually not a good idea because:
It is not necessarily valid Python code all the time. For example, suppose you left an unclosed parenthesis somewhere, but you want to get completions at some other point. That should work without problems, right?
It could involve a very costly computation (e.g. loading a huge CSV in a Dataframe), so evaluating it every time to get completions (and that's a must because your code is different every time you ask for completions) could consume all your RAM in a blink.
it would be great to know why is the autocompletion better in Rodeo (another IDE)
Last time I checked (a couple of years ago), Rodeo evaluated your code to get completions. However, we'll take a look at what they are doing now to see if we can improve our completion machinery.

Autocompletion works correctly if there are NO white spaces in the project working directory path.

Autocomplete was not working for me at all.
So, I tried Tools -> Reset Sypder to factory defaults and it worked.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.