PyCharm has no auto-completion for AWS Glue Python - python

I'm new to Python and AWS Glue and I have trouble to enable auto-completion for certain case.
First case:
dynamic_frame has no API list
If I force the creation of spark DataFrame, no API list is showed, it seems DataSource0 is unknown.
But glueContext has the API list shown:
Second case:
Spark DataFrame can be created and API list is showed. Since I'm new to AWS Glue, I'm not sure this is the best practice to use the object of DynamicFrame for the DF conversion, instead of using .toDF() directly and what would be the impact.
Third case:
Define the type to have API list showed. and again, I'm new to Python and I come from a background of C/JAVA/SCALA, so I've no idea if this would be weird or "non-python" code style.
Environnment:
Python 3.6 installed by Anaconda
pyspark and AWS Glue installed via pip

To solve the first case try putting a type hint in front of the variable DataSource0 : DynamicFrame.
The Linter of the IDE tries running your code in a separate background process to resolve the types and populate the auto-completion list. If you use a type hint you are helping the Linter determine the type of the attribute and so the IDE doesn't have to determine the type dynamically.
The reason auto-completion works with the library functions in the other examples is because in those cases no dynamic code needs to resolved by the Linter. The library functions by themselves are clearly determined.
Sometimes it also takes PyCharm a few seconds to resolve auto-completions, so don't force the menu, give it a few seconds for the background Linter process to complete between writing the variable and the dot (Ctrl+Space afterwards to show the auto-complete).
In your case those library functions also have several levels of depth, that makes resolving the auto-suggestions heavier for the IDE. Flat really is better than nested in this case. But this isn't a problem just an inconvenience, provided the code is correct it will execute as expected at run-time (it does make writing the code more difficult without the occasional help of auto-suggestions).

Related

Can I tell pylance in VS Code to assume import of certain symbols globally for all files?

I am working with code that Amazon's boto3 library. That library does not define any symbols normally and generates lot of the classes at runtime. As a result, it is not possible to use it for type hints.
There is a library called boto3-stubs, I found it in this answer: https://stackoverflow.com/a/54676985/607407 However this is a large codebase and there is certain inertia to adding new libraries, especially ones that do not actually do anything.
What I can do is use forward type annotations, like this:
def fn(versions: Iterator['ObjectVersion']):
pass
This will not affect the runtime and the rest of our team is ok with these names. But it serves little purpose unless pylance can recognize that I mean mypy_boto3_s3.service_resource.ObjectVersion.
Is there a setting that can tell Pylance: "For every file, automatically import these modules."?

How can I call a python function from an advanced scripting voice command in Dragon NaturallySpeaking?

How can I call a python function from an advanced scripting voice command in Dragon NaturallySpeaking?
I don't want to use a third-party application such as dragonfly or NatLink (paper).
So, one way is to compile it. You can put a bunch of functions that do different things all into the same program and pass along appropriate arguments to select the function you want, and pass the parameters along. Returning the result can be tricky, though, but I usually use the Clipboard (so copy the py output to clip and read from clip in Dragon). Multi-word params need to have spaces escaped (%20) and process it inside your py.
Something like this:
ShellExecute "path\program.exe myFunc myPar1, my%20Par%202", 6 ' 6 runs minimized
Wait 1
myVar = Clipboard
Hth,
Warning: This is not an answer. I am not a programmer. I don't know any Python and have no way of testing it.
This is just a suggestion on how to solve this problem. I don't know where else to put this. I'd put it in a comment, but it allows no screenshots. Please edit and suggest as you wish.
There is answer on SO that deals with calling Python from Excel, which is a similar concept: https://stackoverflow.com/a/3569988/2101890. I am trying to use that here but don't know how.
When using commands in another programming language, you can sometimes add them by adding a reference in the MyCommands Editor. You can reference DLLs and other "stuff". Some references to libraries appear automatically. I've installed Python and hoped to find Python in the References, but no such luck:
There is no Python entry here that I can find. You may have better luck. If you do find something, check the box and see if you can add python commands without causing an error when saving the command.
Maybe you can browse to %localappdata%\Programs\Python\Python36\ and add some of the DLLs from there and call Python commands from there. Or try getting it to work in the way described under 1.

Batch call Dependency id requirements?

I have a script which does the following:
Create campaign
Create AdSet (requires campaign_id)
Create AdCreative (requires adset_id)
Create Ad (requires creative_id and adset_id)
I am trying to lump all of them into a batch request. However, I realized that my none of these gets created except for my campaign (step 1) if I use remote_create(batch=my_batch). This is probably due to the dependencies of the ids that are needed in by each of the subsequent steps.
I read the documentation and it mentions that one can "Specifying dependencies between operations in the request" (https://developers.facebook.com/docs/graph-api/making-multiple-requests) between calls via {result=(parent operation name):(JSONPath expression)}
Is this possible with the python API?
Can this be achieved with the way I am using remote_creates?
Unfortunately python sdk doesn't currently support this. There is a github issue for it: https://github.com/facebook/facebook-python-ads-sdk/issues/256.
I have also encountered this issue also and have described my workaround in the comments on the issue:
"I found a decent workaround for getting this behaviour without too much trouble. Basically I set the id fields that have dependencies with values like "{result=:$,id}" and prior to doing execute() on the batch object I iterate over ._batch and add as the 'name' entry. When I run execute sure enough it works perfectly. Obviously this solution does have it's limitations such where you are doing multiple calls to the same endpoint that need to be fed into other endpoints and you would have duplicated resource names and would need to customize the name further to string them together.
Anyways, hope this helps someone!"

How to tell whether a Python function with dependencies has changed?

tl;dr:
How can I cache the results of a Python function to disk and in a later session use the cached value if and only if the function code and all of its dependencies are unchanged since I last ran it?
In other words, I want to make a Python caching system that automatically watches out for changed code.
Background
I am trying to build a tool for automatic memoization of computational results from Python. I want the memoization to persist between Python sessions (i.e. be reusable at a later time in another Python instance, preferrably even on another machine with the same Python version).
Assume I have a Python module mymodule with some function mymodule.func(). Let's say I already solved the problem of serializing/identifying the function arguments, so we can assume that mymodule.func() takes no arguments if it simplifies anything.
Also assume that I guarantee that the function mymodule.func() and all its dependencies are deterministic, so mymodule.func() == mymodule.func().
The task
I want to run the function mymodule.func() today and save its results (and any other information necessary to solve this task). When I want the same result at a later time, I would like to load the cached result instead of running mymodule.func() again, but only if the code in mymodule.func() and its dependencies are unchanged.
To simplify things, we can assume that the function is always run in a freshly started Python interpreter with a minimal script like this:
import some_save_function
import mymodule
result = mymodule.func()
some_save_function(result, 'filename')
Also, note that I don't want to be overly conservative. It is probably not too hard to use the modulefinder module to find all modules involved when running the first time, and then not use the cache if any module has changed at all. But this defeats my purpose, because in my use case it is very likely that some unrelated function in an imported module has changed.
Previous work and tools I have looked at
joblib memoizes results tied to the function name, and also saves the source code so we can check if it is unchanged. However, as far as I understand it does not check upstream functions (called by mymodule.func()).
The ast module gives me the Abstract Syntax Tree of any Python code, so I guess I can (in principle) figure it all out that way. How hard would this be? I am not very familiar with the AST.
Can I use any of all the black magic that's going on inside dill?
More trivia than a solution: IncPy, a finished/deceased research project, implemented a Python interpreter doing this by default, always. Nice idea, but never made it outside the lab.
Grateful for any input!

MongoDB: how to get db.stats() from API

I'm trying to get results of db.stats() mongo shell command in my python code (for monitoring purposes).
But unlike for example serverStatus I can't do db.command('stats'). I was not able to find any API equivalent in mongodb docs. I've also tried variations with db.$cmd but none of that worked.
So,
Small question: how can I get results of db.stats() (number of connections/objects, size of data & indexes, etc) in my python code?
Bigger question: can anyone explain why some of shell commands are easily accessible from API, while others are not? It's very annoying: some admin-related tools are accessible via db.$cmd.sys, some via db.command, some via ...? Is there some standard or explanation of this situation?
PS: mongodb 2.0.2, pymongo 2.1.0, python 2.7
The Javascript shell's stats command helper actually invokes a command named dbstats, which you can run from PyMongo using the Database.command method. The easiest way to find out what command a shell helper will run is to invoke the shell helper without parentheses -- this will print out the Javascript code it runs:
> db.stats
function (scale) {
return this.runCommand({dbstats:1, scale:scale});
}
As for why some commands have helpers and others do not, it's largely a question of preference, time, and perceived frequency of use by the driver authors. You can run any command by name with Database.command, which is just a convenience wrapper around db.$cmd.find_one. You can find a full list of commands at List of Database Commands. You can also submit a patch against PyMongo to add a helper method for commands you find that you need to invoke frequently but aren't supported by PyMongo yet.

Categories