In VScode, it seems that Intellisense is not able to infer the return type of calls to pandas.DataFrame.pipe. It is a source of some inconvenience as I cannot rely on autocompletion after using pipe. But I haven't seen this issue mentioned anywhere, so it makes me wonder if it's just me or if I am missing something.
This is what I do:
import pandas as pd
df = pd.DataFrame({'A': [1,2,3]})
df2 = df.pipe(lambda x: x + 1)
VSCode recognizes df as a DataFrame: , but has no clue what df2 might be:
A first thought would be that this is due to the lack of type hinting in the lambda function. But if I try this instead:
def add_one(df: pd.DataFrame) -> pd.DataFrame:
return df + 1
df3 = df.pipe(add_one)
Still IntelliSense can't guess the type of df3:
Of course as a last recourse I can add a hint to df3 itself:
df3: pd.DataFrame = df.pipe(add_one)
But it seems like it shouldn't be necessary. IntelliSense seems very capable of inferring return types in other complex scenarios, such as involving map:
UPDATE:
I experimented a bit more and found some interesting patterns which narrow down the range of possible causes.
I am not sufficiently familiar with Pylance to really understand why this is happening, but here is what I find:
Finding 1
It is happening to pandas.core.common.pipe if import it. (I know pd.DataFrame.pipe calls pandas.core.generic.pipe, but that internally calls pandas.core.common.pipe, and I can reproduce the issue in pandas.core.common.pipe.)
Finding 2
If I copy the definition of that same function from pandas.core.common, together with the relevant imports of Callable and TypeVar, and declare T as TypeVar('T'), IntelliSense actually does its magic.
(Actually in pandas.core.common, T is not defined as TypeVar('T') but imported from pandas._typing, where it is defined as TypeVar('T'). If I import it instead of defining it myself, it still works fine.)
From this I am tempted to conclude that pandas does everything right, but that Pylance is failing to keep track of type information for some unknown reason...
Finding 3
If I just copy pandas.core.common into a local file pandascommon.py and import pipe from that, it works fine too!
I also simulated in vscode and found that this problem does exist. I think it may be related to the return value definition in the pipe () method. I submit the problem on GitHub and hope to gain something.
I got it!
It was due to the stubs shipped with Pylance. Specifically in ~/.vscode/extensions/ms-python.vscode-pylance-2022.3.2/dist/bundled/stubs/pandas/.
For example in core/common.pyi I found this stub:
def pipe(obj, func, *args, **kwargs): ...
Pylance uses this instead of the annotations in pandas.core.common.pipe, causing the issue.
One heavy-handed solution is to just erase (or rename) the pandas stubs in that folder. Then pipe works again. On the other hand, it breaks some other things, for example read_csv is no longer correctly inferred to return a DataFrame. I think the better long run solution would be for the Pylance maintainers to improve those stubs...
A minimally invasive solution to the original pipe issue is to edit ~/.vscode/extensions/ms-python.vscode-pylance-2022.3.2/dist/bundled/stubs/pandas/core/frame.pyi in the following manner:
add from pandas._typing import T
replace the line starting with def pipe by:
def pipe(self, func: Callable[..., T], *args, **kwargs) -> T: ...
Related
Due to some architectural reasons out of my control, an object I use frequently and would like full code completion for is a dynamic composite of several features on top of the static features already present in the source code.
import lgb.reqs.plan
# Various imports which dynamically extend the smallform
import lgb_extensions.water_extras
import lgb_extensions.toolkit_extras
d = c.req[0] # type: lgb.reqs.plan.smallform
d = d # type: lgb_extensions.water_extras.common
d = d # type: lgb_extensions.toolkit_extras.common
# Now I get the autocomplete on d as I type "d."
d.
I've found the re-assign d method to work great, but it feels wrong. Is there no way to type hint with a tuple or something? I tried and couldn't figure it out.
I've found jupyter notebook to be great for autocompletion, and I'll jump into either ipython or a notebook session if I really need to explore an unknown codebase, but in this case, I'm pretty familiar with the code base and just would like the autocompletions to be better as I can never remember quite what things are called. I'm mostly in pycharm or atom if that matters. The solution above already solves my problem if there are only a few extensions, but it doesn't work when I have 10 things extending the object. In my use usual case, I have about 20 things extending the object I'd like auto-complete on.
You might be able to use Union here. It's more for when a name can hold different types in different circumstances.
Eg.
from typing import Union
a = f() # type: Union[str, int]
d. # now get autocompletion for str and int from IDEs
What I have is a class that inherits from DataFrame, but overrides some behavior for business logic reasons. All is well and good, but I need the ability to import and export them. msgpack appears to be a good choice, but doesn't actually work. (Using the standard msgpack library doesn't even work on regular Dataframes, and the advice there is to use the dataframe msgpack functions.)
class DataFrameWrap(pandas.DataFrame):
pass
df = DataFrameWrap()
packed_df = df.to_msgpack()
pandas.read_msgpack(packed_df)
This results in the error
File "C:\Users\REDACTED\PROJECT_NAME\lib\site-packages\pandas\io\packers.py", line 627, in decode
return globals()[obj[u'klass']](BlockManager(blocks, axes))
KeyError: u'DataFrameWrap'
when it reaches the read_msgpack() line. This works if I replace the DataFrameWrap() with a regular DataFrame().
Is there a way to tell pandas where to find the DataFrameWrap class? From reading the code, it looks like if I could inject {"DataFrameWrap": DataFrameWrap} into the globals as seen from this file, it would work, but I'm not sure how to actually do that. There also might be a proper way to do this, but it's not obvious.
Figured it out. As usual, it was much less complicated than I assumed:
from pandas.io import packers
class DataFrameWrap(pandas.DataFrame):
pass
packers.DataFrameWrap = DataFrameWrap
df = DataFrameWrap()
packed_df = df.to_msgpack()
pandas.read_msgpack(packed_df)
class Class:
def __init__(self, path):
self._path = path
string = open(self._path, 'r'). #HERE
When I try to type read() intelliSense says no completions.
However, I know open() function returns file object, which has read() function. I want to see all supported function after typing a dot.
PyCharm shows me recommanded function list, but PTVS does not support.
I want to know this is casual things in PTVS or only happening to me.
My current Python Enviroment is Anaconda 4.3.0 (Python 3.5.3)
How can I fix it?
We've already fixed the specific case of open for our upcoming update (not the one that released today - the next one), but in short the problem is that you don't really know what open is going to return. In our fix, we guess one of two likely types, which should cover most use cases.
To work around it right now, your best option is to assign the result of open to a variable and force it to a certain type using an assert statement. For example:
f = open(self._path, 'r')
import io
assert isinstance(f, io.TextIOWrapper)
f = open(self._path, 'rb')
import io
assert isinstance(f, io.BufferedIOBase)
Note that your code will now fail if the variable is not the expected type, and that the code for Python 2 would be different from this, but until you can get the update where we embed this knowledge into our code it is the best you can do.
I have a helper function that converts a %Y-%m-%d %H:%M:%S-formatted string to a datetime.datetime:
def ymdt_to_datetime(ymdt: str) -> datetime.datetime:
return datetime.datetime.strptime(ymdt, '%Y-%m-%d %H:%M:%S')
I can validate the ymdt format in the function itself, but it'd be more useful to have a custom object to use as a type hint for the argument, something like
from typing import NewType, Pattern
ymdt_pattern = '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]'
YmdString = NewType('YmdString', Pattern[ymdt_pattern])
def ymdt_to_datetime(ymdt: YmdString)...
Am I going down the wrong rabbit hole? Should this be an issue in mypy or someplace? Or can this be accomplished with the current type hint implementation (3.61)?
There currently is no way for types to statically verify that your string matches a precise format, unfortunately. This is partially because checking at compile time the exact values a given variable can hold is exceedingly difficult to implement (and in fact, is NP-hard in some cases), and partially because the problem becomes impossible in the face of things like user input. As a result, it's unlikely that this feature will be added to either mypy or the Python typing ecosystem in the near future, if at all.
One potential workaround would be to leverage NewType, and carefully control when exactly you construct a string of that format. That is, you could do:
from typing import NewType
YmdString = NewType('YmdString', str)
def datetime_to_ymd(d: datetime) -> YmdString:
# Do conversion here
return YmdStr(s)
def verify_is_ymd(s: str) -> YmdString:
# Runtime validation checks here
return YmdString(s)
If you use only functions like these to introduce values of type YmdString and do testing to confirm that your 'constructor functions' are working perfectly, you can more or less safely distinguish between strings and YmdString at compile time. You'd then want to design your program to minimize how frequently you call these functions to avoid incurring unnecessary overhead, but hopefully, that won't be too onerous to do.
Using type-hints does nothing in Python and acts as an indication of the type in static checkers. It is not meant to perform any actions, merely annotate a type.
You can't do any validation, all you can do, with type-hints and a checker, is make sure the argument passed in is actually of type str.
🌷🌷🌷🌷🌷
Okay, here we are five years later and the answer is now yes, at least if you're willing to take a third-party library on board and decorate the functions you want to be checked at runtime:
$ pip install beartype
import re
from typing import Annotated # python 3.9+
from beartype import beartype
from beartype.vale import Is
YtdString = Annotated[str, Is[lambda string: re.match('[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]', string) is not None]]
#beartype
def just_print_it(ytd_string: YtdString) -> None:
print(ytd_string)
> just_print_it("hey")
BeartypeCallHintParamViolation: #beartyped just_print_it() parameter ytd_string='hey' violates type hint typing.Annotated[str, Is[<lambda>]], as 'hey' violates validator Is[<lambda>]:
False == Is[<lambda>].
> just_print_it("2022-12-23 09:09:23")
2022-12-23 09:09:23
> just_print_it("2022-12-23 09:09:2")
BeartypeCallHintParamViolation: #beartyped just_print_it() parameter ytd_string='2022-12-23 09:09:2' violates type hint typing.Annotated[str, Is[<lambda>]], as '2022-12-23 09:09:2' violates validator Is[<lambda>]:
False == Is[<lambda>].
Please note that I'm using the very imperfect regex pattern I originally included in the question, not production-ready.
Then, a hopeful note: The maintainer of beartype is hard at work on an automagical import hook which will eliminate the need for decorating functions in order to achieve the above.
When I write a function in Python (v2.7), I very often have a type in mind for one of the arguments. I'm working with the unbelievably brilliant pandas library at the movemement, so my arguments are often 'intended' to be pandas.DataFrames.
In my favorite IDE (Spyder), when you type a period . a list of methods appear. Also, when you type the opening parenthesis of a method, the docstring appears in a little window.
But for these things to work, the IDE has to know what type a variable is. But of course, it never does. Am I missing something obvious about how to write Pythonic code (I've read Python Is Not Java but it doesn't mention this IDE autocomplete issue.
Any thoughts?
I don't know if it works in Spyder, but many completion engines (e.g. Jedi) also support assertions to tell them what type a variable is. For example:
def foo(param):
assert isinstance(param, str)
# now param will be considered a str
param.|capitalize
center
count
decode
...
Actually I use IntelliJ idea ( aka pyCharm ) and they offer multiple ways to specify variable types:
1. Specify Simple Variable
Very simple: Just add a comment with the type information behind the definition. From now on Pycharm supports autcompletition! e.g.:
def route():
json = request.get_json() # type: dict
Source: https://www.jetbrains.com/help/pycharm/type-hinting-in-pycharm.html
2. Specify Parameter:
Add three quote signs after the beginning of a method and the idea will autocomplete a docstring, as in the following example:
Source: https://www.jetbrains.com/help/pycharm/using-docstrings-to-specify-types.html
(Currently on my mobile, going to make it pretty later)
If you're using Python 3, you can use function annotations. As an example:
#typechecked
def greet(name: str, age: int) -> str:
print("Hello {0}, you are {1} years old".format(name, age))
I don't use Spyder, but I would assume there's a way for it to read the annotations and act appropriately.
I don't know whether Spyder reads docstrings, but PyDev does:
http://pydev.org/manual_adv_type_hints.html
So you can document the expected type in the docstring, e.g. as in:
def test(arg):
'type arg: str'
arg.<hit tab>
And you'll get the according string tab completion.
Similarly you can document the return-type of your functions, so that you can get tab-completion on foo for foo = someFunction().
At the same time, docstrings make auto-generated documention much more helpful.
The problem is with the dynamic features of Python, I use Spyder and I've used a lot more of python IDEs (PyCharm, IDLE, WingIDE, PyDev, ...) and ALL of them have the problem you stated here. So, when I want code completion for help I just instantiate the variable to the type I want and then type ".", for example: suppose you know your var df will be a DataFrame in some piece of code, you can do this df = DataFrame() and for now on the code completion should work for you just do not forget to delete (or comment) the line df = DataFrame() when you finish editing the code.