Pandas Series[Unknown] error when using Pylance - python

Using VSCode with Pylance, creating a basic series with pandas shown here will show an error. I looked around online and this question hasn't been asked yet, so I'm assuming I have some basic setup done incorrectly.
Using conda, python#3.8.3, pandas#1.4.4
import pandas as pd
test_series = pd.Series([1, 3, 5, 6, 7, 8])
(variable) test_series: Series[Unknown]
Type of "test_series" is partially unknown
Type of "test_series" is "Series[Unknown]"PylancereportUnknownVariableType

Related

Avoid displaying table sorting arrows under Jupyter Notebook

I need to display a single-row dataframe: 
import pandas as pd
from IPython.display import display, HTML
stack = [42, 27, 13]
df = pd.DataFrame([stack], columns=[1, 2, 3], index=["stack"])
display(HTML(df.to_html())) # or simply: display(df)
Is it possible to get rid of the sorting arrows, which are useless here?
Based on the discussion we had in the comment. You had a Jupyter extension installed on your system (called table_beautifier).
Disabling this should get the expected result (without sorting arrows).

How to ignore SettingWithCopyWarning using warnings.simplefilter()?

The question:
Can I ignore or prevent the SettingWithCopyWarning to be printed to the console using warnings.simplefilter()?
The details:
I'm running a few data cleaning routines using pandas, and those are executed in the simplest of ways using a batch file. One of the lines in my Python script triggers the SettingWithCopyWarning and is printed to the console. But it's also being echoed in the command prompt:
Aside from sorting out the source of the error, is there any way I can prevent the error message from being printed to the prompt like I can with FutureWarnings like warnings.simplefilter(action = "ignore", category = FutureWarning)?
Though I would strongly advise to fix the issue, it is possible to suppress the warning by importing it from pandas.core.common. I found where it's located on GitHub.
Example:
import warnings
import pandas as pd
from pandas.core.common import SettingWithCopyWarning
warnings.simplefilter(action="ignore", category=SettingWithCopyWarning)
df = pd.DataFrame(dict(A=[1, 2, 3], B=[2, 3, 4]))
df[df['A'] > 2]['B'] = 5 # No warnings for the chained assignment!
You can use:
pd.set_option('mode.chained_assignment', None)
# This code will not complain!
pd.reset_option("mode.chained_assignment")
Or if you prefer to use it inside a context:
with pd.option_context('mode.chained_assignment', None):
# This code will not complain!

vs code Python extension dataframe not shown in output

I just started using jupyter cells in Visual Studio Code through the Python extension. It is outputting plots fine, but my dataframe is not showing up like the blog example from Microsoft. Below is my code I am running in VS Code:
#%%
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
x = np.linspace(0, 20, 100)
plt.plot(x, np.sin(x))
plt.show()
#%%
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df
My output looks like this:
VS Code Cell outputs
I am really excited to use jupyter in VS Code but I need to view the dataframes like in other variable explorers.
I am on windows using Anaconda as my environment.
jupyter=1.0.0=py36_7
jupyter_client=5.2.3=py36_0
jupyter_console=6.0.0=py36_0
jupyter_core=4.4.0=py36_0
numpy=1.15.4=py36h19fb1c0_0
pandas=0.23.4=py36h830ac7b_0
I uninstalled my Anaconda 3.6 and installed the newer Anaconda 3.7 and now it works in VS Code.
That error means we don't have the capability to render the df output for some reason.
The only thing I can think of is you might have a jupyter extension that's modifying the result of a df. (normally it returns an html table to us)
Do you know what jupyter extensions you have installed?

R and Python in one Jupyter notebook

Is it possible to run R and Python code in the same Jupyter notebook. What are all the alternatives available?
Install r-essentials and create R notebooks in Jupyter.
Install rpy2 and use rmagic functions.
Use a beaker notebook.
Which of above 3 options is reliable to run Python and R code snippets (sharing variables and visualizations) or is there a better option already?
Yes, it is possible! Use rpy2.
You can install rpy2 with: pip install rpy2
Then run %load_ext rpy2.ipython in one of your cells. (You only have to run this once.)
Now you can do the following:
Python cell:
# enables the %%R magic, not necessary if you've already done this
%load_ext rpy2.ipython
import pandas as pd
df = pd.DataFrame({
'cups_of_coffee': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'productivity': [2, 5, 6, 8, 9, 8, 0, 1, 0, -1]
})
R cell:
%%R -i df -w 5 -h 5 --units in -r 200
# import df from global environment
# make default figure size 5 by 5 inches with 200 dpi resolution
install.packages("ggplot2", repos='http://cran.us.r-project.org', quiet=TRUE)
library(ggplot2)
ggplot(df, aes(x=cups_of_coffee, y=productivity)) + geom_line()
And you'll get your pretty figure plotting data from a python Pandas DataFrame.
You also have access to R objects (e.g. data frames) from Python cells:
import rpy2.robjects as robjects
robjects.globalenv['some-variable-name']
To view the names of all available variables use:
list(robjects.globalenv.keys())
Details are explained here: Pandas - how to convert r dataframe back to pandas?
Using #uut's answer for running R in a jupyter notebook within python kernel (in MacOS), the following worked for me.
%%Rshould always be at the start of the cell else you will get the error as shown in figure below
The following is the right way:
Also %load_ext rpy2.ipython should come before %%R hence put it in a different cell above it as shown in the figures.
UPDATE April 2018,
RStudio has also put out a package:
https://blog.rstudio.com/2018/03/26/reticulate-r-interface-to-python/
for which it is possible to run multiple code chunks in different languages using the R markdown notebook, which is similar to a jupyter notebook.
In my previous post, I said that the underlying representation of objects is different. Actually here is a more nuanced discussion of the underlying matrix representation of R and python from the same package:
https://rstudio.github.io/reticulate/articles/arrays.html
Old post:
It will be hard for you to use both R and Python syntax in the same notebook, mostly because the underlying representation of objects in the two languages are different. That said, there is a project that does try to allow conversion of objects and different languages in the same notebook:
http://beakernotebook.com/features
I haven't used it myself but it looks promising
SoS kernel is another option.
Don't know how well it performs yet, just started using it.
The SoS kernel allows you to run different languages within the same notebook, including Python and R.
SoS Polyglot Notebook - Instructions for Installing Desired Languages
Here is an example of a notebook with Python and R cells.
*Update:
In terms of sharing variables, one can use the magics %use and %with.
"SoS automatically shares variables with names starting with sos among all subkernels"1.
Ex.
Starting cell in R:
%use R
sos_var=read.csv('G:\\Somefile.csv')
dim(sos_var)
Output:
51 13
Switching to python:
%with Python3
sos_var.shape
Output:
(51, 13)
A small addition to #uut's answer and #msh's comment:
If you are using rpy2 in Jupyter Notebooks you also have access to R objects (e.g. data frames) from Python cells:
import rpy2.robjects as robjects
robjects.globalenv['some-variable-name']
To view the names of all available variables use:
list(robjects.globalenv.keys())
Details are explained here:
Pandas - how to convert r dataframe back to pandas?

Unable to write my dataframe using feather (strided data not supported)

When using the feather package (http://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/) to try and write a simple 20x20 dataframe, I keep getting an error stating that strided data isn't yet supported. I don't believe my data is strided (or out of the ordinary), and I can replicate the sample code given on the website, but can't seem to get it to work with my own. Here is some sample code:
import feather
import numpy as np
import pandas as pd
tempArr = reshape(np.arange(400), (20,20))
df = pd.DataFrame(tempArr)
feather.write_dataframe(df, 'test.feather')
The last line returns the following error:
FeatherError: Invalid: no support for strided data yet
I am running this on Ubuntu 14.04. Am I perhaps misunderstanding something about how pandas dataframes are stored?
Please come to GitHub: https://github.com/wesm/feather/issues/97
Bug reports do not belong on StackOverflow

Categories