Python and importing sub-modules - Pandas example - python

I was trying to use the pandas.tseries.holiday module within pandas, but for some reason it was not showing up. I tried the following:
import pandas as pd
pd.tseries.<TAB>
This does give me a list of options, but holiday was among them. According to the documentation of holiday, it should be as simple as what I tried above.
This was on my system's Python. I tried it in Jupyter using Anaconda, then in Terminal and even in Emacs, but it was never found. So it must be a general design choice that I am unaware of. I have looked for clues, but all information I find tells me that importing a whole module or parts of it is a subjective choice - example: readability versus name-space pollution etc.
Eventually I just tried importing it manually (the next step would have been downloading the actual holiday file from the pandas git repository.
So I did:
from pandas.tseries import holiday # no error
holiday.<TAB>
... and I am shown all the stuff I need - great!
But what is going on here??
Looking at the actual code of holidays.py does not give me any hint as to why the file/module is not imported when I simply import pandas using the statements above.
Edit
Here is some additional information, showing how holiday is not found within pandas.tseries itself, but can be imported and used explicitly:
>>> import pandas as pd
>>> pd.tseries.holiday.USFederalHolidayCalendar()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'pandas.tseries' has no attribute 'holiday'
>>> from pandas.tseries import holiday
>>> holiday.USFederalHolidayCalendar()
<pandas.tseries.holiday.USFederalHolidayCalendar object at 0x7f3b18dc7fd0>

Using simply import pandas as pd does not automatically import all sub-modules of the pandas library (as pointed out by TomAugspurger in the comments above).
This is because the __init.py__ of the pandas library does not import the everything including the holiday sub-module module.
Either adapt the __init__.py file to do so, or be aware that one must explicitly import certain sub-modules of the pandas library!

Related

scikit-surprise: python cannot find module even though pip lists it as installed

I am trying to use the scikit-surprise module to build a recommender system however I am having an error in getting it to compile.
I am receiving the ImportError: Cannot import name "Reader" error
My class is as follows
import pandas as pd
from surprise import Reader, Dataset
userReviewsFilePath ="UserReviewsFirst5000WithHeadings.csv"
ratings = pd.read_csv(userReviewsFilePath) # reading data in pandas df
ratings_dict = {'recipeID': list(ratings.recipeID),
'rating': list(ratings.rating),
'userID': list(ratings.userID)}
df = pd.DataFrame(ratings_dict)
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['recipeID', 'rating', 'userID']], reader)
pip show says that version 1.0.6 is installed
I think your problem come from the installation... I installed "surprise" and past your code and it worked:
import pandas as pd
from surprise import Reader, Dataset
print(Reader) # or just print(surprise) if you import surprise
out:
<class 'surprise.reader.Reader'>
Start by re-install surprise and tell us.
If you have more than one version of python, do:
which pip
to see if you installed surprise on the used version of python
I think it's in surprise.reader: https://surprise.readthedocs.io/en/stable/reader.html
Your code should read:
from surprise.reader import Reader
from surprise.dataset import Dataset
Edit: I checked the instructions again which seem to contradict this, and give your original code as the correct example. https://surprise.readthedocs.io/en/stable/getting_started.html#getting-started
So maybe they add their own shortcuts? Either way, it seems like this isn't the correct solution, sorry. (Unless it works, in which case their instructions might be out of date.)
Edit 2: They do alias it, so "from surprise import Reader" should indeed have worked: https://github.com/NicolasHug/Surprise/blob/master/surprise/init.py#L19
I think you need to do
from surprise.reader import Reader

String or Unicode type required for dfgui running wx with kivy python

I intended
to write a code which helps me display Table / Dataframe on GUI (Kivy). To which I found the solution here. Apparently it uses a non-official package from a github repo which is dfgui.
The Problem
occurred to me when I executed as told on StackOverflow link. However returned Error that
wx._core.PyAssertionError: C++ assertion "!items.IsEmpty()" failed at
/usr/include/wx-3.0/wx/ctrlsub.h(154) in InsertItems(): need something
to insert
I Brokedown
the problem by selective execution in foll. way
import dfgui
import pandas as pd
xls = pd.read_excel('Res.xls')
df = pd.DataFrame(xls)
dfgui.show(df)
#dfgui.show(xls) Apparently the same as df
which then returned
TypeError: String or Unicode type required
and led me to this link, which I couldn't understand much.
Point me in North, or perhaps a different solution could be great too.

'Tables' not recognizing 'isHDF5File'

I am writing a code that creates an HDF5 that can later be used for data analysis. I load the following packages:
import numpy as np
import tables
Then I use the tables module to determine if my file is an HDF5 file with:
tables.isHDF5File(FILENAME)
This normally would print either TRUE or FALSE depending on if the file type is actually an HDF5 file or not. However, I get the error:
AttributeError: module 'tables' has no attribute 'isHDF5File'
So I tried:
from tables import isHDF5File
and got the error:
ImportError: cannot import name 'isHDF5File'
I've tried this code on another computer, and it ran fine. I've tried updating both numpy and tables with pip but it states that the file is already up to date. Is there a reason 'tables' isn't recognizing 'isHDF5File' for me? I am running this code on a Mac (not working) but it worked on a PC (if this matters).
Do you have the function name right?
In [21]: import tables
In [22]: tables.is_hdf5_file?
Docstring:
is_hdf5_file(filename)
Determine whether a file is in the HDF5 format.
When successful, it returns a true value if the file is an HDF5
file, false otherwise. If there were problems identifying the file,
an HDF5ExtError is raised.
Type: builtin_function_or_method
In [23]:

Python Jupyter Notebook - Unable to open CSV file through a path

I am a little new to Python, and I have been using the Jupyter Notebook through Anaconda. I am trying to import a csv file to make a DataFrame, but I am unable to import the file.
Here is an attempt using the local method:
df = pd.read_csv('Workbook1')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-11-a2deb4e316ab> in <module>()
----> 1 df = pd.read_csv('Workbook1')
After that I tried using the path (I put user for my username)
df = pd.read_csv('Users/user/Desktop/Workbook1.csv')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-13-3f2bedd6c4de> in <module>()
----> 1 df = pd.read_csv('Users/user/Desktop/Workbook1.csv')
I am using a Mac, which I am also new to, and I am not 100% sure if I am correctly importing the right path. Can anyone offer some insight or solutions that would allow me to open this csv file.
Instead of providing path, you can set a path using the code below:
import os
import pandas as pd
os.chdir("D:/dataset")
data = pd.read_csv("workbook1.csv")
This will surely work.
Are you sure that the file exists in the location you are specifying to the pandas read_csv method? You can check using the os python built in module:
import os
os.path.isfile('/Users/user/Desktop/Workbook1.csv')
Another way of checking if the file of interest is in the current working directory within a Jupyter notebook is by running ls -l within a cell:
ls -l
I think the problem is probably in the location of the file:
df1 = pd.read_csv('C:/Users/owner/Desktop/contacts.csv')
Having done that, now you can play around with the big file if you have, and create useful data with:
df1.head()
The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules.
import os
import pandas as pd
os.chdir("c:\Pandas")
df=pd.read_csv("names.csv")
df
This might help. :)
The file name is case sensitive, so check your case.
I had the same problem on a Mac too and for some reason it only happened to me there. And I tried to use many tricks but nothing works. I recommend you go directly to the file, right click and then press “alt” key after that the option to “copy route” will appear, and just paste it into your jupyter. For some reason that worked to me.
I believe the issue is that you're not using fully qualified paths. Try this:
Move the data into a suitable project directory. You can do this using the %%bash Magic commands.
%%bash
mkdir -p /project/data/
cp data.csv /project/data/data.csv
You can read the file
f = open("/project/data/data.csv","r")
print(f.read())
f.close()
But it might be most useful to load it into a library.
import pandas as pd
data = pd.read_csv("/project/data/data.csv")
I’ve created a runnable Jupyter notebook with more details here: Jupyter Basics: Reading Files.
Try double quotes, instead of single quotes. it worked for me.
you can open csv files in Jupyter notebook by following these easy steps-
Step 1 - create a directory or folder (you can also use old created folder)
Step 2 - Change your Jupyter working directory to that created directory -
import os
os.chdir('D:/datascience/csvfiles')
Step 3 - Now your directory is changed in Jupyter Notebook. Store your file(s) in that directory.
Step 4 - Open your file -
import pandas as pd
df = pd.read_csv("workbook1.csv")
Now your file is read and stored in a Data Frame variable df, you can display this file content by following
df.head() - display first five rows of this file
df - display all rows of this file
Happy Data Science!
There was a similar problem for me while reading a CSV file in Jupyter notebook from the computer.
I solved it by substituting the "" symbol with "/" in the path like this.
This is what I had:
"C:\Users\RAJ\Desktop\HRPrediction\HRprediction.csv"
This is what I changed it for:
"C:/Users/RAJ/Desktop/HRPrediction/HRprediction.csv".
This is what worked for me. I am using Mac OS.
Save your CSV on a separate folder on your desktop.
When opening a Jupyter notebook press on the same folder that your dataset is currently saved in. Press new notebook in the upper right hand corner.
After opening a new notebook. Code as per usual and read your data using import pandas as pd and pd.read_csv calling to your dataset.
 No need to use anything extra just use r in front of the location.
df = pd.read_csv(r'C:/Users/owner/Desktop/contacts.csv'

python - using functions from a different py script gives NameError

I have a py script, lets call it MergeData.py, where I merge two data files. Since I have a lot of pairs of data files that have to be merged I thought it would be good for readability reasons to put my code in MergeData.py into a function, say merge_data(), and call this function in a loop over all my pairs of data files in a different py script.
2 Questions:
Is it wise, in terms of speed, to call the function from a different file instead of runing the code directly in the loop? (I have thousands of pairs that have to be merged.)
I thought, to use the function in MergeData.py I have to include in the head of my script from MergedData import merge_data. Within the function merge_data I make use of pandas which I import in the main file by 'import pandas as pd'. When calling the function I get the error 'NameError: global name 'pd' is not defined'. I have tried all possible places to import the pandas modul, even within the function, but the error keeps popping up. What am I doing wrong?
In MergeData.py I have
def merge_data(myFile1,myFile2):
df1 = pd.read_csv(myFile1)
df2 = pd.read_csv(myFile2)
# ... my code
and in the other file I have
import pandas as pd
from MergeData import merge_data
# then some code to get my file names followed by
FileList = zip(FileList1,FileList2)
for myFile1,myFile2 in FileList:
# Run Merging Algorithm
dataEq = merge_data(myFile1,myFile2)
I am aware of What is the best way to call a Python script from another Python script?, but cannot really see if that relates to me.
You need to move the line
import pandas as pd
Into the module in which the symbol pd is actually needed, i.e. move it out of your "other file" and into your MergeData.py file.

Categories