I have uploaded a simple python package in https://test.pypi.org. When I download this with pip and try yo run I get FileNotFoundError: [Errno 2] File b'data/spam_collection.csv' does not exist: b'data/spam_collection.csv'. Earlier I had issues with uploading the csv file when packaging. See my question in Could not upload csv file to test.pypi.org. Now after installing the package with pip I run pip show -f bigramspamclassifier. I get the csv file listed. Therefore, I believe the file has been uploaded. I think the issue is with reading the file in my python file in the package. What should be the path to the csv file in SpamClassifier.py?
pip show -f bigramspamclassifier
Version: 0.0.3
Summary: A bigram approach for classifying Spam and Ham messages
Home-page: ######
Author: #####
Author-email: #######
Location: /home/kabilesh/PycharmProjects/TestPypl3/venv/lib/python3.6/site-packages
Requires: nltk, pandas
Required-by:
Files:
bigramspamclassifier-0.0.3.dist-info/INSTALLER
bigramspamclassifier-0.0.3.dist-info/LICENSE
bigramspamclassifier-0.0.3.dist-info/METADATA
bigramspamclassifier-0.0.3.dist-info/RECORD
bigramspamclassifier-0.0.3.dist-info/WHEEL
bigramspamclassifier-0.0.3.dist-info/top_level.txt
bigramspamclassifier/SpamClassifier.py
bigramspamclassifier/__init__.py
bigramspamclassifier/__pycache__/SpamClassifier.cpython-36.pyc
bigramspamclassifier/__pycache__/__init__.cpython-36.pyc
bigramspamclassifier/data/spam_collection.csv
My project file structure
Path to csv in SpamClassifier.py file #This what I want to know
def classify(self):
fullCorpus = pd.read_csv("data/spam_collection.csv", sep="\t", header=None)
fullCorpus.columns = ["lable", "body_text"]
Your script is attempting to load the spam_collection.csv file from a relative path. Relative paths are loaded relative to where python is being invoked, not where the source file is.
This means that when you're running your module from the bigramspamclassifier directory, this will work. However, once your module is pip-installed, file will no longer be relative to where you're running your code from (it will be buried somewhere in your installed libraries).
You can instead load relative to the source file by doing something like:
import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "spam_collection.csv")
fullCorpus = pd.read_csv(DATA_PATH, sep="\t", header=None)
Related
I have a python script that accomplishes a few small tasks:
Create new directory structure
Download a .zip file from a URL and unzip contents
Clean up the data
Export data as a .csv
The full .py file runs successfully giving desired output when in Spyder, but when trying to run the .py from Command Prompt, it raises "ImportError: no module named geopandas"
I am using Windows 10 Enterprise version 1909, conda v4.9.2, Anaconda command line client v 1.7.2, Spyder 4.2.3.
I am in a virtual environment with all the needed packages that my script imports.
The first part of my script only needs os and requests packages, and it runs fine as its own .py file from Command Prompt:
import os
import requests
#setup folders, download .zip file and unzip it
#working directory is directory the .py file is in
wd = os.path.dirname(__file__)
if not os.path.exists(wd):
os.mkdir(wd)
#data source directory
src_path = os.path.join(wd, "src")
if not os.path.exists(src_path):
os.mkdir(src_path)
#data output directory
output_path = os.path.join(wd,"output")
if not os.path.exists(output_path):
os.mkdir(output_path)
#create new output directories and define as variables
out_parent = os.path.join(wd, "output")
if not os.path.exists(out_parent):
os.mkdir(out_parent)
folders = ["imgs", "eruptions_processed"]
for folder in folders:
new_dir = os.path.join(out_parent, folder)
if not os.path.exists(new_dir):
os.mkdir(new_dir)
output_imgs = os.path.join(out_parent, "imgs")
if not os.path.exists(output_imgs):
os.mkdir(output_imgs)
output_eruptions = os.path.join(out_parent, "eruptions_processed")
if not os.path.exists(output_eruptions):
os.mkdir(output_eruptions)
if not os.path.exists(os.path.join(src_path,"Historical_Significant_Volcanic_Eruption_Locations.zip")):
url = 'https://opendata.arcgis.com/datasets/3ed5925b69db4374aec43a054b444214_6.zip?outSR=%7B%22latestWkid%22%3A3857%2C%22wkid%22%3A102100%7D'
doc = requests.get(url)
os.chdir(src_path) #change working directory to src folder
with open('Historical_Significant_Volcanic_Eruption_Locations.zip', 'wb') as f:
f.write(doc.content)
file = os.path.join(src_path,"Historical_Significant_Volcanic_Eruption_Locations.zip") #full file path of downloaded
But once I re-introduce my full list of packages in the .py file:
import os
import pandas as pd
import geopandas as gpd
import requests
import datetime
import shutil
and run again from Command Prompt, I get:
Traceback (most recent call last):
File "C:\Users\KWOODW01\py_command_line_tools\download_eruptions.py", line 17, in <module>
import geopandas as gpd
ImportError: No module named geopandas
I am thinking the problem is something to do with not finding my installed packages in my anaconda virtual environment, but I don't have a firm grasp on how to troubleshoot that. I thought I had added the necessary Anaconda file paths to my Windows PATH variable before.
The path to my virtual environment packages are in
"C:\Users\KWOODW01\Anaconda3\envs\pygis\Lib\site-packages"
echo %PATH% returns:
C:\Users\KWOODW01\Anaconda3\envs\pygis;C:\Users\KWOODW01\Anaconda3\envs\pygis\Library\mingw-w64\bin;C:\Users\KWOODW01\Anaconda3\envs\pygis\Library\usr\bin;C:\Users\KWOODW01\Anaconda3\envs\pygis\Library\bin;C:\Users\KWOODW01\Anaconda3\envs\pygis\Scripts;C:\Users\KWOODW01\Anaconda3\envs\pygis\bin;C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;C:\WINDOWS\System32\OpenSSH;C:\Program Files\McAfee\Solidcore\Tools\GatherInfo;C:\Program Files\McAfee\Solidcore\Tools\Scanalyzer;C:\Program Files\McAfee\Solidcore;C:\Program Files\McAfee\Solidcore\Tools\ScGetCerts;C:\Users\KWOODW01\AppData\Local\Microsoft\WindowsApps;C:\Users\KWOODW01\Anaconda3\Library\bin;C:\Users\KWOODW01\Anaconda3\Scripts;C:\Users\KWOODW01\Anaconda3\condabin;C:\Users\KWOODW01\Anaconda3;.
So it appears that the path to the directory where my pygis venv packages live are already added to my PATH variables, yet from Command Prompt the script still raises the "ImportError: no module named geopandas". Pretty stuck on this one. Hoping someone can provide some more troubleshooting tips. Thanks.
I figured out I wasn't calling python in command prompt before executing the python file.
The proper command is python modulename.py instead of modulename.py if you want to execute a .py file from the command prompt. Yikes. Let this be a lesson for other python novices.
I have a project packaged as:
wcnlp-tools (project root)
nlu-spacy
setup.py
...
spacy(package name)
...
nlu-wcnlp
setup.py
...
wcnlp (package name)
...
wcnlp depends on spacy.
When I use:
pip install -e .
to install both libraries, it works all fine. However, if I don't use -e to install both, it reprots an error.
pip install .
The line which caused this error is:
abspath = os.path.abspath(os.path.dirname(__file__))
read_yaml_file('os.path.join(abspath,"../../../../nlu-wcnlp/wcnlp/configs/spacy_config.yml")')
The error message is:
No such file or directory: '/Users/minmin/nlp/test/wcnlp-tools/ven/lib/python3.7/site-packages/spacy/lang/en/../../../../nlu-wcnlp/wcnlp/configs/spacy_config.yml'
The actual path, if correctly resolved, should be:
/Users/minmin/nlp/test/wcnlp-tools/ven/lib/python3.7/site-packages/spacy/lang/en/../../../../site-packages/wcnlp/configs/spacy_config.yml
So the differences between the two:
site-packages
VS
nlu-wcnlp
What's possible reason? Should I change my file path code, or setup scripts? Note that 'pip install -e .' works all fine.
ADDITIONS:
The error originates from nlu-wcnlp, but it's traced to the project nlu-spacy, which contains the error.
Error trace:
File "/Users/minmin/nlp/test/wcnlp-tools/ven/lib/python3.7/site-packages/wcnlp/nlp_utils.py", line 4, in <module>
from spacy.lang.en.stop_words import STOP_WORDS
File "/Users/minmin/nlp/test/wcnlp-tools/ven/lib/python3.7/site-packages/spacy/lang/en/__init__.py", line 32, in <module>
CONFIG = read_yaml_file(SPACY_CONFIG_FILE)
File "/Users/minmin/nlp/test/wcnlp-tools/ven/lib/python3.7/site-packages/wcnlp/utils/fileio.py", line 10, in read_yaml_file
with open(filename) as stream:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/minmin/nlp/test/wcnlp-tools/ven/lib/python3.7/site-packages/spacy/lang/en/../../../../wcnlp-tools/wcnlp/configs/spacy_config.yml
When using __file__ to search for data files associated with a package, you should always use a package relative to the __file__ attribute for a module associated with the package that provides the data file. Do not try to use a relative path that spans the space between two packages, and thus assumes that it can correctly predict where those packages will be installed relative to each other.
So, this is unreliable (makes assumptions about installation locations that are not guaranteed to hold):
SPACEY_CONFIG_FILE = os.path.join(os.path.dirname(__file__),
'/../../../../site-packages/wcnlp/configs/spacy_config.yml'
...but this is okay:
import wcnlp
SPACEY_CONFIG_FILE = os.path.join(os.path.dirname(wcnlp.__file__),
'configs/spacy_config.yml')
Assuming you created a whl of a proprietary project and would like to reuse it in another python project, how to indicate it via a relative path in a pip file without exposing whl online?
Although there is no direct description for this problem, from the documentation (https://pip.readthedocs.io/en/1.1/requirements.html) I have tried the following:
Indicate using -e file:
./folder/my-project-0.1.0-cp36-cp36m-linux_x86_64.whl#egg=my-project==0.1.0
got> NotADirectoryError: [Errno 20] Not a directory
Indicate using file:
./folder/my-project-0.1.0-cp36-cp36m-linux_x86_64.whl#egg=my-project==0.1.0
got> FileNotFoundError: [Errno 2] No such file or directory:
'/folder/my-project-0.1.0-cp36-cp36m-linux_x86_64.whl'
It seems to seek the file on the root folder instead of the relative one.
Indicate without file:
folder/my-project-0.1.0-cp36-cp36m-linux_x86_64.whl got> Invalid
requirement: 'folder/my-project-0.1.0-cp36-cp36m-linux_x86_64.whl'
It looks like a path. Does it exist ?
It appears that if I indicate the folder instead it will work.
But as this is not the case, does anyone know how to fix it?
numpy
pandas
file:./folder/my-project-0.1.0-cp36-cp36m-linux_x86_64.whl#egg=my-project==0.1.0
Pip install whl file using relative path with success!
I have uploaded a package to pypi and github. I have then installed the package and tried to use it. It contains a python script which need to read from a file. I have placed both in the same directory.
pip install pycricket
from pycricket import cricket
c = cricket.Cricket()
c.query()
Query() function involves reading from a file. When I see the 'pycricket' package in library, both script as well as file are in same folder.
query():
with open('matches.csv', 'r') as f:
#code
I don't know why I get the error.
You can inspect the current working directory with:
>>> import os
>>> os.getcwd()
If your data is in a different directory (unclear from the question, but likely given the error message), then change to the directory where the data is stores:
>>> os.chdir(path_to_data_directory)
I tried to convert the langdetect package into an .egg file. However, when importing and using the .egg file within python code I get the following error message:
IOError: [Errno 20] Not a directory: '/workspace/langdetect-1.0.1/dist/langdetect-1.0.1-py2.7.egg/langdetect/utils/messages.properties'
After digging into the library code I found that it tries to load the messages.properties file as:
MESSAGES_FILENAME = path.join(path.dirname(__file__), 'messages.properties')
with open(self.MESSAGES_FILENAME, 'r') as f:
Which obviously does not work when the library is zipped as .egg file since the new path of the messages.properties is:
/workspace/langdetect-1.0.1/dist/langdetect-1.0.1-py2.7.egg/langdetect/utils/messages.properties
How can I change the above code so that the messages.properties is loaded from within the .egg file?