File Not Found Error in Scrapy Where File Exists [duplicate] - python

My book states:
Every program that runs on your computer has a current working directory, or cwd. Any filenames or paths that do not begin with the root folder are assumed to be under the current working directory
As I am on OSX, my root folder is /. When I type in os.getcwd() in my Python shell, I get /Users/apple/Documents. Why am I getting the Documents folder in my cwd? Is it saying that Python is using Documents folder? Isn't there any path heading to Python that begins with / (the root folder)? Also, does every program have a different cwd?

Every process has a current directory. When a process starts, it simply inherits the current directory from its parent process; and it's not, for example, set to the directory which contains the program you are running.
For a more detailed explanation, read on.
When disks became large enough that you did not want all your files in the same place, operating system vendors came up with a way to structure files in directories. So instead of saving everything in the same directory (or "folder" as beginners are now taught to call it) you could create new collections and other new collections inside of those (except in some early implementations directories could not contain other directories!)
Fundamentally, a directory is just a peculiar type of file, whose contents is a collection of other files, which can also include other directories.
On a primitive operating system, that was where the story ended. If you wanted to print a file called term_paper.txt which was in the directory spring_semester which in turn was in the directory 2021 which was in the directory studies in the directory mine, you would have to say
print mine/studies/2021/spring_semester/term_paper.txt
(except the command was probably something more arcane than print, and the directory separator might have been something crazy like square brackets and colons, or something;
lpr [mine:studies:2021:spring_semester]term_paper.txt
but this is unimportant for this exposition) and if you wanted to copy the file, you would have to spell out the whole enchilada twice:
copy mine/studies/2021/spring_semester/term_paper.txt mine/studies/2021/spring_semester/term_paper.backup
Then came the concept of a current working directory. What if you could say "from now on, until I say otherwise, all the files I am talking about will be in this particular directory". Thus was the cd command born (except on old systems like VMS it was called something clunkier, like SET DEFAULT).
cd mine/studies/2021/spring_semester
print term_paper.txt
copy term_paper.txt term_paper.backup
That's really all there is to it. When you cd (or, in Python, os.chdir()), you change your current working directory. It stays until you log out (or otherwise exit this process), or until you cd to a different working directory, or switch to a different process or window where you are running a separate command which has its own current working directory. Just like you can have your file browser (Explorer or Finder or Nautilus or whatever it's called) open with multiple windows in different directories, you can have multiple terminals open, and each one runs a shell which has its own independent current working directory.
So when you type pwd into a terminal (or cwd or whatever the command is called in your command language) the result will pretty much depend on what you happened to do in that window or process before, and probably depends on how you created that window or process. On many Unix-like systems, when you create a new terminal window with an associated shell process, it is originally opened in your home directory (/home/you on many Unix systems, /Users/you on a Mac, something more or less like C:\Users\you on recent Windows) though probably your terminal can be configured to open somewhere else (commonly Desktop or Documents inside your home directory on some ostensibly "modern" and "friendly" systems).
Many beginners have a vague and incomplete mental model of what happens when you run a program. Many will incessantly cd into whichever directory contains their script or program, and be genuinely scared and confused when you tell them that you don't have to. If frobozz is in /home/you/bin then you don't have to
cd /home/you/bin
./frobozz
because you can simply run it directly with
/home/you/bin/frobozz
and similarly if ls is in /bin you most definitely don't
cd /bin
./ls
just to get a directory listing.
Furthermore, like the ls (or on Windows, dir) example should readily convince you, any program you run will look in your current directory for files. Not the directory the program or script was saved in. Because if that were the case, ls could only produce a listing of the directory it's in (/bin) -- there is nothing special about the directory listing program, or the copy program, or the word processor program; they all, by design, look in the current working directory (though again, some GUI programs will start with e.g. your Documents directory as their current working directory, by design, at least if you don't tell them otherwise).
Many beginners write scripts which demand that the input and output files are in a particular directory inside a particular user's home directory, but this is just poor design; a well-written program will simply look in the current working directory for its input files unless instructed otherwise, and write output to the current directory (or perhaps create a new directory in the current directory for its output if it consists of multiple files).
Python, then, is no different from any other programs. If your current working directory is /Users/you/Documents when you run python then that directory is what os.getcwd() inside your Python script or interpreter will produce (unless you separately os.chdir() to a different directory during runtime; but again, this is probably unnecessary, and often a sign that a script was written by a beginner). And if your Python script accepts a file name parameter, it probably should simply get the operating system to open whatever the user passed in, which means relative file names are relative to the invoking user's current working directory.
python /home/you/bin/script.py file.txt
should simply open(sys.argv[1]) and fail with an error if file.txt does not exist in the current directory. Let's say that again; it doesn't look in /home/you/bin for file.txt -- unless of course that is also the current working directory of you, the invoking user, in which case of course you could simply write
python script.py file.txt
On a related note, many beginners needlessly try something like
with open(os.path.join(os.getcwd(), "input.txt")) as data:
...
which needlessly calls os.getcwd(). Why is it needless? If you have been following along, you know the answer already: the operating system will look for relative file names (like here, input.txt) in the current working directory anyway. So all you need is
with open("input.txt") as data:
...
One final remark. On Unix-like systems, all files are ultimately inside the root directory / which contains a number of other directories (and usually regular users are not allowed to write anything there, and system administrators with the privilege to do it typically don't want to). Every relative file name can be turned into an absolute file name by tracing the path from the root directory to the current directory. So if the file we want to access is in /home/you/Documents/file.txt it means that home is in the root directory, and contains you, which contains Documents, which contains file.txt. If your current working directory were /home you could refer to the same file by the relative path you/Documents/file.txt; and if your current directory was /home/you, the relative path to it would be Documents/file.txt (and if your current directory was /home/you/Music you could say ../Documents/file.txt but let's not take this example any further now).
Windows has a slightly different arrangement, with a number of drives with single-letter identifiers, each with its own root directory; so the root of the C: drive is C:\ and the root of the D: drive is D:\ etc. (and the directory separator is a backslash instead of a slash, although you can use a slash instead pretty much everywhere, which is often a good idea for preserving your sanity).

Your python interpreter location is based off of how you launched it, as well as subsequent actions taken after launching it like use of the os module to navigate your file system. Merely starting the interpreter will place you in the directory of your python installation (not the same on different operating systems). On the other hand, if you start by editing or running a file within a specific directory, your location will be the folder of the file you were editing. If you need to run the interpreter in a certain directory and you are using idle for example, it is easiest to start by creating a python file there one way or another and when you edit it you can start a shell with Run > Python Shell which will already be in that directory. If you are using the command line interpreter, navigate to the folder where you want to run your interpreter before running the python/python3/py command. If you need to navigate manually, you can of course use the following which has already been mentioned:
import os
os.chdir('full_path_to_your_directory')

This has nothing to do with osx in particular, it's more of a concept shared by all unix-based systems, and I believe Windows as well. os.getcwd() is the equivalent of the bash pwd command - it simply returns the full path of the current location in which you are in. In other words:
alex#suse:~> cd /
alex#suse:/> python
Python 2.7.12 (default, Jul 01 2016, 15:34:22) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getcwd()
'/'
It depends from where you started the python shell/script.

Python is usually (except if you are working with virtual environments) accessible from any of your directory. You can check the variables in your path and Python should be available. So the directory you get when you ask Python is the one in which you started Python. Change directory in your shell before starting Python and you will see you will it.

os.getcwd() has nothing to do with OSX in particular. It simply returns the directory/location of the source-file. If my source-file is on my desktop it would return C:\Users\Dave\Desktop\ or let say the source-file is saved on an external storage device it could return something like G:\Programs\. It is the same for both unix-based and Windows systems.

Related

How do programs know where their files are? And how to implement the same thing in python?

I am working on a python project that depends on some other files. It all works fine while testing. However, I want the program to run on start up. The working directory for programs that run on start up seems to be C:Windows\system32. When installing a program, it usually asks where to install it and no matter where you put it, if it runs on start up, it knows where its files are located. How do they achieve that? Also, how to achieve the same thing in python?
First of all, what do you mean by "their files"? Windows applications can store "their files" in multiple places (including but not limited to %CommonProgramFiles%, %ProgramData% and %AppData%).
That being said, the common location for simple applications and scripts is to use the same directory as the .exe (or script).
In Python there seems to be multiple ways to find this path, this seems to work nicely:
import os
print(os.path.abspath(os.path.dirname(__file__)))
See also:
How do I get the path of the Python script I am running in?
How do I get the path and name of the file that is currently executing?
If you plan to consume local files that contain raw data or processed data, defining a default directory or a set of directories can simplify your implementation, for example:
Place your data files under a specific set of folders in C:\ or place your files under the F:\ folder, that can be a part of your on premisses file system
Based on where your Python application is located, you'll need to use relative paths or a library to help you to locate these files.
Here are some examples:
os.path
pathlib

How to prevent Python from search the current working directory for modules?

I have a Python script which imports the datetime module. It works well until someday I can run it in a directory which has a Python script named datetime.py. Of course, there are a few ways to resolve the issue. First, I run the script in a directory that does not contain the script datetime.py. Second, I can rename the Python script datetime.py. However, neither of the 2 approaches are perfect ways. Suppose one ship a Python script, he never knows where users will run the Python script. Another possible fix is to prevent Python from search the current working directory for modules. I tried to remove the empty path ('') from sys.path but it works in an interactive Python shell but not in a Python script. The invoked Python script still searches the current path for modules. I wonder whether there is any way to disable Python from searching the current path for modules?
if __name__ == '__main__':
if '' in sys.path:
sys.path.remove('')
...
Notice that it deosn't work even if I put the following code to the beginning of the script.
import sys
if '' in sys.path:
sys.path.remove('')
Below are some related questions on StackOverflow.
Removing path from Python search module path
pandas ImportError C extension when io.py in same directory
initialization of multiarray raised unreported exception python
Are you sure that Python is searching for that module in the current directory, and not on the script directory? I don't think Python adds the current directory to the sys.path, except in one case. Doing so could even be a security risk (akin to having . on the UNIX PATH).
According to the documentation:
As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first
So, '' as a representation of the current directory happens only if run from interpreter (that's why your interactive shell test worked) or if the script is read from the standard input (something like cat modquest/qwerty.py | python). Neither is a rather 'normal' way of running Python scripts, generally.
I'm guessing that your datetime.py stands side by side with your actual script (on the script directory), and that it just happens that you're running the script from that directory (that is script directory == current directory).
If that's the actual scenario, and your script is standalone (meaning just one file, no local imports), you could do this:
sys.path.remove(os.path.abspath(os.path.dirname(sys.argv[0])))
But keep in mind that this will bite you in the future, once the script gets bigger and you split it into multiple files, only to spend several hours trying to figure out why it is not importing the local files...
Another option is to use -I, but that may be overkill:
-I
Run Python in isolated mode. This also implies -E and -s. In isolated mode sys.path contains neither the script’s directory nor the user’s site-packages directory. All PYTHON* environment variables are ignored, too. Further restrictions may be imposed to prevent the user from injecting malicious code.

how do I install my modual onto my local copy of python on windows?

I'm reading headfirst python and have just completed the section where I created a module for printing nested list items, I've created the code and the setup file and placed them in a file labeled "Nester" that is sitting on my desktop. The book is now asking for me to install this module onto my local copy of Python. The thing is, in the example he is using the mac terminal, and I'm on windows. I tried to google it but I'm still a novice and a lot of the explanations just go over my head. Can someone give me clear thorough guide?.
On Windows systems, third-party modules (single files containing one or more functions or classes) and third-party packages (a folder [a.k.a. directory] that contains more than one module (and sometimes other folders/directories) are usually kept in one of two places: c:\\Program Files\\Python\\Lib\\site-packages\\ and c:\\Users\\[you]\\AppData\\Roaming\\Python\\.
The location in Program Files is usually not accessible to normal users, so when PIP installs new modules/packages on Windows it places them in the user-accessible folder in the Users location indicated above. You have direct access to that, though by default the AppData folder is "hidden"--not displayed in the File Explorer list unless you set FE to show hidden items (which is a good thing to do anyway, IMHO). You can put the module you're working on in the AppData\\Roaming\\Python\\ folder.
You still need to make sure the folder you put it in is in the PATH environment variable. PATH is a string that tells Windows (and Python) where to look for needed files, in this case the module you're working on. Google "set windows path" to find how to check and set your path variable, then just go ahead and put your module in a folder that's listed in your path.
Of course, since you can add any folder/directory you want to PATH, you could put your module anywhere you wanted--including leaving it on the Desktop--as long as the location is included in PATH. You could, for instance, have a folder such as Documents\\Programming\\Python\\Lib to put your personal modules in, and use Documents\\Programming\\Python\\Source for your Python programs. You'd just need to include those in the PATH variable.
FYI: Personally, I don't like the way python is (by default) installed on Windows (because I don't have easy access to c:\\Program Files), so I installed Python in a folder off the drive root: c:\Python36. In this way, I have direct access to the \\Lib\\site-packages\\ folder.

Confused about using relative and absolute path in Python

I have a script called "test.py" and refers to a config file called "cfg.yaml". These two reside in the same directory called "test/scripts".
test/scripts/test.py
test/script/cfg.yaml
Now I am writing a bash script inside "test/data1/data2" called task.sh
From inside of task.sh, I want to make a call to the python script
test.sh contents are as below:
#!/bin/sh
python ../../scripts/test.py
test.py opens and reads the cfg.yaml like open("cfg.yaml") but when the test.sh is called, it fails because "cfg.yaml" is NOT referred with relative path. How do I resolve this?
try
#!/bin/sh
cd ../../scripts
python test.py
I assume in test.py you are referencing the yaml with a local path that expects the script to be running from the scripts directory (not from data1/data2)
as an aside you can always print os.getcwd() to see what your working directory is while running the python script (or just pwd in bash)
If you want to refer to a file in terms of its relative location to the script, you can't just use a relative path for that, because the script may not be in the current working directory. In fact, in your case, it's guaranteed not to be in the current working directory.
The usual quick&dirty solution is to get the script path explicitly:
import os
import sys
scriptdir = os.path.abspath(os.path.dirname(sys.argv[0]))
That sys.argv[0] is the full name of your script—which on most platforms means either an absolute path, or a relative path that's at least still valid at this point. If you're worried about those other platforms, you may want to look at __file__. Of course if you're running the script out of a zipfile or similar, neither one will be useful.
open(os.path.join(scriptdir, "cfg.yaml"))
(Or you can even forcibly os.chdir(scriptdir) if you want, but that can be a bad idea if you're, e.g., accepting pathnames from the user.)
The right solution is to write your script and data files to be installed (even if you don't actually install anything and run it out of the source tree), and then use pkg_resources to find the data files. That will also solve a whole lot of other problems for you—but it does require a bit more learning first. See the Python Packaging User Guide if you want to get started on that road, and Package Data and Data Files as particular starting points.

Python program needs full path in Notepad++

Not a major issue but just an annoyance I've come upon while doing class work. I have my Notepad++ set up to run Python code straight from Notepad++ but I've noticed when trying to access files I have to use the full path to the file even given the source text file is in the same folder as the Python program being run.
However, when running my Python program through cmd I can just type in the specific file name sans the entire path.
Does anyone have a short answer as to why this might be or maybe how to reconfigure Notepad++?
Thanks in advance.
The problem is that your code is assuming that the current working directory is the same as the script directory. This is not true in general. Of course it is true if you're in a cmd window, and you cd to the script directory before running it.
If you don't want to rely on that (e.g., because you want to be able to run scripts from Notepad++, or directly from Explorer), what you want to do is use the script directory explicitly. For example:
import os
import sys
scriptdir = os.path.abspath(os.path.dirname(sys.argv[0]))
with open(os.path.join(scriptdir, 'myfile.txt')) as f:
# etc.
If you have a ton of files that your scripts reference in a ton of places, it might be better to explicitly set the working directory. Just add one line:
os.chdir(scriptdir)
For anything beyond quick&dirty scripts, it's usually better to build an installable package and use pkg_resources to access the data files. Read the Tutorial on Packaging and Distributing Projects for more details. But as long as you're only hacking up scripts to help you maintain your specific system, the scriptdir solution is workable.
In the properties of the shortcut that you use to start Notepad++, you can change its working directory, to whichever directory you're more accustomed to starting from in Python. You can also begin your python program with the appropriate os.chdir() command.

Categories