I'm working from the book called "Building Machine Learning Systems with Python". I've downloaded some data from MLComp to use as a training set. The file I downloaded (20news-18828) is currently in /Users/jones/testingdocuments/379
The book instructs code as follows:
import sklearn.datasets
MLCOMP_DIR = r"D:\data"
data = sklearn.datasets.load_mlcomp("20news-18828", mlcomp_root=MLCOMP_DIR)
print(data.filenames)
I've tried changing MLCOMP_DIR = /Users/jones/testingdocuments/379 and various combinations thereof, but cannot seem to get to the file.
What am I doing wrong?
MLCOMP_DIR = r"D:\data"
implies that you are on Windows, while
MLCOMP_DIR = /Users/jones/testingdocuments/379
will not work, but with quotes added, implies that you are not on Windows, unless you left of 'C:'. If you run the program from a console (or start Idle from a console with python -m idlelib, you should see some error message.
Related
I am trying to replicate another researcher's findings by using the Python file that he added as a supplement to his paper. It is the first time I am diving into Python, so the error might be extremely simple to fix, yet after two days I haven't still. For context, in the Readme file there's the following instruction:
"To run the script, make sure Python2 is installed. Put all files into one folder designated as “cf_dir”.
In the script I get an error at the following lines:
if __name__ == '__main__':
cf_dir, cf_file, cf_phys_file = sys.argv[1:4]
os.chdir(cf_dir)
cf = pd.read_csv(cf_file)
cf_phys = pd.read_csv(cf_phys_file)
ValueError: need more than 0 values to unpack
The "cf_file" and "cf_phys_file" are two major components of all files that are in the one folder named "cf_dir". The "cf_phys_file" relates only to two survey question's (Q22 and Q23), and the "cf_file" includes all other questions 1-21. Now it seems that the code is meant to retrieve those two files from the directory? Only for the "cf_phys_file" the columns 1:4 are needed. The current working directory is already set at the right location.
The path where I located "cf_dir" is as follows:
C:\Users\Marc-Marijn Ossel\Documents\RSM\Thesis\Data\Suitable for ML\Data en Artikelen\Per task Suitability for Machine Learning score readme\cf_dir
Alternative option in readme file,
In the readme file there's this option, but also here I cannot understand how to direct the path to the right location:
"Run the following command in an open terminal (substituting for file names
below): python cfProcessor_AEAPnP.py cf_dir cf_file cf_phys_file task_file jobTaskRatingFile
jobDataFile OESfile
This should generate the data and plots as necessary."
When I run that in "Command Prompt", I get the following error, and I am not sure how to set the working directory correctly.
- python: can't open file 'cfProcessor_AEAPnP.py': [Errno 2] No such file or directory
Thanks for the reading, and I hope there's someone who could help me!
Best regards & stay safe out there during Corona!!
Marc
cf_dir, cf_file, cf_phys_file = sys.argv[1:4]
means, the python file expects few arguments when called.
In order to run
python cfProcessor_AEAPnP.py cf_dir cf_file cf_phys_file task_file jobTaskRatingFile jobDataFile OESfile
the command prompt should be in that folder.
So, open command prompt and type
cd path_to_the_folder_where_ur_python_file_is_located
Now, you would have reached the path of the python file.
Also, make sure you give full path in double quotes for the arguments.
I need to extract text from a PDF. I tried the PyPDF2, but the textExtract method returned an encrypted text, even though the pdf is not encrypted acoording to the isEncrypted method.
So I moved on to trying accessing a program that does the job from the command prompt, so I could call it from python with the subprocess module. I found this program called textExtract, which did the job I wanted with the following command line on cmd:
"textextract.exe" "download.pdf" /to "download.txt"
However, when I tried running it with subprocess I couldn't get a 0 return code.
Here is the code I tried:
textextract = shlex.split(r'"textextract.exe" "download.pdf" /to "download.txt"')
subprocess.run(textextract)
I already tried it with shell=True, but it didn't work.
Can anyone help me?
I was able to get the following script to work from the command line after installing the PDF2Text Pilot application you're trying to use:
import shlex
import subprocess
args = shlex.split(r'"textextract.exe" "download.pdf" /to "download.txt"')
print('args:', args)
subprocess.run(args)
Sample screen output of running it from a command line session:
> C:\Python3\python run-textextract.py
args: ['textextract.exe', 'download.pdf', '/to', 'download.txt']
Progress:
Text from "download.pdf" has been successfully extracted...
Text extraction has been completed!
The above output was generated using Python 3.7.0.
I don't know if your use of spyder on anaconda affects things or not since I'm not familiar with it/them. If you continue to have problems with this, then, if it's possible, I suggest you see if you can get things working directly—i.e. running the the Python interpreter on the script manually from the command line similar to what's shown above. If that works, but using spyder doesn't, then you'll at least know the cause of the problem.
There's no need to build a string of quoted strings and then parse that back out to a list of strings. Just create a list and pass that:
command=["textextract.exe", "download.pdf", "/to", "download.txt"]
subprocess.run(command)
All that shlex.split is doing is creating a list by removing all of the quotes you had to add when creating the string in the first place. That's an extra step that provides no value over just creating the list yourself.
I am using Python3.6 and I need to run my code in command line. The code works when I run it in PyCharm but when I use command line I get this error:
File "path", line 43, in <module>
rb = ds.GetRasterBand(1)
AttributeError: 'NoneType' object has no attribute 'GetRasterBand'
It seems that I have a problem with these lines:
ds = gdal.Open('tif_file.tif', gdal.GA_ReadOnly)
rb = ds.GetRasterBand(1)
img_array = rb.ReadAsArray()
Does anyone know what I might have done wrong?
EDIT
Some magic just happened. I tried to run my code this morning and everything seems fine. I guess what my computer needed was a restart or something. Thanks to you all for help.
from the gdal documentation:
from osgeo import gdal
dataset = gdal.Open(filename, gdal.GA_ReadOnly)
if not dataset:
...
Note that if GDALOpen() returns NULL it means the open failed, and
that an error messages will already have been emitted via CPLError().
If you want to control how errors are reported to the user review the
CPLError() documentation. Generally speaking all of GDAL uses
CPLError() for error reporting. Also, note that pszFilename need not
actually be the name of a physical file (though it usually is). It's
interpretation is driver dependent, and it might be an URL, a filename
with additional parameters added at the end controlling the open or
almost anything. Please try not to limit GDAL file selection dialogs
to only selecting physical files.
looks like the file you are trying to open is not a valid gdal file or some other magic is going on in the file selection. you could try to direct the program to a known good file online to test it.
I'm working on a python script that creates numerous images files based on a variety of inputs in OS X Yosemite. I am trying to write the inputs used to create each file as 'Finder comments' as each file is created so that IF the the output is visually interesting I can look at the specific input values that generated the file. I've verified that this can be done easily with apple script.
tell application "Finder" to set comment of (POSIX file "/Users/mgarito/Desktop/Random_Pixel_Color/2015-01-03_14.04.21.png" as alias) to {Val1, Val2, Val3} as Unicode text
Afterward, upon selecting the file and showing its info (cmd+i) the Finder comments clearly display the expected text 'Val1, Val2, Val2'.
This is further confirmed by running mdls [File/Path/Name] before and after the applescript is used which clearly shows the expected text has been properly added.
The problem is I can't figure out how to incorporate this into my python script to save myself.
Im under the impression the solution should* be something to the effect of:
VarList = [Var1, Var2, Var3]
Fiele = [File/Path/Name]
file.os.system.add(kMDItemFinderComment, VarList)
As a side note I've also look at xattr -w [Attribute_Name] [Attribute_Value] [File/Path/Name] but found that though this will store the attribute, it is not stored in the desired location. Instead it ends up in an affiliated pList which is not what I'm after.
Here is my way to do that.
First you need to install applescript package using pip install applescript command.
Here is a function to add comments to a file:
def set_comment(file_path, comment_text):
import applescript
applescript.tell.app("Finder", f'set comment of (POSIX file "{file_path}" as alias) to "{comment_text}" as Unicode text')
and then I'm just using it like this:
set_comment('/Users/UserAccountName/Pictures/IMG_6860.MOV', 'my comment')
After more digging, I was able to locate a python applescript bundle: https://pypi.python.org/pypi/py-applescript
This got me to a workable answer, though I'd still prefer to do this natively in python if anyone has a better option?
import applescript
NewFile = '[File/Path/Name]' <br>
Comment = "Almost there.."
AddComment = applescript.AppleScript('''
on run {arg1, arg2}
tell application "Finder" to set comment of (POSIX file arg1 as alias) to arg2 as Unicode text
return
end run
''')
print(AddComment.run(NewFile, Comment))
print("Done")
This is the function to get comment of a file.
def get_comment(file_path):
import applescript
return applescript.tell.app("Finder", f'get comment of (POSIX file "{file_path}" as alias)').out
print(get_comment('Your Path'))
Another approach is to use appscript, a high-level Apple event bridge that is sadly no longer officially supported but still works (and saw an updated release in Jan. 2021). Here is an example of reading and setting the comment on a file:
import appscript
import mactypes
# Get a handle on the Finder.
finder = appscript.app('Finder')
# Tell Finder to select the file.
file = finder.items[mactypes.Alias("/path/to/a/file")]
# Print the current comment
comment = file.comment()
print("Current comment: " + comment)
# Set a new comment.
file.comment.set("New comment")
# Print the current comment again to verify.
comment = file.comment()
print("Current comment: " + comment)
Despite that the author of appscript recommends against using it in new projects, I used it recently to create a command-line utility called Urial for the specialized purpose of writing and updating URIs in Finder comments. Perhaps its code can serve as an an additional example of using appscript to manipulate Finder comments.
I can't believe it. After I solved my question in Problems with Encoding in Eclipse Console and Python I thought it wouldn't happen again that I got problems here. But now this:
I have a program test.py in the project TestMe that looks like this:
print "ö"
-> Run as... Python Run results in
ö
So far so good. When I now copy the program in EasyEclipse by right click/copy and paste I receive the program copy of test.py in the same project that looks exactly the same:
print "ö"
-> Bun Run as... Python Run results in
ö
I noticed, that the file properties changed from Encoding UTF-8 to Default, but also changing to UTF-8 doesn't help here.
Another difference between the two files is the line ending which is "Windows" in the original file and "Unix" in the copy (great definition of copy, btw). Changing this in Notepad++ also doesn't change anything.
I am perplexed...
Set up:
Python 2.5
Windows 7
Easy Eclipse 1.2.2.2
Settings that I've set to UTF-8 / Windows:
Project/Rightclick/Properties
File/Rightclick/Properties
Window/Preferences/Workspace
Several places to change the encoding, most immersive first:
Workspace Window > Preferences > General > Workspace
Project Properties
File Properties
Run Configuration.
Using the first method is the most useful one as the others including the console inherit from it by default which is probably what you want.