read in file path that contains spaces and commas - python

I am having a difficult time setting a file path that contains spaces and other characters. Unfortunately due to the platform that I am using, I am unable to change the name of the directories.
This is the command that I am attempting to run:
import glob, os
file_url = "file:///mnt/projects/samples/vcf_format/Whole genome sequences/Population level WGS variants, pVCF format - interim 200k release"
os.chdir(file_url)
for file in glob.glob("*.vcf.gz"):
print(file)
The error I receive highlights this section of the path "file:///mnt/projects/samples/vcf_format/Whole" and says that the portion after where the spaces start are unreadable.

Related

How to Import multiple csv files into QGIS 3.22.2?

I want to import multiple csv files at once into QGIS. The files have Lat/Long data. I want the files to project the points. Basically I want the same results from importing the csv files as I would if I used Data Source Manager-Delimited Text with Point Coordinates selected and the x-field and y-field set to Long/Lat respectively.
I keep coming across the same python code on numerous forums. While I can get the files to import as tables, I can not get them to load with geometry (a next stage problem will also be getting the timestamp to load as date instead of a string, I may have to refactor all the files).
Here's the code available on forums which results in loading broken links (my files have column headers "Lat" and "Long"):
import glob, os
# Define path to directory of your csv files
path_to_csv = "C:/File Path/"
# Set current directory to path of csv files
os.chdir(path_to_csv)
# Find each .csv file and load them as vector layers
for fname in glob.glob("*.csv"):
uri ="file:///"+path_to_csv + fname+"encoding=%s&delimiter=%s&xField=%s&yField=%s&crs=%s" % ("UTF-8",",", "Long", "Lat","epsg:4326")
name=fname.replace('.csv', '')
lyr = QgsVectorLayer(uri, name, 'delimitedtext')
QgsProject.instance().addMapLayer(lyr)
This code will load layers, but with a warning triangle for "Unavailable Layer". Clicking on the triangle opens the "Repair Data Source" window. I can manually select the file and repair the link. But then it is nothing more than a table with all fields as strings.
If I run the code like this I get the files to import, but only as tables and without geometry:
import glob, os
# Define path to directory of your csv files
path_to_csv = "C:/Users/DanielStevens/Documents/Afghanistan Monitoring/Phase 2/Border Crossing/Crossing Polygons/Pakistan/"
# Set current directory to path of csv files
os.chdir(path_to_csv)
# Find each .csv file and load them as vector layers
for fname in glob.glob("*.csv"):
uri ="file:///"+path_to_csv + fname
"encoding=%s&delimiter=%s&xField=%s&yField=%s&crs=%s" % ("UTF-8",",", "Long",
"Lat","epsg:4326")
name=fname.replace('.csv', '')
lyr = QgsVectorLayer(uri, name, 'delimitedtext')
QgsProject.instance().addMapLayer(lyr)
How do I get the CSV files to batch import with geometry (Lat Long projecting points)?
I modified what you had to the line below and it worked perfectly. I removed the encoding because my data wasn't UTF-8. Not sure if that's what did it.
uri = "file:///" + path_to_csv + fname + "?delimiter=%s&crs=epsg:3857&xField=%s&yField=%s" % (",", "lon", "lat")
In case it help with part of the issue, using a csvt file when importing csv helps force the data types (a pain if you have a number of files especially if the file names change e.g. When a new batch needs to be processed). I was thinking about writing some python that would read the csv, create a csvt with the same filename and populate the file with the right number of column definitions. In the end, as I only have 30 files, it was quicker to use notepad to make the csvt and then rename it accordingly. I have also found that converting date time fields to Oracle date time is handled more consistently in Qgis. Hope that helps.

Opening file path not working in python [duplicate]

This question already has answers here:
open() gives FileNotFoundError / IOError: '[Errno 2] No such file or directory'
(8 answers)
Closed 7 months ago.
I am writing a database program and personica is my test subject (I would usually have a variable in the place of the file path, but for test and demo purposes I just have a string.). There is a text file at this exact location on my computer (I have changed my username on here, by the way because I am paranoid.), but it says:
Traceback (most recent call last):
File "C:\Users\Admin\Documents\Project
Documentation\InteractiveExecutable.py", line 46, in <module>
ReadPerson = open("C:/Users/Admin/Documents/Project
Documentation/personica.txt", 'r')
IOError: [Errno 2] No such file or directory:
'C:/Users/Admin/Documents/Project Documentation/personica.txt'
This is the line of code:
ReadPerson = open("C:/Users/Admin/Documents/Project Documentation/personica.txt", 'r')
I am certain that it is there and when I copy that address into Windows Explorer, it takes me right to the text file.
Anyone know why this is not working?
The new-ish pathlib module (available in Python >= 3.4) is great for working with path-like objects (both Windows and for other OSes).
It's Paths - Paths all the way down
To simplify: you can build up any path (directory and file path objects are treated exactly the same) as an object, which can be an absolute path object or a relative path object. You can use raw strings to make complex paths (i.e., r'string') and pathlib will be very forgiving. However, note that there are better ways to build up paths than raw strings (see further down).
Here are examples:
from pathlib import Path
Path(r'c:\temp\foo.bar') # absolute path
Path(r'c:/temp/foo.bar') # same absolute path
Path('foo.bar') # different path, RELATIVE to current directory
Path('foo.bar').resolve() # resolve converts to absolute path
Path('foo.bar').exists() # check to see if path exists
Note that if you're on Windows pathlib forgives you for using the "wrong slash" in the second example. See discussion at the end about why you should probably always use the forward slash.
Simple displaying of some useful paths- such as the current working directory and the user home- works like this:
# Current directory (relative):
cwd = Path() # or Path('.')
print(cwd)
# Current directory (absolute):
cwd = Path.cwd()
print(cwd)
# User home directory:
home = Path.home()
print(home)
# Something inside the current directory
file_path = Path('some_file.txt') # relative path; or
file_path = Path()/'some_file.txt' # also relative path
file_path = Path().resolve()/Path('some_file.txt') # absolute path
print(file_path)
To navigate down the file tree, you can do things like this. Note that the first object, home, is a Path and the rest are just strings:
some_person = home/'Documents'/'Project Documentation'/'personica.txt' # or
some_person = home.join('Documents','Project Documentation','personica.txt')
To read a file located at a path, you can use its open method rather than the open function:
with some_person.open() as f:
dostuff(f)
But you can also just grab the text directly!
contents = some_person.read_text()
content_lines = contents.split('\n')
...and WRITE text directly!
data = '\n'.join(content_lines)
some_person.write_text(data) # overwrites existing file
Check to see if it is a file or a directory (and exists) this way:
some_person.is_dir()
some_person.is_file()
Make a new, empty file without opening it like this (silently replaces any existing file):
some_person.touch()
To make the file only if it doesn't exist, use exist_ok=False:
try:
some_person.touch(exist_ok=False)
except FileExistsError:
# file exists
Make a new directory (under the current directory, Path()) like this:
Path().mkdir('new/dir') # get errors if Path()/`new` doesn't exist
Path().mkdir('new/dir', parents=True) # will make Path()/`new` if it doesn't exist
Path().mkdir('new/dir', exist_ok=True) # errors ignored if `dir` already exists
Get the file extension or filename of a path this way:
some_person.suffix # empty string if no extension
some_person.stem # note: works on directories too
Use name for the entire last part of the path (stem and extension if they are there):
some_person.name # note: works on directories too
Rename a file using the with_name method (which returns the same path object but with a new filename):
new_person = some_person.with_name('personica_new.txt')
You can iterate through all the "stuff' in a directory like so using iterdir:
all_the_things = list(Path().iterdir()) # returns a list of Path objects
Sidebar: backslashes (\)
Be careful when using backslashes in a path string, especially ending a path with a backslash. As with any string, Python will read that terminating backslash as an escape character even in raw input mode. Observe:
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: EOL while scanning string literal
So this will give a pretty cryptic error message if you are not aware of this issue:
>>> Path(r'C:\')
File "<stdin>", line 1
Path(r'\')
^
SyntaxError: EOL while scanning string literal
The reason for this error is that \' is assumed to be a single quotation in the string. This works fine: '\'' (the second single quotation ends the string).
If you insist on using backslashes, be sure to use raw input mode or you will run into problems. For example, the '\t' character represents a tab. So when you do this (without raw input):
>>> Path('C:\temp')
You are putting a tab character into your path. This is perfectly legal and Python won't complain until you do something that causes Windows to try turning it into a real Windows path:
>>> Path('C:\temp').resolve()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\temp'
This is also a very cryptic error if you do not know what is going on! Best to avoid the backslash characters altogether when messing about with paths.
Preventing Your Problem
Your problem occurred when you created your file and erroneously added a double extension. To prevent this issue using pathlib, use the touch method to make the file:
some_person = Path.home()/'Documents'/'Project Documentation'/'personica.txt'
some_person.touch()
On Windows, I like to use Python's raw string format for file paths:
path = r'C:/Users/Admin/Documents/Project Documentation/personica.txt'
Note the r at the beginning of the string. Also note that the forward slash can be important as well.
Then I can just do the regular Python open() command:
with open(path) as fobj:
for line in fobj:
print line
See the String Literals section in Python's lexical analysis document:
https://docs.python.org/2/reference/lexical_analysis.html#string-literals

handling Unicode filenames in Python 3.4 on Windows

I'm trying to find a reliable way to scan files on Windows in Python, while allowing for the possibility that there may be various Unicode code points in the filenames. I've seen several proposed solutions to this problem, but none of them work for all of the actual issues that I've encountered in scanning filenames created by real-world software and users.
The code sample below is an attempt to extricate and demonstrate the core issue. It creates three files in a subfolder with the sorts of variations I've encountered, and then attempts to scan through that folder and display each filename followed by the file's contents. It will crash on the attempt to read the third test file, with OSError [Errno 22] Invalid argument.
import os
# create files in .\temp that demonstrate various issues encountered in the wild
tempfolder = os.getcwd() + '\\temp'
if not os.path.exists(tempfolder):
os.makedirs(tempfolder)
print('file contents', file=open('temp/simple.txt','w'))
print('file contents', file=open('temp/with a ® symbol.txt','w'))
print('file contents', file=open('temp/with these chars ΣΑΠΦΩ.txt','w'))
# goal is to scan the files in a manner that allows for printing
# the filename as well as opening/reading the file ...
for root,dirs,files in os.walk(tempfolder.encode('UTF-8')):
for filename in files:
fullname = os.path.join(tempfolder.encode('UTF-8'), filename)
print(fullname)
print(open(fullname,'r').read())
As it says in the code, I just want to be able to display the filenames and open/read the files. Regarding display of the filename, I don't care whether the Unicode characters are rendered correctly for the special cases. I just want to print the filename in a manner that uniquely identifies which file is being processed, and doesn't throw an error for these unusual sorts of filenames.
If you comment out the final line of code, the approach shown here will display all three filenames with no errors. But it won't open the file with miscellaneous Unicode in the name.
Is there a single approach that will reliably display/open all three of these filename variations in Python? I'm hoping there is, and my limited grasp of Unicode subtleties is preventing me from seeing it.
The following works fine, if you save the file in the declared encoding, and if you use an IDE or terminal encoding that supports the characters being displayed. Note that this does not have to be UTF-8. The declaration at the top of the file is the encoding of the source file only.
#coding:utf8
import os
# create files in .\temp that demonstrate various issues encountered in the wild
tempfolder = os.path.join(os.getcwd(),'temp')
if not os.path.exists(tempfolder):
os.makedirs(tempfolder)
print('file contents', file=open('temp/simple.txt','w'))
print('file contents', file=open('temp/with a ® symbol.txt','w'))
print('file contents', file=open('temp/with these chars ΣΑΠΦΩ.txt','w'))
# goal is to scan the files in a manner that allows for printing
# the filename as well as opening/reading the file ...
for root,dirs,files in os.walk(tempfolder):
for filename in files:
fullname = os.path.join(tempfolder, filename)
print(fullname)
print(open(fullname,'r').read())
Output:
c:\\temp\simple.txt
file contents
c:\temp\with a ® symbol.txt
file contents
c:\temp\with these chars ΣΑΠΦΩ.txt
file contents
If you use a terminal that does not support encoding the characters used in the filename, You will get UnicodeEncodeError. Change:
print(fullname)
to:
print(ascii(fullname))
and you will see that the filename was read correctly, but just couldn't print one or more symbols in the terminal encoding:
'C:\\temp\\simple.txt'
file contents
'C:\\temp\\with a \xae symbol.txt'
file contents
'C:\\temp\\with these chars \u03a3\u0391\u03a0\u03a6\u03a9.txt'
file contents

python: Mac can read all files in a directory but Windows can't?

This is probably a naive question since I am absolutely a newbie to python...
I was trying to read a bunch of .txt files from a directory using Mac, and it worked perfectly, obtaining all the files without any exceptions.
But then I realized I needed to switch to another Window computer to do the computing... and it just wouldn't read all the files.
Here is an illustration:
import numpy as np
import glob
import os
from __future__ import print_function
# read all .txt files in directory
names = []
for file in os.listdir("Data/text/film/topy/"):
if file.endswith(".txt"):
print(file)
names.append(file)
scripts = [[] for _ in range(len(names)) ]
for i in xrange(len(names)):
scripts[i] = np.genfromtxt("Data/text/film/topy/"+names[i], delimiter="\t",dtype=character,skip_header=1)
names is a list for the .txt file names and scripts is a list comprehension for file contents.
There should be 365 files in there, and with Mac I could read all of them, but with Windows, only 357 files could be read...
the file names are like these:
l_10-Things-I-Hate-About-You.txt
l_12.txt
l_17-Again.txt
l_30-Minutes-or-Less.txt
l_48-Hrs..txt
l_50-50.txt
l_500-Days-of-Summer.txt
l_A-Serious-Man.txt
l_Adaptation.txt
l_Addams-Family,-The.txt
l_Adventures-of-Buckaroo-Banzai-Across-the-Eighth-Dimension,-The.txt
l_After-School-Special.txt
......
Is there certain files name that prevents Windows from reading? Does anyone know the difference and why is it? Super appreciated!!
Windows has restrictions on the characters that can be contained in a file name. From Naming Files, Paths, and Namespaces:
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
Integer value zero, sometimes referred to as the ASCII NUL character.
Characters whose integer representations are in the range from 1 through 31, except for alternate data streams where these characters are allowed. For more information about file streams, see File Streams.
If any of your file names contains one of these characters, it will be unreadable on a Windows system.

How to access through a path rather than just in current directory?

I have written a function which reads in an excel file and manipulates it. Obviously the .py file has to be in the same directory as the excel file. Is there a way to enter the path of the file so I can leave the script in the same place?
import os
os.chdir('/my/new/path')
You can change the current working directory by using os.chdir.
Another way would be to reference the excel file (however you are opening it) by an absolute path.
Obviously the .py file has to be in
the same directory as the excel file.
I don't understand "obviously"
Is there a way to enter the path of
the file so I can leave the script in
the same place?
Yes, just type it in.
From the xlrd documentation:
open_workbook(filename=None, etc etc etc)
Open a spreadsheet file for data extraction.
filename
The path to the spreadsheet file to be opened.
Snippet of script (presumes you are not hard-coding paths in your scripts):
import sys
import xlrd
book = xlrd.open_workbook(sys.argv[1])
Running this in a Windows "Command Prompt" window:
python c:\myscripts\demo_script.py d:\datafiles\foo.xls
Same principles apply to Linux, OS X, etc.
Also, this advice is quite independent of what software you are feeding the filename or filepath.
About hard-coding file paths in Python scripts on Windows:
In ascending order of preferability:
Use backslashes: "c:\testdata\new.xls" ... the \t will be interpreted as a TAB character. The \n will be interpreted as a newline. Fail.
Escape your backslashes: "c:\\testdata\\new.xls" ... Yuk.
Use a raw string: r"c:\testdata\new.xls"
Use forward slashes: "c:/testdata/new.xls" ... yes, it works, when fed to open().
Don't do it ... see script example above.

Categories