I want to import multiple csv files at once into QGIS. The files have Lat/Long data. I want the files to project the points. Basically I want the same results from importing the csv files as I would if I used Data Source Manager-Delimited Text with Point Coordinates selected and the x-field and y-field set to Long/Lat respectively.
I keep coming across the same python code on numerous forums. While I can get the files to import as tables, I can not get them to load with geometry (a next stage problem will also be getting the timestamp to load as date instead of a string, I may have to refactor all the files).
Here's the code available on forums which results in loading broken links (my files have column headers "Lat" and "Long"):
import glob, os
# Define path to directory of your csv files
path_to_csv = "C:/File Path/"
# Set current directory to path of csv files
os.chdir(path_to_csv)
# Find each .csv file and load them as vector layers
for fname in glob.glob("*.csv"):
uri ="file:///"+path_to_csv + fname+"encoding=%s&delimiter=%s&xField=%s&yField=%s&crs=%s" % ("UTF-8",",", "Long", "Lat","epsg:4326")
name=fname.replace('.csv', '')
lyr = QgsVectorLayer(uri, name, 'delimitedtext')
QgsProject.instance().addMapLayer(lyr)
This code will load layers, but with a warning triangle for "Unavailable Layer". Clicking on the triangle opens the "Repair Data Source" window. I can manually select the file and repair the link. But then it is nothing more than a table with all fields as strings.
If I run the code like this I get the files to import, but only as tables and without geometry:
import glob, os
# Define path to directory of your csv files
path_to_csv = "C:/Users/DanielStevens/Documents/Afghanistan Monitoring/Phase 2/Border Crossing/Crossing Polygons/Pakistan/"
# Set current directory to path of csv files
os.chdir(path_to_csv)
# Find each .csv file and load them as vector layers
for fname in glob.glob("*.csv"):
uri ="file:///"+path_to_csv + fname
"encoding=%s&delimiter=%s&xField=%s&yField=%s&crs=%s" % ("UTF-8",",", "Long",
"Lat","epsg:4326")
name=fname.replace('.csv', '')
lyr = QgsVectorLayer(uri, name, 'delimitedtext')
QgsProject.instance().addMapLayer(lyr)
How do I get the CSV files to batch import with geometry (Lat Long projecting points)?
I modified what you had to the line below and it worked perfectly. I removed the encoding because my data wasn't UTF-8. Not sure if that's what did it.
uri = "file:///" + path_to_csv + fname + "?delimiter=%s&crs=epsg:3857&xField=%s&yField=%s" % (",", "lon", "lat")
In case it help with part of the issue, using a csvt file when importing csv helps force the data types (a pain if you have a number of files especially if the file names change e.g. When a new batch needs to be processed). I was thinking about writing some python that would read the csv, create a csvt with the same filename and populate the file with the right number of column definitions. In the end, as I only have 30 files, it was quicker to use notepad to make the csvt and then rename it accordingly. I have also found that converting date time fields to Oracle date time is handled more consistently in Qgis. Hope that helps.
Related
I have a folder containing thousands of images and each image needs a unique list of keywords added to it. I also have a table with fields showing the file path and associated list of desired keywords for each image. For example, one record might need the tags, "ORASH (a survey site code), Crew 1, Transect A Upstream, Site Layout". While the next record might need the tags, "ORWLW, Crew 2, Amphibian, Pacific Giant Salamander".
How do I iterate over each image to add the IPTC keywords to them? I'm using python 3 and the iptcinfo3 module but am willing to try other modules that may work.
Here's where I'm at now:
import os
import pandas as pd
from iptcinfo3 import IPTCInfo
srcdir = r'E:\photos'
files = os.listdir(srcdir)
# Create a dataframe from the table containing filepaths and associated keywords.
df = pd.read_excel(r'E:\photo_info.xlsx')
# Create a dictionary with the filename as the key and the tags as the value.
references = dict(df.set_index('basename')['tags'])
for file in files:
# Get the full filepath for each image.
filepath = os.path.join(srcdir, file)
# Create an object for a file that may not have IPTC data (ignore the 'Marker scan...' notification).
info = IPTCInfo(filepath, force=True)
At this point, I imagined I'd use info['keywords'] = ... in conjunction with the 'references' dictionary to plug the keywords into the correct files. Then info.save_as(filepath). I'm just not experienced enough to know how to make this work or even if it's a reasonable way of doing it. Any help would be appreciated!
I saved the table with the filenames and keywords as a .csv file where the fields and records looked like this (though the text in the 'Subject' field didn't include the quotes):
SourceFile, Artist, Subject
E:\photos\0048.JPG, MARY GRAY, "YEAR2022, REQUIRED, GPS UNIT WITH
TIME"
Because I use Jupyter Lab for other portions of this workflow, I ran this code there:
import os
os.system('cmd d: & exiftool -overwrite_original -sep ", " -csv="E:\photos\metadata.csv" E:\photos')
This opens the Windows command prompt, changes the path to the D: drive (where the exiftool.exe file was saved), invokes exiftool, sets it to overwrite the original image file rather than create a copy, defines the keyword separator in the .csv file, reads the .csv file that has the list of filenames and associated keywords, then runs it on the E:\photos directory.
Worked like a charm on about 4,000 photos!
I am trying to load a dataset for my machine learning project and it requires me to load files having no extensions.
I tried :
import os
import glob
files = filter(os.path.isfile, glob.glob("./[0-9]*"))
for name in files:
with open(name) as fh:
contents = fh.read()
But doesn't return anything, mainly that glob command has nothing in it.
Also tried :
import os
import glob
path = './dataset1/training_validation/2012-07-10/'
for infile in glob.glob(os.path.join(path, '*')):
print("test")
file = open(infile, 'r')
print(file)
but this returns [] because of that glob command.
I'm stuck in here and couldn't find anything over the internet.
My actual problem is to load 'no extension files in a training and testing set' from two folders, validation, and the test itself. I can iterate through the folder but don't know how to handle those file types.
When I open those files in a text editor. it shows me something like this.
So I know that it's a binary format of an image, but have no idea how can I store and train them.
any help would be appreciated. thanks.
Two things:
File extensions (.txt , .dat , .bat, .f90, etc.) are not meaningful to python, at least when using glob or numpy or something of the sort, because it's just part of a string. Some of us are raised (within Windows) to believe that file extensions mean something (I too fell for it).
The file you are looking at is a text file, containing the ASCII representation of a binary image on 0's and 1's. So, it's not a binary file, and it's not an image file (per-se), but it is a text file, which means we can read it as such from python.
To read this in, you could do either:
1. Use numpy to do data = numpy.loadtxt(<filename>), however you might have trouble delimiting the digits.
2. Use Python's standard open function on the file, and loop through each line using for line in <file_handle>:. This way, each row of data is a string, which can be parsed easily (see documentation on string indexing).
Good luck!
IMO this simply means that your path does not exist.
Perhaps you try in a first test an absolute path to your folder, as you eventually confused the relative position of the folder to your current working directory.
I got it to work with the following code.
fileNames = [f for f in listdir(dirName) if isfile(join(dirName, f))]
random.shuffle(fileNames)
for files in fileNames:
data = open(dirName+'/'+files,'r');
Thanks for your responses.
So I've a question, Like I'm reading the fits file and then i'm using the information from the header of the fits to define the other files which are related to the original fits file. But for some of the fits file, the other files (blaze_file, bis_file, ccf_table) are not available. And because of that my code gives the pretty obvious error that No Such file or directory.
import pandas as pd
import sys, os
import numpy as np
from glob import glob
from astropy.io import fits
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
e2ds_hdu = fits.open(filename)
e2ds_header = e2ds_hdu[0].header
date = e2ds_header['DATE-OBS']
date2 = date = date[0:19]
blaze_file = e2ds_header['HIERARCH ESO DRS BLAZE FILE']
bis_file = glob('HARPS.' + date2 + '*_bis_G2_A.fits')
ccf_table = glob('HARPS.' + date2 + '*_ccf_G2_A.tbl')
if not all(file in os.listdir(PATH) for file in [blaze_file,bis_file,ccf_table]):
continue
So what i want to do is like, i want to make my code run only if all the files are available otherwise don't. But the problem is that, i'm defining the other files as variable inside the for loop as i'm using the header information. So how can i define them before the for loop???? and then use something like
So can anyone help me out of this?
The filenames returned by os.listdir() are always relative to the path given there.
In order to be used, they have to be joined with this path.
Example:
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
filepath = os.path.join(PATH, filename)
e2ds_hdu = fits.open(filepath)
…
Let the filenames be ['a', 'b', 'a_ed2ds_A.fits', 'b_ed2ds_A.fits']. The code now excludes the two first names and then prepends the file path to the remaining two.
a_ed2ds_A.fits becomes /home/Desktop/2d_spectra/a_ed2ds_A.fits and
b_ed2ds_A.fits becomes /home/Desktop/2d_spectra/b_ed2ds_A.fits.
Now they can be accessed from everywhere, not just from the given file path.
I should become accustomed to reading a question in full before trying to answer it.
The problem I mentionned is a problem if you don't start the script from any path outside the said directory. Nevertheless, applying it will make your code much more consistent.
Your real problem, however, lies somewhere else: you examine a file and then, after checking its contents, want to read files whose names depend on informations from that first file.
There are several ways to accomplish your goal:
Just extend your loop with the proper tests.
Pseudo code:
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if all files exist:
proceed
or
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if not all files exist:
continue # actual keyword, no pseudo code!
proceed
Put some functionality into functions (variation of 1.)
Create a loop in a generator function which yields the "interesting information" of one fits file (or alternatively nothing) and have another loop run over them to actually work with the data.
If I am still missing some points or am not detailled enough, please let me know.
Since you have to read the fits file to know the other dependant files names, there's no way you can avoid reading the fit file first. The only thing you can do is test for the dependant files existance before trying to read them and skip the rest of the loop (using continue) if not.
Edit this line
e2ds_hdu = fits.open(filename)
And replace with
e2ds_hdu = fits.open(os.path.join(PATH, filename))
I have a folder with .exp files. They're basically .csv files but with a .exp extension (just the format of files exported from the instrument). I know because changing .exp to .csv still allows to open them in Excel as csv files. Example here: https://uowmailedu-my.sharepoint.com/personal/tonyd_uow_edu_au/Documents/LAB/MC-ICPMS%20solution/Dump%20data%20here?csf=1
In Python, I want to read the data from each file into data frames (one for each file). I've tried the following code, but it makes the list dfs with all the files and:
(i) I don't know how to access the content of list dfs and turn it into several data frames
(ii) it looks like the columns in the original .exp files were lost.
import os
# change directory
os.chdir('..\LAB\MC-ICPMS solution\Dump data here')
path = os.getcwd()
import glob
import pandas as pd
# get data file names
filenames = glob.glob(path + "/*.csv")
dfs = []
for filename in filenames:
dfs.append(pd.read_csv(filename))
do you guys have any ideas how I could read these files into data frames, so I can easily access the content?
I found this post: Storing csv file's contents into data Frames [Python Pandas] but not too helpful in my case.
thanks
I would recommend you switch to using an absolute path to your folder. Also it is safer to use os.path.join() when combining file parts (better than string concatenation).
To make things easier to understand, I suggest rather than just creating a list of dataframes, that you create a list of tuples containing the filename and the dataframe, that way you will know which is which.
In your code, you are currently searching for csv files not exp files.
The following creates the list of dataframes, each entry also stores the corresponding filename. At end end it cycles through all of the entries and displays the data.
Lastly, it shows you how you would for example display just the first entry.
import pandas as pd
import glob
import os
# change directory
os.chdir('..\LAB\MC-ICPMS solution\Dump data here')
path = os.getcwd()
# get data file names
dfs = []
for filename in glob.glob(os.path.join(path, "*.exp")):
dfs.append((filename, pd.read_csv(filename)))
print "Found {} exp files".format(len(dfs))
# display each of your dataframes
for filename, df in dfs:
print filename
print df
# To display just the first entry:
print "Filename:", df[0][0]
print df[0][1]
Up until now, I have a structure like this on the top of all of my files (I process raw data and do analysis with pandas so I am working with a lot of raw data):
raw_location = 'C:/Users/OneDrive/raw/'
output_location = 'C:/Users/OneDrive/output/'
mtd_location = 'C:/Users/OneDrive/modified/'
py_location = 'C:/Users/OneDrive/py_files/'
There are a bunch of different paths and some .py files use the same path name to refer to a different path (for example, raw_location is the source of the data which is different for different files). It has become a mess.
Under the locations, I have a list of file names (import_filename, modified_filename, dashboard_filename). All told, I am wasting like 10+ lines of code on each file just to specify variable names. I know there must be a better way to do this.
So far I moved my .py and .ipynb files into folders within the main directory which means I can use relative paths like '../raw' which has helped. Can I create a file which has all of the paths and file name variables within it and then read that instead of listing the paths at the top of my code? What is the best practice here?
Edit: After reviewing the comments below and learning this issue deeper - I've added two additional options:
1) Use python "configparser" - https://docs.python.org/2/library/configparser.html
Examples:
https://stackoverflow.com/a/29479549/5088142
2) As BlackJack mentioned - one can remove the "class" from the imported file
You can write config file, e.g. named: LDconfig.py
raw_location = 'C:/Users/OneDrive/raw/'
output_location = 'C:/Users/OneDrive/output/'
mtd_location = 'C:/Users/OneDrive/modified/'
py_location = 'C:/Users/OneDrive/py_files/'
in your files, you will import this class from this LDconfig.py file using:
import LDconfig
In your files you can access the data using: importedmodule.variable, e.g.
LDconfig.raw_location
3) You can write config file, e.g. named: LDconfig.py with class
class LDconfig:
raw_location = 'C:/Users/OneDrive/raw/'
output_location = 'C:/Users/OneDrive/output/'
mtd_location = 'C:/Users/OneDrive/modified/'
py_location = 'C:/Users/OneDrive/py_files/'
in your files, you will import this class from this LDconfig.py file using:
from LDconfig import LDconfig
In your files you can access the data using: classname.variable, e.g.
LDconfig.raw_location