Creating a file to lookup paths and file names? - python

Up until now, I have a structure like this on the top of all of my files (I process raw data and do analysis with pandas so I am working with a lot of raw data):
raw_location = 'C:/Users/OneDrive/raw/'
output_location = 'C:/Users/OneDrive/output/'
mtd_location = 'C:/Users/OneDrive/modified/'
py_location = 'C:/Users/OneDrive/py_files/'
There are a bunch of different paths and some .py files use the same path name to refer to a different path (for example, raw_location is the source of the data which is different for different files). It has become a mess.
Under the locations, I have a list of file names (import_filename, modified_filename, dashboard_filename). All told, I am wasting like 10+ lines of code on each file just to specify variable names. I know there must be a better way to do this.
So far I moved my .py and .ipynb files into folders within the main directory which means I can use relative paths like '../raw' which has helped. Can I create a file which has all of the paths and file name variables within it and then read that instead of listing the paths at the top of my code? What is the best practice here?

Edit: After reviewing the comments below and learning this issue deeper - I've added two additional options:
1) Use python "configparser" - https://docs.python.org/2/library/configparser.html
Examples:
https://stackoverflow.com/a/29479549/5088142
2) As BlackJack mentioned - one can remove the "class" from the imported file
You can write config file, e.g. named: LDconfig.py
raw_location = 'C:/Users/OneDrive/raw/'
output_location = 'C:/Users/OneDrive/output/'
mtd_location = 'C:/Users/OneDrive/modified/'
py_location = 'C:/Users/OneDrive/py_files/'
in your files, you will import this class from this LDconfig.py file using:
import LDconfig
In your files you can access the data using: importedmodule.variable, e.g.
LDconfig.raw_location
3) You can write config file, e.g. named: LDconfig.py with class
class LDconfig:
raw_location = 'C:/Users/OneDrive/raw/'
output_location = 'C:/Users/OneDrive/output/'
mtd_location = 'C:/Users/OneDrive/modified/'
py_location = 'C:/Users/OneDrive/py_files/'
in your files, you will import this class from this LDconfig.py file using:
from LDconfig import LDconfig
In your files you can access the data using: classname.variable, e.g.
LDconfig.raw_location

Related

How to Import multiple csv files into QGIS 3.22.2?

I want to import multiple csv files at once into QGIS. The files have Lat/Long data. I want the files to project the points. Basically I want the same results from importing the csv files as I would if I used Data Source Manager-Delimited Text with Point Coordinates selected and the x-field and y-field set to Long/Lat respectively.
I keep coming across the same python code on numerous forums. While I can get the files to import as tables, I can not get them to load with geometry (a next stage problem will also be getting the timestamp to load as date instead of a string, I may have to refactor all the files).
Here's the code available on forums which results in loading broken links (my files have column headers "Lat" and "Long"):
import glob, os
# Define path to directory of your csv files
path_to_csv = "C:/File Path/"
# Set current directory to path of csv files
os.chdir(path_to_csv)
# Find each .csv file and load them as vector layers
for fname in glob.glob("*.csv"):
uri ="file:///"+path_to_csv + fname+"encoding=%s&delimiter=%s&xField=%s&yField=%s&crs=%s" % ("UTF-8",",", "Long", "Lat","epsg:4326")
name=fname.replace('.csv', '')
lyr = QgsVectorLayer(uri, name, 'delimitedtext')
QgsProject.instance().addMapLayer(lyr)
This code will load layers, but with a warning triangle for "Unavailable Layer". Clicking on the triangle opens the "Repair Data Source" window. I can manually select the file and repair the link. But then it is nothing more than a table with all fields as strings.
If I run the code like this I get the files to import, but only as tables and without geometry:
import glob, os
# Define path to directory of your csv files
path_to_csv = "C:/Users/DanielStevens/Documents/Afghanistan Monitoring/Phase 2/Border Crossing/Crossing Polygons/Pakistan/"
# Set current directory to path of csv files
os.chdir(path_to_csv)
# Find each .csv file and load them as vector layers
for fname in glob.glob("*.csv"):
uri ="file:///"+path_to_csv + fname
"encoding=%s&delimiter=%s&xField=%s&yField=%s&crs=%s" % ("UTF-8",",", "Long",
"Lat","epsg:4326")
name=fname.replace('.csv', '')
lyr = QgsVectorLayer(uri, name, 'delimitedtext')
QgsProject.instance().addMapLayer(lyr)
How do I get the CSV files to batch import with geometry (Lat Long projecting points)?
I modified what you had to the line below and it worked perfectly. I removed the encoding because my data wasn't UTF-8. Not sure if that's what did it.
uri = "file:///" + path_to_csv + fname + "?delimiter=%s&crs=epsg:3857&xField=%s&yField=%s" % (",", "lon", "lat")
In case it help with part of the issue, using a csvt file when importing csv helps force the data types (a pain if you have a number of files especially if the file names change e.g. When a new batch needs to be processed). I was thinking about writing some python that would read the csv, create a csvt with the same filename and populate the file with the right number of column definitions. In the end, as I only have 30 files, it was quicker to use notepad to make the csvt and then rename it accordingly. I have also found that converting date time fields to Oracle date time is handled more consistently in Qgis. Hope that helps.

Is it possible to read a csv file that is located in the python project?

I have a python project with a handful of python scripts, a domain and repository folder with additional python scripts in each etc.... I'd like to create a config folder within this python project that will contain a csv file for configurations that my python project will use.
I then would like to read this csv file into a dataframe within my python code. Is this possible?
Any search I've done only details reading csv files from an actual file share location like from a C drive or desktop location.
df_config = pd.read_csv('Python Config Folder/File_Config.csv', delimiter = ",")
You can use the current directory for this.
Example:
import pathlib
actual_dir = pathlib.Path().absolute()
df_config = pd.read_csv(f'{actual_dir}/File_Config.csv', delimiter = ",")
If you want get the previus directory you can use ".."
Exemple:
f'{actual_dir}/../File_Config.csv'

Extracting file directory from a .txt file?

I have a text file known as testConfigFile which is as follow :
inputCsvFile = BIN+"/testing.csv"
description = "testing"
In which BIN is my parent directory of the folder (already declared using os.getcwd in my python script).
The problem I'm facing now is, how to read and extract the BIN+"testing.csv" from the testConfigFile.txt.
Since the name testing.csv might be changed to other names, so it will be a variable. I'm planning to do something like, first the script reads the keyword "inputCsvFile = " then it will automatically extract the words behind it, which is "BIN+"testing.csv".
f = open("testConfigFile","r")
line = f.readlines(f)
if line.startswith("inputCsvFile = ")
inputfile = ...
This is my failed partial code, where I've no idea on how to fix it. Is there anyone willing to help me?
Reading a config off a unstructured txt file is not the best idea. Python actually is able to parse config files that are structured in a certain way. I have restructured your txt file so that it is easier to work with. The config file extension does not really matter, I have changed it to .ini in this case.
app.ini:
[csvfilepath]
inputCsvFile = BIN+"/testing.csv"
description = "testing"
Code:
from configparser import ConfigParser # Available by default, no install needed.
config = ConfigParser() # Create a ConfigParser instance.
config.read('app.ini') # You can input the full path to the config file.
file_path = config.get('csvfilepath', 'inputCsvFile')
file_description = config.get('csvfilepath', 'description')
print(f"CSV File Path: {file_path}\nCSV File Description: {file_description}")
Output:
CSV File Path: BIN+"/testing.csv"
CSV File Description: "testing"
To read more about configparser, you may refer here.
For a simple tutorial on configparser, you may refer here.

HTCondor output files: obtain created directory

I am using HTcondor to generate some data (txt, png). By running my program, it creates a directory next to the .sub file, named datasets, where the datasets are stored into. Unfortunately, condor does not give me back this created data when finished. In other words, my goal is to get the created data in a "Datasets" subfolder next to the .sub file.
I tried:
1) to not put the data under the datasets subfolder, and I obtained them as thought. Howerver, this is not a smooth solution, since I generate like 100 files which are now mixed up with the .sub file and all the other.
2) Also I tried to set this up in the sub file, leading to this:
notification = Always
should_transfer_files = YES
RunAsOwner = True
When_To_Transfer_Output = ON_EXIT_OR_EVICT
getenv = True
transfer_input_files = main.py
transfer_output_files = Datasets
universe = vanilla
log = log/test-$(Cluster).log
error = log/test-$(Cluster)-$(Process).err
output = log/test-$(Cluster)-$(Process).log
executable = Simulation.bat
queue
This time I get the error, that Datasets was not found. Spelling was checked already.
3) Another option would be, to pack everything in a zip, but since I have to run hundreds of jobs, I do not want to unpack all this files afterwards.
I hope somebody comes up with a good idea on how to solve this.
Just for the record here: HTCondor does not transfer created directories at the end of the run or its contents. The best way to get the content back is to write a wrapper script that will run your executable and then compress the created directory at the root of the working directory. This file will be transferred with all other files. For example, create run.exe:
./Simulation.bat
tar zcf Datasets.tar.gz Datasets
and in your condor submission script put:
executable = run.exe
However, if you do not want to do this and if HTCondor is using a common shared space like an AFS you can simply copy the whole directory out:
./Simulation.bat
cp -r Datasets <AFS location>
The other alternative is to define an initialdir as described at the end of: https://research.cs.wisc.edu/htcondor/manual/quickstart.html
But one must create the directory structure by hand.
also, look around pg. 65 of: https://indico.cern.ch/event/611296/contributions/2604376/attachments/1471164/2276521/TannenbaumT_UserTutorial.pdf
This document is, in general, a very useful one for beginners.

Renaming and Saving Excel Files With Python

I've got a pretty simple task but I haven't done too many functions with excel within python and I'm not sure how to go about doing this.
What I need to do:
Look at many excel files within subfolders, rename them according to information within the file and store them in all in one folder somewhere else.
The data is structured like this:
Main Folder
Subfolder1
File1
File2
File3
...
For about a hundred subfolders and several files within each subfolder.
From here, I want to pull the company name, part number, and date from within the file and use those to rename the excel file. Not sure how to rename the file.
Then save it somewhere else. I'm having trouble finding all these functions, any advice?
Check the os and os.path module for listing folder contents (walk, listdir) and working with path names (abspath, basename etc.)
Also, shutil has some interesting functions for copying stuff. Check out copyfile and specify the dst parameter based on the data you read from the excel file.
This page can help you getting at the Excel data: http://www.python-excel.org/
You probably want to have some highlevel code like this:
for subfolder_name in os.listdir(MAIN_FOLDER):
# exercise left to reader: filter out non-folders
subfolder_path = os.path.join(MAIN_FOLDER, subfolder_name)
for excel_file_name in os.listdir(os.path.join(MAIN_FOLDER, subfolder_name)):
# exercise left to reader: filter out non-excel-files
excel_file_path = os.path.join(subfolder_path, excel_file_name)
new_excel_file_name = extract_filename_from_excel_file(excel_file_path)
new_excel_file_path = os.path.join(NEW_MAIN_FOLDER, subfolder_name,
new_excel_file_name)
shutil.copyfile(excel_file_path, new_excel_file_path)
You'll have to provide extract_filename_from_excel_file yourself using the xlrd module from the site I mentioned.

Categories