Save files of a modeling program in different files .txt - python

hello I am New to Python, I have worked with a library design by a professor in my university
I need help to save the results in different files .txt and in different folders
The model consists of a mesh of 2500 cells with coordinates x, y, z. I want in every step of the iteration to get a file .txt that has cell index, the coordinates of each cell, and the different attributes of this cell, I wrote the code below but I obtained the last cell index and the position 0.0.0... in only one file txt, and I don't know how to save it in a folder...
if os.path.exists('Model'):
import shutil
shutil.rmtree('Model')
os.mkdir('Model')
def save_m1(path, fluid_index):
assert isinstance(path, str)
with open(path, 'w') as file:
for cell in model.cells:
x, y, z = cell.pos
file.write(f'{cell.pos} {model.cell_number} {cell.get_fluid(fluid_index).mass}\n')
if step % 100 == 0:
model.print_cells("Model" % step)
dt = 10.0
for step in range(10):
model.iterate(dt=dt)
thermal.iterate(dt=dt)
thermal.exchange_heat(model, dt=dt, fid=0, na_g = na_g, fa_T = ID_T, fa_c = ID_K)
thermal.exchange_heat(model, dt=dt, fid=1, na_g = na_g, fa_T = ID_T, fa_c = ID_K)
thermal.exchange_heat(model, dt=dt, fid=2, na_g = na_g, fa_T = ID_T, fa_c = ID_K)
print(f'step = {step}, m1_produced = {get_mass1_produced(1) - InitialMass1}') # kg
save_m1('xxx.txt', 1)

Related

How to speed up several nested loops

I have nested for loops which are causing the execution of my operation to be incredibly slow. I wanted to know if there is another way to do this.
The operation is basically going through files in 6 different directories and seeing if there is a file in each directory that is the same before opening each file up and then displaying them.
My code is:
original_images = os.listdir(original_folder)
ground_truth_images = os.listdir(ground_truth_folder)
randomforest_images = os.listdir(randomforest)
ilastik_images = os.listdir(ilastik)
kmeans_images = os.listdir(kmeans)
logreg_multi_images = os.listdir(logreg_multi)
random_forest_multi_images = os.listdir(randomforest_multi)
for x in original_images:
for y in ground_truth_images:
for z in randomforest_images:
for i in ilastik_images:
for j in kmeans_images:
for t in logreg_multi_images:
for w in random_forest_multi_images:
if x == y == z == i == j == w == t:
*** rest of code operation ***
If the condition is that the same file must be present in all seven directories to run the rest of the code operation, then it's not necessary to search for the same file in all directories. As soon as the file is not in one of the directories, you can forget about it and move to the next file. So you can build a for loop looping through the files in the first directory and then build a chain of nested if statements: If the file exists in the next directory, you move forward to the directory after that and search there. If it doesn't, you move back to the first directory and pick the next file in it.
Convert all of them to sets and iterate through the last one, checking membership for all of the others:
original_images = os.listdir(original_folder)
ground_truth_images = os.listdir(ground_truth_folder)
randomforest_images = os.listdir(randomforest)
ilastik_images = os.listdir(ilastik)
kmeans_images = os.listdir(kmeans)
logreg_multi_images = os.listdir(logreg_multi)
files = set()
# add folder contents to the set of all files here
for folder in [original_images, ground_truth_images, randomforest_images, ilastik_images, kmeans_images, logreg_multi_images]:
files.update(folder)
random_forest_multi_images = set(os.listdir(randomforest_multi))
# find all common items between the sets
for file in random_forest_multi_images.intersection(files):
# rest of code
The reason this works is that you are only interested in the intersection of all sets, so you only need to iterate over one set and check for membership in the rest
You should check x == y before going in the nest loop. Then y == z etc. Now you are going over each loop way too often.
There is also another approach:
You can create a set of all your images and create an intersection over each set so the only elements which will remain are the ones that are equal. If you are sure that the files are the same you can skip that step.
If x is in all other list you can create your paths on the go:
import pathlib
original_images = os.listdir(original_folder)
ground_truth_images = pathlib.Path(ground_truth_folder) #this is a folder
randomforest_images = pathlib.Path(randomforest)
for x in original_images:
y = ground_truth_images / x
i = randomforest_images / x
# And so on for all your files
# check if all files exist:
for file in [x, y, i, j, t ,w]:
if not file.exists():
continue # go to next x
# REST OF YOUR CODE USING x, y, i, j, t, w,
# y, i, j, t, w, are now pathlib object, you can get s string (of its path using str(y), str(i) etc.

Apply the same code to multiple files in the same directory

I have a code that already works but I need to use it to analyse many files in the same folder. How can I re-write it to do this? All the files have similar names (e.g. "pos001", "pos002", "pos003").
This is the code at the moment:
pos001 = mpimg.imread('pos001.tif')
coord_pos001 = np.genfromtxt('treat_pos001_fluo__spots.csv', delimiter=",")
Here I label the tif file "pos001" to differentiate separate objects in the same image:
label_im = label(pos001)
regions = regionprops(label_im)
Here I select only the object of interest by setting its pixel values == 1 and all the others == 0 (I'm interested in many objects, I show only one here):
cell1 = np.where(label_im != 1, 0, label_im)
Here I convert the x,y coordinates of the spots in the csv file to a 515x512 image where each spot has value 1:
x = coord_pos001[:,2]
y = coord_pos001[:,1]
coords = np.column_stack((x, y))
img = Image.new("RGB", (512,512), "white")
draw = ImageDraw.Draw(img)
dotSize = 1
for (x,y) in coords:
draw.rectangle([x,y,x+dotSize-1,y+dotSize-1], fill="black")
im_invert = ImageOps.invert(img)
bin_img = im_invert.convert('1')
Here I set the values of the spots of the csv file equal to 1:
bin_img = np.where(bin_img == 255, 1, bin_img)
I convert the arrays from 2d to 1d:
bin_img = bin_img.astype(np.int64)
cell1 = cell1.flatten()
bin_img = bin_img.flatten()
I multiply the arrays to get an array where only the spots overlapping the labelled object have value = 1:
spots_cell1 = []
for num1, num2 in zip(cell1, bin_img):
spots_cell1.append(num1 * num2)
I count the spots belonging to that object:
spots_cell1 = sum(float(num) == 1 for num in spots_cell1)
print(spots_cell1)
I hope it's clear. Thank you in advance!
You can define a function that takes the .tif file path and the .csv file path and processes the two
def process(tif_file, csv_file):
pos = mpimg.imread(tif_file)
coord = np.genfromtxt(csv_file, delimiter=",")
# Do other processing with pos and coord
To process a single pair of files, you'd do:
process('pos001.tif', 'treat_pos001_fluo__spots.csv')
To list all the files in your tif file directory, you'd use the example in this answer:
import os
tif_file_directory = "/home/username/path/to/tif/files"
csv_file_directory = "/home/username/path/to/csv/files"
all_tif_files = os.listdir(tif_file_directory)
for file in all_tif_files:
if file.endswith(".tif"): # Make sure this is a tif file
fname = file.rstrip(".tif") # Get just the file name without the .tif extension
tif_file = f"{tif_file_directory}/{fname}.tif" # Full path to tif file
csv_file = f"{csv_file_directory}/treat_{fname}_fluo__spots.csv" # Full path to csv file
# Just to keep track of what is processed, print them
print(f"Processing {tif_file} and {csv_file}")
process(tif_file, csv_file)
The f"...{variable}..." construct is called an f-string. More information here: https://realpython.com/python-f-strings/

How to use tensorflow to ingest sharded CSVs

This is a problem I am working on in google cloud platform with tensorflow v1.15
I am working on this notebook
In this section, I am supposed to return a function that feeds model.train()
CSV_COLUMNS = ['fare_amount', 'pickuplon','pickuplat','dropofflon','dropofflat','passengers', 'key']
DEFAULTS = [[0.0], [-74.0], [40.0], [-74.0], [40.7], [1.0], ['nokey']]
# TODO: Create an appropriate input function read_dataset
def read_dataset(filename, mode):
#TODO Add CSV decoder function and dataset creation and methods
return dataset
def get_train_input_fn():
return read_dataset('./taxi-train.csv', mode = tf.estimator.ModeKeys.TRAIN)
def get_valid_input_fn():
return read_dataset('./taxi-valid.csv', mode = tf.estimator.ModeKeys.EVAL)
I think it should be like this:
def read_dataset(filename, mode, batch_size = 512):
def fn():
def decode_csv(value_column):
columns = tf.decode_csv(value_column, record_defaults = DEFAULTS)
features = dict(zip(CSV_COLUMNS, columns))
label = features.pop(LABEL_COLUMN)
return features, label
# Create list of file names that match "glob" pattern (i.e. data_file_*.csv)
filenames_dataset = tf.data.Dataset.list_files(filename)
# Read lines from text files
textlines_dataset = filenames_dataset.flat_map(tf.data.TextLineDataset)
# Parse text lines as comma-separated values (CSV)
dataset = textlines_dataset.map(decode_csv)
if mode == tf.estimator.ModeKeys.TRAIN:
num_epochs = None # indefinitely
dataset = dataset.shuffle(buffer_size = 10 * batch_size)
else:
num_epochs = 1 # end-of-input after this
dataset = dataset.repeat(num_epochs).batch(batch_size)
return dataset
return fn
That is actually reflective of code in the video recap that accompanies this notebook, and very similar to my own attempts before I saw that recap. It is also similar to the next notebook, but that code also unfortunately fails.
With the above code, I am getting this error:
UnimplementedError: Cast string to float is not supported
[[node linear/head/ToFloat (defined at /usr/local/lib/python3.5/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
So, I'm not sure how to transform the data to match the datatype.. I cannot cast the data in decode_csv like:
features = {CSV_COLUMNS[i]: float(cols[i]) for i in range(1, len(CSV_COLUMNS) - 1)}
because the error is happening the line before that is called.
Investigating the data I note:
import csv
with open('./taxi-train.csv') as f:
reader = csv.reader(f)
print(next(reader))
['12.0', '-73.987625', '40.750617', '-73.971163', '40.78518', '1', '0']
that looks like the raw data might actually be a string .. am I correct? How can I solve this?
edit : I have located the csv file, it is not raw string data. Why is the tensorflow import bringing it in as text??
The training-data-analyst repository you mentioned, also has the solutions to all the notebooks.
From analysing the provided solution it looks like the def fn() part is reduntant. the read_dataset function should simply return a tf.Data.dataset:
def read_dataset(filename, mode, batch_size = 512):
def decode_csv(row):
columns = tf.decode_csv(row, record_defaults = DEFAULTS)
features = dict(zip(CSV_COLUMNS, columns))
features.pop('key') # discard, not a real feature
label = features.pop('fare_amount') # remove label from features and store
return features, label
# Create list of file names that match "glob" pattern (i.e. data_file_*.csv)
filenames_dataset = tf.data.Dataset.list_files(filename, shuffle=False)
# Read lines from text files
textlines_dataset = filenames_dataset.flat_map(tf.data.TextLineDataset)
# Parse text lines as comma-separated values (CSV)
dataset = textlines_dataset.map(decode_csv)
# Note:
# use tf.data.Dataset.flat_map to apply one to many transformations (here: filename -> text lines)
# use tf.data.Dataset.map to apply one to one transformations (here: text line -> feature list)
if mode == tf.estimator.ModeKeys.TRAIN:
num_epochs = None # loop indefinitely
dataset = dataset.shuffle(buffer_size = 10 * batch_size, seed=2)
else:
num_epochs = 1 # end-of-input after this
dataset = dataset.repeat(num_epochs).batch(batch_size)
return dataset
The solutions are located i the same directory as labs. So for example the solution for
training-data-analyst/courses/machine_learning/deepdive/03_tensorflow/labs/c_dataset.ipynb
is located at
training-data-analyst/courses/machine_learning/deepdive/03_tensorflow/c_dataset.ipynb

How to read in multiple documents with same code?

So I have a couple of documents, of which each has a x and y coordinate (among other stuff). I wrote some code which is able to filter out said x and y coordinates and store them into float variables.
Now Ideally I'd want to find a way to run the same code on all documents I have (number not fixed, but let's say 3 for now), extract x and y coordinates of each document and calculate an average of these 3 x-values and 3 y-values.
How would I approach this? Never done before.
I successfully created the code to extract the relevant data from 1 file.
Also note: In reality each file has more than just 1 set of x and y coordinates but this does not matter for the problem discussed at hand.
I'm just saying that so that the code does not confuse you.
with open('TestData.txt', 'r' ) as f:
full_array = f.readlines()
del full_array[1:31]
del full_array[len(full_array)-4:len(full_array)]
single_line = full_array[1].split(", ")
x_coord = float(single_line[0].replace("1 Location: ",""))
y_coord = float(single_line[1])
size = float(single_line[3].replace("Size: ",""))
#Remove unecessary stuff
category= single_line[6].replace(" Type: Class: 1D Descr: None","")
In the end I'd like to not have to write the same code for each file another time, especially since the amount of files may vary. Now I have 3 files which equals to 3 sets of coordinates. But on another day I might have 5 for example.
Use os.walk to find the files that you want. Then for each file do you calculation.
https://docs.python.org/2/library/os.html#os.walk
First of all create a method to read a file via it's file name and do the parsing in your way. Now iterate through the directory,I guess files are in the same directory.
Here is the basic code:
import os
def readFile(filename):
try:
with open(filename, 'r') as file:
data = file.read()
return data
except:
return ""
for filename in os.listdir('C:\\Users\\UserName\\Documents'):
#print(filename)
data=readFile( filename)
print(data)
#parse here
#do the calculation here

How to write .csv file in Python?

I am running the following: output.to_csv("hi.csv") where output is a pandas dataframe.
My variables all have values but when I run this in iPython, no file is created. What should I do?
Better give the complete path for your output csv file. May be that you are checking in a wrong folder.
You have to make sure that your 'to_csv' method of 'output' object has a write-file function implemented.
And there is a lib for csv manipulation in python, so you dont need to handle all the work:
https://docs.python.org/2/library/csv.html
I'm not sure if this will be useful to you, but I write to CSV files frequenly in python. Here is an example generating random vectors (X, V, Z) values and writing them to a CSV, using the CSV module. (The paths are os paths are for OSX but you should get the idea even on a different os.
Working Writing Python to CSV example
import os, csv, random
# Generates random vectors and writes them to a CSV file
WriteFile = True # Write CSV file if true - useful for testing
CSVFileName = "DataOutput.csv"
CSVfile = open(os.path.join('/Users/Si/Desktop/', CSVFileName), 'w')
def genlist():
# Generates a list of random vectors
global v, ListLength
ListLength = 25 #Amount of vectors to be produced
Max = 100 #Maximum range value
x = [] #Empty x vector list
y = [] #Empty y vector list
z = [] #Empty x vector list
v = [] #Empty xyz vector list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max)) #Generate random number
x.append(rnd) #Add it to x list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max))
y.append(rnd) #Add it to y list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max)) #Generate random number
z.append(rnd) #Add it to z list
for i in xrange (ListLength):
merge = x[i], y[i],z[i] # Merge x[i], y[i], x[i]
v.append(merge) #Add merged list into v list
def writeCSV():
# Write Vectors to CSV file
wr = csv.writer(CSVfile, quoting = csv.QUOTE_MINIMAL, dialect='excel')
wr.writerow(('Point Number', 'X Vector', 'Y Vector', 'Z Vector'))
for i in xrange (ListLength):
wr.writerow((i+1, v[i][0], v[i][1], v[i][2]))
print "Data written to", CSVfile
genlist()
if WriteFile is True:
writeCSV()
Hopefully there is something useful in here for you!

Categories