gdal WriteArray() crashes python without a stack trace - python

I'm trying to write an array to a geotiff using gdal. Each row of the array is identical, and I used np.broadcast_to to create the array.
When I try to write it, the I get a windows popup saying "Python has stopped working: A problem cause the program to stop working correctly. Please close the program"
This approximates the steps I'm taking:
import gdal
import numpy as np
driver = gdal.GetDriverByName('GTiff')
outRaster = driver.Create("C:/raster.tif", 1000, 1000, 1, 6)
band = outRaster.GetRasterBand(1)
# Create array
a = np.arange(0,1000, dtype='float32')
a1 = np.broadcast_to(a, (1000,1000))
# try writing
band.WriteArray(a1) # crash

The problem is that the input array created by broadcast_to isn't contiguous on disk. As described in the numpy documentation, more than one element array may point to the same memory address. This causes problems in gdal.
Instead of using broadcast_to, use something that stores each element as its own place in memory.
As an illustrative example, see the following code:
import gdal
import numpy as np
import sys
driver = gdal.GetDriverByName('GTiff')
outRaster = driver.Create("C:/raster.tif", 1000, 1000, 1, 6)
band = outRaster.GetRasterBand(1)
# Create 1000 x 1000 array two different ways
a = np.arange(0,1000, dtype='float32')
a1 = a[np.newaxis, :]
a1 = a1.repeat(1000, axis=0)
a2 = np.broadcast_to(a, (1000,1000))
# examine size of objects
sys.getsizeof(a1) # 4000112
sys.getsizeof(a2) # 112
# try writing
band.WriteArray(a1) # writes fine
band.WriteArray(a2) # crash

Related

Coordinate offsets in xarray and dask

I'm making use of xarray as the coordinates and automatic alignment are really useful. and I've been using Dask as the data I'm generally dealing with datasets in the order of terabytes.
I have a 3D source array generated (or loaded) and dependent on wavelength (wl) and x position and y position at origin zero.
I also have a 2D output array dependant only on x and y which accumulates all of the wavelengths from the source array. Idealistically the output would be:
output = source.sum('wl')
However, the wavelength dependence means that each wavelength offsets the source origin by a certain amount. The best (and ugliest) solution I could come up with is to loop through each wavelength, reassign coordinates, interp up to the output coordinates, stack them into a new array and then sum.
I have an example code that shows what I'm trying to do:
from dask.distributed import Client
import xarray as xr
import dask.array as da
import numpy as np
client = Client(n_workers=2, threads_per_worker=2, memory_limit='2GB')
client
# Generate some offset data here
wavelengths = np.linspace(0.1,10,1000)
x_offsets = np.linspace(100,400,1000)
y_offsets = np.linspace(100,400,1000)
# Coordinate offsets for each wavelength
offset = xr.Dataset(
{
'x': (['wl'],x_offsets),
'y': (['wl'],y_offsets)
},
coords={
'wl': wavelengths
})
# Our example source function
source_shape = (1000, 10000, 10000,)
wl_source = np.linspace(0.4,5,source_shape[0])
x_source = np.linspace(-6,6, source_shape[1])
y_source = np.linspace(-6,6, source_shape[2])
source = xr.DataArray(da.random.random(size=source_shape, chunks=(10,400,400)),
coords=[wl_source,x_source,y_source],
dims=['wl','x','y'])
out_shape = (10000, 10000,)
# Our final output array
x_out = np.linspace(-1000,1000,out_shape[0])
y_out = np.linspace(-1000,1000,out_shape[1])
out = xr.DataArray(da.random.random(size=out_shape, chunks=(4000,4000)),coords=[x_out, y_out], dims=['x','y'])
accum =[]
for wl in source.wl:
# Build our map from source -> output space
x_map = offset.interp(wl=wl).x + source.x
y_map = offset.interp(wl=wl).y + source.y
# Remap coordinates
source_mapped = source.sel(wl=wl).assign_coords({'x':x_map,
'y':y_map})
# Interp_like unchunks it so need to rechunk it here
# Interp up to the output coordinates
accum.append(
source_mapped.interp_like(out, method='nearest',kwargs={'fill_value':0}).chunk({'x':4000,'y':4000})
)
# Accumalate and add to the output
out += xr.concat(accum,dim='wl').sum('wl')
out
This solution ends up with over 1 million tasks, because of that the building of the task graph takes a long time and during computation the gc collection takes a long time, memory is exhausted or I spill so much to disk that I run out of storage. Manually slicing has the same issue.
Additionally, this can't scale if I have more than one source as well. I've been racking my brain trying to figure out a better solution.
I'm wondering if theres a more efficient way of doing this? Either through, dask, xarray or some other library. I'm fairly new to dask and xarray so I'm still trying to get to grips with how they work and how to better chunk and distribute tasks
Sorry for the long winded question!

Python3.8: The last output file is not stored properly on disk

I have a global dataset at about 300m resolution in tif. I want to upscale it to 9km resolution (below you see my code). I decided to do upscaling piecewise due to high resolution data and large computing time. So I divided the whole global data into 10 pieces, do upscaling and store each piece separately in a tif file again. NOW my problem pops up: the last piece of global data is NOT saved completely on the disk. Each piece of map should be 2M but piece#10 is 1.7M. And the strange thing is that after running my script twice, that piece#10 will be completed and it will change from 1.7M to 2M. But the current piece10 is again not complete.
import numpy as np
from osgeo import gdal
from osgeo import osr
from osgeo.gdalconst import *
import pandas as pd
#
#%%
#-----converting--------#
df_new = pd.read_excel("input_attribute_table.xlsx",sheet_name='Global_data')
listvar = ['var1']
number = df_new['data_number'][:]
##The size of global array is 129599 x 51704. The pieces should be square
xoff = np.array([0, 25852.00, 51704.00, 77556.00, 103408.00])
yoff = np.array([0, 25852.00])
xcount = 25852
ycount = 25852
o = 1
for q in range(len(yoff)):
for p in range(len(xoff)):
src = gdal.Open('Global_database.tif')
ds_xform = src.GetGeoTransform()
ds_driver = gdal.GetDriverByName('Gtiff')
srs = osr.SpatialReference()
srs.ImportFromEPSG(4326)
data =src.GetRasterBand(1).ReadAsArray(xoff[p],yoff[q],xcount,ycount).astype(np.float32)
Var = np.zeros(data.shape, dtype=np.float32)
Variable_load = df_new[listvar[0]][:]
for m in range(len(number)):
Var[data==number[m]] = Variable_load[m]
#-------rescaling-----------#
Var[np.where(np.isnan(Var))]=0
ds_driver = gdal.GetDriverByName('Gtiff')
srs = osr.SpatialReference()
srs.ImportFromEPSG(4326)
sz = Var.itemsize
h,w = Var.shape
bh, bw = 36, 36
shape = (h/bh, w/bw, bh, bw)
shape2 = (int(shape[0]),int(shape[1]),shape[2],shape[3])
strides = sz*np.array([w*bh,bw,w,1])
blocks = np.lib.stride_tricks.as_strided(Var,shape=shape2,strides=strides)
resized_array=ds_driver.Create(str(listvar[0])+'_resized_to_9km_glob_piece'+str(o)+'.tif',shape2[1],shape2[0],1,gdal.GDT_Float32) resized_array.SetGeoTransform((ds_xform[0],ds_xform[1]*bw,ds_xform[2],ds_xform[3],ds_xform[4],ds_xform[5]*bh))
resized_array.SetProjection(srs.ExportToWkt())
band = resized_array.GetRasterBand(1)
zero_array = np.zeros([shape2[0],shape2[1]], dtype=np.float32)
for z in range(len(blocks)):
for k in range(len(blocks)):
zero_array[z][k] = np.mean(blocks[z][k])
band.WriteArray(zero_array)
band.FlushCache()
band = None
del zero_array
del Var
o=o+1
Normally, you should either be sure to call close on a file, or use the with statement. However, it looks like neither of those is supported by gdal.
Instead, you're expected to remove all references to the file. You're already setting band = None, but you also need to set src = None.
This is a bad, non-Pythonic interface, but that's apparently what the Python gdal library does. In addition to being a weird gotcha in its own right, it also interacts poorly with exceptions; any unhandled exceptions may also result in the file not being saved (or being partly saved, or being corrupted).
For the immediate problem, though, adding src = None or del src should do the trick.
PS (from comments): Another option would be to move the body of the for loop into a function; that will automatically delete all the variables without you having to list them all and potentially miss one. It'll still have problems if there's an exception, but at least the normal case should start working...

Efficiently using 1-D pyfftw on small slices of a 3-D numpy array

I have a 3D data cube of values of size on the order of 10,000x512x512. I want to parse a window of vectors (say 6) along dim[0] repeatedly and generate the fourier transforms efficiently. I think I'm doing an array copy into the pyfftw package and it's giving me massive overhead. I'm going over the documentation now since I think there is an option I need to set, but I could use some extra help on the syntax.
This code was originally written by another person with numpy.fft.rfft and accelerated with numba. But the implementation wasn't working on my workstation so I re-wrote everything and opted to go for pyfftw instead.
import numpy as np
import pyfftw as ftw
from tkinter import simpledialog
from math import ceil
import multiprocessing
ftw.config.NUM_THREADS = multiprocessing.cpu_count()
ftw.interfaces.cache.enable()
def runme():
# normally I would load a file, but for Stack Overflow, I'm just going to generate a 3D data cube so I'll delete references to the binary saving/loading functions:
# load the file
dataChunk = np.random.random((1000,512,512))
numFrames = dataChunk.shape[0]
# select the window size
windowSize = int(simpledialog.askstring('Window Size',
'How many frames to demodulate a single time point?'))
numChannels = windowSize//2+1
# create fftw arrays
ftwIn = ftw.empty_aligned(windowSize, dtype='complex128')
ftwOut = ftw.empty_aligned(windowSize, dtype='complex128')
fftObject = ftw.FFTW(ftwIn,ftwOut)
# perform DFT on the data chunk
demodFrames = dataChunk.shape[0]//windowSize
channelChunks = np.zeros([numChannels,demodFrames,
dataChunk.shape[1],dataChunk.shape[2]])
channelChunks = getDFT(dataChunk,channelChunks,
ftwIn,ftwOut,fftObject,windowSize,numChannels)
return channelChunks
def getDFT(data,channelOut,ftwIn,ftwOut,fftObject,
windowSize,numChannels):
frameLen = data.shape[0]
demodFrames = frameLen//windowSize
for yy in range(data.shape[1]):
for xx in range(data.shape[2]):
index = 0
for i in range(0,frameLen-windowSize+1,windowSize):
ftwIn[:] = data[i:i+windowSize,yy,xx]
fftObject()
channelOut[:,index,yy,xx] = 2*np.abs(ftwOut[:numChannels])/windowSize
index+=1
return channelOut
if __name__ == '__main__':
runme()
What happens is I get a 4D array; the variable channelChunks. I am saving out each channel to a binary (not included in the code above, but the saving part works fine).
This process is for a demodulation project we have, the 4D data cube channelChunks is then parsed into eval(numChannel) 3D data cubes (movies) and from that we are able to separate a movie by color given our experimental set up. I was hoping I could circumvent writing a C++ function that calls the fft on the matrix via pyfftw.
Effectively, I am taking windowSize=6 elements along the 0 axis of dataChunk at a given index of 1 and 2 axis and performing a 1D FFT. I need to do this throughout the entire 3D volume of dataChunk to generate the demodulated movies. Thanks.
The FFTW advanced plans can be automatically built by pyfftw.
The code could be modified in the following way:
Real to complex transforms can be used instead of complex to complex transform.
Using pyfftw, it typically writes:
ftwIn = ftw.empty_aligned(windowSize, dtype='float64')
ftwOut = ftw.empty_aligned(windowSize//2+1, dtype='complex128')
fftObject = ftw.FFTW(ftwIn,ftwOut)
Add a few flags to the FFTW planner. For instance, FFTW_MEASURE will time different algorithms and pick the best. FFTW_DESTROY_INPUT signals that the input array can be modified: some implementations tricks can be used.
fftObject = ftw.FFTW(ftwIn,ftwOut, flags=('FFTW_MEASURE','FFTW_DESTROY_INPUT',))
Limit the number of divisions. A division costs more than a multiplication.
scale=1.0/windowSize
for ...
for ...
2*np.abs(ftwOut[:,:,:])*scale #instead of /windowSize
Avoid multiple for loops by making use of FFTW advanced plan through pyfftw.
nbwindow=numFrames//windowSize
# create fftw arrays
ftwIn = ftw.empty_aligned((nbwindow,windowSize,dataChunk.shape[2]), dtype='float64')
ftwOut = ftw.empty_aligned((nbwindow,windowSize//2+1,dataChunk.shape[2]), dtype='complex128')
fftObject = ftw.FFTW(ftwIn,ftwOut, axes=(1,), flags=('FFTW_MEASURE','FFTW_DESTROY_INPUT',))
...
for yy in range(data.shape[1]):
ftwIn[:] = np.reshape(data[0:nbwindow*windowSize,yy,:],(nbwindow,windowSize,data.shape[2]),order='C')
fftObject()
channelOut[:,:,yy,:]=np.transpose(2*np.abs(ftwOut[:,:,:])*scale, (1,0,2))
Here is the modifed code. I also, decreased the number of frame to 100, set the seed of the random generator to check that the outcome is not modifed and commented tkinter. The size of the window can be set to a power of two, or a number made by multiplying 2,3,5 or 7, so that the Cooley-Tuckey algorithm can be efficiently applied. Avoid large prime numbers.
import numpy as np
import pyfftw as ftw
#from tkinter import simpledialog
from math import ceil
import multiprocessing
import time
ftw.config.NUM_THREADS = multiprocessing.cpu_count()
ftw.interfaces.cache.enable()
ftw.config.PLANNER_EFFORT = 'FFTW_MEASURE'
def runme():
# normally I would load a file, but for Stack Overflow, I'm just going to generate a 3D data cube so I'll delete references to the binary saving/loading functions:
# load the file
np.random.seed(seed=42)
dataChunk = np.random.random((100,512,512))
numFrames = dataChunk.shape[0]
# select the window size
#windowSize = int(simpledialog.askstring('Window Size',
# 'How many frames to demodulate a single time point?'))
windowSize=32
numChannels = windowSize//2+1
nbwindow=numFrames//windowSize
# create fftw arrays
ftwIn = ftw.empty_aligned((nbwindow,windowSize,dataChunk.shape[2]), dtype='float64')
ftwOut = ftw.empty_aligned((nbwindow,windowSize//2+1,dataChunk.shape[2]), dtype='complex128')
#ftwIn = ftw.empty_aligned(windowSize, dtype='complex128')
#ftwOut = ftw.empty_aligned(windowSize, dtype='complex128')
fftObject = ftw.FFTW(ftwIn,ftwOut, axes=(1,), flags=('FFTW_MEASURE','FFTW_DESTROY_INPUT',))
# perform DFT on the data chunk
demodFrames = dataChunk.shape[0]//windowSize
channelChunks = np.zeros([numChannels,demodFrames,
dataChunk.shape[1],dataChunk.shape[2]])
channelChunks = getDFT(dataChunk,channelChunks,
ftwIn,ftwOut,fftObject,windowSize,numChannels)
return channelChunks
def getDFT(data,channelOut,ftwIn,ftwOut,fftObject,
windowSize,numChannels):
frameLen = data.shape[0]
demodFrames = frameLen//windowSize
printed=0
nbwindow=data.shape[0]//windowSize
scale=1.0/windowSize
for yy in range(data.shape[1]):
#for xx in range(data.shape[2]):
index = 0
ftwIn[:] = np.reshape(data[0:nbwindow*windowSize,yy,:],(nbwindow,windowSize,data.shape[2]),order='C')
fftObject()
channelOut[:,:,yy,:]=np.transpose(2*np.abs(ftwOut[:,:,:])*scale, (1,0,2))
#for i in range(nbwindow):
#channelOut[:,i,yy,xx] = 2*np.abs(ftwOut[i,:])*scale
if printed==0:
for j in range(channelOut.shape[0]):
print j,channelOut[j,0,yy,0]
printed=1
return channelOut
if __name__ == '__main__':
seconds=time.time()
runme()
print "time: ", time.time()-seconds
Let us know how much it speeds up your computations! I went from 24s to less than 2s on my computer...

Python/Shogun Toolbox: Convert RealFeatures to StreamingRealFeatures

I am using the Python version of the Shogun Toolbox.
I want to use the LinearTimeMMD, which accepts data under the streaming interface CStreamingFeatures. I have the data in the form of two RealFeatures objects: feat_p and feat_q. These work just fine with the QuadraticTimeMMD.
In order to use it with the LinearTimeMMD, I need to create StreamingFeatures objects from these - In this case, these would be StreamingRealFeatures, as far as I know.
My first approach was using this:
gen_p, gen_q = StreamingRealFeatures(feat_p), StreamingRealFeatures(feat_q)
This however does not seem to work: The LinearTimeMMD delivers warnings and an unrealistic result (growing constantly with the number of samples) and calling gen_p.get_dim_feature_space() returns -1. Also, if I try calling gen_p.get_streamed_features(100) this results in a Memory Access Error.
I tried another approach using StreamingFileFromFeatures:
streamFile_p = sg.StreamingFileFromRealFeatures()
streamFile_p.set_features(feat_p)
streamFile_q = sg.StreamingFileFromRealFeatures()
streamFile_q.set_features(feat_q)
gen_p = StreamingRealFeatures(streamFile_p, False, 100)
gen_q = StreamingRealFeatures(streamFile_q, False, 100)
But this results in the same situation with the same described problems.
It seems that in both cases, the contents of the RealFeatures object handed to the StreamingRealFeatures object cannot be accessed.
What am I doing wrong?
EDIT: I was asked for a small working example to show the error:
import os
SHOGUN_DATA_DIR=os.getenv('SHOGUN_DATA_DIR', '../../../data')
import shogun as sg
from shogun import StreamingRealFeatures
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import laplace, norm
def sample_gaussian_vs_laplace(n=220, mu=0.0, sigma2=1, b=np.sqrt(0.5)):
# sample from both distributions
X=norm.rvs(size=n)*np.sqrt(sigma2)+mu
Y=laplace.rvs(size=n, loc=mu, scale=b)
return X,Y
# Main Script
mu=0.0
sigma2=1
b=np.sqrt(0.5)
n=220
X,Y=sample_gaussian_vs_laplace(n, mu, sigma2, b)
# turn data into Shogun representation (columns vectors)
feat_p=sg.RealFeatures(X.reshape(1,len(X)))
feat_q=sg.RealFeatures(Y.reshape(1,len(Y)))
gen_p, gen_q = StreamingRealFeatures(feat_p), StreamingRealFeatures(feat_q)
print("Dimensions: ", gen_p.get_dim_feature_space())
print("Number of features: ", gen_p.get_num_features())
print("Number of vectors: ", gen_p.get_num_vectors())
test_features = gen_p.get_streamed_features(1)
print("success")
EDIT 2: The Output of the working example:
Dimensions: -1
Number of features: -1
Number of vectors: 1
Speicherzugriffsfehler (Speicherabzug geschrieben)
EDIT 3: Additional Code with LinearTimeMMD using the RealFeatures directly.
mmd = sg.LinearTimeMMD()
kernel = sg.GaussianKernel(10, 1)
mmd.set_kernel(kernel)
mmd.set_p(feat_p)
mmd.set_q(feat_q)
mmd.set_num_samples_p(1000)
mmd.set_num_samples_q(1000)
alpha = 0.05
# Code taken from notebook example on
# http://www.shogun-toolbox.org/notebook/latest/mmd_two_sample_testing.html
# Location on page: In[16]
block_size=100
mmd.set_num_blocks_per_burst(block_size)
# compute an unbiased estimate in linear time
statistic=mmd.compute_statistic()
print("MMD_l[X,Y]^2=%.2f" % statistic)
EDIT 4: Additional code sample showing the growing mmd problem:
import os
SHOGUN_DATA_DIR=os.getenv('SHOGUN_DATA_DIR', '../../../data')
import shogun as sg
from shogun import StreamingRealFeatures
import numpy as np
from matplotlib import pyplot as plt
def mmd(n):
X = [(1.0,i) for i in range(n)]
Y = [(2.0,i) for i in range(n)]
X = np.array(X)
Y = np.array(Y)
# turn data into Shogun representation (columns vectors)
feat_p=sg.RealFeatures(X.reshape(2, len(X)))
feat_q=sg.RealFeatures(Y.reshape(2, len(Y)))
mmd = sg.LinearTimeMMD()
kernel = sg.GaussianKernel(10, 1)
mmd.set_kernel(kernel)
mmd.set_p(feat_p)
mmd.set_q(feat_q)
mmd.set_num_samples_p(100)
mmd.set_num_samples_q(100)
alpha = 0.05
block_size=100
mmd.set_num_blocks_per_burst(block_size)
# compute an unbiased estimate in linear time
statistic=mmd.compute_statistic()
print("N =", n)
print("MMD_l[X,Y]^2=%.2f" % statistic)
print()
for n in [1000, 10000, 15000, 20000, 25000, 30000]:
mmd(n)
Output:
N = 1000
MMD_l[X,Y]^2=-12.69
N = 10000
MMD_l[X,Y]^2=-40.14
N = 15000
MMD_l[X,Y]^2=-49.16
N = 20000
MMD_l[X,Y]^2=-56.77
N = 25000
MMD_l[X,Y]^2=-63.47
N = 30000
MMD_l[X,Y]^2=-69.52
For some reason, the pythonenv in my machine is broken. So, I couldn't give a snippet in Python. But let me point to a working example in C++ which attempts to address the issues (https://gist.github.com/lambday/983830beb0afeb38b9447fd91a143e67).
I think the easiest way is to create a StreamingRealFeatures instance directly from RealFeatures instance (like you tried the first time). Check test1() and test2() methods in the gist which shows the equivalence of using RealFeatures and StreamingRealFeatures in the use-case in question. The reason you were getting weird results when streaming directly is that in order to start the streaming process we need to call the start_parser method in the StreamingRealFeatures class. We handle these technicalities internally inside MMD classes. But when trying to use it directly, we need to invoke that separately (See test3() method in my attached example).
Please note that the compute_statistic() method doesn't return MMD directly, but rather returns \frac{n_x\times n_y}{n_x+n_y}\times MMD^2 (as mentioned in the doc http://shogun.ml/api/latest/classshogun_1_1CMMD.html). With that in mind, maybe the results you are getting for varying number of samples make sense.
Hope it helps.

Python garbage collector for large data formatting program

I have written a program to read through a folder of excel files and load each file into the program. It then takes the data and creates an array of zeros of size (3001,2001), which will be iterated through and the corresponding coordinate values from excel will put changed to ones. The array is then reshaped to a size of (1,6005001). I am using tensorflow to reshape the array since the program considers it a tuple, but the final values are stored in a numpy array. I finally store the final formatted array into a csv file named "filename_Array.csv" and the program moves on to the next excel file to be formatted. I am running Python on Eclipse with tensorflow installed
The issue I am running into is that some values are being cached in memory, but I can not figure out what it is. I have tried explicitly deleting large variables that will be reinitialized and having gc.collect() to clean the inactive memory that is stored. I am still seeing a steady increase in memory usage until around 25 files formatted, then the computer begins freezing up as all of the RAM on my pc (12GB) is being used. I know that python automatically clears memory for values that are completely unreachable by the program, so I am not sure if this is an issue with fragmenting of RAM or something else.
Sorry for the walls of text, I am just trying to give as much info to the problem as possible.
Here is a link to a screenshot of my performance tab while running the program through about 24 files before I had to terminate the program due to the computer freezing.
Here is my code:
from __future__ import print_function
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
import numpy as np
import csv
import gc
path = r'C:\Users\jeremy.desforges\Desktop\Eclipse\NN_MNIST\VAM SLIJ-II 4.500'
def create_array(g,h,trainingdata,filename):
# Multiplying by factors of 10 to keep precision of data
g = g*1000
h = h*1
max_g = 3000
max_h = 2000
# Initializes an array with zeros to represent a blank graph
image = np.zeros((max_g+1,max_h+1),dtype=np.int)
shape = ((max_g+1)*(max_h+1))
# Fills the blank graph with the input data points
for i in range(len(h)):
image[g[i].astype('int'),h[i].astype('int')] = 1
trainingdata.close()
image = tf.reshape(image,[-1,shape])
# Converts tensor objects to numpy arrays to feed into network
sess = tf.InteractiveSession()
image = sess.run(image)
np.savetxt((filename + "_Array.csv"), np.flip(image,1).astype(int), fmt = '%i' ,delimiter=",")
print(filename, "appended")
print("size",image.shape)
print(image,"= output array")
del image,shape,g,h,filename,sess
return
# Initializing variables
image = []
shape = 1
g = 1.0
h = 1.0
f = 1
specials = '.csv'
folder = os.listdir(path)
for filename in folder:
trainingdata = open(filename, "r+")
filename = str(filename.replace(specials, ''))
data_read = csv.reader(trainingdata)
for row in data_read:
in1 = float(row[0])
in2 = float(row[1])
if (f==0):
z_ = np.array([in1])
g = np.hstack((g,z_))
q = np.array([in2])
h = np.hstack((h,q))
if (f == 1):
g = np.array([in1])
h = np.array([in2])
f = 0
create_array(g,h,trainingdata,filename)
gc.collect()
image = []
shape = 1
g = 1.0
h = 1.0
f = 1

Categories