For-loop over multiple files in same directory in Python - python

So I already tried to check other questions here about (almost) the same topic, however I did not find something that solves my problem.
Basically, I have a piece of code in Python that tries to open the file as a data frame and execute some eye tracking functions (PyGaze). I have 1000 files that I need to analyse and wanted to create a for-loop to execute my code on all the files automatically.
The code is the following:
os.chdir("/Users/Documents//Analyse/Eye movements/Python - Eye Analyse")
directory = '/Users/Documents/Analyse/Eye movements/R - Filtering Data/Filtered_data/Filtered_data_test'
for files in glob.glob(os.path.join(directory,"*.csv")):
#Downloas csv, plot
df = pd.read_csv(files, parse_dates = True)
#Plot raw data
plt.plot(df['eye_x'],df['eye_y'], 'ro', c="red")
plt.ylim([0,1080])
plt.xlim([0,1920])
#Fixation analysis
from detectors import fixation_detection
fixations_data = fixation_detection(df['eye_x'],df['eye_y'], df['time'],maxdist=25, mindur=100)
Efix_data = fixations_data[1]
numb_fixations = len(Efix_data) #number of fixations
fixation_start = [i[0] for i in Efix_data]
fixation_stop = [i[1] for i in Efix_data]
fixation = {'start' : fixation_start, 'stop': fixation_stop}
fixation_frame = pd.DataFrame(data=fixation)
fixation_frame['difference'] = fixation_frame['stop'] - fixation_frame['start']
mean_fixation_time = fixation_frame['difference'].mean() #mean fixation time
final = {'number_fixations' : [numb_fixations], 'mean_fixation_time': [mean_fixation_time]}
final_frame = pd.DataFrame(data=final)
#write everything in one document
final_frame.to_csv("/Users/Documents/Analyse/Eye movements/final_data.csv")
The code is running (no errors), however : it only runs for the first file. The code is not ran for the other files present in the folder/directory.
I do not see where my mistake is?

Your output file name is constant, so it gets overwritten with each iteration of the for loop. Try the following instead of your final line, which opens the file in "append" mode instead:
#write everything in one document
with open("/Users/Documents/Analyse/Eye movements/final_data.csv", "a") as f:
final_frame.to_csv(f, header=False)

Related

Create multiple files with same programm but different results

so Im having a problem with my code:
for i in range(10):
a = ([random.randint(1, 1000) for j in range(10)])
print(*a, sep = ".jpg, ", end = ".jpg, \n")
f = open("test.cvs", "x")
f.close()
my current output looks like this:
689.jpg, 715.jpg, 772.jpg, 639.jpg, 903.jpg, 264.jpg, 226.jpg, 629.jpg, 306.jpg,
758.jpg,
458.jpg, 355.jpg, 262.jpg, 889.jpg, 244.jpg, 849.jpg, 613.jpg, 439.jpg, 646.jpg,
766.jpg,
481.jpg, 954.jpg, 192.jpg, 742.jpg, 598.jpg, 373.jpg, 522.jpg, 685.jpg, 404.jpg,
164.jpg,
12.jpg, 202.jpg, 600.jpg, 365.jpg, 635.jpg, 938.jpg, 189.jpg, 492.jpg, 871.jpg,
611.jpg,
67.jpg, 256.jpg, 102.jpg, 587.jpg, 637.jpg, 759.jpg, 252.jpg, 175.jpg, 561.jpg,
965.jpg,
470.jpg, 744.jpg, 897.jpg, 367.jpg, 765.jpg, 455.jpg, 848.jpg, 258.jpg, 615.jpg,
910.jpg,
111.jpg, 344.jpg, 605.jpg, 292.jpg, 511.jpg, 548.jpg, 452.jpg, 836.jpg, 285.jpg,
152.jpg,
582.jpg, 716.jpg, 33.jpg, 387.jpg, 335.jpg, 855.jpg, 487.jpg, 57.jpg, 668.jpg,
41.jpg,
765.jpg, 424.jpg, 196.jpg, 124.jpg, 898.jpg, 549.jpg, 590.jpg, 42.jpg, 944.jpg,
462.jpg,
682.jpg, 728.jpg, 145.jpg, 206.jpg, 246.jpg, 734.jpg, 519.jpg, 618.jpg, 903.jpg,
662.jpg
which is perfect, i want it exactly like that, but i want to create a file for every output. It cant be the same result in multiple files. Is it possible to let the numbers generate then put the output inside a file and then again let the numbers generate again and put it in a new file.
The goal is to create .cvs files
Thank you in advance.

How do you compile variables and still use them individually?

TL;DR - trying to clean this up but unsure of the best practice for compiling a list of variables and still separating them on individual lines on the .txt file they're being copied to.
This is my first post here.
I've recently created a script to automate an extremely tedious process at work that involves modifying an excel document, copying and pasting outputs from specifics cells depending on the type of configuration we are generating and pasting into 3 separate .txt files to send out via email.
I've got the script functioning, but I hate how my code looks and to be honest, it is quite the pain to try to make additions to.
I'm using openpyxl & pycel for this, as the cells I copy are outputs from a formula that I couldn't seem to get anything except for #N/A when strictly using openpyxl so I integrated pycel for that piece.
I've referenced my code below, & I appreciate any input.
F62 = format(excel.evaluate('Config!F62'))
F63 = format(excel.evaluate('Config!F63'))
F64 = format(excel.evaluate('Config!F64'))
F65 = format(excel.evaluate('Config!F65'))
F66 = format(excel.evaluate('Config!F66'))
F67 = format(excel.evaluate('Config!F67'))
F68 = format(excel.evaluate('Config!F68'))
F69 = format(excel.evaluate('Config!F69'))
F70 = format(excel.evaluate('Config!F70'))
F71 = format(excel.evaluate('Config!F71'))
F72 = format(excel.evaluate('Config!F72'))
F73 = format(excel.evaluate('Config!F73'))
F74 = format(excel.evaluate('Config!F74'))
F75 = format(excel.evaluate('Config!F75'))
F76 = format(excel.evaluate('Config!F76'))
F77 = format(excel.evaluate('Config!F77'))
#so on and so forth to put into:
with open(f'./GRAK-R-{KIT}/3_GRAK-R-{KIT}_FULL.txt', 'r') as basedone:
linetest = f"{F62}\n{F63}\n{F64}\n{F65}\n{F66}\n{F67}\n{F68}\n{F69}\n{F70}\n{F71}\n{F72}\n{F73}\n{F74}\n{F75}\n{F76}\n{F77}\n{F78}\n{F79}\n{F80}\n{F81}\n{F82}\n{F83}\n{F84}\n{F85}\n{F86}\n{F87}\n{F88}\n{F89}\n{F90}\n{F91}\n{F92}\n{F93}\n{F94}\n{F95}\n{F96}\n{F97}\n{F98}\n{F99}\n{F100}\n{F101}\n{F102}\n{F103}\n{F104}\n{F105}\n{F106}\n{F107}\n{F108}\n{F109}\n{F110}\n{F111}\n{F112}\n{F113}\n{F114}\n{F115}\n{F116}\n{F117}\n{F118}\n{F119}\n{F120}\n{F121}\n{F122}\n{F123}\n{F124}\n{F125}\n{F126}\n{F127}\n{F128}\n{F129}\n{F130}\n{F131}\n{F132}\n{F133}\n{F134}\n{F135}\n{F136}\n{F137}\n{F138}\n{F139}\n{F140}\n{F141}\n{F142}\n{F143}\n{F144}\n{F145}\n{F146}\n{F147}\n{F148}\n{F149}\n{F150}\n{F151}\n{F152}\n{F153}\n{F154}\n{F155}\n{F156}\n{F157}\n{F158}\n{F159}\n{F160}\n{F161}\n{F162}\n{F163}\n{F164}\n{F165}\n{F166}\n{F167}\n{F168}\n{F169}\n{F170}\n{F171}\n{F172}\n{F173}\n{F174}\n{F175}\n{F176}\n{F177}\n{F178}\n{F179}\n {F180}\n{F181}\{F182}\n{F183}\n{F184}\n{F185}\n{F186}\n{F187}\n{F188}\n{F189}\n{F190}\n {F191}\n{F192}\n{F193}\n{F194}\n{F195}\n{F196}\n{F197}\n{F198}\n{F199}\n{F200}\n{F201}\n{F202}\n{F203}\n{F204}\n{F205}\n{F206}\n{F207}\n{F208}\n{F209}\n{F210}\n{F211}\n{F212}\n{F213}\n{F214}\n{F215}\n{F216}\n{F217}\n{F218}\n{F219}\n{F220}\n{F221}\n{F222}\n{F223}\n{F224}\n{F225}\n{F226}\n{F227}\n{F228}\n{F229}\n{F230}\n{F231}\n{F232}\n{F233}\n{F234}\n{F235}\n{F236}\n{F237}\n{F238}\n{F239}\n{F240}\n{F241}\n{F242}\n{F243}\n{F244}\n{F245}\n{F246}\n{F247}\n{F248}\n{F249}\n{F250}\n{F251}\n{F252}\n{F253}\n{F254}\n{F255}\n{F256}\n{F257}\n{F258}\n{F259}\n{F260}\n{F261}\n{F262}\n{F263}\n{F264}\n{F265}\n{F266}\n{F267}\n{F268}\n{F269}\n{F270}\n{F271}\n{F272}\n{F273}\n{F274}\n"
oline = basedone.readlines()
oline.insert(9,linetest)
basedone.close()
with open(f'./GRAK-R-{KIT}/3_GRAK-R-{KIT}_FULL.txt', 'w') as basedone:
basedone.writelines(oline)
basedone.close
I don't think you need to name every single variable. You can use f-strings and list comprehensions to keep your code flexible.
min_cell = 62
max_cell = 77
column_name = 'F'
sheet_name = 'Config'
cell_names = [f'{sheet_name}!{column_name}{i}' for i in range(min_cell, max_cell + 1)]
vals = [format(excel.evaluate(cn)) for cn in cell_names]
linetest = '\n'.join(vals)

Prevent reading data multiple times using Dask

What can i do to prevent same file being read more then twice?
For the background, i have below detail
Im trying to read list of file in a folder, transform it, output it into a file, and check the gap before and after transformation
first for the reading part
def load_file(file):
df = pd.read_excel(file)
return df
file_list = glob2.glob("folder path here")
future_list = [delayed(load_file)(file) for file in file_list]
read_result_dd = dd.from_delayed(future_list)
After that , i will do some transformation to the data:
def transform(df):
# do something to df
return df
transformation_result = read_result_dd.map_partitions(lambda df: transform(df))
i would like to achieve 2 things:
first to get the transformation output:
Outputfile = transformation_result.compute()
Outputfile.to_csv("path and param here")
second to get the comparation
read_result_comp = read_result_dd.groupby("groupby param here")["result param here"].sum().reset_index()
transformation_result_comp = transformation_result_dd.groupby("groupby param here")["result param here"].sum().reset_index()
Checker = read_result_dd.merge(transformation_result, on=['header_list'], how='outer').compute()
Checker.to_csv("path and param here")
The problem is if i call Outputfile and Checker in sequence, i.e.:
Outputfile = transformation_result.compute()
Checker = read_result_dd.merge(transformation_result, on=['header_list'], how='outer').compute()
Outputfile.to_csv("path and param here")
Checker.to_csv("path and param here")
it will read the entire file twice (for each of the compute)
Is there any way to have the read result done only once?
Also are there any way to have both compute() to run in a sequence? (if i run it in two lines, from the dask dashboard i could see that it will run the first, clear the dasboard, and run the second one instead of running both in single sequence)
I cannot run .compute() for the result file because my ram can't contain it, the resulting dataframe is too big. both the checker and the output file is significantly smaller compared to the original data.
Thanks
You can call the dask.compute function on multiple Dask collections
a, b = dask.compute(a, b)
https://docs.dask.org/en/latest/api.html#dask.compute
In the future, I recommend producing an MCVE

PyZDDE: Cannot pass filter string to detector viewer for NSC mode

I'm running NSC raytraces in Zemax, using the python zemax DDE server pyZDDE to run through multiple configurations. Ideally I'd like it to run through all model configurations and perform a small amount of analysis so that I can leave the models processing overnight.
Part of this analysis involves using a filter string to get detector output for a couple of different wavelengths, however when I try to pass my filter string (in this case 'W2', I get the error "ValueError: could not convert string to float: W2"
The full error is:
File "C:\ProgramData\Anaconda2\lib\site-packages\pyzdde\zdde.py", line 9397, in zGetDetectorViewer
ret = _zfu.readDetectorViewerTextFile(pyz, textFileName, displayData)
File "C:\ProgramData\Anaconda2\lib\site-packages\pyzdde\zfileutils.py", line 679, in readDetectorViewerTextFile
posX = float(line_list[smoothLineNum + 2].split(':')[1].strip()) # 'Detector X'
ValueError: could not convert string to float: W2
So to me it looks like its mistaking the filter string for the detector infromation, but I'm not sure how to fix it!
Solutions I've tried:
Checking the encoding- I'm using ASCII, but running it in utf-8
hasn't changed the error.
Running a detector .CFG file generated by Zemax that gives the
desired output when not run though pyZDDE.
Minimal working example:
import pyzdde.zdde as pyz
#get rid of any non closed connections possibly hanging around
pyz.closeLink()
#Connect to server
ln = pyz.createLink()
status = ln.zDDEInit()
ln.zSetTimeout(1e5)
filename = 'C:\\...\\Zemax\\Samples\\MultiConfigLens.zmx'
# Load a lens file into the ZEMAX DDE server
ln.zLoadFile(filename)
#Generate config files
configFile0 = 'C:\\...\Zemax\\Samples\\MultiConfigLens_Config1.CFG'
configFile1 = 'C:\\...\Zemax\\Samples\\MultiConfigLens_Config1.CFG'
configFile2 = 'C:\\...\Zemax\\Samples\\MultiConfigLens_Config2.CFG'
ln.zSetDetectorViewerSettings(settingsFile=configFile0, surfNum=1, detectNum=10, showAs=0, scale = 1,dType=4)
ln.zSetDetectorViewerSettings(settingsFile=configFile1, surfNum=1, detectNum=10, zrd='RAYS1.ZRD', dfilter='W1',showAs=0, scale=1, dType=4)
ln.zSetDetectorViewerSettings(settingsFile=configFile2, surfNum=1, detectNum=10, zrd='RAYS1.ZRD', dfilter='W2',showAs=0, scale=1, dType=4)
#perform the ray trace
ln.zNSCTrace(1,0,split = 1,scatter = 1,usePolar = 1,ignoreErrors = 1,randomSeed = 0,save = 1,saveFilename = 'RAYS1.ZRD',timeout = 1e5)
#grab that detector data
data0 = ln.zGetDetectorViewer(configFile0,displayData = True)
data1 = ln.zGetDetectorViewer(configFile1,displayData = True)
data2 = ln.zGetDetectorViewer(configFile2,displayData = True)
as soon as the code gets to "data1" it fails and returns the above error message- any help would be super appreciated!
EDIT: I found the source of the problem, and I'll post the explaination I submitted as a bug report on the pyZDDE github page:
Looking through readDetectorViewerTextFile in zfileutils (around line 678) it looks like the function doesn't account for the fact that the text file outputted by the detector is changed when a ray database and a filter string are included in the detector config meaning that it attempts to read the filter string as the detector X position, causing the value error.

Having an issue with using median function in numpy

I am having an issue with using the median function in numpy. The code used to work on a previous computer but when I tried to run it on my new machine, I got the error "cannot perform reduce with flexible type". In order to try to fix this, I attempted to use the map() function to make sure my list was a floating point and got this error message: could not convert string to float: .
Do some more attempts at debugging, it seems that my issue is with my splitting of the lines in my input file. The lines are of the form: 2456893.248202,4.490 and I want to split on the ",". However, when I print out the list for the second column of that line, I get
4
.
4
9
0
so it seems to somehow be splitting each character or something though I'm not sure how. The relevant section of code is below, I appreciate any thoughts or ideas and thanks in advance.
def curve_split(fn):
with open(fn) as f:
for line in f:
line = line.strip()
time,lc = line.split(",")
#debugging stuff
g=open('test.txt','w')
l1=map(lambda x:x+'\n',lc)
g.writelines(l1)
g.close()
#end debugging stuff
return time,lc
if __name__ == '__main__':
# place where I keep the lightcurve files from the image subtraction
dirname = '/home/kuehn/m4/kepler/subtraction/detrending'
files = glob.glob(dirname + '/*lc')
print(len(files))
# in order to create our lightcurve array, we need to know
# the length of one of our lightcurve files
lc0 = curve_split(files[0])
lcarr = np.zeros([len(files),len(lc0)])
# loop through every file
for i,fn in enumerate(files):
time,lc = curve_split(fn)
lc = map(float, lc)
# debugging
print(fn[5:58])
print(lc)
print(time)
# end debugging
lcm = lc/np.median(float(lc))
#lcm = ((lc[qual0]-np.median(lc[qual0]))/
# np.median(lc[qual0]))
lcarr[i] = lcm
print(fn,i,len(files))

Categories