I have a jupyter notebook where I run the same simulation using many different combinations of parameters (essentially, to simulate different versions of environment and their effect on the results). Let's say that the result of each run is an image and a 2d array of all relevant metrics for my system. I want to be able to keep the images in notebook, but save the arrays all in one place, so that I can work with them later on if needed.
Ideally I would save them into an external file with the following format:
'Experiment environment version i' (or some other description)
2d array
and every time I would run a new simulation (a new cell) the results would be added into this file until I close it.
Any ideas how to end up with such external summary file?
If you have excel available to you then you could use pandas to write the results to a spreadsheet (or you could use pandas to write to a csv). See the documentation here, but essentially you would do the following when appending and/or using a new sheet:
import pandas as pd
for i in results:
with pd.ExcelWriter('results.xlsx', mode='a') as writer:
df.to_excel(writer, sheet_name='Result'+i)
You will need to have your array in dataframe 'df', there are lots of tutorials on how to put an array into pandas.
After a bit of try and error, here is a general answer how to write to txt (without pandas, otherwise see #jaybeesea's answer)
with open("filename.txt", "a+") as f:
f.write("Comment 1 \n")
f.write("%s \n" %np.array2string(array, separator=' , '))
Every time you run it, it adds to the file "f".
Related
I have an excel workbook that uses functions like OFFSET, UNIQUE, and FILTER which spill into other cells. I'm using python to analyze and write some data to the workbook, but after doing so these formulas revert into normal arrays. This means they now take up a fixed number of cells (however many they took up before opening the file in python) instead of adjusting to fit all of the data. I can revert the change by selecting the formula and hitting enter, but there are many of these formulas it's more work to fix them than to just print the data to a text file and paste it into excel manually. Is there any way to prevent this behavior?
I've been using openpyxl to open and save the workbook, but after encountering this issue also tried xlsxwriter and the dataframe to excel function from pandas. Both of them had the same issue as openpyxl. For context I am on python 3.11 and using the most recent version of these modules. I believe this issue is on the Python side and not the Excel side, so I don't think changing Excel settings will help, but maybe there is something there I missed.
Example:
I've created an empty workbook with two sheets, one called 'main' and one called 'input'. The 'main' sheet will analyze data from the 'input' sheet which will be entered with openpyxl. The data will just be values in the first column.
In cell A1 of the 'main' sheet, enter =OFFSET(input!A1,0,0,COUNTA(input!A:A),1).
This formula will just show a copy of the data. Since there currently isn't any data it gives a #REF! error, so it only takes up one cell.
Now I'll run the following python code to add the numbers 0-9 into the first column of the input sheet:
from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb['input']
for i in range(10):
ws.append([i])
wb.save('workbook_2.xlsx')
When opening the new file, cell A1 on the 'main' sheet only has the first value, 0, instead of the range 0--9. When selecting the cell, you can see the formula is now {=OFFSET(input!A1,0,0,COUNTA(input!A:A),1)}. The curly brackets make it an array, so it wont spill. By hitting enter in the formula the array is removed and the sheet properly becomes the full range.
If I can get this simple example to work, then expanding it to the data I'm using shouldn't be a problem.
Edit: I found out a solution to my question. More or less look at the user manual for openPyxl instead of online tutorials, the tutorials ran errors when I tried them (I tried more than one) and their thought process was significantly different from the thought process in the user manual. And also I ended up not using pandas as much as I thought I would need to.
I am trying to append certain values in an Excel file with multiple sheets based on user inputs and then rewrite it to the Excel file (without deleting the rest of the sheets). So far I have tried this which seems to combine the data but I didn't quite see how it applied to what I am doing since I want to append a part of a sheet instead of rewrite the whole excel file. I have also tried a few other things with ExcelWriter but I don't quite understand it since it usually wipes all the data in the file (I may be using it wrong).
episode_dataframe = pd.read_excel (r'All_excerpts (Siena Copy)_test.xlsx', sheet_name=episode)
#episode is a specified string inputted by user, this line makes a data frame for the specified sheet
episode_dataframe.loc[(int(pass_num) - 1), 'Resources'] = resources
#resources is also a user inputted string, it's what I am trying to append the spreadsheet cell value to, this appends to corresponding data frame
path_R = open("All_excerpts (Siena Copy)_test.xlsx", "rb")
with pd.ExcelWriter(path_R) as writer:
writer.book = openpyxl.load_workbook(path_R)
#I copied this from [here][3], i think it should make the writer for the to_excel? I don't fully know
episode_dataframe.to_excel(writer, sheet_name=episode, engine=openpyxl, if_sheet_exsits ='replace')
#this should write the sheet data frame onto the file, but I don't want it to delete the other sheets
Additionally, I have been running into a bunch of other smaller errors, a big one was Workbook' object has no attribute 'add worksheet' even though I'm not trying to add a worksheet, also I could not get their solution to work.
I am a bit of a novice at python, so my code might be a bit of a mess.
I'm a student who is quite new to coding in Python.
I'm using Dymola for several years and now I'm using the Dymola/Python interface with which you can operate Dymola from inside Python (useful for building stock simulations, global sensitivity analysis etc.).
Now, Dymola always generates .mat files in an efficient unreadable data structure. I was wondering how to export variables I'm interested in from that .mat-file to .csv using a Python-script? (I don't want the whole file to be converted to .csv because it is simple way too large)
I'm aware of a DyMat-package for Python that should do the job but either I don't understand the code or the code is not doing what it should do? Does anybody have experience with this?
I probably miss some code defining which .mat file has to be read/exported from, which variables I want and in which directory the result.csv-file should be stored?
import csv, numpy
def export(dm, varList, fileName=None, formatOptions={}):
"""Export DyMat data to a CSV file"""
if not fileName:
fileName = dm.fileName + '.csv'
oFile = open(fileName, 'w')
csvWriter = csv.writer(oFile)
vDict = dm.sortByBlocks(varList)
for vList in vDict.values():
vData = dm.getVarArray(vList)
vList.insert(0, dm._absc[0])
csvWriter.writerow(vList)
csvWriter.writerows(numpy.transpose(vData))
oFile.close()
Thanks!
In the Dymola distribution there is a utility called alist.exe, that allows you to export a number of variables in CSV format.
Another possibility is to convert the MAT file to SDF format, which is a very simple HDF5 interpretation. The HDF5 file is not as compact as the MAT-file, but you can compress the file using ZIP/GZIP/7ZIP to reduce archival storage. There are both MATLAB and Python scripts for reading the SDF format in the Dymola distribution.
Since this was tagged openmodelica, I am proposing a solution using it:
filterSimulationResults("file.mat", "file.csv", {"x","y","z"}) creates a csv-file with only variables x, y, z (If you think it's still too large, it is possible to resample the file).
For small files (<2GB) Buildingspy (or other Python packages) covers pretty much all needs: https://simulationresearch.lbl.gov/modelica/buildingspy/
However, since one will run into issues when the files are above 2GB (e.g. for full years of simulations), "alist.exe" from Dymola may be employed. (filterSimulationResults from OpenModelica also fails then)
"alist.exe" seems to accept up until approx. 100 variables to be exported at once and single executions for each variable seems to slow things down drastically (translation of 1 or 100 rows takes almost the same time). One may employ the alist.exe as follows from Python to facilitate automation and speed things up.
var_list=['Component.Name1','Component.Name3','Component.Name2','...'] #List of Variabels to be extracted
N_batch=100 #Number of variables to be extracted from the .mat file at once (max. approx 110)
cmds=[] #list of commands to be executed batch wise
for i,var in enumerate(var_list):
if (i%N_batch == 0) &(i > 0):
cmds.append(cmd)
cmd=''
cmd+=f' -e {var}'#build command
cmds.append(cmd)
lst_df=[] #list of pandas dataframes
for i,cmd in enumerate(cmds):
os.system(f'"C:/Program Files/Dymola 2021/bin64/alist.exe" {cmd} {inFile} tmp.csv')
lst_df.append(pd.read_csv('tmp.csv',index_col=[0]).squeeze("columns"))
df_overall=pd.concat(lst_df,axis=1)
df_overall.to_csv('CompleteCSVFile.csv')#or use .pkl for more efficient writing and reading
It is still not a fast solution, but enables the processing of the date in the first instance. Variable Selection of Dymola should always be exploited first before trying to wrangle around such amounts of data on a local machine.
Hope this helps someone someday!
I am creating a dataframe with a bunch of calculations and adding new columns using these formulas (calculations). Then I am saving the dataframe to an Excel file.
I lose the formula after I save the file and open the file again.
For example, I am using something like:
total = 16
for s in range(total):
df_summary['Slopes(avg)' + str(s)]= df_summary[['Slope_S' + str(s)]].mean(axis=1)*df_summary['Correction1']/df_summary['Correction2'].mean(axis=1)
How can I make sure this formula appears in my excel file I write to, similar to how we have a formula in an excel worksheet?
You can write formulas to an excel file using the XlsxWriter module. Use .write_formula() https://xlsxwriter.readthedocs.org/worksheet.html#worksheet-write-formula. If you're not attached to using an excel file to store your dataframe you might want to look into using the pickle module.
import pickle
# to save
pickle.dump(df,open('saved_df.p','wb'))
# to load
df = pickle.load(open('saved_df.p','rb'))
I think my answer here may be responsive. The short of it is you need to use openpyxl (or possibly xlrd if they've added support for it) to extract the formula, and then xlsxwriter to write the formula back in. It can definitely be done.
This assumes, of course, as #jay s pointed out, that you first write Excel formulas into the DataFrame. (This solution is an alternative to pickling.)
I have multiple files which I process using Numpy and SciPy, but I am required to deliver an Excel file. How can I efficiently copy/paste a huge numpy array to Excel?
I have tried to convert to Pandas' DataFrame object, which has the very usefull function to_clipboard(excel=True), but I spend most of my time converting the array into a DataFrame.
I cannot simply write the array to a CSV file then open it in excel, because I have to add the array to an existing file; something very hard to achieve with xlrd/xlwt and other Excel tools.
My best solution here would be to turn the array into a string, then use win32clipboard to sent it to the clipboard. This is not a cross-platform solution, but then again, Excel is not avalable on every platform anyway.
Excel uses tabs (\t) to mark column change, and \r\n to indicate a line change.
The relevant code would be:
import win32clipboard as clipboard
def toClipboardForExcel(array):
"""
Copies an array into a string format acceptable by Excel.
Columns separated by \t, rows separated by \n
"""
# Create string from array
line_strings = []
for line in array:
line_strings.append("\t".join(line.astype(str)).replace("\n",""))
array_string = "\r\n".join(line_strings)
# Put string into clipboard (open, clear, set, close)
clipboard.OpenClipboard()
clipboard.EmptyClipboard()
clipboard.SetClipboardText(array_string)
clipboard.CloseClipboard()
I have tested this code with random arrays of shape (1000,10000) and the biggest bottleneck seems to be passing the data to the function. (When I add a print statement at the beginning of the function, I still have to wait a bit before it prints anything.)
EDIT: The previous paragraph related my experience in Python Tools for Visual Studio. In this environment, it seens like the print statement is delayed. In direct command line interface, the bottleneck is in the loop, like expected.
import pandas as pd
pd.DataFrame(arr).to_clipboard()
I think it's one of the easiest way with pandas package.
If I would need to process multiple files loaded into python and then parse into excel, I would probably make some tools using xlwt
That said, may I offer my recipe Pasting python data into a spread sheet open for any edits, complaints or feedback. It uses no third party libraries and should be cross platform.
As of today, you can also use xlwings. It's open source, and fully compatible with Numpy arrays and Pandas DataFrames.
I extended PhilMacKay answer to:
- incude 1dimensional arrays and,
- to allow for commas as decimal separator (decimal=","):
import win32clipboard as clipboard
def to_clipboard(array, decimal=","):
"""
Copies an array into a string format acceptable by Excel.
Columns separated by \t, rows separated by \n
"""
# Create string from array
try:
n, m = np.shape(array)
except ValueError:
n, m = 1, 0
line_strings = []
if m > 0:
for line in array:
if decimal == ",":
line_strings.append("\t".join(line.astype(str)).replace(
"\n","").replace(".", ","))
else:
line_strings.append("\t".join(line.astype(str)).replace(
"\n",""))
array_string = "\r\n".join(line_strings)
else:
if decimal == ",":
array_string = "\r\n".join(array.astype(str)).replace(".", ",")
else:
array_string = "\r\n".join(array.astype(str))
# Put string into clipboard (open, clear, set, close)
clipboard.OpenClipboard()
clipboard.EmptyClipboard()
clipboard.SetClipboardText(array_string)
clipboard.CloseClipboard()
You could also look into pyxll project.