Edit: I found out a solution to my question. More or less look at the user manual for openPyxl instead of online tutorials, the tutorials ran errors when I tried them (I tried more than one) and their thought process was significantly different from the thought process in the user manual. And also I ended up not using pandas as much as I thought I would need to.
I am trying to append certain values in an Excel file with multiple sheets based on user inputs and then rewrite it to the Excel file (without deleting the rest of the sheets). So far I have tried this which seems to combine the data but I didn't quite see how it applied to what I am doing since I want to append a part of a sheet instead of rewrite the whole excel file. I have also tried a few other things with ExcelWriter but I don't quite understand it since it usually wipes all the data in the file (I may be using it wrong).
episode_dataframe = pd.read_excel (r'All_excerpts (Siena Copy)_test.xlsx', sheet_name=episode)
#episode is a specified string inputted by user, this line makes a data frame for the specified sheet
episode_dataframe.loc[(int(pass_num) - 1), 'Resources'] = resources
#resources is also a user inputted string, it's what I am trying to append the spreadsheet cell value to, this appends to corresponding data frame
path_R = open("All_excerpts (Siena Copy)_test.xlsx", "rb")
with pd.ExcelWriter(path_R) as writer:
writer.book = openpyxl.load_workbook(path_R)
#I copied this from [here][3], i think it should make the writer for the to_excel? I don't fully know
episode_dataframe.to_excel(writer, sheet_name=episode, engine=openpyxl, if_sheet_exsits ='replace')
#this should write the sheet data frame onto the file, but I don't want it to delete the other sheets
Additionally, I have been running into a bunch of other smaller errors, a big one was Workbook' object has no attribute 'add worksheet' even though I'm not trying to add a worksheet, also I could not get their solution to work.
I am a bit of a novice at python, so my code might be a bit of a mess.
Related
I have an excel workbook that uses functions like OFFSET, UNIQUE, and FILTER which spill into other cells. I'm using python to analyze and write some data to the workbook, but after doing so these formulas revert into normal arrays. This means they now take up a fixed number of cells (however many they took up before opening the file in python) instead of adjusting to fit all of the data. I can revert the change by selecting the formula and hitting enter, but there are many of these formulas it's more work to fix them than to just print the data to a text file and paste it into excel manually. Is there any way to prevent this behavior?
I've been using openpyxl to open and save the workbook, but after encountering this issue also tried xlsxwriter and the dataframe to excel function from pandas. Both of them had the same issue as openpyxl. For context I am on python 3.11 and using the most recent version of these modules. I believe this issue is on the Python side and not the Excel side, so I don't think changing Excel settings will help, but maybe there is something there I missed.
Example:
I've created an empty workbook with two sheets, one called 'main' and one called 'input'. The 'main' sheet will analyze data from the 'input' sheet which will be entered with openpyxl. The data will just be values in the first column.
In cell A1 of the 'main' sheet, enter =OFFSET(input!A1,0,0,COUNTA(input!A:A),1).
This formula will just show a copy of the data. Since there currently isn't any data it gives a #REF! error, so it only takes up one cell.
Now I'll run the following python code to add the numbers 0-9 into the first column of the input sheet:
from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb['input']
for i in range(10):
ws.append([i])
wb.save('workbook_2.xlsx')
When opening the new file, cell A1 on the 'main' sheet only has the first value, 0, instead of the range 0--9. When selecting the cell, you can see the formula is now {=OFFSET(input!A1,0,0,COUNTA(input!A:A),1)}. The curly brackets make it an array, so it wont spill. By hitting enter in the formula the array is removed and the sheet properly becomes the full range.
If I can get this simple example to work, then expanding it to the data I'm using shouldn't be a problem.
first post here.
I am very new to programmation, sorry if it is confused.
I made a database by collecting multiple different data online. All these data are in one xlsx file (each data a column), that I converted in csv afterwards because my teacher only showed us how to use csv file in Python.
I installed pandas and make it read my csv file, but it seems it doesnt understand that I have multiple columns, it reads one column. Thus, I can't get the info on each data (and so i can't transform the data).
I tried df.info() and df.info(verbose=True, show_counts=True) but it makes the same thing
len(df.columns) = 1 which proves it doesnt see that each data has its own column
len(df) = 1923, which is right
I was expecting that : https://imgur.com/a/UROKtxN (different project, not the same database)
database used: https://imgur.com/a/Wl1tsYb
And I have that instead : https://imgur.com/a/iV38YNe
database used: https://imgur.com/a/VefFrL4
idk, it looks pretty similar, why doesn't work :((
Thanks.
Background:
I am creating a program that will need to keep track of what it has ran and when and what it has sent and to whom. The logging module in Python doesn't appear to accomplish what I need but I'm still pretty new to this so I may be wrong. Alternative solutions to accomplish the same end are also welcome.
The program will need to take in a data file (preferably .xlsx or .csv) which will be formatted as something like this (the Nones will need to be filled in by the program):
Run_ID
Date_Requested
Time_Requested
Requestor
Date_Completed
Time_Completed
R_423h
9/8/2022
1806
email#email.com
None
None
The program will then need to compare the Run_IDs from the log to the new run_IDs provided in a similar format to the table above (in a .csv) ie:
ResponseId
R_jals893
R_hejl8234
I can compare the IDs myself, but the issue then becomes it will need to update the log with the new IDs it has ran, along with the times they were run and the emails and such, and then resave the log file. I'm sure this is easy but it's throwing me for a loop.
My code:
log = pd.read_excel('run_log.xlsx', usecols=None, parse_dates=True)
new_run_requests=pd.read_csv('Run+Request+Sheet_September+6,+2022_14.28.csv',parse_dates=True)
old_runs = log.Run_ID[:]
new_runs = new_run_requests.ResponseId[:]
log['Run_ID'] = pd.concat([old_runs, new_runs], ignore_index=True)
After this the dataframe does not change.
This is one of the things I have tried out of 2 or 3. Suggestions are appreciated!
I am racking my brain here and have read a lot of tutorials, sites, sample code, etc. Something is not clicking for me.
Here is my desired end state.
Select data from MSSQL - Sorted, not a problem
Open an Excel template (xlsx file) - Sorted, not a problem
Export data to this Excel template and saving it with a different name - PROBLEM.
What I have achieved so far: (this works)
I can extract data from DB.
I can write that data to Excel using pandas, my line of code for doing that is: pd.read_sql(script,cnxn).to_excel(filename,sheet_name="Sheet1",startrow=19,encoding="utf-8")
filename variable is a new file that I create every time the for loop runs.
What my challenge is:
The data needs to be export to a predefined template (template has formatting that must be present in every file)
I can open the file and I can write to the file, but I do not know how to save that file with a different name through every iteration of the for loop
In my for loop I use this code:
#this does not work
pd.read_sql(script,cnxn)
writer = pd.ExcelWriter(SourcePath) #opens the source document
df.to_excel(writer)
writer.save() #how to I saveas() a different file name?????
Your help would be highly appreciated.
Your method is work. The problem is you don't need to write the data into excel file right after you read the data from the database. My suggestion is first read the data into different data frame.
df1 = pd.read_sql(script)
df2 = pd.read_sql(script)
df3 = pd.read_sql(script)
You can then write all the dataframe together to a excel file. You can refer to this link.
I hope this solution can help you. Have a nice weekend
If I call a sheet by name, get_all_values function will always give me an empty list for a sheet that is definitely not empty.
import gspread
sheet = workbook.worksheet(sheet_name)
all_rows_list = sheet.get_all_values()
The only time get_all_values seems to return like it should is if I do the following:
all_rows_list = workbook.sheet1.get_all_values()
But the above works just for the first sheet and for no other, which is kind of useless for a workbook with more sheets.
What always works is reading row by row like
one_row_list = sheet.row_values(1) # first row
But the problem is that I'm trying to read a relatively big workbook with lots of sheets to figure out where I'm supposed to start writing, and it looks like reading row by row triggers "RESOURCES EXHAUSTED" error very fast.
So, am I doing something wrong or is get_all_values broken in gspread?
EDIT:
Added a screenshot.
gspread doesn't work well with sheets with names that could be confused as a cell reference in the A1 notation (like X101 and AT8 in your case).
https://github.com/burnash/gspread/issues/554 is an older issue that describes the underlying problem (the symptoms in that issue are different, but I'm pretty sure the root problem is the same).
I'll copy the workaround with providing a range, that you've discovered yourself:
ws.range("A1:C"+str(end_row)) That end_row is usually row_count of the sheet.