Add data in the next empty row in python pandas - python

I'm making a small and simple program that put one name under another in an excel file, and i dont know how i can get the next empty row
I have this excel table:
Name
Carl
And i'm making a program to add new names. Here is the function:
def modifyexcel ():
book = openpyxl.load_workbook (r'C:\Users\usuario\Desktop\prueba.xlsx')
sheet = book ["a"]
sheet ["a3"] = str(entrada1.get())
book.save (r'C:\Users\usuario\Desktop\prueba.xlsx')
But i need, instead of modifying the "a3" cell, modify the next row that is empty, so every time i add a new name it gets placed on the next empty row

you can just use google colab to modify your excel !
you can mount your csv or excel to google drive or just load the csv to the side bar!
and copy path and paste it to your pandas read_csv or (read_excel is the same thing)!
https://colab.research.google.com/
from google.colab import files
import pandas as pd
#So first you read from your original excel
df=pd.read_csv('path')
list1=df.name.tolist()
##then create a variable to store your new name
name= "new name" ##param {type:"string"}
##append the new name to the list and return to pandas dataframe
list1.append(name)
df.name=list1
##output to csv and download
df.to_csv('newsheet.csv',index=False)
files.download('newsheet.csv')

Related

Removing the Indexed Column when Merging 2 Excel Spreadsheets into a new Sheet in an existing Excel Spreadsheet using Pandas

I wanted to automate comparing two excel spreadsheets and updating old data (call this spreadsheet Old_Data.xlsx) with new data (from a different excel document; called New_Data.xlsx) and placing the updated data into a different sheet on on Old_Data.xlsx.
I am able to successfully create the new sheet in Old_Data.xlsx and see the changes between the two data sets, however, in the new sheet an index appears labeling the rows of data from 0-n. I've tried hiding this index so the information on each sheet in Old_Data.xlsx appears the same, however, I cannot successfully seem to get rid of the addition of the index. See the code below:
from openpyxl import load_workbook
# import xlwings as xl
import pandas as pd
import jinja2
# Load the workbook that is going to updated with new information.
wb = load_workbook('OldData.xlsx')
# Define the file path for all of the old and new data.
old_path = 'OldData.xlsx'
new_path = 'NewData.xlsx'
# Load the data frames for each Spreadsheet.
df_old = pd.read_excel(old_path)
print(df_old)
df_new = pd.read_excel(new_path)
print(df_new)
# Keep all original information why showing the differences in information and write
# to a new sheet in the workbook.
difference = pd.merge(df_old, df_new, how='right')
difference = difference.style.format.hide()
print(difference)
# Append the difference to an existing Excel File
with pd.ExcelWriter('OldData.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
difference.to_excel(writer, sheet_name="1-25-2023")
This is an image of the table of the second sheet that I creating. (https://i.stack.imgur.com/7Amdf.jpg)
I've tried adding the code:
difference = difference.style.format.hide
To get rid of the row, but I have not succeeded.
pass index = False as an argument in last line of you code. It should be something like this :-
with pd.ExcelWriter('OldData.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
difference.to_excel(writer, sheet_name="1-25-2023", index = False)
I think this should solve your problem.

Inserting Data into an Excel file using Pandas - Python

I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')

Import every worksheet in an excel workbook and save to a dataframe named by the worksheet name

I have an excel workbook with 3 worksheets, they are called "Z_scores", "Alpha" and "Rho" respectively.
In the future, this workbook will increase as the number of models and their corresponding parameters are stored here.
In my function I am looking to import each worksheet individually and save it to a dataframe, the name of the dataframe should be decided by the name of the worksheet.
So far, I have this function but I am not able to dynamically name the dataframe and I am unsure what should be written in the return statement
FYI: The import identifier function is simply a way of scanning in worksheet names and those with the identifier present should not be inserted e.g. putting a single blankspace at the beginning of the worksheet name will prevent the worksheet being imported.
#import libraries
import pandas as pd
#define function
def import_excel(filename, import_identifier):
#Create dataframe of the excel
df = pd.read_excel('Excel.xlsx')
# this will read the first sheet into df
xls = pd.ExcelFile('Excel.xlsx')
#Delete all worksheet that begin with the import_identifier
worksheets = []
for x in all_worksheets:
if x[0] != import_identifier:
worksheets.append(x)
#Loop through the sheets which are flagged for importing and import each
#sheet individually into a dataframe
for sheetname in worksheets:
#Encase the sheetname in quotation marks to satisfy the sheetname function in read_excel
sheetname_macro_str = '"{}"'.format(sheetname_macro)
#Import the workbook and save to dynamically named dataframe
sheetname_macro = pd.read_excel(xls, sheetname=sheetname_macro_str)
#What would I return here, how do I ensure the data frames are stored?
#return
As you can read in this thread, a DataFrame object can't reliably be "named". Usually, the Python variable to which the object is assigned will be what describes or differentiates it.
If you're looking to store references to multiple DataFrames in your code, you'll probably want to create a list, tuple, or dictionary for that (outside the scope of your import function). If you use a dictionary, then you can use your worksheet names as keys:
dataframes = {}
dataframes[friendly_sheetname] = dataframe_from_sheet

Check if a Excel Sheet is empty

I have an excel workbook with multiple sheets. I need to delete the sheets which are completely empty, as my code when processing finds a blank sheet it fails.
os.chdir(path)
list_file=[]
for file in glob.glob("*.xlsx"):
print(file)
list_file.append(file)
I have listed all the files here available.
AB_list=[s for s in list_file if "India" in s]
CD_list=[s for s in list_file if "Japan" in s]
Then, i store the file names is list as per requirement. Now I need to delete empty sheets from those excel files before I move them to a dataframe. Then loop through to read the files into individual dataframe.
ws.max_row and ws.max_column should give you last used cell position. Based on that you can determine if sheet is empty. Also check if this works for you ws.calculate_dimension(), which should return a range.
All the functions are from openpyxl which you are already familiar with.
You've tagged openpyxl so I assume you're using that.
# workbook is opened MS Exel sheet opened with openpyxl.Workbook
empty_sheets = []
for name in workbook.get_sheet_names():
sheet = workbook.get_sheet_by_name(name)
if not sheet.columns():
empty_sheets.append(sheet)
map(workbook.remove, empty_sheets)
you can easily do this with pandas which I'm using too.
here
and code looks like
import pandas as pd
df = pd.read_csv(filename) #
or
pd.read_excel(filename) for xls file
df.empty

Appending a excel spreadsheet as a new sheet to multiple spreadsheets using python

I have over 300 unique ".xlsx" spreadsheets. I have another spreadsheet (a data dictionary explaining field names) that I would like to append as a new sheet (tab) to each of the 300 unique spreadsheets.
Is there a relatively simple way to do this task in python?
Here's how you could do it with Python-Excel
import xlrd
import xlwt
from xlutils.copy import copy
import os
if not os.path.exists("/new"): os.makedirs("new")
toBeAppended = xlrd.open_workbook("ToBeAppended.xlsx")
sheetToAppend = toBeAppended.sheets()[0] #If you don't want it to open the first sheet, change the 0 accordingly
dataTuples = []
for row in range(sheetToAppend.nrows):
for col in range(sheetToAppend.ncols):
dataTuples.append((row, col, sheetToAppend.cell(row,col).value))
#You need to change this line!
wbNames = ["{}.xlsx".format(num) for num in range(1,7)]
for name in wbNames:
wb = copy(xlrd.open_workbook(name))
newSheet = wb.add_sheet("Appended Sheet")
for row, col, data in dataTuples:
newSheet.write(row, col, data)
wb.save("new/"+name.split('.')[0]+".xls")
So this creates a new folder for your new sheets (just in case it doesn't work). Then it copies the the first sheet of "ToBeAppended.xlsx" and gathers all the data in it. Then it gathers then name of files it needs to change (which for me was "1.xlsx" and so on). Then it creates a copy of each workbook it needs to edit, adds the sheet, and writes all the data too it. Finally, it saves the file.
You'll note that it saves a ".xls" file. This is a limitation of the package, and I don't know any way around it. Sorry
Hope this helps.

Categories