Append data in New Column in excel using python - python

I have to Run this code everyday and store dataframe in new columns
How to Store the data frame to next column automatically without specifying the column number in excel file using Python Here dx stores the day of month i.e "09-11-2022" dx=9 so the data gets stored in column 9 but there will be gaps if i run it after some time like if dx=22 the columns between 9-22 will be empty So how to store data to next new column without specifying the startcol
df1 = pd.DataFrame({today:mob})
df2 = pd.DataFrame({today:dtop})
writer=pd.ExcelWriter('Pagespeed.xlsx',mode='a',engine="openpyxl",
if_sheet_exists='overlay')
df1.to_excel(writer, index=False,sheet_name='Mobile',startcol=dx)
df2.to_excel(writer,index=False,sheet_name='Desktop',startcol=dx)
writer.save()

Try this answer to get the max populated column in the excel file.
Then use startcol = max_column + 1 to write to the next empty column

Related

Data Frame, how to save data extracted with webscraping?

I have a spreadsheet where columns "A", "B" and "C" have data, I need to save the webscraping data in column "D".
Currently, it deletes the contents of all columns and saves the information in column "B", this is my problem, as it should save in column "D" and keep the data from the other columns
I tried the command below and I had no success, because it just created a column named "c".
pd.DataFrame(data, columns=["c"])
just below the command I use to save all my webscraping data
data = {'Video': links}
df = pd.DataFrame(data)
df.to_excel(r"C:\__Imagens e Planilhas Python\Facebook\Youtube.xlsx", engine='xlsxwriter')
print(df)
You should have included what the data in "Youtube.xlsx" and data look like. The answer below is suggested with the assumption that they're the same length and that "Youtube.xlsx" has no index column and exactly 3 columns so any added column will be the 4th by default.
I don't know what's in "Youtube.xlsx" or in data, but the way it's coded df will have only one column (Video), and .to_excel uses write mode by default, so
Currently, it deletes the contents of all columns and saves the information in column "B"
[I expect it uses column the first column as index, so Video ends up as the second column.] If you don't want to write over the previous contents, the usual approach is to append with mode='a'; but that will append rows, and [afaik] there is no way to append columns directly to a file.
You could read the file into a DataFrame, then add the column and save.
filepath = r"C:\__Imagens e Planilhas Python\Facebook\Youtube.xlsx"
df = df.read_excel(filepath) #, header=None) # if there is no header row
df['Video'] = links
df.to_excel(filepath, engine='xlsxwriter') #, index=False)
[Use a different column name if the file already had a Video column that you want to preserve.]

Use Python to export Excel column data when column isn't in first row

I need to pull data from a column based on the column header. My only problem is the input files aren't consistent and have the column in different locations and the data doesn't start on row one.
Above is an example excel file. I want to pull the data for Market. I've got this to work using panda if the data starts at a1, but I can't get it to pull the data if it doesn't start in the first position.
How about you use this just after you pd.read_excel() statement ?
df=df.dropna(how='all',axis='columns').dropna(how='all',axis='rows')
You can then set the first row as header:
df.columns = df.iloc[0]
df = df[1:]
df

reading multiple excel sheets and dropping the last row of each sheet

My program reads in an Excel file that contains multiple sheets and concatenates them together. The issue is that the last row at the end of each sheet Totals and I don't want that row. Is there an argument that will drop the last row when I read the sheets in? And will I need to first read the sheets in and remove this last row before I run the concat function to avoid deleting out the wrong rows? I've tried using skipfooter = 0 and skipfooter = 1 but this threw an error message.
I assume you using pandas to read xlsx where the excel file have multiple sheet with difference length of data and you want to drop the last row from each sheet, so you can use [:-1] like this :
df = pd.ExcelFile('report.xlsx',engine = 'openpyxl')
data = [df.parse(name)[:-1] for name in df.sheet_names]

Problem with combining multiple excel files in python pandas

I am quite new to python programming. I need to combine 1000+ files into one file. each file has 3 sheets in it and I need to get data only from sheet2 and make an final excel file. I am facing a problem to pick a value from specific cell from each excel file on sheet2 and create a column. python is picking the value from first file and create a column on that
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsm'):
df = pd.read_excel(file, sheet_name=1, header=None)
df['REPORT_NO'] = df.iloc[1][4] #Report Number
df['SUPPLIER'] = df.iloc[2][4] #Supplier
df['REPORT_DATE'] = df.iloc[0][4] #Report Number
df2 = df2.dropna(thresh=15)
df2 = df.append(df, ignore_index=True)
df = df.reset_index()
del df['index']
df2.to_excel('FINAL_FILES.xlsx')
How can I solve this issue so python can take from each excel and put the information on right rows.
I df.iloc[2][4] refers to the 2nd row and 4th column of the 1st sheet. You have imported with sheet_name=1 and never activated a different sheet, though you mentioned all of the .xlsm have 3 sheets.
II your scoping could be wrong. Why define df outside of the loop? If will change per file, so no need for an external one. All info form the loop should be put into your df2 before the next iteration of the loop.
III Have you checked if append is adding a row or a column?
Even though
df['REPORT_NO'] = df.iloc[1][4] #Report Number
df['SUPPLIER'] = df.iloc[2][4] #Supplier
df['REPORT_DATE'] = df.iloc[0][4] #Report Number
are written as columns they have Report Number/Supplier/Report Date repeated for every row in that column.
When you use df2 = df.append(df, ignore_index=True) check the output. It might not be appending in the way you intend.

Create excel worksheet for every unique value in column of dataframe python

I have a VERY large CSV file with 250,000+ records that takes a while to do any analyses on in Excel, so I wanted to splice it into multiple worksheets based on a specific calculated column that I created in pandas.
The specific column is called "Period" and is a string variable in my dataframe in the form of MMM_YYYY (e.g., Jan_2016, Feb_2016, etc.)
I am trying to make something that would have a workbook (let's call it data_by_month.xlsx) have a worksheet for every unique period in the dataframe column "Period," with all matching rows written into the respective worksheet.
This is the logic that I tried:
for row in df:
for period in unique_periods:
if row[38] == period:
with pd.ExcelWriter("data_by_month.xslx") as writer:
df.to_excel(writer, sheet_name = period)
The idea behind this is for every row in the dataframe, go through every period in a list of unique periods, and if the row[38] -- which is the index of Period -- is equal to a period, write it into the data_by_month.xlsx workbook into a specific worksheet.
I know that my code is completely incorrect right now, but it's the general logic that I've been trying to implement. I'm pretty sure I'm referring to the location of the "Period" column in the dataframe incorrectly, since it keeps saying it's out of range. Any advice would be welcome!
Thank you so much!
You should be able to achieve this using a groupby in pandas. For example ...
with pd.ExcelWriter("data_by_month.xlsx") as writer:
for period, data in df.groupby('Period'):
data.to_excel(writer, sheet_name = period)

Categories