Pandas: Write dataframe containing strings to xlsx with multiline format - python

df=pd.DataFrame(['abc\n123\n232','1\n2\n3\n4\n5\n6'])
df.to_csv('text.csv')
I would like to have in a single cell in the xlsx (Edited: not csv):
abc
123
232
The desired output is A1 cell only being filled.
The dataframe has only 1 cell.
But the above code would result in the xlsx (Edited: not csv) printing that 1 cell into multiple cells.
Is there a way to format and write the xlsx (Edited: not csv) into multilines within each cell?
Edit:
I shall clarify my problem. There is nothing wrong with my dataframe definition. I would like the "\n" within the strings in each cell of the dataframe to become a line break within the xlsx (Edited: not csv) cell. this is another example.
df=pd.DataFrame(['abc\n123\n232','1\n2\n3\n4\n5\n6'])
df.to_csv('text.csv')
The desired output is A1 and A2 cells only being filled.
Edit 2:
Not in csv but xlsx.

You can use .to_excel with index=False and header=False.
df.to_excel('test.xlsx', index=False, header=False)
But you may need to turn on 'Wrap Text' by yourself.

why not use openpyxl for that it will work
from openpyxl import Workbook
workbook = Workbook()
worksheet = workbook.worksheets[0]
worksheet.title = "Sheet1"
worksheet.cell('A1').style.alignment.wrap_text = True
worksheet.cell('A1').value = "abc\n123\n232"
worksheet.cell('B1').value = "1\n2\n3\n4\n5\n6"
workbook.save('test.xlsx')

Related

Format and manipulate data across multiple Excel sheets in Python using openpyxl before converting to Dataframe

I need some help with editing the sheets within my Excel workbook in python, before I stack the data using pd.concat(). Each sheet (~100) within my Excel workbook is structured identically, with the unique identifier for each sheet being a 6-digit code that is found in line 1 of the worksheet.
I've already done the following steps to import the file, unmerge rows 1-4, and insert a new column 'C':
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data_sheets.xlsx')
for sheet in wb.worksheets:
sheet.merged_cells
for merge in list(sheet.merged_cells):
sheet.unmerge_cells(range_string=str(merge))
sheet.insert_cols(3, 1)
print(sheet)
wb.save('workbook_test.xlsx')
#concat once worksheets have been edited
df= pd.concat(pd.read_excel('workbook_test.xlsx, sheet_name= None), ignore_index= True)
Before stacking the data however, I would like to make the following additonal (sequential) changes to every sheet:
Extract from row 1 the right 8 characters (in excel the equivalent of this would be =RIGHT(A1, 8) - this is to pull the unique code off of each sheet, which will look like '(000000)'.
Populate column C from rows 6-282 with the unique code.
Delete rows 1-5
The end result would make each sheet within the workbook look like this:
Is this possible to do with openpyxl, and if so, how? Any direction or assistance with this would be much appreciated - thank you!
Here is a 100% openpyxl approach to achieve what you're looking for :
from openpyxl import load_workbook
wb = load_workbook("workbook_test.xlsx")
for ws in wb:
ws.unmerge_cells("A1:O1") #unmerge first row till O
ws_uid = ws.cell(row=1, column=1).value[-8:] #get the sheet's UID
for num_row in range(6, 282):
ws.cell(row=num_row, column=3).value = '="{}"'.format(ws_uid) #write UID in Column C
ws.delete_rows(1, 5) #delete first 5 rows
wb.save("workbook_test.xlsx")
NB : This assume there is already an empty column (C).

How to fill a block of cells in xlsxwriter?

I had a problem with my xlsxwriter code, it works perfectly fine but I need to figure out how to select a block of cells. For example - from A1 to J10 as it depicted on screenshot.
Does xlsxwriter have such a function? I've searched several formats, such as:
worksheet.write('A1:J1', '...')
But it write only on A1. So for example - how can I fill all highlighted area with one word without writing code for all of 10 rows?
Here is how I write my output from a pandas dataframe to an excel template.
Please note that if data is already present in the cells where you are trying to write the dataframe, it will not be overwritten and the dataframe will be written to a new sheet which is my i have included a step to clear existing data from the template. I have not tried to write output on merged cells so that might throw an error.
Setup
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
file_path='Template.xlsx'
book=load_workbook(file_path)
writer = pd.ExcelWriter(file_path, engine='openpyxl')
writer.book = book
sheet_name="Template 1"
sheet=book[sheet_name]
Set first row and first column in the excel template where output is to be pasted.
If my output is to be pasted starting in cell N2, row_start will be 2 and col_start will be 14
row_start=2
col_start=14
Clear existing data in excel template
for c_idx, col in enumerate(df.columns,col_start):
for r_idx in range(row_start,10001):
sheet.cell(row=r_idx, column=c_idx, value="")
Write dataframe to excel template
rows=dataframe_to_rows(df,index=False)
for r_idx, row in enumerate(rows,row_start):
for c_idx, col in enumerate(row,col_start):
sheet.cell(row=r_idx, column=c_idx, value=col)
writer.save()
writer.close()

Joining excel rows to a single string to be used in pandas DataFrame

I am new to Pandas.
I have and excel file with 10 sheets in it. I am trying to achieve this.
As no answers were provided on that question I am going to use this method to check if a string in a DataFrame row contains a word from excel sheet:
file = pd.read_excel(open('config_values.xlsx', 'rb'),
sheet_name='ContainsFree')
Join all rows in excel sheet using first_sheet = '|'.join(file)
Using :
df['Contains Language'] = df.Search_Query.str.contains(first_sheet, regex=True)
However, when I use '|'.join(file) I get the first row of the excel sheet rather than the joined string:
excel_sheet_1
gratuit
free
gratis
...
After '|'join.(file) I get:
gratuit
Expected:
gratuit|free|gratis
What am I doing wrong in order to join all rows in an excel sheet?
Thank you for your suggestions.
Try:
file = pd.read_excel('config_values.xlsx', sheet_name='ContainsFree', header=None)
'|'.join(file[0].astype(str))
'gratuit|free|gratis'

Can I modify specific sheet from Excel file and write back to the same without modifying other sheets using Pandas | openpyxl

I'll try to explain my problem with an example:
Let's say I have an Excel file test.xlsx which has five tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4 and sheet5. I am interested to read and modify data in sheet2.
My sheet2 has some columns whose cells are dropdowns and those dropdown values are defined in sheet4 and sheet5. I don't want to touch sheet4 and sheet5. (I mean sheet4 & sheet5 have some references to cells on Sheet2).
I know that I can read all the sheets in excel file using pd.read_excel('test.xlsx', sheetnames=None) which basically gives all sheets as a dictionary(OrderedDict) of DataFrames.
Now I want to modify my sheet2 and save it without disturbing others. So is it posibble to do this using Python Pandas library.
[UPDATE - 4/1/2019]
I am using Pandas read_excel to read whatever sheet I need from my excel file, validating the data with the data in database and updating the status column in the excelfile.
So for writing back the status column in excel I am using openpyxl as shown in the below pseudo code.
import pandas as pd
import openpyxl
df = pd.read_excel(input_file, sheetname=my_sheet_name)
df = df.where((pd.notnull(df)), None)
write_data = {}
# Doing some validations with the data and building my write_data with key
# as (row_number, column_number) and value as actual value to put in that
# cell.
at the end my write_data looks something like this:
{(2,1): 'Hi', (2,2): 'Hello'}
Now I have defined a seperate class named WriteData for writing data using openpyxl
# WriteData(input_file, sheet_name, write_data)
book = openpyxl.load_workbook(input_file, data_only=True, keep_vba=True)
sheet = book.get_sheet_by_name(sheet_name)
for k, v in write_data.items():
row_num, col_num = k
sheet.cell(row=row_num, column=col_num).value = v
book.save(input_file)
Now when I am doing this operation it is removing all the formulas and diagrams. I am using openpyxl 2.6.2
Please correct me if I am doing anything wrong! Is there any better way to do?
Any help on this will be greatly appreciated :)
To modify a single sheet at a time, you can use pandas excel writer:
sheet2 = pd.read_excel("test.xlsx", sheet = "sheet2")
##modify sheet2 as needed.. then to save it back:
with pd.ExcelWriter("test.xlsx") as writer:
sheet2.to_excel(writer, sheet_name="sheet2")

Is it possible to copy the Excel formula of a cell instead of the value using python?

Right now I'm working on combining Excel sheets into 1 new sheet, using pandas which is working.
The only problem is that the value inside the new Excel sheet are plain numbers instead of the Formulas and I would like the Formulas.
Loading file
directory = os.path.dirname(__file__)
fname = os.path.join(directory, "Reisanalyze.xlsm")
print("Loading %s..." % fname)
sheet1 = pd.read_excel(fname, sheetname="Input")
sheet2 = pd.read_excel(fname, sheetname="Alternatieven")
Write to new sheet
writer = pd.ExcelWriter('first_sheet.xlsx', engine='xlsxwriter')`**
sheet1.to_excel(writer, sheet_name='Input', merge_cells=False, startrow=0, startcol=0)
sheet2.to_excel(writer, sheet_name='Input', merge_cells=False, startrow=0, startcol=21)
I originally tried to use the pycel project which worked until I needed to load multiple sheets, which didn't work. That's why I'm using pandas to write multiple sheets into 1 sheet.
You can use OpenPyXL. Read here
Following is the test excel file testexl.xlsx
A | B
---------- | ------
=SUM(B1:B2)| 1
| 2
Following is the test code
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = 'testexl.xlsx')
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values)
print df
Output
0 1
0 =SUM(B1:B2) 1
1 None 2
If you want to keep excel formulas, then you will need to stop them from being formulas and then convert them back afterwards.
To do this, before conversion, on your keyboard, do control/command+F then a menu should come up in the middle of the screen then click the replace tab.
In the "find What:" box type "=" and and in the "replace with:" box type ".=". Then do replace all.
This will turn the formulas to text for you to copy.
Save it as a csv file
Note: I know that this will also replace = signs inside of formula. It doesn't matter, it'll go.
After you merge them, open it back up in excel, repeat but in reverse to convert them back into formulas.
This might be easier than importing extra modules.

Categories