Skip Columns with pandas - python

problem
I have first concatenating all data from the available excel files into a single dataframe and then writing that dataframe into a new excel file. However, I would like to do 2 simple things:
a leave a 2columns blank for each new dataframe that will be appended
b the headers and the bold formatting has disappeared after appending the dataframes. see a pic of how one excelFile initially looked Original formatting
attempt This is my attempt Two Seperate DataFrames
data = []
for excel_file in excel_files:
print(excel_file) # the name for the dataframe
data.append(pd.read_excel(excel_file, engine="openpyxl"))
df1 = pd.DataFrame(columns=['DVT', 'Col2', 'Col3']) #blank df maybe?!this line is not imp!
#df1.style.set_properties(subset=['DVT'], {'font-weight:bold'}) !this line is not imp!
# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)

I don't have Excel available right now, so I can't test, but something like this might be a good approach.
# Open the excel document using a context manager in 'append' mode.
with pd.ExcelWriter(excelAutoNamed, mode="a", engine="openpyxl", if_sheet_exists="overlay") as writer:
for excel_file in excel_files:
print(excel_file)
# Append Dataframe to Excel File.
pd.read_excel(excel_file, engine="openpyxl").to_excel(writer, index=False)
# Append Dataframe with two blank columns to File.
pd.DataFrame([np.nan, np.nan]).T.to_excel(writer, index=False, header=False)

Related

Multiple sheets of an Excel workbook into different dataframes using Pandas

I have a Excel workbook which has 5 sheets containing data.
I want each sheet to be a different dataframe.
I tried using the below code for one sheet of my Excel Sheet
df = pd.read_excel("path",sheet_name = ['Product Capacity'])
df
But this returns the sheet as a dictionary of the sheet, not a dataframe.
I need a data frame.
Please suggest the code that will return a dataframe
If you want separate dataframes without dictionary, you have to read individual sheets:
with pd.ExcelFile('data.xlsx') as xlsx:
prod_cap = pd.read_excel(xlsx, sheet_name='Product Capacity')
load_cap = pd.read_excel(xlsx, sheet_name='Load Capacity')
# and so on
But you can also load all sheets and use a dict:
dfs = pd.read_excel('data.xlsx', sheet_name=None)
# dfs['Product Capacity']
# dfs['Load Capacity']

How remove numbering from output after extract xls file with pandas [Python]

I have a Python Script that extracts a specific column from an Excel .xls file, but the output has a numbering next to the extracted information, so I would like to know how to format the output so that they don't appear.
My actual code is this:
for i in sys.argv:
file_name = sys.argv[1]
workbook = pd.read_excel(file_name)
df = pd.DataFrame(workbook, columns=['NOM_LOGR_COMPLETO'])
df = df.drop_duplicates()
df = df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
print(df)
My current output:
1 Street Alpha <br>
2 Street Bravo
But the result I need is:
Street Alpha <br>
Street Bravo
without the numbering, just the name of the streets.
Thanks!
I believe you want to have a dataframe without the index. Note that you cannot have a DataFrame without the indexes, they are the whole point of the DataFrame. So for your case, you can adopt:
print(df.values)
to see the dataframe without the index column. To save the output without index, use:
writer = pd.ExcelWriter("dataframe.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name = df, index=False)
writer.save()
where file_name = "dataframe.xlsx" for your case.
Further references can be found at:
How to print pandas DataFrame without index
Printing a pandas dataframe without row number/index
disable index pandas data frame
Python to_excel without row names (index)?

Append only new values to CSV from DataFrame in Python

Let's suppose I have a CSV file which looks like this:
Date,High,Low,Open,Close,Volume,Adj Close
1980-12-12,0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907
1980-12-15,0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
1980-12-16,0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533
I have also a Pandas DataFrame which has exactly the same values but also the new entries. My goal is to append to the CSV file only the new values.
I tried like this, but unfortunately this append not only the new entries, but the old ones also:
df.to_csv('{}/{}'.format(FOLDER, 'AAPL.CSV'), mode='a', header=False)
You can just re-read your csv file after writing it and drop any duplicates before appending the newly fetched data.
The following code was working for me:
import pandas as pd
# Creating original csv
columns = ['Date','High','Low','Open','Close','Volume','Adj Close']
original_rows = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
]]
df_original = pd.DataFrame(columns=columns, data=original_rows)
df_original.to_csv('AAPL.CSV', mode='w', index=False)
# Fetching the new data
rows_updated = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
], ["1980-12-16",0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533]]
df_updated = pd.DataFrame(columns=columns, data=rows_updated)
# Read in current csv values
current_csv_data = pd.read_csv('AAPL.CSV')
# Drop duplicates and append only new data
new_entries = pd.concat([current_csv_data, df_updated]).drop_duplicates(subset='Date', keep=False)
new_entries.to_csv('AAPL.CSV', mode='a', header=False, index=False)

How to open, delete columns and save a xls file in python

I need to know how to open a xls file that is already made, I want to delete some columns and then save the file. This is what I have but I get an error when I want to delete the columns. How do I use the DataFrame function to delete columns and then save.
Read in excel file
Workbook = xlrd.open_workbook("C:/Python/Python37/Files/firstCopy.xls", on_demand=True)
worksheet = Workbook.sheet_by_name("Sheet1")
Delete a column
df.DataFrame.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1, inplace=True)
Workbook.save('output.xls')
Without seeing your dataset and error it is hard to tell what is going on. See How to Ask and how to create a Minimal, Complete, and Verifiable example.
Here's what I would suggest:
import pandas as pd
df = pd.read_excel('firstCopy.xls')
df.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
import pandas as pd
df = pd.read_excel('firstCopy.xls')
df.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1, inplace = True)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

How to prepend new rows at the beginning of an existing csv file?

Assume this is my csv file: (df)
id,name,version,ct_id
1,testing,version1,245
2,testing1,version2,246
3,testing2,version3,247
4,testing3,version4,248
5,testing1,version5,249
Now I've performed some operation on the file and write it to another csv file.
df = pd.read_csv('op.csv')
df1 = df.groupby('name').agg({'version': ', '.join, 'ct_id': 'first'}).reset_index()
df1.to_csv('test.csv', index=False)
Now I've another csv file. (df_1)
id,name,version,ct_id
36,testing17,version16,338
37,testing18,version17,339
I want to write this to my existing test.csv file which I created earlier but I want to insert these two rows at the beginning of the file rather than at the end.
I tried something like this.
df_1.iloc[:, 1:].to_csv('test.csv', mode='a', index=False)
# This does append but at the end.
I would appreciate if someone could help?
Prepending A in B is same as appending B to A.
The below code should work for the above case.
test_df = pd.read_csv('test.csv')
df_1 = pd.read_csv('df_1.csv')
df_1 = df_1.append(test_df, sort=False)
df_1.to_csv('test.csv')

Categories