Automation of splitting excel sheets based on column values with Python - python

Consider I have a huge excel sheet, with multiple columns and entries. However, there exists a particular column (COLUMN A) containing boolean values 0s and 1s. Now I wish to split my parent excel sheet into 2 sheets, based on the values of the COLUMN A. I already know that this can be done using VBA codes. However, I wanna try this on python.
My idea is that we can iterate through the said column values, and if a condition is satisfied, pick up the whole row and write it in a new sheet.
I am learning the language, can use numpy and pandas a bit to create linear regression models and the like. I'd like to work on this 'personal-project'. Would be glad if anyone would help me with this, provide a few hints or something to start with. Thank you.

How I would go about it:
Read the full excel sheet into a pandas dataframe
df = pd.from_excel("file_name.xlsx")
Filter the dataframe by values in that columns
df1 = df[df["COLUMN A"]==1]
df0 = df[df["COLUMN A"]==0]
Read those new dataframes to a new excel workbook, or new excel sheet on an exisiting workbook, using the pandas ExcelWriter: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.ExcelWriter.html
Don't forget to handle missing data in column A, if there is any.
I am just a student, so perhaps there are more efficient ways to do this, but I use pandas quite a bit in my undergraduate research and this is what I would do. Best of luck you :)

Related

Merging multiple excel files into a master file using python with out any repeated values

I have multiple excel files with different columns and some of them have same columns with additional data added as additional columns. I created a masterfile which contain all the column headers from each excel file and now I want to export data from individual excel files into the masterfile. Ideally, each row representing all the information about one single item.
I tried merging and concatenating the files, it adds all the data as new rows so, now I have some columns with repeated data but they also contain additional data in different columns.
What I want now is to recognize the columns that are already present and fill in the new data instead of repeating the all columns using python. I cannot share the data or the code so, looking for some help or idea to get this done. Any help would be appreciated, Thanks in advance!
You are probably merging the wrong way.
Not sure about your masterfile, sounds not very intuitive.
Make sure your rows have a specific ID that identifies it.
Then always perform the merge with that id and the 'inner' merge type.

send dataframe to multiple sheets - with non standard format excel

Looking to produce the following output in the image below in an export to excel.
Input is a dataframe and output is multiple sheets that is determind by a column in the dataframe. In addition each sheet should be the same format, with some merging and cells with values selected from the dataframe.
I have tried to solve this in two parts - splitting into separate sheets which i convert the dataframe into a dictionary and then xlsxwriter for editing the sheets but I cannot combine the two into one bit of code which is desirable. any help much appreciated to get started on how to combine the two approaches or a new approach altogether

Pandas and complicated filtering and merge/join multiple sub-data frames

I have a seemingly complicated problem and I have a general idea of how I should solve it but I am not sure if it is the best way to go about it. I'll give the scenario and would appreciate any help on how to break this down. I'm fairly new with Pandas so please excuse my ignorance.
The Scenario
I have a CSV file that I import as a dataframe. My example I am working through contains 2742 rows × 136 columns. The rows are variable but the columns are set. I have a set of 23 lookup tables (also as CSV files) named per year, per quarter (range is 2020 3rd quarter - 2015 1st quarter) The lookup files are named as such: PPRRVU203.csv. So that contains values from the 3rd quarter of 2020. The lookup tables are matched by two columns ('Code' and 'Mod') and I use three values that are associated in the lookup.
I am trying to filter sections of my data frame, pull the correct values from the matching lookup file, merge back into the original subset, and then replace into the original dataframe.
Thoughts
I can probably abstract this and wrap in a function but not sure how I can place back in. My question, for those that understand Pandas better than myself, what is the best method to filter, replace the values, and write the file back out.
The straight forward solution would be to filter the original dataframe into 23 separate dataframes, then do the merge on each individual file, then concat into a new dataframe and output to CSV.
This seems highly inefficient?
I can post code but I am looking for more of any high-level thoughts?
Not sure exactly how your DataFrame looks like but Pandas.query() method will maybe prove useful for the selection of data.
name = df.query('columnname == "something"')

Normalize heavily merged excel table

I was asked to do some data manipulation on an excel table with a head that's heavily merged as in the following picture...
And here is the some of the data inside the table...
If I tried to drop the first 17 rows of the head to drop the nonsense and get to the column names it still wouldn't read the column names correctly due to current merge, and I couldn't seem to figure a way to do it using pandas yet.
Any ideas?

Using pandas.DataFrame to get format of excel cell?

TLDR: Uploading an existing excel file to a pandas DataFrame using df = pd.read_excel(file.xlsx). Currently unable to find any way to get the format (in terms of the excel sheet, i.e. General, Number, Currency, etc.) from the DataFrame df. Does anyone have any suggestions?
Associated Topics: I know this is possible in PHP and C#, but I would prefer to stay in python for the simplicity.
You can set a style really easily in pandas, but I can't find any documentation which shows how to get a style for a particular item in the DataFrame.

Categories