How to make the pandas row a column name? - python

When I create the Pandas dataframe, it detects the empty line at the top of the excel file as the column name and shows it as unnamed. But my column names should be the concentration names on the bottom line of it. How can I do this in a pandas? (Editing in Excel is a solution, but I want to automatically edit multiple excel files with python)

I think the column over there is not representing any column it is simply indication that there are many number of columns there. If it is a column and u don't want it u can simply drop it
df.drop("...")
if still it is still not resolved do comment.

Related

Reading Excel files and detect column name in python

I have some excel files that includes some rows(it could be 1 or more rows) at the top for description and below it, there are the tables with the column names and values. Also, some column names are in two rows that I need to merge them. Also, there are cases that includes three rows for the column name.
I would like to go through it, skip the first lines to detect rows that include the column name. What would be your suggestions for it?

Way to refer a column within a same name under difference merged cell?

im kinda new to pandas and stuck at how to refer a column within same name under different merged column. here some example which problem im stuck about. i wanna refer a database from worker at company C. but if im define this excel as df and
dfcompanyAworker=df[Worker]
it wont work
is there any specific way to define a database within identifical column like this ?
heres the table
https://i.stack.imgur.com/8Y6gp.png
thanks !
first read the dataset that will be used, then set the shape for example I use excel format
dfcompanyAworker = pd.read_excel('Worker', skiprows=1, header=[1,2], index_col=0, skipfooter=7)
dfcompanyAworker
where:
skiprows=1 to ignore the title row in the data
header=[1, 2] is a list because we have multilevel columns, namely Category (Company) and other data
index_col=0 to make the Date column an ​​index for easier processing and analysis
skipfooter=7 to ignore the footer at the end of the data line
You can follow or try the steps as I made the following

How to get column names after importing pickle file into Pandas

I am new to hands-on python and programming in general. I have imported a 6gb pickle file into pandas and been able to display the results of the file. It doesn't look well ordered however. My dataframe has varying rows and 842 columns.
My next task is to;
get column names of all 842 columns so i can find columns that have similar features.
create a new column (series) with data from (1) above
"append" new column to original dataframe
Thus far i have tried the "functions" column, col, dataframe.columns, to get column names but no one is working.
Please see what my program looks like;code and output
You can get list of your dataframe column names using this :
list(your_dataframe.columns)
for adding new columns, check this :
new-columns-in pandas

Remove Unnamed Columns in pandas

I am working on an excel file and the pandas shows the excel file like this.
How do i get rid of all Unnamed rows ?
This will do the trick
remove_cols = [col for col in gd.columns if 'Unnamed' in col]
gd.drop(remove_cols, axis='columns', inplace=True)
Looking at the result you are getting, the Excel data doesn't start on the first row. It also starts in column B instead of column A.
If you are able to edit the Excel file, I would recommend starting your data at A1 (by removing the empty column A and the empty rows at the top using Excel), as that will make later processing much easier for everyone reading the file.
If this file is not editable (perhaps it is generated by another party), you will need to skip the first couple of rows to read the correct headings:
gd = pd.read_excel(r"D:\gdp.xlsx", skiprows=3, usecols="B:L")

Iterate Through Folder and Add One Column of Each CSV to Dataframe

I have a folder that contains ~90 CSV files. Each relevant file is named xxxxx-2012 and has the same column names.
I would like to create a single DataFrame with a specific column power(MW) from each file, i.e. 90 columns in total, naming the column in the resulting DataFrame by the file name.
My objective with problems like this is to get to a simple datastructure as quickly as possible. In this case, that could be a dictionary of filenames to DataFrames.
frames = {filename: pd.read_csv(filename) for filename is os.listdir()}
You may have to filter out bad filenames, e.g. by extension, or you may be better off using glob... in either case it breaks up the problem, this shouldn't be too bad.
Then the question becomes much easier*:
How do I get one column from a DataFrame. df[colname].
How do I concat a list of columns to a DataFrame.
*Assuming you know your way around python datastructure e.g. list comprehensions.
Another option is to just concat the entire dict:
pd.concat(frames)
(which gives you a MultiIndex with all the information.)

Categories