Unable to rename and remove pandas Index - Python - python

I have a dataframe which is like as shown below
Though I know the column names are 'FR', 'ig' and 'te' with the help of below command.
dataFramesDict['Tri'].columns
What does name = 'level_1' mean here? Moreover, I also don't see subject_ID in the columns or index list. What is subject_ID here?
How do I get the output to be like as shown below
I tried the below code to rename 'level_1' to 'subject_ID' but it doesn't work
dataFramesDict['Tri'].index = dataFramesDict['Tri'].index.rename('subject_ID')
Please note that the data is just a sample data. I am only interested in changing the first column name and dropping that 'level_1'. Nothing to do with data
I am unable to create dataframe of this type through sample code. The above shown dataframe is a result of another complex code. So, I have provided a screenshot of dataframe

Try this
df.columns.name= ''
df.reset_index(inplace=True)

Related

Why Doesn't Python Recognize the Column Name (KeyError)

I imported stock/options data into a data frame and want to use pandas to manually filter for specific criteria. I renamed a few columns and then later on I tried to do a bit of cleaning so I can work with the data.
I tried to replace percentage signs then convert the data type to a float by doing this:
df = df['IV'].str.rstrip("%").astype(float)
df = df['IV_Rank'].str.rstrip("%").astype(float)/100
df = df['IV PCT'].str.rstrip("%").astype(float)/100
When I run that code I get the error message: KeyError: 'IV'. I got this error for the other columns as well when I tried to run them each independently but I tried copy then pasting the column name as well as trying the old names. I am not too sure what to do but some help would be appreciated
That's because you are overwriting the entire dataframe. This is what I think you are trying to do
df['IV'] = df['IV'].str.rstrip("%").astype(float)
df['IV_Rank'] = df['IV_Rank'].str.rstrip("%").astype(float)/100
df['IV PCT'] = df['IV PCT'].str.rstrip("%").astype(float)/100

Way to refer a column within a same name under difference merged cell?

im kinda new to pandas and stuck at how to refer a column within same name under different merged column. here some example which problem im stuck about. i wanna refer a database from worker at company C. but if im define this excel as df and
dfcompanyAworker=df[Worker]
it wont work
is there any specific way to define a database within identifical column like this ?
heres the table
https://i.stack.imgur.com/8Y6gp.png
thanks !
first read the dataset that will be used, then set the shape for example I use excel format
dfcompanyAworker = pd.read_excel('Worker', skiprows=1, header=[1,2], index_col=0, skipfooter=7)
dfcompanyAworker
where:
skiprows=1 to ignore the title row in the data
header=[1, 2] is a list because we have multilevel columns, namely Category (Company) and other data
index_col=0 to make the Date column an ​​index for easier processing and analysis
skipfooter=7 to ignore the footer at the end of the data line
You can follow or try the steps as I made the following

Unable to update new column values in rows which were derived from existing column having multiple values separeted by ','?

Original dataframe
Converted Dataframe using stack and split:
Adding new column to a converted dataframe:
What i am trying to is add a new column using np.select(condition, values) but it not updating the two addition rows derived from H1 its returning with 0 or NAN. Can someone please help me here ?
Please note i have already done the reset index but still its not helping.
I think using numpy in this situation is kind of unnecessary.
you can use something like the following code:
df[df.State == 'CT']['H3'] = 4400000

Dropping rows in a Data Frame

I am trying to drop some specific rows in a DataFrame df where, the column Time is anything except 06:00:00. I tried the following code but it dosen't seem to work. I even tried adding another column Index to my file to aid the process but still it is not working. Can you please help me. I am attaching the screenshots.
The val just contains the specific time 06:00:00. Also, please ignore the variable req. Thanks a lot.
In pandas, by default drop isn't inplace operation. Try specifying df.drop(j, inplace=True).
Have you tried?
df = df.drop(df[//expresion here//].index)
Or even better:
df = df[~df.a.str.contains("06:00:00")]
Where a is the name of the column you want to search the time in

PySpark fails to select column after pivot

I have a dataframe with a Timestamp column, a Tag column and a Value column.
I did a pivot like this:
df = df.groupBy("Timestamp").pivot("Tag").mean()
Which works well, gives me what I want. When I show columns, I get
df.columns
----------------------------------------
['Timestamp', 'TAG:Tag1.val', 'TAG:Tag2.val', 'TAG:Tag3.val']
But then when I try to select a column, I have this error:
df.select('TAG:Tag1.val')
----------------------------------------
org.apache.spark.sql.AnalysisException: cannot resolve '`TAG:Tag1.val`' given input columns: [Timestamp, TAG:Tag1.val, TAG:Tag2.val, TAG:Tag3.val];;
I tried by giving the name directly, by using df.columns[0], df.schema.fieldNames(), by doing df=df.toDF(*df.schema.fieldNames()) before select.
Always the same error message. Do you know why is it doing so?
I also tried to hardcode the column's list in .pivot("Tag", list_tags), got the same result.
I also need to tell you that selecting Timestamp works perfectly well.
Here's a way, you need to wrap the column names with backticks:
df.select('`TAG:Tag1.val`').show()
To check all columns, you can do:
df.select([f'`{x}`' for x in df.columns]).show()
I found the solution by replacing all ':' in the tags by '_'.
For a reason I don't know, pyspark did not read correctly ':'

Categories