I have a dataframe which is like as shown below
Though I know the column names are 'FR', 'ig' and 'te' with the help of below command.
dataFramesDict['Tri'].columns
What does name = 'level_1' mean here? Moreover, I also don't see subject_ID in the columns or index list. What is subject_ID here?
How do I get the output to be like as shown below
I tried the below code to rename 'level_1' to 'subject_ID' but it doesn't work
dataFramesDict['Tri'].index = dataFramesDict['Tri'].index.rename('subject_ID')
Please note that the data is just a sample data. I am only interested in changing the first column name and dropping that 'level_1'. Nothing to do with data
I am unable to create dataframe of this type through sample code. The above shown dataframe is a result of another complex code. So, I have provided a screenshot of dataframe
Try this
df.columns.name= ''
df.reset_index(inplace=True)
Related
I imported stock/options data into a data frame and want to use pandas to manually filter for specific criteria. I renamed a few columns and then later on I tried to do a bit of cleaning so I can work with the data.
I tried to replace percentage signs then convert the data type to a float by doing this:
df = df['IV'].str.rstrip("%").astype(float)
df = df['IV_Rank'].str.rstrip("%").astype(float)/100
df = df['IV PCT'].str.rstrip("%").astype(float)/100
When I run that code I get the error message: KeyError: 'IV'. I got this error for the other columns as well when I tried to run them each independently but I tried copy then pasting the column name as well as trying the old names. I am not too sure what to do but some help would be appreciated
That's because you are overwriting the entire dataframe. This is what I think you are trying to do
df['IV'] = df['IV'].str.rstrip("%").astype(float)
df['IV_Rank'] = df['IV_Rank'].str.rstrip("%").astype(float)/100
df['IV PCT'] = df['IV PCT'].str.rstrip("%").astype(float)/100
im kinda new to pandas and stuck at how to refer a column within same name under different merged column. here some example which problem im stuck about. i wanna refer a database from worker at company C. but if im define this excel as df and
dfcompanyAworker=df[Worker]
it wont work
is there any specific way to define a database within identifical column like this ?
heres the table
https://i.stack.imgur.com/8Y6gp.png
thanks !
first read the dataset that will be used, then set the shape for example I use excel format
dfcompanyAworker = pd.read_excel('Worker', skiprows=1, header=[1,2], index_col=0, skipfooter=7)
dfcompanyAworker
where:
skiprows=1 to ignore the title row in the data
header=[1, 2] is a list because we have multilevel columns, namely Category (Company) and other data
index_col=0 to make the Date column an index for easier processing and analysis
skipfooter=7 to ignore the footer at the end of the data line
You can follow or try the steps as I made the following
Original dataframe
Converted Dataframe using stack and split:
Adding new column to a converted dataframe:
What i am trying to is add a new column using np.select(condition, values) but it not updating the two addition rows derived from H1 its returning with 0 or NAN. Can someone please help me here ?
Please note i have already done the reset index but still its not helping.
I think using numpy in this situation is kind of unnecessary.
you can use something like the following code:
df[df.State == 'CT']['H3'] = 4400000
I am trying to drop some specific rows in a DataFrame df where, the column Time is anything except 06:00:00. I tried the following code but it dosen't seem to work. I even tried adding another column Index to my file to aid the process but still it is not working. Can you please help me. I am attaching the screenshots.
The val just contains the specific time 06:00:00. Also, please ignore the variable req. Thanks a lot.
In pandas, by default drop isn't inplace operation. Try specifying df.drop(j, inplace=True).
Have you tried?
df = df.drop(df[//expresion here//].index)
Or even better:
df = df[~df.a.str.contains("06:00:00")]
Where a is the name of the column you want to search the time in
I have a dataframe with a Timestamp column, a Tag column and a Value column.
I did a pivot like this:
df = df.groupBy("Timestamp").pivot("Tag").mean()
Which works well, gives me what I want. When I show columns, I get
df.columns
----------------------------------------
['Timestamp', 'TAG:Tag1.val', 'TAG:Tag2.val', 'TAG:Tag3.val']
But then when I try to select a column, I have this error:
df.select('TAG:Tag1.val')
----------------------------------------
org.apache.spark.sql.AnalysisException: cannot resolve '`TAG:Tag1.val`' given input columns: [Timestamp, TAG:Tag1.val, TAG:Tag2.val, TAG:Tag3.val];;
I tried by giving the name directly, by using df.columns[0], df.schema.fieldNames(), by doing df=df.toDF(*df.schema.fieldNames()) before select.
Always the same error message. Do you know why is it doing so?
I also tried to hardcode the column's list in .pivot("Tag", list_tags), got the same result.
I also need to tell you that selecting Timestamp works perfectly well.
Here's a way, you need to wrap the column names with backticks:
df.select('`TAG:Tag1.val`').show()
To check all columns, you can do:
df.select([f'`{x}`' for x in df.columns]).show()
I found the solution by replacing all ':' in the tags by '_'.
For a reason I don't know, pyspark did not read correctly ':'