I am using pandas to make a dataframe. I want to delete 12 initial rows by drop function. every resources website says that you should use drop to delete the rows unfortunately it doesn't work. I don't know why. the error says that 'list' object has no attribute 'drop' could you do me a favor and find it what should I do?
url=Exp01.html
url=str(url)
df = pd.read_html(url)
df = df.drop(index=['1','12'],axis=0,inplace=True)
print(df)
You can slice the rows out:
df = df.loc[11:]
df
loc in general is configured this way:
df.loc[x:y]
where x is the starting index and y is the ending index.
[11:] gives starting index as 11 and no ending index
Pandas read_html returns a list of dataframes.
So df is a list on your example. First, take a look at what the list holds.
If it's just one table (dataframe), you can change it to:
df = pd.read_html(url)[0]
Full code:
url=Exp01.html
url=str(url)
df = pd.read_html(url)[0]
df.drop(index=df.index[:12], axis=0, inplace=True)
Related
I have a dataframe with column names ['2533,3093', '1645,2421', '1776,1645', '3133,2533', '2295,2870'] and I'm trying to add a new column which is '2009,3093'.
I'm using df.loc[:, col] = some series, but it is returning a KeyError meaning that column does not exist. But by default, pandas would create that column. If I do df.loc[:, 'test'] = value it works fine.
But somehow, when I do df.loc[:, col], it returns me the entire dataframe. When it should actually return a KeyError, because the column does not existe in the dataframe.
Any thoughts?
Thanks
please use this syntax
df.loc[:,[column name]] = series
df.loc[:, ['2009,3093']] = series
I have used this code for testing, not sure what series you were trying to assing
import pandas as pd
col = ['2533,3093', '1645,2421', '1776,1645', '3133,2533']
df = pd.DataFrame(columns=col)
df.loc[:, ['2009,3093']] = ['a','b','c','d']
print(df)
I want to split the rows while maintaing the values.
How can I split the rows like that?
The data frame below is an example.
the output that i want to see
You can use the pd.melt( ). Read the documentation for more information: https://pandas.pydata.org/docs/reference/api/pandas.melt.html
I tried working on your problem.
import pandas as pd
melted_df = data.melt(id_vars=['value'], var_name="ToBeDropped", value_name="ID1")
This would show a warning because of the unambiguity in the string passed for "value_name" argument. This would also create a new column which I have assigned the name already. The new column will be called 'ToBeDropped'. Below code will remove the column for you.
df = melted_df.drop(columns = ['ToBeDropped'])
'df' will be your desired output.
via wide_to_long:
df = pd.wide_to_long(df, stubnames='ID', i='value',
j='ID_number').reset_index(0)
via set_index and stack:
df = df.set_index('value').stack().reset_index(name='IDs').drop('level_1', 1)
via melt:
df = df.melt(id_vars='value', value_name="ID1").drop('variable', 1)
I have this weird problem with my code . I am trying to generate Auto Id to my dataframe with this code
df['id'] = pd.Series(range(1,(len(df)+1))).astype(str).apply('{:0>8}'.format
now, len(df) is equals to 799734
but df['id'] is Nan after row 77998
I tried to print the values using:
[print(i) for i in range(1,(len(df)+1))]
In first attempt it printed None after 77998 values. In second attempt it printed all values to the end normally. but dataframe has still Nan in last rows.
May be it has something to do with memory? I am not getting any hint. Please help me solve this issue.
Missing values means there is different index values in Series and DataFrame, for correct working need same.
So need pass df.index to Series constructor:
df['id'] = pd.Series(range(1,(len(df)+1)), index=df.index).astype(str).apply('{:0>8}'.format
Or 2 rows solution with assign range:
df['id'] = range(1,(len(df)+1))
df['id'] = df['id'].astype(str).apply('{:0>8}'.format
Or create default index values in DataFrame for same like Series:
df = df.reset_index(drop=True)
df['id'] = pd.Series(range(1,(len(df)+1))).astype(str).apply('{:0>8}'.format
I am not able to find how to index my dataframe columns properly
I tried some methods but not able to find right one
import pandas as pd
df = pd.read_html('sbi.html')
data = df[1]
i want the second row as my index of columns in which "Narration" is there
Set header parameter to 1:
data = pd.read_html('sbi.html', header=1)[0]
Or use skiprows parameter:
data = pd.read_html('sbi.html', skiprows=1)[0]
I'm trying to manipulate a dataframe using a cumsum function.
My data looks like this:
To perform my cumsum, I use
df = pd.read_excel(excel_sheet, sheet_name='Sheet1').drop(columns=['Material']) # Dropping material column
I run the rest of my code, and get my expected outcome of a dataframe cumsum without the material listed:
df2 = df.as_matrix() #Specifying Array format
new = df2.cumsum(axis=1)
print(new)
However, at the end, I need to replace this material column. I'm unsure how to use the add function to get this back to the beginning of the dataframe.
IIUC, then you can just set the material column to the index, then do your cumsum, and put it back in at the end:
df2 = df.set_index('Material').cumsum(1).reset_index()
An alternative would be to do your cumsum on all but the first column:
df.iloc[:,1:] = df.iloc[:,1:].cumsum(1)