dropping a row in pandas with dates indexes, python - python

I'm trying to drop the last row in a dataframe created by pandas in python and seem to be having trouble.
index = DateRange('1/1/2000', periods=8)
df = DataFrame(randn(8, 3), index=index, columns=['A', 'B', 'C'])
I tried the drop method like this:
df.drop([shape(df)[0]-1], axis = 0)
but it keeps saying label not contained in the axis.
I also tried to drop by index name and it still doesn't seem to be working.
Any advice would be appreciated. Thanks!!!

df.ix[:-1]
returns the original DataFrame with the last row removed.

Referencing the DataFrame directly to retrieve all but the last index worked for me.
df[:-1]

Related

How to delete multiple rows in data frame at panda in python?

I am using pandas to make a dataframe. I want to delete 12 initial rows by drop function. every resources website says that you should use drop to delete the rows unfortunately it doesn't work. I don't know why. the error says that 'list' object has no attribute 'drop' could you do me a favor and find it what should I do?
url=Exp01.html
url=str(url)
df = pd.read_html(url)
df = df.drop(index=['1','12'],axis=0,inplace=True)
print(df)
You can slice the rows out:
df = df.loc[11:]
df
loc in general is configured this way:
df.loc[x:y]
where x is the starting index and y is the ending index.
[11:] gives starting index as 11 and no ending index
Pandas read_html returns a list of dataframes.
So df is a list on your example. First, take a look at what the list holds.
If it's just one table (dataframe), you can change it to:
df = pd.read_html(url)[0]
Full code:
url=Exp01.html
url=str(url)
df = pd.read_html(url)[0]
df.drop(index=df.index[:12], axis=0, inplace=True)

Unable to update Pandas dataframe element with dictionary

I have a Pandas dataframe where its just 2 columns: the first being a name, and the second being a dictionary of information relevant to the name. Adding new rows works fine, but if I try to updates the dictionary column by assigning a new dictionary in place, I get
ValueError: Incompatible indexer with Series
So, to be exact, this is what I was doing to produce the error:
import pandas as pd
df = pd.DataFrame(data=[['a', {'b':1}]], columns=['name', 'attributes'])
pos = df[df.loc[:,'name']=='a'].index[0]
df.loc[pos, 'attributes'] = {'c':2}
I was able to find another solution that seems to work:
import pandas as pd
df = pd.DataFrame(data=[['a', {'b':1}]], columns=['name', 'attributes'])
pos = df[df.loc[:,'name']=='a'].index[0]
df.loc[:,'attributes'].at[pos] = {'c':2}
but I was hoping to get an answer as to why the first method doesn't work, or if there was something wrong with how I had it initially.
Since you are trying to access a dataframe with an index 'pos', you have to use iloc to access the row. So changing your last row as following would work as intended:
df.iloc[pos]['attributes'] = {'c':2}
For me working DataFrame.at:
df.at[pos, 'attributes'] = {'c':2}
print (df)
name attributes
0 a {'c': 2}

How do I split rows in DataFrame?

I want to split the rows while maintaing the values.
How can I split the rows like that?
The data frame below is an example.
the output that i want to see
You can use the pd.melt( ). Read the documentation for more information: https://pandas.pydata.org/docs/reference/api/pandas.melt.html
I tried working on your problem.
import pandas as pd
melted_df = data.melt(id_vars=['value'], var_name="ToBeDropped", value_name="ID1")
This would show a warning because of the unambiguity in the string passed for "value_name" argument. This would also create a new column which I have assigned the name already. The new column will be called 'ToBeDropped'. Below code will remove the column for you.
df = melted_df.drop(columns = ['ToBeDropped'])
'df' will be your desired output.
via wide_to_long:
df = pd.wide_to_long(df, stubnames='ID', i='value',
j='ID_number').reset_index(0)
via set_index and stack:
df = df.set_index('value').stack().reset_index(name='IDs').drop('level_1', 1)
via melt:
df = df.melt(id_vars='value', value_name="ID1").drop('variable', 1)

range(1:len(df)) assigns NaN to last rows in dataframe

I have this weird problem with my code . I am trying to generate Auto Id to my dataframe with this code
df['id'] = pd.Series(range(1,(len(df)+1))).astype(str).apply('{:0>8}'.format
now, len(df) is equals to 799734
but df['id'] is Nan after row 77998
I tried to print the values using:
[print(i) for i in range(1,(len(df)+1))]
In first attempt it printed None after 77998 values. In second attempt it printed all values to the end normally. but dataframe has still Nan in last rows.
May be it has something to do with memory? I am not getting any hint. Please help me solve this issue.
Missing values means there is different index values in Series and DataFrame, for correct working need same.
So need pass df.index to Series constructor:
df['id'] = pd.Series(range(1,(len(df)+1)), index=df.index).astype(str).apply('{:0>8}'.format
Or 2 rows solution with assign range:
df['id'] = range(1,(len(df)+1))
df['id'] = df['id'].astype(str).apply('{:0>8}'.format
Or create default index values in DataFrame for same like Series:
df = df.reset_index(drop=True)
df['id'] = pd.Series(range(1,(len(df)+1))).astype(str).apply('{:0>8}'.format

Removing index column in pandas when reading a csv

I have the following code which imports a CSV file. There are 3 columns and I want to set the first two of them to variables. When I set the second column to the variable "efficiency" the index column is also tacked on. How can I get rid of the index column?
df = pd.DataFrame.from_csv('Efficiency_Data.csv', header=0, parse_dates=False)
energy = df.index
efficiency = df.Efficiency
print efficiency
I tried using
del df['index']
after I set
energy = df.index
which I found in another post but that results in "KeyError: 'index' "
When writing to and reading from a CSV file include the argument index=False and index_col=False, respectively. Follows an example:
To write:
df.to_csv(filename, index=False)
and to read from the csv
df.read_csv(filename, index_col=False)
This should prevent the issue so you don't need to fix it later.
df.reset_index(drop=True, inplace=True)
DataFrames and Series always have an index. Although it displays alongside the column(s), it is not a column, which is why del df['index'] did not work.
If you want to replace the index with simple sequential numbers, use df.reset_index().
To get a sense for why the index is there and how it is used, see e.g. 10 minutes to Pandas.
You can set one of the columns as an index in case it is an "id" for example.
In this case the index column will be replaced by one of the columns you have chosen.
df.set_index('id', inplace=True)
If your problem is same as mine where you just want to reset the column headers from 0 to column size. Do
df = pd.DataFrame(df.values);
EDIT:
Not a good idea if you have heterogenous data types. Better just use
df.columns = range(len(df.columns))
you can specify which column is an index in your csv file by using index_col parameter of from_csv function
if this doesn't solve you problem please provide example of your data
One thing that i do is df=df.reset_index()
then df=df.drop(['index'],axis=1)
To remove or not to create the default index column, you can set the index_col to False and keep the header as Zero. Here is an example of how you can do it.
recording = pd.read_excel("file.xls",
sheet_name= "sheet1",
header= 0,
index_col= False)
The header = 0 will make your attributes to headers and you can use it later for calling the column.
It works for me this way:
Df = data.set_index("name of the column header to start as index column" )

Categories