Unable to update Pandas dataframe element with dictionary - python

I have a Pandas dataframe where its just 2 columns: the first being a name, and the second being a dictionary of information relevant to the name. Adding new rows works fine, but if I try to updates the dictionary column by assigning a new dictionary in place, I get
ValueError: Incompatible indexer with Series
So, to be exact, this is what I was doing to produce the error:
import pandas as pd
df = pd.DataFrame(data=[['a', {'b':1}]], columns=['name', 'attributes'])
pos = df[df.loc[:,'name']=='a'].index[0]
df.loc[pos, 'attributes'] = {'c':2}
I was able to find another solution that seems to work:
import pandas as pd
df = pd.DataFrame(data=[['a', {'b':1}]], columns=['name', 'attributes'])
pos = df[df.loc[:,'name']=='a'].index[0]
df.loc[:,'attributes'].at[pos] = {'c':2}
but I was hoping to get an answer as to why the first method doesn't work, or if there was something wrong with how I had it initially.

Since you are trying to access a dataframe with an index 'pos', you have to use iloc to access the row. So changing your last row as following would work as intended:
df.iloc[pos]['attributes'] = {'c':2}

For me working DataFrame.at:
df.at[pos, 'attributes'] = {'c':2}
print (df)
name attributes
0 a {'c': 2}

Related

How to delete multiple rows in data frame at panda in python?

I am using pandas to make a dataframe. I want to delete 12 initial rows by drop function. every resources website says that you should use drop to delete the rows unfortunately it doesn't work. I don't know why. the error says that 'list' object has no attribute 'drop' could you do me a favor and find it what should I do?
url=Exp01.html
url=str(url)
df = pd.read_html(url)
df = df.drop(index=['1','12'],axis=0,inplace=True)
print(df)
You can slice the rows out:
df = df.loc[11:]
df
loc in general is configured this way:
df.loc[x:y]
where x is the starting index and y is the ending index.
[11:] gives starting index as 11 and no ending index
Pandas read_html returns a list of dataframes.
So df is a list on your example. First, take a look at what the list holds.
If it's just one table (dataframe), you can change it to:
df = pd.read_html(url)[0]
Full code:
url=Exp01.html
url=str(url)
df = pd.read_html(url)[0]
df.drop(index=df.index[:12], axis=0, inplace=True)

Pandas returning whole dataframe when .loc to a specific column that is not existant

I have a dataframe with column names ['2533,3093', '1645,2421', '1776,1645', '3133,2533', '2295,2870'] and I'm trying to add a new column which is '2009,3093'.
I'm using df.loc[:, col] = some series, but it is returning a KeyError meaning that column does not exist. But by default, pandas would create that column. If I do df.loc[:, 'test'] = value it works fine.
But somehow, when I do df.loc[:, col], it returns me the entire dataframe. When it should actually return a KeyError, because the column does not existe in the dataframe.
Any thoughts?
Thanks
please use this syntax
df.loc[:,[column name]] = series
df.loc[:, ['2009,3093']] = series
I have used this code for testing, not sure what series you were trying to assing
import pandas as pd
col = ['2533,3093', '1645,2421', '1776,1645', '3133,2533']
df = pd.DataFrame(columns=col)
df.loc[:, ['2009,3093']] = ['a','b','c','d']
print(df)

Updating element of dataframe while referencing column name and row number

I am coming from an R background and used to being able to retrieve the value from a dataframe by using syntax like:
r_dataframe$some_column_name[row_number]
And I can assign a value to the dataframe by the following syntax:
r_dataframe$some_column_name[row_number] <= some_value
or without the arrow:
r_dataframe$some_column_name[row_number] = some_value
For example:
#create R dataframe data
employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
employ.data <- data.frame(employee, salary, startdate)
#print out the name of this employee
employ.data$employee[2]
#assign the name
employ.data$employee[2] <= 'Some other name'
I'm now learning some Python and from what I can see the most straight-forward way to retreive a value from a pandas dataframe is:
pandas_dataframe['SomeColumnName'][row_number]
I can see the similarities to R.
However, what confuses me is that when it comes to modifying/assigning the value in the pandas dataframe I need to completely change the syntax to something like:
pandas_dataframe.at[row_number, 'SomeColumnName'] = some_value
To read this code is going to require a lot more concentration because the column name and row number have changed order.
Is this the only way to perform this pair of operations? Is there a more logical way to do this that respects the consistent use of column name and row number order?
If I understand what you mean correctly, as #sammywemmy mentioned you can use .loc and .iloc to get/change value in any row and column.
If the order of your dataframe rows changes, you must define index to get every row (datapoint) by its index, even if the order has changed.
Like below:
df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])
Now you can get the first row by its index:
df.loc['a'] # equivalent to df.iloc[0]
It turns out that pandas_dataframe.at[row_number, 'SomeColumnName'] can be used to modify AND retrieve information.

Is it possible to exclude index column from view in Dataframe?

I want to exclude the Indexcolumn from the view of a Dataframe:
I sort the whole dataframe based on the values (in decending order) and assign ranks.
It perfectly works, however the indexcolumn is a bit misleading (especially in the ranking).
I already tried to replace the Indexcolumn and used the column Rank as an index by using
df.set_index('Rank', inplace=True)
However, the sorting is then suspended and I may get a key Error if 2 persons (like here) have the same Rank.
My code is:
from scipy.stats import rankdata
import pandas as pd
from tabulate import tabulate
names = ['Tim', 'Tom', 'Sam', 'Kyle']
values = [2, 4, 5, 4]
df = pd.DataFrame({'Name': names,'Values': values})
columns = ["Name", "Values"]
df['Rank'] = df['Values'].rank(method='dense', ascending=False).astype(int)
df.sort_values(by="Rank", ascending=True)
Most (possibly all?) of the pandas to_... methods take the index argument. If you set it to False the index won't be shown. If you really want the pretty HTML output in Jupyter then do
from IPython.display import HTML
HTML(df.sort_values(by="Rank", ascending=True).to_html(index=False))

Change one column of a DataFrame only

I'm using Pandas with Python 3. I have a dataframe with a bunch of columns, but I only want to change the data type of all the values in one of the columns and leave the others alone. The only way I could find to accomplish this is to edit the column, remove the original column and then merge the edited one back. I would like to edit the column without having to remove and merge, leaving the the rest of the dataframe unaffected. Is this possible?
Here is my solution now:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
def make_float(var):
var = float(var)
return var
#create a new dataframe with the value types I want
df2 = df1['column'].apply(make_float)
#remove the original column
df3 = df1.drop('column',1)
#merge the dataframes
df1 = pd.concat([df3,df2],axis=1)
It also doesn't work to apply the function to the dataframe directly. For example:
df1['column'].apply(make_float)
print(type(df1.iloc[1]['column']))
yields:
<class 'str'>
df1['column'] = df1['column'].astype(float)
It will raise an error if conversion fails for some row.
Apply does not work inplace, but rather returns a series that you discard in this line:
df1['column'].apply(make_float)
Apart from Yakym's solution, you can also do this -
df['column'] += 0.0

Categories