Filling an empty dataframe by assignment via loc selection with tuple indices

Filling an empty dataframe by assignment via loc selection with tuple indices - python

Why does this work?
a=pd.DataFrame()
a.loc[1,2]=0
>
2
1 0.0
And, this does not work?
a=pd.DataFrame()
a.loc[(1,2),2]=0
>
KeyError: '[1 2] not in index'
The latter is what I would like to do. I will be filling the values by assignment via loc selection with tuple specified index, from a dataframe with no values, 0 rows, 0 columns.

Using a tuple as index will work if your dataframe already has a multi-index:
import pandas as pd
# Define multi-index
index = pd.MultiIndex.from_product([[],[]], names=['first', 'second'])
# or
# index = pd.MultiIndex.from_tuples([], names=['first', 'second'])
a = pd.DataFrame(index=index)
a.loc[(1,2), 2]=0
# 2
# first second
# 1.0 2.0 0.0

I like Julien's Answer as it feels less like magic. All of these are efforts to set a 2 level multiindex.
set_index with empty arrays
i = np.array([])
a = pd.DataFrame().set_index([i, i])
a.loc[(1, 2), 2] = 0
a
2
1.0 2.0 0.0
Slightly more concise
a = pd.DataFrame().set_index([np.array([])] * 2)
a.loc[(1, 2), 2] = 0
pd.concat
a = pd.concat([pd.DataFrame()] * 2, keys=[1, 2])
a.loc[(1, 2), 2] = 0
a
2
1 2 0.0

Related

Replace values by result of a function

I have following dataframe table:
df = pd.DataFrame({'A': [0, 1, 0],
'B': [1, 1, 1]},
index=['2020-01-01', '2020-02-01', '2020-03-01'])
I'm trying to achieve that every value where 1 is present will be replaced by an increasing number. I'm looking for something like:
df.replace(1, value=3)
that works great but instead of number 3 I need number to be increasing (as I want to use it as ID)
number += 1
If I join those together, it doesn't work (or at least I'm not able to find correct syntax) I'd like to obtain following result:
df = pd.DataFrame({'A': [0, 2, 0],
'B': [1, 3, 4]},
index=['2020-01-01', '2020-02-01', '2020-03-01'])
Note: I can not use any command that relies on specification of column or row name, because table has 2600 columns and 5000 rows.

Element-wise assignment on a copy of df.values can work.
More specifically, a range starting from 1 to the number of 1's (inclusive) is assigned onto the location of 1 elements in the value array. The assigned array is then put back into the original dataframe.
Code
(Data as given)
1. Row-first ordering (what the OP wants)
arr = df.values
mask = (arr > 0)
arr[mask] = range(1, mask.sum() + 1)
for i, col in enumerate(df.columns):
df[col] = arr[:, i]
# Result
print(df)
A B
2020-01-01 0 1
2020-02-01 2 3
2020-03-01 0 4
2. Column-first ordering (another possibility)
arr_tr = df.values.transpose()
mask_tr = (arr_tr > 0)
arr_tr[mask_tr] = range(1, mask_tr.sum() + 1)
for i, col in enumerate(df.columns):
df[col] = arr_tr[i, :]
# Result
print(df)
A B
2020-01-01 0 2
2020-02-01 1 3
2020-03-01 0 4

Map multiple items to a value in a pandas dataframe

I have this dataframe:
df = pd.DataFrame({"a":[1,2,3, 100], "b": [4,5,6, 50]})
I want to replace values (4, 5, 6) in column b with 50. I can use the following code:
vals_to_replace1 = {4:10, 5:10, 6:10}
df['b'] = df['b'].map(vals_to_replace1)
But I have a long list of items that I need to replace with only one value. I tried this solution:
vals_to_replace = {[4,5,6]:10}. But it does not work. Is there any simple method to do this mapping?

Use Series.replace.
my_list = [4,5,6]
val = 10
df['b'] = df['b'].replace(my_list,val)
Or creating a dict:
df['b'] = df['b'].replace(dict(zip(my_list,[val]*len(my_list))))
#Or Series.map + fillna
#df['b'] = ( df['b'].map(dict(zip(my_list,[val]*len(my_list))))
# .fillna(df['b']) )
We could also use Series.isin.
m = df['b'].isin(my_list)
Then you can use DataFrame.loc
df.loc[m,'b'] = val
or Series.mask
df['b']=df['b'].mask(m,val)
#df['b']=df['b'].where(~m,val)
Output df
a b
0 1 10.0
1 2 10.0
2 3 10.0
3 100 50.0

Loop through dataframe (cols and rows) and replace data

I have:
df = pd.DataFrame([[1, 2,3], [2, 4,6],[3, 6,9]], columns=['A', 'B','C'])
and I need to calculate de difference between the i+1 and i value of each row and column, and store it again in the same column. The output needed would be:
Out[2]:
A B C
0 1 2 3
1 1 2 3
2 1 2 3
I have tried to do this, but I finally get a list with all values appended, and I need to have them stored separately (in lists, or in the same dataframe).
Is there a way to do it?
difs=[]
for column in df:
for i in range(len(df)-1):
a = df[column]
b = a[i+1]-a[i]
difs.append(b)
for x in difs:
for column in df:
df[column]=x

You can use pandas function shift to achieve your intended goal. This is what it does (more on it on the docs):
Shift index by desired number of periods with an optional time freq.
for col in df:
df[col] = df[col] - df[col].shift(1).fillna(0)
df
Out[1]:
A B C
0 1.0 2.0 3.0
1 1.0 2.0 3.0
2 1.0 2.0 3.0
Added
In case you want to use the loop, probably a good approach is to use iterrows (more on it here) as it provides (index, Series) pairs.
difs = []
for i, row in df.iterrows():
if i == 0:
x = row.values.tolist() ## so we preserve the first row
else:
x = (row.values - df.loc[i-1, df.columns]).values.tolist()
difs.append(x)
difs
Out[1]:
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
## Create new / replace old dataframe
cols = [col for col in df.columns]
new_df = pd.DataFrame(difs, columns=cols)
new_df
Out[2]:
A B C
0 1.0 2.0 3.0
1 1.0 2.0 3.0
2 1.0 2.0 3.0

Neested loops in pandas python

I have two DataFrames One with many rows and another one with a few rows and I need to merge these two Dataframes according some conditions (in strings). I used nested loops in Pandas like this:
density = []
for row in df.itertuples():
for row1 in df2.itertuples():
if(row['a'].find(row1['b']))>0:
density.append(row1['c'])
But I receive the error message:
TypeError: tuple indices must be integers, not str
What's wrong?

Consider df and df2
df = pd.DataFrame(dict(
a=['abcd', 'stk', 'shij', 'dfffedeffj', 'abcdefghijk'],
))
df2 = pd.DataFrame(dict(
b=['abc', 'hij', 'def'],
c=[1, 2, 3]
))
You can produce decent-ish speed with get_value and set_value. And I'd store the values in a dataframe
density = pd.DataFrame(index=df.index, columns=df2.index)
for i in df.index:
for j in df2.index:
a = df.get_value(i, 'a')
b = df2.get_value(j, 'b')
if a.find(b) >= 0:
density.set_value(i, j, df2.get_value(j, 'c'))
print(density)
0 1 2
0 1 NaN NaN
1 NaN NaN NaN
2 NaN 2 NaN
3 NaN NaN 3
4 1 2 3
You can also use a composite numpy str functions
t = df2.b.apply(lambda x: df.a.str.contains(x)).values
c = df2.c.values[:, None]
density = pd.DataFrame(
np.where(t, np.hstack([c] * t.shape[1]), np.nan).T,
df.index, df2.index)

The method DataFrame.itertuples returns namedtuples and to access the values in a namedtuple you have to use the dot notation.
density = []
for row in df.itertuples():
for row1 in df2.itertuples():
if row.a.find(row1.b) > 0:
density.append(row1.c)
Nevertheless, this does not produce a merge of the two DataFrames.

getting pandas Setting With Enlargement right

Since version 0.13, it is possible to append to a dataframe by referring to indices in .loc or .ix which are not yet in the dataframe. See the documentation.
Then I am confused why this line fails:
all_treatments.loc[originalN:newN,:] = all_treatments.loc[0:newrowcount,:]
This generates a ValueError:
ValueError: could not broadcast input array from shape (12) into shape (0)
Here all_treatments.shape = (53, 12), originalN = 53, newN = 64, all_treatments.loc[originalN:newN,:].shape = (0,12), all_treatments.loc[0:newrowcount,:].shape = (12,12).
What is the proper way to set with enlargement here?

You can only set by enlargement with a single row or column. You are setting with a range.
The .loc/.ix/[] operations can perform enlargement when setting a non-existant key for that axis.
For your use, something like this should work to expand a dataframe with new blank rows:
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df
a b
0 1 4
1 2 5
2 3 6
new_row_count = 2
for new_row, old_row in enumerate(range(new_row_count), start=len(df)):
df.ix[new_row] = None
>>>df
a b
0 1 4
1 2 5
2 3 6
3 NaN NaN
4 NaN NaN
If you wanted to copy data from the original dataframe, I would normally just concatenate.
df = pd.concat([df, df.iloc[:2, :]], ignore_index=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Filling an empty dataframe by assignment via loc selection with tuple indices - python

Related

Replace values by result of a function

Map multiple items to a value in a pandas dataframe

Loop through dataframe (cols and rows) and replace data

Neested loops in pandas python

getting pandas Setting With Enlargement right

Categories

Resources