How to join a dataframe and dictionary on two rows - python

I have a dictionary and a dataframe. The dictionary contains a mapping of one letter to one number and the dataframe has a row containing these specific letters and another row containing these specific numbers, adjacent to each other (not that it necessarily matters).
I want to update the row containing the numbers by matching each letter in the row of the dataframe with the letter in the dictionary and then replacing the corresponding number (number in the same column as the letter) with the value of that letter from the dictionary.
df = pd.DataFrame(np.array([[4, 5, 6], ['a', 'b', 'c'], [7, 8, 9]]))
dict = {'a':2, 'b':3, 'c':5}
Let's say dict is the dictionary and df is the dataframe I want the result to be df2.
df2 = pd.DataFrame(np.array([[3, 2, 5], ['b', 'a', 'c'], [7, 8, 9]]))
df
0 1 2
0 4 5 6
1 a b c
2 7 8 9
dict
{'a': 2, 'b': 3, 'c': 5}
df2
0 1 2
0 2 3 5
1 a b c
2 7 8 9
I do not know how to use merge or join to fix this, my initial thoughts are to make the dictionary a dataframe object but I am not sure where to go from there.

It's a little weird, but:
df = pd.DataFrame(np.array([[4, 5, 6], ['a', 'b', 'c'], [7, 8, 9]]))
d = {'a': 2, 'b': 3, 'c': 5}
df.iloc[0] = df.iloc[1].map(lambda x: d[x] if x in d.keys() else x)
df
# 0 1 2
# 0 2 3 5
# 1 a b c
# 2 7 8 9
I couldn't bring myself to redefine dict to be a particular dictionary. :D
After receiving a much-deserved smackdown regarding the speed of apply, I present to you the theoretically faster approach below:
df.iloc[0] = df.iloc[1].map(d).where(df.iloc[1].isin(d.keys()), df.iloc[0])
This gives you the dictionary value of d (df.iloc[1].map(d)) if the value in row 1 is in the keys of d (.where(df.iloc[1].isin(d.keys()), ...), otherwise gives you the value in row 0 (...df.iloc[0])).
Hope this helps!

Related

Search Value from anywhere in Dataframe and get location of that value and update it

I tried to search value 'Apple' in DataFrame and update these value to 'Green Apple'
My method is search location of that value and update it.
My code below
x = df[df.isin(['Apple'])].stack()
It return Row Index and Col Name as I expect, but I don't know how to get these value
6 Fruit Name Apple
dtype: object
I try get value 6 (Row) and Fruit Name ( Col )
x[0] or x.value but it does not work
And besides if Value has spaces like ' Apple' it also not work.
There are any syntax like "islike" instead of "isin"?
For finding location of the element you can use the same method df[df.isin(['Apple'])].stack() and for replacing the element in whole dataframe you can use df.replace() as given below
import pandas as pd
df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
'B': [5, 6, 7, 8, 9],
'C': ['Apple', 'b', 'c', 'd', 'Apple']})
new_df = df.replace('Apple','Green Apple')
print(new_df)
A B C
0 0 5 Green Apple
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 Green Apple
Reference
pandas documentation

Pandas, Return a list of row contents for only specified columns

a b c
0 2 3 4
1 3 4 5
2 4 5 6
d = {'a': [2, 3], 'b': [3, 4], 'c': [4, 5]}
df = pd.DataFrame(data=d)
If I have a dataframe similar to the one above. How can I get a list of the row contents for only certain columns?
For example, I want to get a list of row contents for columns a and c only, such that it looks like this:
contents = [[2,4], [3,5], [4,6]]

Sort_values based on column index

I have seen lots of advice about sorting based on a pandas column name but I am trying to sort based on the column index.
I have included some code to demonstrate what I am trying to do.
import pandas as pd
df = pd.DataFrame({
'col1' : ['A', 'A', 'B', 'D', 'C', 'D'],
'col2' : [2, 1, 9, 8, 7, 4],
'col3': [0, 1, 9, 4, 2, 3],
})
df2 = df.sort_values(by=['col2'])
I want to sort a number of dataframes that all have different names for the second column. It is not practical to sort based on (by=['col2'] but I always want to sort on the second column (i.e. Column index 1). Is this possible?
Select columns name by position and pass to by parameter:
print (df.columns[1])
col2
df2 = df.sort_values(by=df.columns[1])
print (df2)
col1 col2 col3
1 A 1 1
0 A 2 0
5 D 4 3
4 C 7 2
3 D 8 4
2 B 9 9

MultiIndex Pandas does not group first index level

I am trying to create a Pandas Dataframe with two levels of index in the rows.
info = pd.DataFrame([['A', 1, 3],
['A', 2, 4],
['A', 3, 6],
['B', 1, 9],
['B', 2, 10],
['B', 4, 6]], columns=pd.Index(['C', 'D', 'V'])
info_new = info.set_index(['C', 'D'], drop=False)
EDIT: I want the following output:
V
C D
A 1 3
2 4
3 6
B 1 9
2 10
4 6
According to every instruction I found, this should work.
I am still getting
V
C D
A 1 3
A 2 4
A 3 6
B 1 9
B 2 10
B 4 6
So apparently, the multiindex does not work here.
I checked each column with non-unique values with .is_unique, the answer is False.
I checked the columns with unique values, the answer is True.
I also tried to assign a dtype=str, this didn't change anything.
Thank you for the info_new.index.is_lexsorted() comment.
I solved it by specifying dtype=str in the .csv import and then:
info_new.sortlevel(inplace=True)

DataFrame from dictionary

Sorry, if it is a duplicate, but I didn't find the solution in internet...
I have some dictionary
{'a':1, 'b':2, 'c':3}
Now I want to construct pandas DF with the columns names corresponding to key and values corresponding to values. Actually it should be Df with only one row.
a b c
1 2 3
At the other topic I found only solutions, where both - keys and values are columns in the new DF.
You have some caveats here, if you just pass the dict to the DataFrame constructor then it will raise an error:
ValueError: If using all scalar values, you must must pass an index
To get around that you can pass an index which will work:
In [139]:
temp = {'a':1,'b':2,'c':3}
pd.DataFrame(temp, index=[0])
Out[139]:
a b c
0 1 2 3
Ideally your values should be iterable, so a list or array like:
In [141]:
temp = {'a':[1],'b':[2],'c':[3]}
pd.DataFrame(temp)
Out[141]:
a b c
0 1 2 3
Thanks to #joris for pointing out that if you wrap the dict in a list then you don't have to pass an index to the constructor:
In [142]:
temp = {'a':1,'b':2,'c':3}
pd.DataFrame([temp])
Out[142]:
a b c
0 1 2 3
For flexibility, you can also use pd.DataFrame.from_dict with orient='index'. This works whether your dictionary values are scalars or lists.
Note the final transpose step, which can be performed via df.T or df.transpose().
temp1 = {'a': 1, 'b': 2, 'c': 3}
temp2 = {'a': [1, 2], 'b':[2, 3], 'c':[3, 4]}
print(pd.DataFrame.from_dict(temp1, orient='index').T)
a b c
0 1 2 3
print(pd.DataFrame.from_dict(temp2, orient='index').T)
a b c
0 1 2 3
1 2 3 4

Categories