Replace values in pandas datatable if in list - python

How can I replace values in the datatable data with information in filllist if a value is in varlist?
import pandas as pd
data = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 10]})
varlist = (5,7,9,10)
fillist = ('a', 'b', 'c', 'd')
data[data.isin(varlist)==True] = 'is in varlist!'
Returns data as:
A B
0 is in varlist! 1
1 6 2
2 3 3
3 4 is in varlist!
But I want:
A B
0 a 1
1 6 2
2 3 3
3 4 d

Use the replace method of the dataframe.
replace_map = dict(zip(varlist, fillist))
data.replace(replace_map)
this gives
A B
0 a 1
1 6 2
2 3 3
3 4 d
The documentation is here in case you want to use it in a different way:
replace method documentation

Related

Join tables and create combinations in python

In advance: Sorry, the title is a bit fuzzy
PYTHON
I have two tables. In one there are unique names for example 'A', 'B', 'C' and in the other table there is a Time series with months example 10/2021, 11/2021, 12/2021. I want to join the tables now that I have all TimeStemps for each name. So the final data should look like this:
Month
Name
10/2021
A
11/2021
A
12/2021
A
10/2021
B
11/2021
B
12/2021
B
10/2021
C
11/2021
C
12/2021
C
from cartesian product in pandas
df1 = pd.DataFrame([1, 2, 3], columns=['A'])
df2 = pd.DataFrame(["a", "b", "c"], columns=['B'])
df = (df1.assign(key=1)
.merge(df2.assign(key=1), on="key")
.drop("key", axis=1)
)
A B
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
6 3 a
7 3 b
8 3 c
If you are only trying to get the cartesian product of the values - you can do it using itertools.product
import pandas as pd
from itertools import product
df1 = pd.DataFrame(list('abcd'), columns=['letters'])
df2 = pd.DataFrame(list('1234'), columns=['numbers'])
df_combined = pd.DataFrame(product(df1['letters'], df2['numbers']), columns=['letters', 'numbers'])
output
letters numbers
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
8 c 1
9 c 2
10 c 3
11 c 4
12 d 1
13 d 2
14 d 3
15 d 4

drop rows using pandas groupby and filter

I'm trying to drop rows from a df where certain conditions are met. Using below, I'm grouping values using column C. For each unique group, I want to drop ALL rows where A is less than 1 AND B is greater than 100. This has to occur on the same row though. If I use .any() or .all(), it doesn't return what I want.
df = pd.DataFrame({
'A' : [1,0,1,0,1,0,0,1,0,1],
'B' : [101, 2, 3, 1, 5, 101, 2, 3, 4, 5],
'C' : ['d', 'd', 'd', 'd', 'e', 'e', 'e', 'f', 'f',],
})
df.groupby(['C']).filter(lambda g: g['A'].lt(1) & g['B'].gt(100))
initial df:
A B C
0 1 101 d # A is not lt 1 so keep all d's
1 0 2 d
2 1 3 d
3 0 1 d
4 1 5 e
5 0 101 e # A is lt 1 and B is gt 100 so drop all e's
6 0 2 e
7 1 3 f
8 0 4 f
9 1 5 f
intended out:
A B C
0 1 101 d
1 0 2 d
2 1 3 d
3 0 1 d
7 1 3 f
8 0 4 f
9 1 5 f
For better performnce get all C values match condition and then filter original column C by Series.isin in boolean indexing with inverted mask:
df1 = df[~df['C'].isin(df.loc[df['A'].lt(1) & df['B'].gt(100), 'C'])]
Another idea is use GroupBy.transform with GroupBy.any for test if match at least one value:
df1 = df[~(df['A'].lt(1) & df['B'].gt(100)).groupby(df['C']).transform('any')]
Your solution is possible with any and not for scalars, if large DataFrame it should be slow:
df1 = df.groupby(['C']).filter(lambda g:not ( g['A'].lt(1) & g['B'].gt(100)).any())
df1 = df.groupby(['C']).filter(lambda g: (g['A'].ge(1) | g['B'].le(100)).all())
print (df1)
A B C
0 1 101 d
1 0 2 d
2 1 3 d
3 0 1 d
7 1 3 f
8 0 4 f
9 1 5 f

Pandas row value string parsing (mixed string and float) [duplicate]

I have data like this
ID INFO
1 A=2;B=2;C=5
2 A=3;B=4;C=1
3 A=1;B=3;C=2
I want to split the Info columns into
ID A B C
1 2 2 5
2 3 4 1
3 1 3 2
I can split columns with one delimiter by using
df['A'], df['B'], df['C'] = df['INFO'].str.split(';').str
then split again by = but this seems to not so efficient in case I have many rows and especially when there are so many field that cannot be hard-coded beforehand.
Any suggestion would be greatly welcome.
You could use named groups together with Series.str.extract. In the end concat back the 'ID'. This assumes you always have A=;B=;and C= in a line.
pd.concat([df['ID'],
df['INFO'].str.extract('A=(?P<A>\d);B=(?P<B>\d);C=(?P<C>\d)')], axis=1)
# ID A B C
#0 1 2 2 5
#1 2 3 4 1
#2 3 1 3 2
If you want a more flexible solution that can deal with cases where a single line might be 'A=1;C=2' then we can split on ';' and partition on '='. pivot in the end to get to your desired output.
### Starting Data
#ID INFO
#1 A=2;B=2;C=5
#2 A=3;B=4;C=1
#3 A=1;B=3;C=2
#4 A=1;C=2
(df.set_index('ID')['INFO']
.str.split(';', expand=True)
.stack()
.str.partition('=')
.reset_index(-1, drop=True)
.pivot(columns=0, values=2)
)
# A B C
#ID
#1 2 2 5
#2 3 4 1
#3 1 3 2
#4 1 NaN 2
Browsing a Series is much faster that iterating across the rows of a dataframe.
So I would do:
pd.DataFrame([dict([x.split('=') for x in t.split(';')]) for t in df['INFO']], index=df['ID']).reset_index()
It gives as expected:
ID A B C
0 1 2 2 5
1 2 3 4 1
2 3 1 3 2
It should be faster than splitting twice dataframe columns.
values = [dict(item.split("=") for item in value.split(";")) for value in df.INFO]
df[['a', 'b', 'c']] = pd.DataFrame(values)
This will give you the desired output:
ID INFO a b c
1 a=1;b=2;c=3 1 2 3
2 a=4;b=5;c=6 4 5 6
3 a=7;b=8;c=9 7 8 9
Explanation:
The first line converts every value to a dictionary.
e.g.
x = 'a=1;b=2;c=3'
dict(item.split("=") for item in x.split(";"))
results in :
{'a': '1', 'b': '2', 'c': '3'}
DataFrame can take a list of dicts as an input and turn it into a dataframe.
Then you only need to assign the dataframe to the columns you want:
df[['a', 'b', 'c']] = pd.DataFrame(values)
Another solution is Series.str.findAll to extract values and then apply(pd.Series):
df[["A", "B", "C"]] = df.INFO.str.findall(r'=(\d+)').apply(pd.Series)
df = df.drop("INFO", 1)
Details:
df = pd.DataFrame([[1, "A=2;B=2;C=5"],
[2, "A=3;B=4;C=1"],
[3, "A=1;B=3;C=2"]],
columns=["ID", "INFO"])
print(df.INFO.str.findall(r'=(\d+)'))
# 0 [2, 2, 5]
# 1 [3, 4, 1]
# 2 [1, 3, 2]
df[["A", "B", "C"]] = df.INFO.str.findall(r'=(\d+)').apply(pd.Series)
print(df)
# ID INFO A B C
# 0 1 A=2;B=2;C=5 2 2 5
# 1 2 A=3;B=4;C=1 3 4 1
# 2 3 A=1;B=3;C=2 1 3 2
# Remove INFO column
df = df.drop("INFO", 1)
print(df)
# ID A B C
# 0 1 2 2 5
# 1 2 3 4 1
# 2 3 1 3 2
Another solution :
#split on ';'
#explode
#then split on '='
#and pivot
df_INFO = (df.INFO
.str.split(';')
.explode()
.str.split('=',expand=True)
.pivot(columns=0,values=1)
)
pd.concat([df.ID,df_INFO],axis=1)
ID A B C
0 1 2 2 5
1 2 3 4 1
2 3 1 3 2

how to set the index as character for pandas

I am trying to create a pandas df like this post.
df = pd.DataFrame(np.arange(9).reshape(3,3) , columns=list('123'))
df
this piece of code gives
describe() gives
is there is way to set the name of each row (i.e. the index) in df as 'A', 'B', 'C' instead of '0', '1', '2' ?
Use df.index:
df.index=['A', 'B', 'C']
print(df)
1 2 3
A 0 1 2
B 3 4 5
C 6 7 8
A more scalable and general solution would be using list-comprehension
df.index = [chr(ord('a') + x).upper() for x in df.index]
print(df)
1 2 3
A 0 1 2
B 3 4 5
C 6 7 8
Add index parameter in DataFrame constructor:
df = pd.DataFrame(np.arange(9).reshape(3,3) ,
index=list('ABC'),
columns=list('123'))
print (df)
1 2 3
A 0 1 2
B 3 4 5
C 6 7 8

How to add multiple column to a dataframe

I have a function which returns a list of lists and I'd like to add multiple columns to my dataframe based on the return value. Here is how the return value of my function looks like
[[1,2,3],[3,4,3],[1,6,7],[4,7,6]]
I would like to add three columns to my dataframe. I have the following code
col_names = ['A','B','C']
df[col_names] = func()
but it gives me an error. How can I add 3 new columns?
you can pass the list directly:
pd.DataFrame([[1,2,3],[3,4,3],[1,6,7],[4,7,6]],columns=['A','B','C'])
A B C
0 1 2 3
1 3 4 3
2 1 6 7
3 4 7 6
Or if you have defined the list as l = [[1,2,3],[3,4,3],[1,6,7],[4,7,6]], pass it to the Dataframe:
df = pd.Dataframe(l,columns=['A','B','C'])
Here is one way to do it:
df = pd.DataFrame({'foo': ['bar', 'buzz', 'fizz']})
def buzz():
return [[1,2,3],[3,4,3],[1,6,7],[4,7,6]]
pd.concat([df, pd.DataFrame.from_records(buzz(), columns=col_names)], axis=1)
foo A B C
0 bar 1 2 3
1 buzz 3 4 3
2 fizz 1 6 7
3 NaN 4 7 6
Here is an example with a dummy function that returns a list of list. Since the function returns a list, you need to pass it to pd.DataFrame constructor before assigning it to the existing dataframe.
def fn(l1,l2,l3):
return [l1,l2,l3]
df = pd.DataFrame({'col': ['a', 'b', 'c']})
col_names = ['A','B','C']
df[col_names] = pd.DataFrame(fn([1,2,3], [3,4,3], [4,7,6]))
You get
col A B C
0 a 1 2 3
1 b 3 4 3
2 c 4 7 6

Categories