Let's say I have the following dataframe:
Value
[None, A, B, C]
[None]
I would like to replace None value in the column with none but it seems I couldn't figure out it.
I used this but not working.
df['Value'] = df['Value'].str.replace('None','none')
None is a built-in type in Python, so if you want to make it lowercase, you have to convert it to a string.
There is no built-in way in Pandas to replace values in lists, but you can use explode to expand all the lists so that each individual item of each list gets its own row in the column, then replace, then group back together into the original list format:
df['Value'] = df['Value'].explode().replace({None: 'none'}).groupby(level=0).apply(list)
Output:
>>> df
Value
0 [none, A, B, C]
1 [none]
Here is a way using map()
df['Value'] = df['Value'].map(lambda x: ['none' if i == None else i for i in x])
Output:
Value
0 [none, A, B, C]
1 [none]
Related
I have a dataframe that has a column where each row has a list.
I want to get the next element after the value I am looking for (in another column).
For example:
Let's say I am looking for 'b':
|lists |next_element|
|---------|------------|
|[a,b,c,d]| c | #(c is the next value after b)
|[c,b,a,e]| a | #(a is the next value after b)
|[a,e,f,b]| [] | #(empty, because there is no next value after b)
*All lists have the element. There are no lists without the value I am looking for
Thank you
Try writing a function and use apply.
value = 'b'
def get_next(x):
get_len = len(x)-1
for i in x:
if value.lower() == i.lower():
curr_idx = x.index(i)
if curr_idx == get_len:
return []
else:
return x[curr_idx+1]
df["next_element"] = df["lists"].apply(get_next)
df
Out[649]:
lists next_element
0 [a, b, c, d] c
1 [c, b, a, e] a
2 [a, e, f, b] []
First observation, since you want the next element of a list of string elements, the expected data type should be a string for that column, and not a list.
So, instead of the next_element columns as [c, a, []] its better to use [c, a, None]
Secondly, you should try avoiding apply methods directly over series and instead utilize the str methods that pandas provides for series which is a vectorized way of solving such problems super fast.
With the above in mind, let's try this completely vectorized one-liner -
element = 'b'
df['next_element'] = df.lists.str.join('').str.split(element).str[-1].str[0]
lists next_element
0 [a, b, c, d] c
1 [c, b, a, e] a
2 [a, e, f, b] NaN
First I combine each row as a single string [a,b,c,d]->'abcd`
Next I split this by 'b' to get substrings
I pick the last element from this list and finally the first element from that, for each row, using str functions which are vectorized over each row.
Read more about pandas.Series.str methods on official documentation/tutorial here
df = df.assign(next_element = "")
print(df)
for ind in df.index:
c= df["Lists"][ind]
for i,v in enumerate(c):
if v == "b":
df["next_element"][ind] = c[i+1]
print(df)
Try with this one you will get the exact output what you expected.
How can I join the multiple lists in a Pandas column 'B' and get the unique values only:
A B
0 10 [x50, y-1, sss00]
1 20 [x20, MN100, x50, sss00]
2 ...
Expected output:
[x50, y-1, sss00, x20, MN100]
You can do this simply by list comprehension and sum() method:
result=[x for x in set(df['B'].sum())]
Now If you print result you will get your desired output:
['y-1', 'x20', 'sss00', 'x50', 'MN100']
If in input data are not lists, but strings first create lists:
df.B = df.B.str.strip('[]').str.split(',')
Or:
import ast
df.B = df.B.apply(ast.literal_eval)
Use Series.explode for one Series from lists with Series.unique for remove duplicates if order is important:
L = df.B.explode().unique().tolist()
#alternative
#L = df.B.explode().drop_duplicates().tolist()
print (L)
['x50', 'y-1', 'sss00', 'x20', 'MN100']
Another idea if order is not important use set comprehension with flatten lists:
L = list(set([y for x in df.B for y in x]))
print (L)
['x50', 'MN100', 'x20', 'sss00', 'y-1']
I am trying to create a column with the result of a comparison between a Dataframe cell list and a list
I have this dataframe with list values:
df = pd.DataFrame({'A': [['KB4525236', 'KB4485447', 'KB4520724', 'KB3192137', 'KB4509091']], 'B': [['a', 'b']]})
and a list with this value:
findKBs = ['KB4525236','KB4525202']
The expected result :
A B C
0 [KB4525236, KB4485447, KB4520724, KB3192137, K... [a, b] [KB4525202]
I donĀ“t know how to iterate my list with the cell list and find the non matches, can you help me?
You should simply compare the 2 lists like this: Loop through the values of findKBs and assign them to new list if they are not in df['A'][0]
df['C'] = [[x for x in findKBs if x not in df['A'][0]]]
Result:
A B C
0 [KB4525236, KB4485447, KB4520724, KB3192137, K... [a, b] [KB4525202]
There's probably a pandas-centric way you could do it,but this appears to work:
df['C'] = [list(filter(lambda el: True if el not in df['A'][0] else False, findKBs))]
In the following example, how do I keep only rows that have "a" in the array present in column tags?
df = pd.DataFrame(columns=["val", "tags"], data=[[5,["a","b","c"]]])
df[3<df.val] # this works
df["a" in df.tags] # is there an equivalent for filtering on tags?
I think using sets is intuitive. Then you can use >= as set containment
df[df.tags.apply(set) >= {'a'}]
val tags
0 5 [a, b, c]
A Numpy alternative would be
tags = df['tags']
n = len(tags)
out = np.zeros(n, np.bool8)
i = np.arange(n).repeat(tags.str.len())
np.logical_or.at(out, i, np.concatenate(tags) == 'a')
df[out]
Per #JonClements
You can use set.issubset in a map (very clever)
df[df.tags.map({'a'}.issubset)]
val tags
0 5 [a, b, c]
Use list comprehension:
df1 = df[["a" in x for x in df.tags]]
you could use apply with a lambda function which tests if 'a' is in arg of lambda:
df.tags.apply(lambda x: 'a' in x)
Result:
0 True
Name: tags, dtype: bool
This can also be used to index your dataframe:
df[df.tags.apply(lambda x: 'a' in x)]
Result:
val tags
0 5 [a, b, c]
I have the following list of combinations:
a = [(1,10),(2,8),(300,28),(413,212)]
b = [(8,28), (8,15),(10,21),(28,34),(413,12)]
I want to create a new combination list from these two lists which follow the following criteria:
A. List a and List b have common elements.
The second element of the tuple in list a equals the first element of the
Tuple in the list b.
Combination of List a and List b should form a new combination:
d = [(1,10,21),(2,8,28),(2,8,15),(300,28,34)]
All other tuples in both lists which do not satisfy the criteria get ignored.
QUESTIONS
Can I do this criteria based combination using itertools?
What is the most elegant way to solve this problem with/without using modules?
How can one display the output in excel sheet to display each element of a tuple in list d to a separate column such as:
d = [(1,10,21),(2,8,28),(2,8,15),(300,28,34)] is displayed in excel as:
Col A = [1, 2, 2, 300]
Col B = [10,8,8,28]
Col C = [21,28,15,34]
pandas works like a charm for excel.
Here is the code:
a = [(1,10),(2,8),(300,28),(413,212)]
b = [(8,28), (8,15),(10,21),(28,34),(413,12)]
c = [(x, y, t) for x, y in a for z, t in b if y == z]
import pandas as pd
df = pd.DataFrame(c)
df.to_excel('MyFile.xlsx', header=False, index=False)