Python: Combination with criteria - python

I have the following list of combinations:
a = [(1,10),(2,8),(300,28),(413,212)]
b = [(8,28), (8,15),(10,21),(28,34),(413,12)]
I want to create a new combination list from these two lists which follow the following criteria:
A. List a and List b have common elements.
The second element of the tuple in list a equals the first element of the
Tuple in the list b.
Combination of List a and List b should form a new combination:
d = [(1,10,21),(2,8,28),(2,8,15),(300,28,34)]
All other tuples in both lists which do not satisfy the criteria get ignored.
QUESTIONS
Can I do this criteria based combination using itertools?
What is the most elegant way to solve this problem with/without using modules?
How can one display the output in excel sheet to display each element of a tuple in list d to a separate column such as:
d = [(1,10,21),(2,8,28),(2,8,15),(300,28,34)] is displayed in excel as:
Col A = [1, 2, 2, 300]
Col B = [10,8,8,28]
Col C = [21,28,15,34]

pandas works like a charm for excel.
Here is the code:
a = [(1,10),(2,8),(300,28),(413,212)]
b = [(8,28), (8,15),(10,21),(28,34),(413,12)]
c = [(x, y, t) for x, y in a for z, t in b if y == z]
import pandas as pd
df = pd.DataFrame(c)
df.to_excel('MyFile.xlsx', header=False, index=False)

Related

Rename None in a list under pandas column

Let's say I have the following dataframe:
Value
[None, A, B, C]
[None]
I would like to replace None value in the column with none but it seems I couldn't figure out it.
I used this but not working.
df['Value'] = df['Value'].str.replace('None','none')
None is a built-in type in Python, so if you want to make it lowercase, you have to convert it to a string.
There is no built-in way in Pandas to replace values in lists, but you can use explode to expand all the lists so that each individual item of each list gets its own row in the column, then replace, then group back together into the original list format:
df['Value'] = df['Value'].explode().replace({None: 'none'}).groupby(level=0).apply(list)
Output:
>>> df
Value
0 [none, A, B, C]
1 [none]
Here is a way using map()
df['Value'] = df['Value'].map(lambda x: ['none' if i == None else i for i in x])
Output:
Value
0 [none, A, B, C]
1 [none]

Detect specific characters in pandas dataframe

How to detect columns and rows that might have one of the characters in a string of a dataframe element other than the desired characters.
desired characters are A, B, C, a, b, c, 1, 2, 3, &, %, =, /
dataframe -
Col1
Col2
Col3
Abc
Øa
12
bbb
+
}
output will be elements Øa, +, } and their location in dataframe.
I find it really difficult to locate an element for a condition directly in pandas, so I converted the dataframe to a nested list first, then proceeded to work with the list. Try this:
import pandas as pd
import numpy as np
#creating your sample dataframe
array = np.array([['Abc','Øa','12'],['bbb','+','}']])
columns = ['Col1','Col2','Col3']
df = pd.DataFrame(data=array, columns=columns)
#convert dataframe to nested list
pd_list = df.values.tolist()
#return any characters other than the ones in 'var'
all_chars = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;=>?#[\\]^_`{|}~Ø'
var = 'ABCabc123&%=//'
for a in var:
all_chars = all_chars.replace(a, "")
#stores previously detected elements to prevent duplicate
temp_storage = []
#loops through the nested list to get the elements' indexes
for x in all_chars:
for i in pd_list:
for n in i:
if x in n:
#check if element is duplicate
if not n in temp_storage:
temp_storage.append(n)
print(f'found {n}: row={pd_list.index(i)}; col={i.index(n)}')
Output:
> found +: row=1; col=1
> found }: row=1; col=2
> found Øa: row=0; col=1

Search values from a list in dataframe cell list and add another column with results

I am trying to create a column with the result of a comparison between a Dataframe cell list and a list
I have this dataframe with list values:
df = pd.DataFrame({'A': [['KB4525236', 'KB4485447', 'KB4520724', 'KB3192137', 'KB4509091']], 'B': [['a', 'b']]})
and a list with this value:
findKBs = ['KB4525236','KB4525202']
The expected result :
A B C
0 [KB4525236, KB4485447, KB4520724, KB3192137, K... [a, b] [KB4525202]
I don´t know how to iterate my list with the cell list and find the non matches, can you help me?
You should simply compare the 2 lists like this: Loop through the values of findKBs and assign them to new list if they are not in df['A'][0]
df['C'] = [[x for x in findKBs if x not in df['A'][0]]]
Result:
A B C
0 [KB4525236, KB4485447, KB4520724, KB3192137, K... [a, b] [KB4525202]
There's probably a pandas-centric way you could do it,but this appears to work:
df['C'] = [list(filter(lambda el: True if el not in df['A'][0] else False, findKBs))]

Check to see if column values exist in dictionary [pandas]

Can a data frame column (Series) of lists be used as a conditional check within a dictionary?
I have a column of lists of words (split up tweets) that I'd like to feed to a vocab dictionary to see if they all exist - if one does not exist, I'd like to skip it, continue on and then run a function over the existing words.
This code produces the intended result for one row in the column, however, I get a "unhashable type list" error if I try to apply it to more than one column.
w2v_sum = w2v[[x for x in train['words'].values[1] if x in w2v.vocab]].sum()
Edit with reproducible example:
df = pd.DataFrame(data={'words':[['cow','bird','cat'],['red','blue','green'],['low','high','med']]})
d = {'cow':1,'bird':4,'red':1,'blue':1,'green':1,'high':6,'med':3}
Desired output is total (sum of the words within dictionary):
total words
0 5 [cow, bird, cat]
1 3 [red, blue, green]
2 9 [low, high, med]
This should do what you want:
import pandas as pd
df = pd.DataFrame(data={'words':[['cow','bird','cat'],['red','blue','green'],['low','high','med']]})
d = {'cow':1,'bird':4,'red':1,'blue':1,'green':1,'high':6,'med':3}
EDIT:
To reflect the lists inside the column, see this nested comprehension:
list_totals = [[d[x] for x in y if x in d] for y in df['words'].values]
list_totals = [sum(x) for x in list_totals]
list_totals
[5, 3, 9]
You can then add list_totals as a column to your pd.
One solution is to use collections.Counter and a list comprehension:
from collections import Counter
d = Counter({'cow':1,'bird':4,'red':1,'blue':1,'green':1,'high':6,'med':3})
df['total'] = [sum(map(d.__getitem__, L)) for L in df['words']]
print(df)
words total
0 [cow, bird, cat] 5
1 [red, blue, green] 3
2 [low, high, med] 9
Alternatively, if you always have a fixed number of words, you can split into multiple series and use pd.DataFrame.applymap:
df['total'] = pd.DataFrame(df['words'].tolist()).applymap(d.get).sum(1).astype(int)

Remove elements from lists when index is one lists is NaN

Suppose I have three lists where one contains NaN's (I think they're 'NaNs', they get printed as '--' from a previous masked array operation):
a = [1,2,3,4,5]
b = [6,7,--,9,--]
c = [6,7,8,9,10]
I'd like to perform an operation that iterates through b, and deletes the indexes from all lists where b[i]=NaN. I'm thinking something like this:
for i in range(0,len(b):
if b[i] = NaN:
del.a[i] etc
b is generated from from masking c under some condition earlier on in my code, something like this:
b = np.ma.MaskedArray(c, condition)
Thanks!
This is easy to do using numpy:
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,np.NaN,9,np.NaN])
c = np.array([6,7,8,9,10])
where_are_nans = np.isnan(b)
filtered_array = a[~where_are_nans] #note the ~ negation
print(filtered_array)
And as you can easily see it returns:
[1 2 4]

Categories