Remove NaN from lists in python - python

I have converted data frame rows to lists and in those list there are NaN values which I would like to remove.
This is my attempt but the NaN values are not removed
import pandas as pd
df = pd.read_excel('MainFile.xlsx', dtype = str)
df_list = df.values.tolist()
print(df_list)
print('=' * 50)
for l in df_list:
newlist = [x for x in l if x != 'nan']
print(newlist)
Here's a snapshot of the original data
I could find a solution using these lines (but I welcome any ideas)
for l in df_list:
newlist = [x for x in l if x == x]
print(newlist)

It is not working because you are trying to compare it to string 'nan'.
If excel cell is empty it is returned as NaN value in pandas.
You can use numpy library, to compare it with NaN:
import numpy as np
for l in df_list:
newlist = [x for x in l if x != np.nan]
print(newlist)
EDIT:
If you want to get all values from the dataframe, which are not NaN, you can just do:
df.stack().tolist()
If you want to print values with the loop (as in your example), you can do:
for l in df.columns:
print(list(df[l][df[l].notna()]))
To create nested list with a loop:
main = []
for l in df.T.columns:
new_list = list(df.T[l][df.T[l].notna()])
main.append(new_list)
print(main)

You can always try the approach that is proposed here:
import numpy as np
newlist = [x for x in df_list if np.isnan(x) == False]
print(newlist)
I hope that this will help.

Related

Convert comma-separated values into integer list in pandas dataframe

How to convert a comma-separated value into a list of integers in a pandas dataframe?
Input:
Desired output:
There are 2 steps - split and convert to integers, because after split values are lists of strings, solution working well also if different lengths of lists (not added Nones):
df['qty'] = df['qty'].apply(lambda x: [int(y) for y in x.split(',')])
Or:
df['qty'] = df['qty'].apply(lambda x: list(map(int, x.split(','))))
Alternative solutions:
df['qty'] = [[int(y) for y in x.split(',')] for x in df['qty']]
df['qty'] = [list(map(int, x.split(','))) for x in df['qty']]
Or try expand=True:
df['qty'] = df['qty'].str.split(',', expand=True).astype(int).agg(list, axis=1)
Vectorised solution :
import ast
df["qty"] = ("[" + df["qty"].astype(str) + "]").apply(ast.literal_eval)

Get unique values from multiple lists in Pandas column

How can I join the multiple lists in a Pandas column 'B' and get the unique values only:
A B
0 10 [x50, y-1, sss00]
1 20 [x20, MN100, x50, sss00]
2 ...
Expected output:
[x50, y-1, sss00, x20, MN100]
You can do this simply by list comprehension and sum() method:
result=[x for x in set(df['B'].sum())]
Now If you print result you will get your desired output:
['y-1', 'x20', 'sss00', 'x50', 'MN100']
If in input data are not lists, but strings first create lists:
df.B = df.B.str.strip('[]').str.split(',')
Or:
import ast
df.B = df.B.apply(ast.literal_eval)
Use Series.explode for one Series from lists with Series.unique for remove duplicates if order is important:
L = df.B.explode().unique().tolist()
#alternative
#L = df.B.explode().drop_duplicates().tolist()
print (L)
['x50', 'y-1', 'sss00', 'x20', 'MN100']
Another idea if order is not important use set comprehension with flatten lists:
L = list(set([y for x in df.B for y in x]))
print (L)
['x50', 'MN100', 'x20', 'sss00', 'y-1']

replacing special characters in a numpy array with blanks

I have a list of lists (see below) which has ? where a value is missing:
([[1,2,3,4],
[5,6,7,8],
[9,?,11,12]])
I want to convert this to a numpy array using np.array(test), however, the ? value is causing an issue. What I want to do is replace the ? with blank space '' and then convert to a numpy array so that I have the following
so that I end up with the following array:
([[1,2,3,4],
[5,6,7,8],
[9,,11,12]])
Use list comprehension:
matrix = ...
new_matrix = [["" if not isinstance(x,int) else x for x in sublist] for sublist in matrix]
Python does not have type for ?
check this
a =?
print(type(a))
Above code will cause syntax error
It must be "?".
If this is the case then you can use
list1 = ([[1,2,3,4],
[5,6,7,8],
[9,?,11,12]])
for i1, ele in enumerate(list1):
for i2, x in enumerate(ele):
if x == "?":
list1[i1][i2] = ""
print(list1)
This is an approach using loops to find elements that can't be turned into integers and replaces them with blank spaces.
import numpy as np
preArray = ([[1,2,3,4],
[5,6,7,8],
[9,'?',11,12]])
newPreArray = []
for row in preArray:
newRow = []
for val in row:
try:
int(val)
newRow.append(val)
except:
newRow.append('')
newPreArray.append(newRow)
array = np.array(newPreArray)
For a single list you can do something like:
>>> myList = [4, 5, '?', 6]
>>> myNewList = [i if str(i).isdigit() else '' for i in myList]
>>> myNewList
[4,5,'',6]
so take that information and make it work with a list of lists.

Generating concatenated dataframe output from for loop over a function

I have a function:
def func(x):
y = pd.read_csv(x)
return y
In case i have to loop this function for severeal inputs i am expected to have combined dataframe from all those inputs as output:
list = ["a.csv", "b.csv", "d.csv", "e.csv"]
for i in list:
m = func(x)
How can i get value of m as combined dataframes from all the input files?
For combine DataFrames use concat in list comprehension:
Notice: Dont use list like variable, because python code word.
L = ["a.csv", "b.csv", "d.csv", "e.csv"]
df1 = pd.concat([func(x) for i in L])
Or in loop:
out = []
for i in L:
m = func(x)
out.append(m)
df1 = pd.concat(out)

Copying dataframe data effectively

I have a dataframe:
dataframe = pd.DataFrame()
dataframe['column'] = [10,20,30,40]
I want to effectively duplicate each element 3 times so it becomes the equivalent of:
dataframe['column'] = [10,10,10,20,20,20,30,30,30,40,40,40]
I need a solution that will work for a df of any size.I also need the index to stay 1,2,3,4 etc..
magic_list = [10,20,30,40]
dataframe['column'] = [x for y in [[a for i in range(3)] for a in magic_list] for x in y]

Categories