I have converted data frame rows to lists and in those list there are NaN values which I would like to remove.
This is my attempt but the NaN values are not removed
import pandas as pd
df = pd.read_excel('MainFile.xlsx', dtype = str)
df_list = df.values.tolist()
print(df_list)
print('=' * 50)
for l in df_list:
newlist = [x for x in l if x != 'nan']
print(newlist)
Here's a snapshot of the original data
I could find a solution using these lines (but I welcome any ideas)
for l in df_list:
newlist = [x for x in l if x == x]
print(newlist)
It is not working because you are trying to compare it to string 'nan'.
If excel cell is empty it is returned as NaN value in pandas.
You can use numpy library, to compare it with NaN:
import numpy as np
for l in df_list:
newlist = [x for x in l if x != np.nan]
print(newlist)
EDIT:
If you want to get all values from the dataframe, which are not NaN, you can just do:
df.stack().tolist()
If you want to print values with the loop (as in your example), you can do:
for l in df.columns:
print(list(df[l][df[l].notna()]))
To create nested list with a loop:
main = []
for l in df.T.columns:
new_list = list(df.T[l][df.T[l].notna()])
main.append(new_list)
print(main)
You can always try the approach that is proposed here:
import numpy as np
newlist = [x for x in df_list if np.isnan(x) == False]
print(newlist)
I hope that this will help.
I get a strings with | between letters instead of between the words
for example w|o|r|d,|o|k
instead of word|ok
It is also not possible to do join on list of lists but a list of strings
data = pd.read_csv('data.csv')
intersect1=""
for j in range(len(data)):
x=str(data.iloc[j, 1])
#print(x)
x=x.split("|")
x = x[:-1]
y=str(data.iloc[j, 2])
y=y.split("|")
y = y[:-1]
intersect= list(set(x) & set(y))
#intersect1.append(intersect)
intersect1+=str(intersect)
print(intersect1)
print ("|".join(intersect1))
#print(intersect1)
#intersect1.join("|")
#print(intersect1)
#print(intersect1)
df3 = pd.DataFrame(list(intersect1))
df3.to_csv('intersect1.csv')
Thank you in advance
How can I join the multiple lists in a Pandas column 'B' and get the unique values only:
A B
0 10 [x50, y-1, sss00]
1 20 [x20, MN100, x50, sss00]
2 ...
Expected output:
[x50, y-1, sss00, x20, MN100]
You can do this simply by list comprehension and sum() method:
result=[x for x in set(df['B'].sum())]
Now If you print result you will get your desired output:
['y-1', 'x20', 'sss00', 'x50', 'MN100']
If in input data are not lists, but strings first create lists:
df.B = df.B.str.strip('[]').str.split(',')
Or:
import ast
df.B = df.B.apply(ast.literal_eval)
Use Series.explode for one Series from lists with Series.unique for remove duplicates if order is important:
L = df.B.explode().unique().tolist()
#alternative
#L = df.B.explode().drop_duplicates().tolist()
print (L)
['x50', 'y-1', 'sss00', 'x20', 'MN100']
Another idea if order is not important use set comprehension with flatten lists:
L = list(set([y for x in df.B for y in x]))
print (L)
['x50', 'MN100', 'x20', 'sss00', 'y-1']
I have a function:
def func(x):
y = pd.read_csv(x)
return y
In case i have to loop this function for severeal inputs i am expected to have combined dataframe from all those inputs as output:
list = ["a.csv", "b.csv", "d.csv", "e.csv"]
for i in list:
m = func(x)
How can i get value of m as combined dataframes from all the input files?
For combine DataFrames use concat in list comprehension:
Notice: Dont use list like variable, because python code word.
L = ["a.csv", "b.csv", "d.csv", "e.csv"]
df1 = pd.concat([func(x) for i in L])
Or in loop:
out = []
for i in L:
m = func(x)
out.append(m)
df1 = pd.concat(out)
I have this List of lists containing string values:
List = [['138.314038', '-35.451642'],
['138.313946', '-35.45212'],
['138.313395', '-35.45291'],
['138.312425', '-35.453978'],
['138.311697', '-35.454879'],
['138.311042', '-35.45569'],
['138.310407', '-35.45647'],
['138.315603', '-35.44981'],
['138.315178', '-35.450241'],
['138.314603', '-35.450948'],
['138.314038', '-35.45164']]
I am trying to transform each string value in the list of lists into a float value.
I was trying:
results = [float(i) for i in List]
But I am only indexing the lists and not the values inside. How can I do it using a similar approach and keeping the same structure of the variable List.
You have list, so use a double comprehension:
results = [[float(i) for i in e] for e in List]
I am using numpy convert it
np.array(List).astype(float).tolist()
Out[185]:
[[138.314038, -35.451642],
[138.313946, -35.45212],
[138.313395, -35.45291],
[138.312425, -35.453978],
[138.311697, -35.454879],
[138.311042, -35.45569],
[138.310407, -35.45647],
[138.315603, -35.44981],
[138.315178, -35.450241],
[138.314603, -35.450948],
[138.314038, -35.45164]]
Maybe ugly using two list maps:
print(list(map(lambda x: list(map(float,x)), List)))
Output:
[[138.314038, -35.451642], [138.313946, -35.45212], [138.313395, -35.45291], [138.312425, -35.453978], [138.311697, -35.454879], [138.311042, -35.45569], [138.310407, -35.45647], [138.315603, -35.44981], [138.315178, -35.450241], [138.314603, -35.450948], [138.314038, -35.45164]]
Print it better:
pprint.pprint(list(map(lambda x: list(map(float,x)), List)))
Output:
[[138.314038, -35.451642],
[138.313946, -35.45212],
[138.313395, -35.45291],
[138.312425, -35.453978],
[138.311697, -35.454879],
[138.311042, -35.45569],
[138.310407, -35.45647],
[138.315603, -35.44981],
[138.315178, -35.450241],
[138.314603, -35.450948],
[138.314038, -35.45164]]
#you can use map function as well
results = [list(map(float,x)) for x in List]
you can expand the list, like this:
results = [list(map(float, l)) for l in List]
You could use map to achieve it.
floats = [ list(map(float, i)) for i in List ]