Generating concatenated dataframe output from for loop over a function - python

I have a function:
def func(x):
y = pd.read_csv(x)
return y
In case i have to loop this function for severeal inputs i am expected to have combined dataframe from all those inputs as output:
list = ["a.csv", "b.csv", "d.csv", "e.csv"]
for i in list:
m = func(x)
How can i get value of m as combined dataframes from all the input files?

For combine DataFrames use concat in list comprehension:
Notice: Dont use list like variable, because python code word.
L = ["a.csv", "b.csv", "d.csv", "e.csv"]
df1 = pd.concat([func(x) for i in L])
Or in loop:
out = []
for i in L:
m = func(x)
out.append(m)
df1 = pd.concat(out)

Related

Remove NaN from lists in python

I have converted data frame rows to lists and in those list there are NaN values which I would like to remove.
This is my attempt but the NaN values are not removed
import pandas as pd
df = pd.read_excel('MainFile.xlsx', dtype = str)
df_list = df.values.tolist()
print(df_list)
print('=' * 50)
for l in df_list:
newlist = [x for x in l if x != 'nan']
print(newlist)
Here's a snapshot of the original data
I could find a solution using these lines (but I welcome any ideas)
for l in df_list:
newlist = [x for x in l if x == x]
print(newlist)
It is not working because you are trying to compare it to string 'nan'.
If excel cell is empty it is returned as NaN value in pandas.
You can use numpy library, to compare it with NaN:
import numpy as np
for l in df_list:
newlist = [x for x in l if x != np.nan]
print(newlist)
EDIT:
If you want to get all values from the dataframe, which are not NaN, you can just do:
df.stack().tolist()
If you want to print values with the loop (as in your example), you can do:
for l in df.columns:
print(list(df[l][df[l].notna()]))
To create nested list with a loop:
main = []
for l in df.T.columns:
new_list = list(df.T[l][df.T[l].notna()])
main.append(new_list)
print(main)
You can always try the approach that is proposed here:
import numpy as np
newlist = [x for x in df_list if np.isnan(x) == False]
print(newlist)
I hope that this will help.

I get words with commas between letters instead of between words

I get a strings with | between letters instead of between the words
for example w|o|r|d,|o|k
instead of word|ok
It is also not possible to do join on list of lists but a list of strings
data = pd.read_csv('data.csv')
intersect1=""
for j in range(len(data)):
x=str(data.iloc[j, 1])
#print(x)
x=x.split("|")
x = x[:-1]
y=str(data.iloc[j, 2])
y=y.split("|")
y = y[:-1]
intersect= list(set(x) & set(y))
#intersect1.append(intersect)
intersect1+=str(intersect)
print(intersect1)
print ("|".join(intersect1))
#print(intersect1)
#intersect1.join("|")
#print(intersect1)
#print(intersect1)
df3 = pd.DataFrame(list(intersect1))
df3.to_csv('intersect1.csv')
Thank you in advance

Get unique values from multiple lists in Pandas column

How can I join the multiple lists in a Pandas column 'B' and get the unique values only:
A B
0 10 [x50, y-1, sss00]
1 20 [x20, MN100, x50, sss00]
2 ...
Expected output:
[x50, y-1, sss00, x20, MN100]
You can do this simply by list comprehension and sum() method:
result=[x for x in set(df['B'].sum())]
Now If you print result you will get your desired output:
['y-1', 'x20', 'sss00', 'x50', 'MN100']
If in input data are not lists, but strings first create lists:
df.B = df.B.str.strip('[]').str.split(',')
Or:
import ast
df.B = df.B.apply(ast.literal_eval)
Use Series.explode for one Series from lists with Series.unique for remove duplicates if order is important:
L = df.B.explode().unique().tolist()
#alternative
#L = df.B.explode().drop_duplicates().tolist()
print (L)
['x50', 'y-1', 'sss00', 'x20', 'MN100']
Another idea if order is not important use set comprehension with flatten lists:
L = list(set([y for x in df.B for y in x]))
print (L)
['x50', 'MN100', 'x20', 'sss00', 'y-1']

Correct syntax of list comprehension for this nested loop?

I have a dataframe column, df['Traversal'], where each row may contain a string something like 'Paris->France->London'.
The correct output works for the following code:
emptylist = []
for x in df['Traversal']:
for y in x.split('->'):
emptylist.append(y)
I've tried variations of:
emptylist = [y.split('->') for y in df['Traversal']
emptylist = [y for y in x.split('->') for x in df['Traversal']]
The closest I got was a list of lists (split). The end result I would like is a list of all the strings only, not grouped by the 'split' lists.
[e for x in df["Traversal"] for e in x.split('->')]
Also see: Double Iteration in List Comprehension
Why not:
emptylist = [y.split('->') for y in df['Traversal']
cities = []
_ = [cities.extend(t) for t in emptylist]
If you must use list-comprehensions ;)

Python: Combination with criteria

I have the following list of combinations:
a = [(1,10),(2,8),(300,28),(413,212)]
b = [(8,28), (8,15),(10,21),(28,34),(413,12)]
I want to create a new combination list from these two lists which follow the following criteria:
A. List a and List b have common elements.
The second element of the tuple in list a equals the first element of the
Tuple in the list b.
Combination of List a and List b should form a new combination:
d = [(1,10,21),(2,8,28),(2,8,15),(300,28,34)]
All other tuples in both lists which do not satisfy the criteria get ignored.
QUESTIONS
Can I do this criteria based combination using itertools?
What is the most elegant way to solve this problem with/without using modules?
How can one display the output in excel sheet to display each element of a tuple in list d to a separate column such as:
d = [(1,10,21),(2,8,28),(2,8,15),(300,28,34)] is displayed in excel as:
Col A = [1, 2, 2, 300]
Col B = [10,8,8,28]
Col C = [21,28,15,34]
pandas works like a charm for excel.
Here is the code:
a = [(1,10),(2,8),(300,28),(413,212)]
b = [(8,28), (8,15),(10,21),(28,34),(413,12)]
c = [(x, y, t) for x, y in a for z, t in b if y == z]
import pandas as pd
df = pd.DataFrame(c)
df.to_excel('MyFile.xlsx', header=False, index=False)

Categories