Convert comma-separated values into integer list in pandas dataframe

Convert comma-separated values into integer list in pandas dataframe - python

How to convert a comma-separated value into a list of integers in a pandas dataframe?
Input:
Desired output:

There are 2 steps - split and convert to integers, because after split values are lists of strings, solution working well also if different lengths of lists (not added Nones):
df['qty'] = df['qty'].apply(lambda x: [int(y) for y in x.split(',')])
Or:
df['qty'] = df['qty'].apply(lambda x: list(map(int, x.split(','))))
Alternative solutions:
df['qty'] = [[int(y) for y in x.split(',')] for x in df['qty']]
df['qty'] = [list(map(int, x.split(','))) for x in df['qty']]

Or try expand=True:
df['qty'] = df['qty'].str.split(',', expand=True).astype(int).agg(list, axis=1)

Vectorised solution :
import ast
df["qty"] = ("[" + df["qty"].astype(str) + "]").apply(ast.literal_eval)

Related

Remove NaN from lists in python

I have converted data frame rows to lists and in those list there are NaN values which I would like to remove.
This is my attempt but the NaN values are not removed
import pandas as pd
df = pd.read_excel('MainFile.xlsx', dtype = str)
df_list = df.values.tolist()
print(df_list)
print('=' * 50)
for l in df_list:
newlist = [x for x in l if x != 'nan']
print(newlist)
Here's a snapshot of the original data
I could find a solution using these lines (but I welcome any ideas)
for l in df_list:
newlist = [x for x in l if x == x]
print(newlist)

It is not working because you are trying to compare it to string 'nan'.
If excel cell is empty it is returned as NaN value in pandas.
You can use numpy library, to compare it with NaN:
import numpy as np
for l in df_list:
newlist = [x for x in l if x != np.nan]
print(newlist)
EDIT:
If you want to get all values from the dataframe, which are not NaN, you can just do:
df.stack().tolist()
If you want to print values with the loop (as in your example), you can do:
for l in df.columns:
print(list(df[l][df[l].notna()]))
To create nested list with a loop:
main = []
for l in df.T.columns:
new_list = list(df.T[l][df.T[l].notna()])
main.append(new_list)
print(main)

You can always try the approach that is proposed here:
import numpy as np
newlist = [x for x in df_list if np.isnan(x) == False]
print(newlist)
I hope that this will help.

I get words with commas between letters instead of between words

I get a strings with | between letters instead of between the words
for example w|o|r|d,|o|k
instead of word|ok
It is also not possible to do join on list of lists but a list of strings
data = pd.read_csv('data.csv')
intersect1=""
for j in range(len(data)):
x=str(data.iloc[j, 1])
#print(x)
x=x.split("|")
x = x[:-1]
y=str(data.iloc[j, 2])
y=y.split("|")
y = y[:-1]
intersect= list(set(x) & set(y))
#intersect1.append(intersect)
intersect1+=str(intersect)
print(intersect1)
print ("|".join(intersect1))
#print(intersect1)
#intersect1.join("|")
#print(intersect1)
#print(intersect1)
df3 = pd.DataFrame(list(intersect1))
df3.to_csv('intersect1.csv')
Thank you in advance

Get unique values from multiple lists in Pandas column

How can I join the multiple lists in a Pandas column 'B' and get the unique values only:
A B
0 10 [x50, y-1, sss00]
1 20 [x20, MN100, x50, sss00]
2 ...
Expected output:
[x50, y-1, sss00, x20, MN100]

You can do this simply by list comprehension and sum() method:
result=[x for x in set(df['B'].sum())]
Now If you print result you will get your desired output:
['y-1', 'x20', 'sss00', 'x50', 'MN100']

If in input data are not lists, but strings first create lists:
df.B = df.B.str.strip('[]').str.split(',')
Or:
import ast
df.B = df.B.apply(ast.literal_eval)
Use Series.explode for one Series from lists with Series.unique for remove duplicates if order is important:
L = df.B.explode().unique().tolist()
#alternative
#L = df.B.explode().drop_duplicates().tolist()
print (L)
['x50', 'y-1', 'sss00', 'x20', 'MN100']
Another idea if order is not important use set comprehension with flatten lists:
L = list(set([y for x in df.B for y in x]))
print (L)
['x50', 'MN100', 'x20', 'sss00', 'y-1']

Generating concatenated dataframe output from for loop over a function

I have a function:
def func(x):
y = pd.read_csv(x)
return y
In case i have to loop this function for severeal inputs i am expected to have combined dataframe from all those inputs as output:
list = ["a.csv", "b.csv", "d.csv", "e.csv"]
for i in list:
m = func(x)
How can i get value of m as combined dataframes from all the input files?

For combine DataFrames use concat in list comprehension:
Notice: Dont use list like variable, because python code word.
L = ["a.csv", "b.csv", "d.csv", "e.csv"]
df1 = pd.concat([func(x) for i in L])
Or in loop:
out = []
for i in L:
m = func(x)
out.append(m)
df1 = pd.concat(out)

Converting to float a list of lists containing strings

I have this List of lists containing string values:
List = [['138.314038', '-35.451642'],
['138.313946', '-35.45212'],
['138.313395', '-35.45291'],
['138.312425', '-35.453978'],
['138.311697', '-35.454879'],
['138.311042', '-35.45569'],
['138.310407', '-35.45647'],
['138.315603', '-35.44981'],
['138.315178', '-35.450241'],
['138.314603', '-35.450948'],
['138.314038', '-35.45164']]
I am trying to transform each string value in the list of lists into a float value.
I was trying:
results = [float(i) for i in List]
But I am only indexing the lists and not the values inside. How can I do it using a similar approach and keeping the same structure of the variable List.

You have list, so use a double comprehension:
results = [[float(i) for i in e] for e in List]

I am using numpy convert it
np.array(List).astype(float).tolist()
Out[185]:
[[138.314038, -35.451642],
[138.313946, -35.45212],
[138.313395, -35.45291],
[138.312425, -35.453978],
[138.311697, -35.454879],
[138.311042, -35.45569],
[138.310407, -35.45647],
[138.315603, -35.44981],
[138.315178, -35.450241],
[138.314603, -35.450948],
[138.314038, -35.45164]]

Maybe ugly using two list maps:
print(list(map(lambda x: list(map(float,x)), List)))
Output:
[[138.314038, -35.451642], [138.313946, -35.45212], [138.313395, -35.45291], [138.312425, -35.453978], [138.311697, -35.454879], [138.311042, -35.45569], [138.310407, -35.45647], [138.315603, -35.44981], [138.315178, -35.450241], [138.314603, -35.450948], [138.314038, -35.45164]]
Print it better:
pprint.pprint(list(map(lambda x: list(map(float,x)), List)))
Output:
[[138.314038, -35.451642],
[138.313946, -35.45212],
[138.313395, -35.45291],
[138.312425, -35.453978],
[138.311697, -35.454879],
[138.311042, -35.45569],
[138.310407, -35.45647],
[138.315603, -35.44981],
[138.315178, -35.450241],
[138.314603, -35.450948],
[138.314038, -35.45164]]

#you can use map function as well
results = [list(map(float,x)) for x in List]

you can expand the list, like this:
results = [list(map(float, l)) for l in List]

You could use map to achieve it.
floats = [ list(map(float, i)) for i in List ]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert comma-separated values into integer list in pandas dataframe - python

How to convert a comma-separated value into a list of integers in a pandas dataframe? Input: Desired output:

Or try expand=True: df['qty'] = df['qty'].str.split(',', expand=True).astype(int).agg(list, axis=1)

Vectorised solution : import ast df["qty"] = ("[" + df["qty"].astype(str) + "]").apply(ast.literal_eval)

Related

Remove NaN from lists in python

I get words with commas between letters instead of between words

Get unique values from multiple lists in Pandas column

Generating concatenated dataframe output from for loop over a function

Converting to float a list of lists containing strings

Categories

Resources