Dictionary where list is value as dataframe - python

This may be an incorrect way to use dataframes, but I have a dictionary where the values are a list of items. Such as:
my_dict = {'a':[1,2,3], 'b':[3,4,5]}
I want to create a data frame where the indices are the keys and there is one column, where the value is the list. This is the output I'd like to see:
In [69]: my_df
Out[69]:
0
a [1, 2, 3]
b [3, 4, 5, 6]
This is the closest I've gotten, by changing the dictionary value to a list of lists and using a transpose. What is the better way?
In [64]: my_dict = {'a':[[1,2,3]], 'b':[[3,4,5,6]]}
In [65]: my_df = pd.DataFrame(my_dict)
In [66]: print my_df
a b
0 [1, 2, 3] [3, 4, 5, 6]
In [67]: my_df.T
Out[67]:
0
a [1, 2, 3]
b [3, 4, 5, 6]
Thanks for the help!

import pandas as pd
my_dict = {'a':[1,2,3], 'b':[3,4,5]}
pd.DataFrame([[i] for i in my_dict.values()],index=my_dict)
Out[3]:
0
a [1, 2, 3]
b [3, 4, 5]
But as what you have is more of a Series than a DataFrame:
pd.Series(my_dict)
Out[4]:
a [1, 2, 3]
b [3, 4, 5]
and if you need to, you can convert it to a DataFrame:
pd.DataFrame(pd.Series(my_dict))
Out[5]:
0
a [1, 2, 3]
b [3, 4, 5]

Related

Python Dataframe subtract a value from each list of a row

I have a data frame consisting of lists as elements. I want to subtract a value from each list and create a new column.
My code:
df = pd.DataFrame({'A':[[1,2],[4,5,6]]})
df
A
0 [1, 2]
1 [4, 5, 6]
# lets substract 1 from each list
val = 1
df['A_new'] = df['A'].apply(lambda x:[a-b for a,b in zip(x[0],[val]*len(x[0]))],axis=1)
Present solution:
IndexError: index 3 is out of bounds for axis 0 with size 2
Expected solution:
df
A A_new
0 [1, 2] [0, 1]
1 [4, 5, 6] [3, 4, 5]
Convert to numpy array
df['A_new'] = df.A.map(np.array)-1
Out[455]:
0 [0, 1]
1 [3, 4, 5]
Name: A, dtype: object
df['A_new'] = df['A'].apply(lambda x:[a-b for a,b in zip(x,[val]*len(x))])
You have to pass the list to the len function. Here x is the list itself. So indexing it, x[0] just returns a number which is wrong given the context. This gives the output:
A A_new
0 [1, 2] [0, 1]
1 [4, 5, 6] [3, 4, 5]
How about a simple list comprehension:
df['new'] = [[i - 1 for i in l] for l in df['A']]
A new
0 [1, 2] [0, 1]
1 [4, 5, 6] [3, 4, 5]
You can convert the list to np.array and then subtract the val:
import numpy as np
df['A_new'] = df['A'].apply(lambda x: np.array(x) - val)
Output:
A A_new
0 [1, 2] [0, 1]
1 [4, 5, 6] [3, 4, 5]

Sort list of string numbers that are in a column of a data frame

I have a data frame whose one column contains lists of string number
Col1
['1']
['1']
['1','3','4']
['2','3','1','4','5']
How can I sort this number? I have tried to adapt the answer given here.
I would like to have a sorted list of integers instead of strings.
Try this
df = pd.DataFrame({'Col1':[['1'],['1'],['1','3','4'],['2','3','1','4','5']]})
# use a list comprehension in which map list elements to int and sort the list
df['sorted'] = [sorted(map(int, row)) for row in df['Col1']]
print(df)
Col1 sorted
0 [1] [1]
1 [1] [1]
2 [1, 3, 4] [1, 3, 4]
3 [2, 3, 1, 4, 5] [1, 2, 3, 4, 5]
Use:
In [599]: df['Col1'] = df.Col1.apply(lambda x: sorted(map(int, x)))
In [600]: df
Out[600]:
Col1
0 [1]
1 [1]
2 [1, 3, 4]
3 [1, 2, 3, 4, 5]

Find new values in a column with a datatype of list per row

Dataframe:
type previous current
a [1,2,3] [1,2,3,5]
b [1,2,3] [1,2,3,9]
c [1,2,3] [1,2,3]
Hello there, i'm having a hard time figuring out how to get the unique value that is available in the current compared the previous column, is it possible what i'm trying?
Output:
type result
a 5
b 9
c 0
This is one approach using set
Ex:
df = pd.DataFrame({"previous": [[1,2,3], [1,2,3], [1,2,3]], "current": [[1,2,3,5], [1,2,3,9], [1,2,3]]})
df['result'] = df['current'].apply(set)- df['previous'].apply(set)
print(df)
Output:
previous current result
0 [1, 2, 3] [1, 2, 3, 5] {5}
1 [1, 2, 3] [1, 2, 3, 9] {9}
2 [1, 2, 3] [1, 2, 3] {}
x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 3, 4]
x=set(x)
y=set(y)
z = x.difference(y)
print(z)
#output 5, 6

How to get maximum and minimum of a list in column?

Given that, I have a dataframe as below:
import pandas as pd
import numpy as np
dict = {
"A": [[1,2,3,4],[3],[2,8,4],[5,8]]
}
dt = pd.DataFrame(dict)
I wish to have the Maximum and minimum of each row in column B. My favorite output is:
A B
0 [1, 2, 3, 4] [1,4]
1 [3] [3,3]
2 [2, 8, 4] [2,8]
3 [5, 8] [5,8]
What I already tried is the below code which does not work:
dt["B"] =[np.min(dt.A), np.max(dt.A)]
Like this:
In [1592]: dt['B'] = dt.A.apply(lambda x: [min(x), max(x)])
In [1593]: dt
Out[1593]:
A B
0 [1, 2, 3, 4] [1, 4]
1 [3] [3, 3]
2 [2, 8, 4] [2, 8]
3 [5, 8] [5, 8]
As suggested by #Ch3steR, using map since it's faster:
dt['B'] = dt.A.map(lambda x: [min(x), max(x)])
You can create DataFrame, then minimal and maximal values by DataFrame.agg, convert to lists and assign back if requirement is no loops (Apply are loops under the hood):
df = pd.DataFrame(dt.A.tolist())
dt['B'] = df.agg(['min','max'], axis=1).astype(int).values.tolist()
print (dt)
A B
0 [1, 2, 3, 4] [1, 4]
1 [3] [3, 3]
2 [2, 8, 4] [2, 8]
3 [5, 8] [5, 8]
If no problem with loops another solution with list comprehension, it should be faster like apply, depends of real data:
dt['B'] = [[min(x), max(x)] for x in dt.A]
Just an alternative with explode:
dt['B'] = (dt['A'].explode().astype(int).groupby(level=0).agg(['min','max'])
.to_numpy().tolist())
print(dt)
A B
0 [1, 2, 3, 4] [1, 4]
1 [3] [3, 3]
2 [2, 8, 4] [2, 8]
3 [5, 8] [5, 8]
Use list comprehension on sorted values in dt.A
dt['B']= [[row[0], row[-1]] for row in dt.A.map(lambda x: sorted(x))]

Python Pandas from dictionary

I have a dictionary
x={'XYZ': [4, 5, 6], 'ABC': [1, 2, 3]}
I want a pd.DataFrame like this:
'SomeColumnName'
'XYZ' [4,5,6]
'ABC' [1,2,3]
Whatever I do, it splits the list of x.values() in 3 separate columns. I could do a '~'.join before creating the Dataframe. Just wondering if there was an easier way
Why don't you just input the data as:
x={'XYZ': [[4, 5, 6]], 'ABC': [[1, 2, 3]]}
Then you get:
In [7]: pd.DataFrame(x).transpose()
Out[7]:
0
ABC [1, 2, 3]
XYZ [4, 5, 6]
You can recode your dictionary using:
for key in x.keys():
x[key] = [x[key]]
Ok, this is how I did it
z = pd.DataFrame.from_records(list(x.items()),columns=['A','SomeColumnName'],index='A')
Problem was - I wasnt using list() for data

Categories