How to extend a list inside a pandas dataframe - python

I have a pandas data frame, and each element of one of its columns is a list.
Then I have a list with the same amount of elements as rows in the pandas data frame; I want to extend the list inside pandas with this new list.
So, for example, if this is the data frame.
my_column
[1, 2]
[3, 4]
df = pd.DataFrame({'my_column':[[1, 2], [3, 4]]})
and this is the external list
external_list = [[5, 6], [7, 8, 9]]
I want to extend each of the lists of the data frame, so the final result is:
my_column
[1, 2, 5, 6]
[3, 4, 7, 8, 9]
For now, what I have is:
for index, row in data.iterrows():
df["my_column"].loc[index] = row["my_column"].extend(external_list[index])
Is there a more pythonic way?

df = pd.DataFrame({'my_column':[[1, 2], [3, 4]]})
lst = [[5, 6], [7, 8, 9]]
One way:
df['my_column'] += pd.Series(lst)
Another way: You can zip the column values with list values and use list comprehension:
df['my_column'] = [l1 + l2 for l1, l2 in zip(df['my_column'].tolist(), lst)]
Output:
my_column
0 [1, 2, 5, 6]
1 [3, 4, 7, 8, 9]

I am not sure whether or not it's pythonic enough but you can do it this way:
data = {"my_column":[[1, 2], [3, 4]]}
df = pd.DataFrame(data)
list2 = [[5, 6], [7, 8, 9]]
df["my_column"] = [list1 + list2[i] for i, list1 in df["my_column"].iteritems()]

Related

pandas, access a series of lists as a set and take the set difference of 2 set series

Given 2 pandas series, both consisting of lists (i.e. each row in the series is a list), I want to take the set difference of 2 columns
For example, in the dataframe...
pd.DataFrame({
'A': [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
'B': [[1, 2], [5, 6], [7, 8, 9]]
})
I want to create a new column C, that is set(A) - set(B)...
pd.DataFrame({
'C': [[3], [4], []]
})
Thanks to: https://www.geeksforgeeks.org/python-difference-two-lists/
def Diff(li1, li2):
return list(set(li1) - set(li2)) + list(set(li2) - set(li1))
df['C'] = df.apply(lambda x: Diff(x['A'], x['B']), axis=1)
Output
A B C
0 [1, 2, 3] [1, 2] [3]
1 [4, 5, 6] [5, 6] [4]
2 [7, 8, 9] [7, 8, 9] []

Split the first and second part of list of list into two sublists

I am facing an issue given below and I want to separate the 0th element and the 1st element in two separate lists. for eg I have a list
a = [[1, 2], [3,4], [5,6], [7,8]]
I want two lists like:
a0 = [1,3,5,7]
a2 = [2,4,6,8]
Can anyone help me with this please?
You can use zip
a = [[1, 2], [3, 4], [5, 6], [7, 8]]
a = [[*x] for x in zip(*a)]
print(a[0], a[1]) # [1, 3, 5, 7] [2, 4, 6, 8]
You can do this -
a = [[1, 2],[3,4],[5,6],[7,8]]
a0 = []
a2 = []
for i in a:
a0.append(i[0])
a2.append(i[1])
print(a0)
print(a2)
Using a list comprehension:
a = [[1, 2],[3,4],[5,6],[7,8]]
a0 = [x[0] for x in a]
a2 = [x[1] for x in a]
print(a0) # [1, 3, 5, 7]
print(a2) # [2, 4, 6, 8]
a = [[1, 2],[3,4],[5,6],[7,8]]
a0 = []
a1 = []
for x in a:
a0.append(x[0])
a1.append(x[1])
use chain to flatten the list, then calculate the middle index, then split it into two lists from the middle index
In [1]: from itertools import chain
In [2]: a = [[1, 2],[3,4],[5,6],[7,8]]
In [3]: flat = list(chain.from_iterable(a))
In [4]: flat
Out[4]: [1, 2, 3, 4, 5, 6, 7, 8]
In [5]: middle = len(flat) // 2
In [6]: first_half = flat[:middle]
In [7]: second_half = flat[middle:]
In [10]: first_half
Out[10]: [1, 2, 3, 4]
In [11]: second_half
Out[11]: [5, 6, 7, 8]
You can try below code
import numpy as np
a = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
np.array(a).T.tolist()
gives as
[[1, 3, 5, 7, 9], [2, 4, 6, 8, 10]]

Join two lists into one row-wise

I have two lists and need to join them by rows. The output looks like list3 (below). Actually there aren't any commas between the bracketed pairs. I've tried a few different things and can't figure this out.
list1 = [1, 2, 3]
list2 = [4, 5, 6]
my desired output is: list3 = [[1 4], [2 5], [3 6]]
Use zip
[[i, j] for i,j in zip(list1 , list1 )]
[[1, 4], [2, 5], [3, 6]]
I am not sure if you want the final list to be list of lists or not but I will assume you do,
list1 = [1, 2, 3]
list2 = [4, 5, 6]
res = [[*np.asarray([list1, list2])[:,i]] for i in range(3)]
res = [[1, 4], [2, 5], [3, 6]]

Subtract previous list from current list in a list of lists loop

I have a list of dataframes with data duplicating in every next dataframe within list which I need to subtract between themselves
the_list[0] = [1, 2, 3]
the_list[1] = [1, 2, 3, 4, 5, 6, 7]
There are also df headers. Dataframes are only different in number of rows.
Wanted solution:
the_list[0] = [1, 2, 3]
the_list[1] = [4, 5, 6, 7]
Due to the fact that my list of lists, the_list, contains several dataframes, I have to work backward and go from the last df to first with first remaining intact.
My current code (estwin is the_list):
estwin = [df1, df2, df3, df4]
output=([])
estwin.reverse()
for i in range(len(estwin) -1):
difference = Diff(estwin[i], estwin[i+1])
output.append(difference)
return(output)
def Diff(li_bigger, li_smaller):
c = [x for x in li_bigger if x not in li_smaller]
return (c)
Currently, the result is an empty list. I need an updated the_list that contains only the differences (no duplicate values between lists).
You should not need to go backward for this problem, it is easier to keep track of what you have already seen going forward.
Keep a set that gets updated with new items as you traverse through each list, and use it to filter out the items that should be present in the output.
list1 = [1,2,3]
list2 = [1,2,3,4,5,6,7]
estwin = [list1, list2]
lookup = set() #to check which items/numbers have already been seen.
output = []
for lst in estwin:
updated_lst = [i for i in lst if i not in lookup] #only new items present
lookup.update(updated_lst)
output.append(updated_lst)
print(output) #[[1, 2, 3], [4, 5, 6, 7]]
Your code is not runnable, but if I guess what you meant to write, it works, except that you have one bug in your algorithm:
the_list = [
[1, 2, 3],
[1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7, 8, 9]
]
def process(lists):
output = []
lists.reverse()
for i in range(len(lists)-1):
difference = diff(lists[i], lists[i+1])
output.append(difference)
# BUGFIX: Always add first list (now last becuase of reverse)
output.append(lists[-1])
output.reverse()
return output
def diff(li_bigger, li_smaller):
return [x for x in li_bigger if x not in li_smaller]
print(the_list)
print(process(the_list))
Output:
[[1, 2, 3], [1, 2, 3, 4, 5, 6, 7], [1, 2, 3, 4, 5, 6, 7, 8, 9]]
[[1, 2, 3], [4, 5, 6, 7], [8, 9]]
One-liner:
from itertools import chain
l = [[1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]]
new_l = [sorted(list(set(v).difference(chain.from_iterable(l[:num]))))
for num, v in enumerate(l)]
print(new_l)
# [[1, 2], [3], [4], [5]]

Shuffle a dictionary of lists aggregating by rows

I have a defaultfict(list) that might look like this
d = {0: [2, 4, 5], 1: [5, 6, 1]}
that I need to shuffle all the first elements from all of the lists together, and move one to the second and third rows. So in this example I need to take [2, 5], [4, 6], [5, 1] shuffle them and then put them back. At the end my dictionary might look like this
d = {0: [5, 4, 1], 1: [2, 6, 5]}
is there a pythonic way of doing this avoiding loops?
What I have until now is a way to extract and aggregate all the first, second, etc., elements of the lists and shuffle them using this
[random.sample([tmp_list[tmp_index] for tmp_list in d.values()], 2) for tmp_index in range(3)]
that will create the following
[[2, 5], [4, 6], [5, 1]]
and then in order to create my final shuffled-by-rows dictionary I use simple for loops.
Get a transposed version of the dict values:
>>> data = [list(v) for v in zip(*d.values())]
>>> data
[[2, 5], [4, 6], [5, 1]]
Shuffle them in-place
>>> for x in data:
... random.shuffle(x)
...
>>> data
[[5, 2], [4, 6], [5, 1]]
Transpose the data again
>>> data = zip(*data)
Assign the new values to the dict
>>> for x, k in zip(data, d):
... d[k][:] = x # Could also be written as d[k] = list(x)
...
>>> d
{0: [5, 4, 5], 1: [2, 6, 1]}

Categories