Related
This question already has answers here:
How to Split or break a Python list into Unequal chunks, with specified chunk sizes
(3 answers)
Python For Loop Appending Only Last Value to List
(2 answers)
Closed 1 year ago.
I have two arrays x and y:
x = [2 3 1 1 2 5 7 3 6]
y = [0 0 4 2 4 5 8 4 5 6 7 0 5 3 2 8 1 3 1 0 4 2 4 5 4 4 5 6 7 0]
I want to create a list "z" and want to store group/chunks of numbers from y into z and the size of groups is defined by the values of x.
so z store numbers as
z = [[0,0],[4,2,4],[5],[8],[4,5],[6,7,0,5,3],[2,8,1,3,1,0,4],[2,4,5],[4,4,5,6,7,0]]
I tried this loop:
h=[]
for j in x:
h=[[a] for i in range(j) for a in y[i:i+1]]
But it is only storing for last value of x.
Also I am not sure whether the title of this question is appropriate for this problem. Anyone can edit if it is confusing. Thank you so much.
You're reassigning h each time through the loop, so it ends up with just the last iteration's assignment.
You should append to it, not assign it.
start = 0
for j in x:
h.append(y[start:start+j])
start += j
Another way to do it would be by using (and consuming as you do) an iterator like so:
x = [2, 3, 1, 1, 2, 5, 7, 3, 6]
y = [0, 0, 4, 2, 4, 5, 8, 4, 5, 6, 7, 0, 5, 3, 2, 8, 1, 3, 1, 0, 4, 2, 4, 5, 4, 4, 5, 6, 7, 0]
yi = iter(y)
res = [[next(yi) for _ in range(i)] for i in x]
print(res) # -> [[0, 0], [4, 2, 4], [5], [8], [4, 5], [6, 7, 0, 5, 3], [2, 8, 1, 3, 1, 0, 4], [2, 4, 5], [4, 4, 5, 6, 7, 0]]
Aside of the problem you are facing, and as a general rule to live by, try to give more meaningful names to your variables.
I have a simple dataframe df with a column of lists lists. I would like to generate an additional column based on lists.
The df looks like:
import pandas as pd
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
df
lists
1 [1]
2 [1, 2, 3]
3 [2, 9, 7, 9]
4 [2, 7, 3, 5]
I would like df to look like this:
df
Out[9]:
lists rolllists
1 [1] [1]
2 [1, 2, 3] [1, 1, 2, 3]
3 [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
4 [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
Basically I want to 'sum'/append the rolling 2 lists. Note that row 1, because I only have 1 list 1, rolllists is that list. But in row 2, I have 2 lists that I want appended. Then for row three, append df[2].lists and df[3].lists etc. I have worked on similar things before, reference this:Pandas Dataframe, Column of lists, Create column of sets of cumulative lists, and record by record differences.
In addition, if we can get this part above, then I want to do this in a groupby (so the example below would be 1 group for example, so for instance the df might look like this in the groupby):
Group lists rolllists
1 A [1] [1]
2 A [1, 2, 3] [1, 1, 2, 3]
3 A [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
4 A [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
5 B [1] [1]
6 B [1, 2, 3] [1, 1, 2, 3]
7 B [2, 9, 7, 9] [1, 2, 3, 2, 9, 7, 9]
8 B [2, 7, 3, 5] [2, 9, 7, 9, 2, 7, 3, 5]
I have tried various things like df.lists.rolling(2).sum() and I get this error:
TypeError: cannot handle this type -> object
in Pandas 0.24.1 and unfortunatley in Pandas 0.22.0 the command doesn't error, but instead returns the exact same values as in lists. So Looks like newer versions of Pandas can't sum lists? That's a secondary issue.
Love any help! Have Fun!
You can start with
import pandas as pd
mylists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
mydf=pd.DataFrame.from_dict(mylists,orient='index')
mydf=mydf.rename(columns={0:'lists'})
mydf = pd.concat([mydf, mydf], axis=0, ignore_index=True)
mydf['group'] = ['A']*4 + ['B']*4
# initialize your new series
mydf['newseries'] = mydf['lists']
# define the function that appends lists overs rows
def append_row_lists(data):
for i in data.index:
try: data.loc[i+1, 'newseries'] = data.loc[i, 'lists'] + data.loc[i+1, 'lists']
except: pass
return data
# loop over your groups
for gp in mydf.group.unique():
condition = mydf.group == gp
mydf[condition] = append_row_lists(mydf[condition])
Output
lists Group newseries
0 [1] A [1]
1 [1, 2, 3] A [1, 1, 2, 3]
2 [2, 9, 7, 9] A [1, 2, 3, 2, 9, 7, 9]
3 [2, 7, 3, 5] A [2, 9, 7, 9, 2, 7, 3, 5]
4 [1] B [1]
5 [1, 2, 3] B [1, 1, 2, 3]
6 [2, 9, 7, 9] B [1, 2, 3, 2, 9, 7, 9]
7 [2, 7, 3, 5] B [2, 9, 7, 9, 2, 7, 3, 5]
How about this?
rolllists = [df.lists[1].copy()]
for row in df.iterrows():
index, values = row
if index > 1: # or > 0 if zero-indexed
rolllists.append(df.loc[index - 1, 'lists'] + values['lists'])
df['rolllists'] = rolllists
Or as a slightly more extensible function:
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
def rolling_lists(df, roll_period=2):
new_roll, rolllists = [], [df.lists[1].copy()] * (roll_period - 1)
for row in df.iterrows():
index, values = row
if index > roll_period - 1: # or -2 if zero-indexed
res = []
for i in range(index - roll_period, index):
res.append(df.loc[i + 1, 'lists']) # or i if 0-indexed
rolllists.append(res)
for li in rolllists:
while isinstance(li[0], list):
li = [item for sublist in li for item in sublist] # flatten nested list
new_roll.append(li)
df['rolllists'] = new_roll
return df
Easily extensible to groupby as well, just wrap it in a function and use df.apply(rolling_lists). You can give any number of rolling rows to use as roll_period. Hope this helps!
I have a dataframe that contains a string of varying length in each cell i.e.
Num
(1,2,3,4,5)
(6,7,8)
(9)
(10,11,12)
I want to avoid attempting to perform str.split(',') on the cells that only have one number in them. However, I want all of the single numbers to be converted to a list of one element.
Here is what I have tried, it gives an error that says " 'int' object is not callable"
if(df['Num'].size() > 1):
df['Num'] = df['Num'].str.split(',')
update for clarification:
Index Num
0 2,6,7
1 1,3,6,7,8
2 2,4,7,8,9
3 3,5,8,9,10
4 4,9,10
5 1,2,7
6 1,2,3,6,8
7 2,3,4,7,9
8 3,4,5,8,10
9 4,5,9
10 2,3
11 1,3
12 1,2
13 2,3,4
14 1,3,4
15 1,2,4
16 1,2,3
17 2
18 1
I am trying to take this dataframe and convert each Num row from a string of numbers to a list. I want all of the indices that contain only one number (17 and 18) to be converted to a list containing a single element (itself).
This code below only works if every string is more than one number separated by a ','.
df['Adj'] = df['Adj'].str.split(',')
The output dataframe that I get when I run the above code. Notice the elements that only had one number are now nan.
Index Num
0 [2, 6, 7]
1 [1, 3, 6, 7, 8]
2 [2, 4, 7, 8, 9]
3 [3, 5, 8, 9, 10]
4 [4, 9, 10]
5 [1, 2, 7]
6 [1, 2, 3, 6, 8]
7 [2, 3, 4, 7, 9]
8 [3, 4, 5, 8, 10]
9 [4, 5, 9]
10 [2, 3]
11 [1, 3]
12 [1, 2]
13 [2, 3, 4]
14 [1, 3, 4]
15 [1, 2, 4]
16 [1, 2, 3]
17 NaN
18 NaN
Assuming your column are all strings and you just want the individual numbers as a list of str, this should do the trick:
df['Num'].str.strip('()').str.split(',')
# 0 [1, 2, 3, 4, 5]
# 1 [6, 7, 8]
# 2 [9]
# 3 [10, 11, 12]
# Name: Num, dtype: object
Since not all your data are str type, you'll need to coerce them into str first to ensure the string methods are called properly:
df['Num'].astype(str).str.split(',')
# 0 [2, 6, 7]
# 1 [1, 3, 6, 7, 8]
# 2 [2, 4, 7, 8, 9]
# ...
# 16 [1, 2, 3]
# 17 [2]
# 18 [1]
I am trying to perform a simple groupby operation on a Pandas dataframe with list columns (with the goal of concatenating the lists corresponding to each group). It works fine when grouping on a single column, but for reasons I can't explain fails when grouping on two columns. A simplified example:
x = pd.DataFrame({'a':[1,1,2,2],'b':['a','a','a','b'],'c':[[1,2],[3,4],[5,6],[7,8]]})
a b c
0 1 a [1, 2]
1 1 a [3, 4]
2 2 a [5, 6]
3 2 b [7, 8]
Now, grouping on either a or b works as expected:
x.groupby('b')['c'].sum()
b
a [1, 2, 3, 4, 5, 6]
b [7, 8]
dtype: object
x.groupby('a')['c'].sum()
a
1 [1, 2, 3, 4]
2 [5, 6, 7, 8]
dtype: object
But if I try to group on a AND b (i.e. x.groupby(['a','b'])['c'].sum()), it invariably fails with ValueError: Function does not reduce.
On the surface I can't see why this should happen, as either way we're just concatenating lists, but I imagine it has something to do with Pandas internals...
Any workarounds or explanations?
I think it may be a bug, where sum fails when some rows can't be summed, the last two for example will remain split with the double grouping. The workaround is apply:
import pandas as pd
x = pd.DataFrame({'a':[1,1,2,2],'b':['a','a','a','b'],'c':[[1,2],[3,4],[5,6],[7,8]]})
print x
a b c
0 1 a [1, 2]
1 1 a [3, 4]
2 2 a [5, 6]
3 2 b [7, 8]
print x.groupby(('a'))['c'].apply(sum)
a
1 [1, 2, 3, 4]
2 [5, 6, 7, 8]
Name: c, dtype: object
print x.groupby(('a'))['c'].sum()
a
1 [1, 2, 3, 4]
2 [5, 6, 7, 8]
dtype: object
print x.groupby(('a','b'))['c'].apply(sum)
a b
1 a [1, 2, 3, 4]
2 a [5, 6]
b [7, 8]
Name: c, dtype: object
I think you should submit this to the pandas team as well.
Let's say I have a numpy array with the following shape :
nonSortedNonFiltered=np.array([[9,8,5,4,6,7,1,2,3],[1,3,2,6,4,5,7,9,8]])
I want to :
- Sort the array according to nonSortedNonFiltered[1]
- Filter the array according to nonSortedNonFiltered[0] and an array of values
I currently do the sorting with :
sortedNonFiltered=nonSortedNonFiltered[:,nonSortedNonFiltered[1].argsort()]
Which gives : np.array([[9 5 8 6 7 4 1 3 2],[1 2 3 4 5 6 7 8 9]])
Now I want to filter sortedNonFiltered from an array of values, for example :
sortedNonFiltered=np.array([[9 5 8 6 7 4 1 3 2],[1 2 3 4 5 6 7 8 9]])
listOfValues=np.array([8 6 5 2 1])
...Something here...
> np.array([5 8 6 1 2],[2 3 4 7 9]) #What I want to get in the end
Note : Each value in a column of my 2D array is exclusive.
You can use np.in1d to get a boolean mask and use it to filter columns in the sorted array, something like this -
output = sortedNonFiltered[:,np.in1d(sortedNonFiltered[0],listOfValues)]
Sample run -
In [76]: nonSortedNonFiltered
Out[76]:
array([[9, 8, 5, 4, 6, 7, 1, 2, 3],
[1, 3, 2, 6, 4, 5, 7, 9, 8]])
In [77]: sortedNonFiltered
Out[77]:
array([[9, 5, 8, 6, 7, 4, 1, 3, 2],
[1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [78]: listOfValues
Out[78]: array([8, 6, 5, 2, 1])
In [79]: sortedNonFiltered[:,np.in1d(sortedNonFiltered[0],listOfValues)]
Out[79]:
array([[5, 8, 6, 1, 2],
[2, 3, 4, 7, 9]])