index a list of lists in one loop [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
What logic can I use a single index to iterate through two lists in one loop, e.g, using indexes [0,0], [0,1], [1,0], [1,1] for a 2x2 iterating through 1 2 3 4? Here is my best attempt so far:
numbers_list = [[1,2],[3,4]]
letters_list = ['a', 'b', 'c', 'd']
for i in [1,2,3,4]:
indx1, = i%2,
indx2 = i % 2 + i-2
print indx1, indx2
print numbers_list[indx1][indx2], letters_list[i]
desired output is
0 a
1 b
2 c
3 d

Since I don't know the generic structure of your lists, am gonna take the lists you provided. So in a single loop :
for i in range(4):
div,rem = divmod(i,2)
print(numbers_list[div][rem],letters_list[i])
So, we get :
IN : letters_list = ['a', 'b', 'c', 'd']
IN : numbers_list = [[1,2],[3,4]]
OUT : 1 a
2 b
3 c
4 d

Just nest the loops.
lists = [[1,2],[3,4]]
for i in range(len(lists)):
sublist = lists[i]
for j in range(len(sublist)):
element = sublist[j]
print element
This will give:
1
2
3
4
Your method will work, but you have to know the length of the lists. Also all your lists have to be the same length, whereas with this method they can vary.

If you're not opposed to using a built-in function like enumerate() this may work:
numbers_list = [[1,2],[3,4]]
letters_list = ['a', 'b', 'c', 'd']
for i, n in enumerate([1,2,3,4]):
print i, letters_list[i]
output:
0 a
1 b
2 c
3 d

Use itertools.product.
For a 2x3 list:
from itertools import product
numbers_list = [[1, 2, 3], [3, 4, 5]]
for i, j in itertools.product(range(2), range(3)):
print("indices:")
print(i)
print(j)
print("Item:")
print(numbers_list[i][j])
If you just want to flatten the list into a list of tuples containing (index_1, index_2, item), you could do a nested list comprehension:
letter_list = [['A', 'B', 'C'], ['D', 'E', 'F']]
[(i, j, x) for i, y in enumerate(letter_list) for j, x in enumerate(y)]
#Returns [(0, 0, 'A'), (0, 1, 'B'), (0, 2, 'C'), (1, 0, 'D'), (1, 1, 'E'), (1, 2, 'F')]

If you want to be able to handle any shape of two-deep nested lists in a single top-level loop, you could first define a helper iterator function (you only need to do this once):
def nest_iter(x):
for i, a in enumerate(x):
for j, b in enumerate(a):
yield i, j, b
Once you have that, you can use it in the rest of your code as follows:
numbers_list = [[1, 2], [3, 4, 5]]
for i, j, b in nest_iter(numbers_list):
print i, j, b
The output is:
0 0 1
0 1 2
1 0 3
1 1 4
1 2 5
In this example, there are two sub-lists of different lengths.

Related

Finding multiple supersets and subsets for values in a column with python

I am trying to find supersets and subsets for values included in a column(here for letter column) from an excel file. The data looks like this:
id
letter
1
A, B, D, E, F
2
B, C
3
B
4
D, B
5
B, D, A
6
X, Y, Z
7
X, Y
8
E, D
7
G
8
G
For e.g.
'B', 'D,B', 'E,D', 'B,D,A' are subsets of 'A,B,D,E,F',
'B' is a subset of 'B,C',
'X,Y' is a subset of 'X,Y,Z',
'G' is a subset of 'G'.
and
'A,B,D,E,F', 'B,C', 'X,Y,Z' and 'G' are supersets.
I would like to show and store that relation in the separate excel files, first one includes (subsets and their supersets) second one includes supersets, First file:
id
letter
1
A, B, D, E, F
5
B,D,A
8
E,D
4
D,B
3
B
2
B,C
3
B
6
X, Y, Z
7
X, Y
7
G
8
G
Second file:
id
letter
1
A, B, D, E, F
2
B,C
6
X, Y, Z
7
G
One possible solution could be using itertools.combinations and check in every combination if all elements of the one item is in the other.
To find the supersets we take the letter column and convert it to a list of tuples. Then we create all possible combinations each with two elements of that column.
The line a,b = ... is to find the shorter element in that specific combination. a is always the shorter element.
If every letter of a is in b and a is in list out, then we remove it from the list because it is a subset of another element. At the end, out only contains the supersets of your data.
Then we only have to change the elements of the list to joined strings again and filter the df with that list to get your 2nd file (here called df2)
You need to be aware of how you split your strings in the beginning and also joining in the end. If there leading or trailing whitespaces in your data, you need to strip them, otherwise in the end the filter wouldn't match the rows.
EDIT
If you want to get rid of the duplicates at the end, you just need to add .drop_duplicates(subset='letter') at the end after filtering your df2. subset needs to be defined here, since both rows with G have a different value for id, so it wouldn't be considered as duplicate.
df = pd.DataFrame({
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'letter': ['A, B, D, E, F','B, C','B','D, B','B, D, A','X, Y, Z','X, Y','E, D','G','G']})
lst = df['letter'].values.tolist()
lst = list(tuple(item.strip() for item in x.split(',')) for x in lst)
print(lst)
# [('A', 'B', 'D', 'E', 'F'), ('B', 'C'), ('B',), ('D', 'B'), ('B', 'D', 'A'), ('X', 'Y', 'Z'), ('X', 'Y'), ('E', 'D')]
out = lst[:] #copy of lst
for tup1,tup2 in itertools.combinations(lst, 2):
a, b = (tup1, tup2) if len(tup1) < len(tup2) else (tup2, tup1)
# e.g for a,b : (('D','B'), ('B', 'D', 'A'))
if all(elem in b for elem in a) and a in out:
out.remove(a)
print(out)
# [('A', 'B', 'D', 'E', 'F'), ('B', 'C'), ('X', 'Y', 'Z')]
filt = list(map(', '.join, out))
df2 = df.loc[df['letter'].isin(filt), :].drop_duplicates(subset='letter')
print(df2)
Output:
id letter
0 1 A, B, D, E, F
1 2 B, C
5 6 X, Y, Z
8 9 G
Additional Question
get id's of sublists from superset:
You can create a mapping each row of df with id as key and the sublists as value. Then loop through df2 and check if all elements of the sublist are in the supersets.
mapping = df.set_index('id')['letter'].str.split(', ').to_dict()
print(mapping)
{1: ['A', 'B', 'D', 'E', 'F'],
2: ['B', 'C'],
3: ['B'],
4: ['D', 'B'],
5: ['B', 'D', 'A'],
6: ['X', 'Y', 'Z'],
7: ['X', 'Y'],
8: ['E', 'D'],
9: ['G'],
10: ['G']}
Create new column:
#create helper function
def func(row):
sublists = []
for key,value in mapping.items():
check = [val in row for val in value]
if all(check):
sublists.append(key)
return sublists
# apply on each row of df2
df2['sublists'] = [func(row) for row in df2['letter']]
print(df2)
id letter sublists
0 1 A, B, D, E, F [1, 3, 4, 5, 8]
1 2 B, C [2, 3]
5 6 X, Y, Z [6, 7]
8 9 G [9, 10]
or as oneliner if you like to:
df2['sublists'] = [[key for key,value in mapping.items() if all(val in row for val in value)] for row in df2['letter']]
df2

Insert in a for loop gives me an infinite loop, and can't insert what I want where I want

I'm trying to insert different elements in specific indexes of a list.
Given this example:
l = [1,2,3,4,5]
Lets say I want to systematically insert the string 'k' after each value of the list.
For this, I know I can use enumerate:
r = l.copy()
for idx, val in enumerate(l):
r.insert(idx, 'k')
or do it manually with a counter:
index = 0
for i in l:
index += 1
l.insert(index, 'k')
print(index)
if index >=5:
index = 0
break
but when I try both, it just inserts the value as many times as values in the list in the same index:
[1, 'k', 'k', 'k', 'k', 'k', 2, 3, 4, 5]
What am I missing?
Thanks in advance.
Solution:
What I would do is:
l = data
l2 = []
for i in data:
l2.append(i)
l2.append("k")
What is happening in your code:
you start by inserting k at index 1
[1, index 1 , 2, 3, 4] => [1, k, 2, 3, 4]
Then if you insert at index 2,
[1, k, index 2 , 2, 3, 4] => [1,k,k,2,3,4]
etc.
Also as a side note, on a large dataset insert would have to move all the items following in the list, so it would be very slow. My solution eliminates that, but creates a copy of the list.

Adding columns of 2d list if certain value in a row

Lets say I have a 2D list:
mylist = [[3,4,5,'x'],
[6,1,4,'x'],
[4,7,9,'y'],
[0,4,3,'y'],
[5,1,7,'z']]
How would I sum up the second column where the fourth elements are the same (the letters)? Currently I have isolated the fourth elements into a list, avoiding duplicates, with:
newlist = list(set([r[3] for r in mylist]))
Which returns a list ['z', 'y', 'x']
I want it in a format like: [['x', a], ['y', b]..] or in a dictionary like {'x':a,...}
Where a is the sum of the second column where mylist[3]='x', which would be 4+1, and b is the same but with y and would be 7+4. So this example would output [['x', 5], ['y', 11], ['z', 1]]
What would be the best way to do this? Or would numpy/pandas handle it better?
This should do it, I am using zip
mylist = [[3,4,5,'x'],
[6,1,4,'x'],
[4,7,9,'y'],
[0,4,3,'y'],
[5,1,7,'z']]
#Zip all elements in the list
res = list(zip(*mylist))
#Zip the second column and character array
arr = list(zip(res[1], res[3]))
#[(4, 'x'), (1, 'x'), (7, 'y'), (4, 'y'), (1, 'z')]
dct = {}
#Calculate the sum
for num, key in arr:
dct.setdefault(key,0)
dct[key]+=num
print(dct)
#{'x': 5, 'y': 11, 'z': 1}
#Convert dict to list
li = []
for k, v in dct.items():
li.append([k,v])
print(li)
The output will be
[['x', 5], ['y', 11], ['z', 1]]
You could use a Counter (from collections):
from collections import Counter
result = Counter()
for r in mylist:
result[r[3]] += r[1]
You also could do it in a single line:
result = Counter( r[3] for r in mylist for _ in range(r[1]) )
or without using Counter:
result = dict()
for _,value,_,key in map(tuple,mylist): # for r in mylist
result[key] = result.get(key,0) + value # result[r[3]]=result.get(r[3],0)+r[1]
or
result = { r[3]:sum(v[1] for v in mylist if v[3]==r[3]) for r in mylist }
note that the for loops will run faster than the one liners
I do prefer pandas for this purpose like this:
import pandas as pd
mylist = [[3,4,5,'x'],
[6,1,4,'x'],
[4,7,9,'y'],
[0,4,3,'y'],
[5,1,7,'z']]
df = pd.DataFrame(mylist)
this gives:
print(df)
0 1 2 3
0 3 4 5 x
1 6 1 4 x
2 4 7 9 y
3 0 4 3 y
4 5 1 7 z
Working with pandas groupby:
print(df.groupby(3).sum())
0 1 2
3
x 9 5 9
y 4 11 12
z 5 1 7
print(df.groupby(3).sum()[1].to_dict())
{'x': 5, 'y': 11, 'z': 1}
That's it
This can be done by looping over each element in your list, checking the 4th spot for either x or y, and adding to some running total:
mylist = [[3,4,5,'x'],
[6,1,4,'x'],
[4,7,9,'y'],
[0,4,3,'y'],
[5,1,7,'z']]
x_total = 0
y_total = 0
for i in mylist:
if i[3] == "y":
y_total += i[1]
if i[3] == 'x':
x_total += i[1]
print("x: ",x_total)
print("y: ",y_total)
Yet another way could be using defaultdict.
from collections import defaultdict
mylist = [
[3,4,5,'x'],
[6,1,4,'x'],
[4,7,9,'y'],
[0,4,3,'y'],
[5,1,7,'z']
]
d = defaultdict(int)
for l in mylist:
d[l[3]] += l[1]
# d: defaultdict(<class 'int'>, {'x': 5, 'y': 11, 'z': 1})
# dict(d) to convert to regular dict

Loop print of 2 columns in to 1

Suppose I have an array of 2 columns. It looks like this
column1 = [1,2,3,...,830]
column2 = [a,b,c,...]
I want to have output in a print form of single columns, that includes value of both columns one by one. output form: column = [1,a,2,b ....]
I tried to do by this code,
dat0 = np.genfromtxt("\", delimiter = ',')
mu = dat0[:,0]
A = dat0[:,1]
print(mu,A)
R = np.arange(0,829,1)
l = len(mu)
K = np.zeros((l, 1))
txtfile = open("output_all.txt",'w')
for x in mu:
i = 0
K[i,0] = x
dat0[i,1] = M
txtfile.write(str(x))
txtfile.write('\n')
txtfile.write(str(M))
txtfile.write('\n')
print K
I do not understand your code completely, is the reference to numpy really relevant for your question? What is M?
If you have two lists of the same lengths you can get pairs of elements using the zip builtin.
A = [1, 2, 3]
B = ['a', 'b', 'c']
for a, b in zip(A, B):
print(a)
print(b)
This will print
1
a
2
b
3
c
I'm sure there is a better way to do this, but one method is
>>> a = numpy.array([[1,2,3], ['a','b','c'],['d','e','f']])
>>> new_a = []
>>> for column in range(0,a.shape[1]): # a.shape[1] is the number of columns in a
... for row in range(0,a.shape[1]): # a.shape[0] is the number of rows in a
... new_a.append(a[row][column])
...
>>> numpy.array(new_a)
array(['1', 'a', 'd', '2', 'b', 'e', '3', 'c', 'f'],
dtype='|S1')

How to iterate over two lists?

I am trying to do something in pyGTk where I build a list of HBoxes:
self.keyvalueboxes = []
for keyval in range(1,self.keyvaluelen):
self.keyvalueboxes.append(gtk.HBox(False, 5))
But I then want to run over the list and assign A text entry & a label into each one both of which are stored in a list.
If your list are of equal length use zip
>>> x = ['a', 'b', 'c', 'd']
>>> y = [1, 2, 3, 4]
>>> z = zip(x,y)
>>> z
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> for l in z: print l[0], l[1]
...
a 1
b 2
c 3
d 4
>>>
Check out http://docs.python.org/library/functions.html#zip. It lets you iterate over two lists at the same time.

Categories