Convert list into table - python

Convert list into table - python - python

I have two arrays. column_names hold the column titles. values hold all the values.
I understand if I do this:
column_names = ["a", "b", "c"]
values = [1, 2, 3]
for n, v in zip(column_names, values):
print("{} = {}".format(n, v))
I get
a = 1
b = 2
c = 3
How do I code it so if I pass:
column_names = ["a", "b", "c"]
values = [1, 2, 3, 4, 5, 6, 7, 8, 9]
I would get
a = 1, 4, 7
b = 2, 5, 8
c = 3, 6, 9
Thank you!

With pandas and numpy it is easy and the result will be a much more useful table. Pandas excels at arranging tabular data. So lets take advantage of it:
install pandas with:
pip install pandas --user
#pandas comes with numpy
import numpy as np
import pandas as pd
# this makes a normal python list for integers 1-9
input = list(range(1,10))
#lets convert that to numpy array as np.array
num = np.array(input)
#currently its shape is single dimensional, lets change that to a two dimensional matrix that turns it into the clean breaks you want
reshaped = num.reshape(3,3)
#now construct a beautiful table
pd.DataFrame(reshaped, columns=['a','b','c'])
#ouput is
a b c
0 1 2 3
1 4 5 6
2 7 8 9

You can do it as follows
>>> for n, v in zip(column_names, zip(*[values[i:i+3] for i in range(0,len(values),3)])):
... print("{} = {}".format(n, ', '.join(map(str, v))))
...
a = 1, 4, 7
b = 2, 5, 8
c = 3, 6, 9
Alternatively, you can use grouper defined in itertools
>>> def grouper(iterable, n, fillvalue=None):
... "Collect data into fixed-length chunks or blocks"
... # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
... args = [iter(iterable)] * n
... return zip_longest(*args, fillvalue=fillvalue)
...
>>> from itertools import zip_longest
>>> for n, v in zip(column_names, zip(*grouper(values, 3))):
... print("{} = {}".format(n, ', '.join(map(str, v))))
...
a = 1, 4, 7
b = 2, 5, 8
c = 3, 6, 9

itertools.cycle seems appropriate in this case. Here's another version for future readers:
import itertools
column_names = ["a", "b", "c"]
values = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L = zip(itertools.cycle(column_names), values)
for g, v in itertools.groupby(sorted(L), lambda x: x[0]):
print("{} = {}".format(g, [i[1] for i in v]))
gives:
a = [1, 4, 7]
b = [2, 5, 8]
c = [3, 6, 9]

This has two sub-steps that you want to do.
First, you want to divide your list into chunks, and then you want to assign those chunks to a dictionary.
To split the list into chunks, we can create a function:
def chunk(values, chunk_size):
assert len(values)%chunk_size == 0 # Our chunk size has to evenly fit in our list
steps = len(values)/chunk_size
chunky_list = []
for i in range(0,steps):
position = 0 + i
sub_list = []
while position < len(values):
sub_list.append(values[position])
position += chunk_size
chunky_list.append(sub_list)
return chunky_list
At this point we will have:
[[1,4,7],[2,5,8],[3,6,9]]
From here, creating the dict is really easy. First, we zip the two lists together:
zip(column_names, chunk(3))
And take advantage of the fact that Python knows how to convert a list of tuples into a dictionary:
dict(zip(column_names, chunk(3)))

You can also use slicing and a collections.defaultdict to collect your values:
from collections import defaultdict
column_names = ["a", "b", "c"]
values = [1, 2, 3, 4, 5, 6, 7, 8, 9]
column_len = len(column_names)
d = defaultdict(list)
for i in range(0, len(values), column_len):
seq = values[i:i+column_len]
for idx, number in enumerate(seq):
d[column_names[idx]].append(number)
for k, v in d.items():
print('%s = %s' % (k, ', '.join(map(str, v))))
Which Outputs:
a = 1, 4, 7
b = 2, 5, 8
c = 3, 6, 9
This can be imporoved if you create zipped lists with itertools.cycle, avoiding the slicing all together:
from collections import defaultdict
from itertools import cycle
column_names = ["a", "b", "c"]
values = [1, 2, 3, 4, 5, 6, 7, 8, 9]
column_names = cycle(column_names)
d = defaultdict(list)
for column, val in zip(column_names, values):
d[column].append(val)
for k, v in d.items():
print('%s = %s' % (k, ', '.join(map(str, v))))

Related

Python adding a list to a slice of another list

Here's basic problem:
>>> listb = [ 1, 2, 3, 4, 5, 6, 7 ]
>>> slicea = slice(2,5)
>>> listb[slicea]
[3, 4, 5]
>>> lista = listb[slicea]
>>> lista
[3, 4, 5]
>>> listb[slicea] += lista
>>> listb
[1, 2, 3, 4, 5, 3, 4, 5, 6, 7]
listb should be
[1, 2, 6, 8, 10, 6, 7]
But 3, 4, 5 was inserted after 3, 4, 5 not added to it.
tl;dr
I have this code that's not working:
self.lib_tree.item(song)['values'][select_values] = adj_list
self.lib_tree.item(album)['values'][select_values] += adj_list
self.lib_tree.item(artist)['values'][select_values] += adj_list
The full code is this:
def toggle_select(self, song, album, artist):
# 'values' 0=Access, 1=Size, 2=Selected Size, 3=StatTime, 4=StatSize,
# 5=Count, 6=Seconds, 7=SelSize, 8=SelCount, 9=SelSeconds
# Set slice to StatSize, Count, Seconds
total_values = slice(4, 7) # start at index, stop before index
select_values = slice(7, 10) # start at index, stop before index
tags = self.lib_tree.item(song)['tags']
if "songsel" in tags:
# We will toggle off and subtract from selected parent totals
tags.remove("songsel")
self.lib_tree.item(song, tags=(tags))
# Get StatSize, Count and Seconds
adj_list = [element * -1 for element in \
self.lib_tree.item(song)['values'][total_values]]
else:
tags.append("songsel")
self.lib_tree.item(song, tags=(tags))
# Get StatSize, Count and Seconds
adj_list = self.lib_tree.item(song)['values'][total_values] # 1 past
self.lib_tree.item(song)['values'][select_values] = adj_list
self.lib_tree.item(album)['values'][select_values] += adj_list
self.lib_tree.item(artist)['values'][select_values] += adj_list
if self.debug_toggle < 10:
self.debug_toggle += 1
print('artist,album,song:',self.lib_tree.item(artist, 'text'), \
self.lib_tree.item(album, 'text'), \
self.lib_tree.item(song, 'text'))
print('adj_list:',adj_list)
The adj_list has the correct values showing up in debug.
How do I add a list of values to the slice of a list?

The behavior you want is not a feature of any Python built-in type; + with built-in sequences means concatenation, not element-wise addition. But numpy arrays will do what you want, so I'd suggest looking into numpy. Simple example:
>>> import numpy as np
>>> a = np.array([2,3,4], dtype=np.int64)
>>> b = np.array([5,6,7], dtype=np.int64)
>>> a += b
>>> a
array([ 7, 9, 11])
>>> print(a)
[ 7 9 11]
>>> print(a.tolist())
[7, 9, 11]
Note that the output (both repr and str forms) looks a little different from Python lists, but you can convert back to a plain Python list if needed.

How to efficiently count each element in a list in Python? [duplicate]

This question already has answers here:
Using a dictionary to count the items in a list
(8 answers)
Closed 7 months ago.
Given an unordered list of values like
a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
How can I get the frequency of each value that appears in the list, like so?
# `a` has 4 instances of `1`, 4 of `2`, 2 of `3`, 1 of `4,` 2 of `5`
b = [4, 4, 2, 1, 2] # expected output

In Python 2.7 (or newer), you can use collections.Counter:
>>> import collections
>>> a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
>>> counter = collections.Counter(a)
>>> counter
Counter({1: 4, 2: 4, 5: 2, 3: 2, 4: 1})
>>> counter.values()
dict_values([2, 4, 4, 1, 2])
>>> counter.keys()
dict_keys([5, 1, 2, 4, 3])
>>> counter.most_common(3)
[(1, 4), (2, 4), (5, 2)]
>>> dict(counter)
{5: 2, 1: 4, 2: 4, 4: 1, 3: 2}
>>> # Get the counts in order matching the original specification,
>>> # by iterating over keys in sorted order
>>> [counter[x] for x in sorted(counter.keys())]
[4, 4, 2, 1, 2]
If you are using Python 2.6 or older, you can download an implementation here.

If the list is sorted, you can use groupby from the itertools standard library (if it isn't, you can just sort it first, although this takes O(n lg n) time):
from itertools import groupby
a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
[len(list(group)) for key, group in groupby(sorted(a))]
Output:
[4, 4, 2, 1, 2]

Python 2.7+ introduces Dictionary Comprehension. Building the dictionary from the list will get you the count as well as get rid of duplicates.
>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>> d = {x:a.count(x) for x in a}
>>> d
{1: 4, 2: 4, 3: 2, 4: 1, 5: 2}
>>> a, b = d.keys(), d.values()
>>> a
[1, 2, 3, 4, 5]
>>> b
[4, 4, 2, 1, 2]

Count the number of appearances manually by iterating through the list and counting them up, using a collections.defaultdict to track what has been seen so far:
from collections import defaultdict
appearances = defaultdict(int)
for curr in a:
appearances[curr] += 1

In Python 2.7+, you could use collections.Counter to count items
>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>>
>>> from collections import Counter
>>> c=Counter(a)
>>>
>>> c.values()
[4, 4, 2, 1, 2]
>>>
>>> c.keys()
[1, 2, 3, 4, 5]

Counting the frequency of elements is probably best done with a dictionary:
b = {}
for item in a:
b[item] = b.get(item, 0) + 1
To remove the duplicates, use a set:
a = list(set(a))

You can do this:
import numpy as np
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
np.unique(a, return_counts=True)
Output:
(array([1, 2, 3, 4, 5]), array([4, 4, 2, 1, 2], dtype=int64))
The first array is values, and the second array is the number of elements with these values.
So If you want to get just array with the numbers you should use this:
np.unique(a, return_counts=True)[1]

Here's another succint alternative using itertools.groupby which also works for unordered input:
from itertools import groupby
items = [5, 1, 1, 2, 2, 1, 1, 2, 2, 3, 4, 3, 5]
results = {value: len(list(freq)) for value, freq in groupby(sorted(items))}
results
format: {value: num_of_occurencies}
{1: 4, 2: 4, 3: 2, 4: 1, 5: 2}

I would simply use scipy.stats.itemfreq in the following manner:
from scipy.stats import itemfreq
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
freq = itemfreq(a)
a = freq[:,0]
b = freq[:,1]
you may check the documentation here: http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.itemfreq.html

from collections import Counter
a=["E","D","C","G","B","A","B","F","D","D","C","A","G","A","C","B","F","C","B"]
counter=Counter(a)
kk=[list(counter.keys()),list(counter.values())]
pd.DataFrame(np.array(kk).T, columns=['Letter','Count'])

seta = set(a)
b = [a.count(el) for el in seta]
a = list(seta) #Only if you really want it.

Suppose we have a list:
fruits = ['banana', 'banana', 'apple', 'banana']
We can find out how many of each fruit we have in the list like so:
import numpy as np
(unique, counts) = np.unique(fruits, return_counts=True)
{x:y for x,y in zip(unique, counts)}
Result:
{'banana': 3, 'apple': 1}

This answer is more explicit
a = [1,1,1,1,2,2,2,2,3,3,3,4,4]
d = {}
for item in a:
if item in d:
d[item] = d.get(item)+1
else:
d[item] = 1
for k,v in d.items():
print(str(k)+':'+str(v))
# output
#1:4
#2:4
#3:3
#4:2
#remove dups
d = set(a)
print(d)
#{1, 2, 3, 4}

For your first question, iterate the list and use a dictionary to keep track of an elements existsence.
For your second question, just use the set operator.

def frequencyDistribution(data):
return {i: data.count(i) for i in data}
print frequencyDistribution([1,2,3,4])
...
{1: 1, 2: 1, 3: 1, 4: 1} # originalNumber: count

I am quite late, but this will also work, and will help others:
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
freq_list = []
a_l = list(set(a))
for x in a_l:
freq_list.append(a.count(x))
print 'Freq',freq_list
print 'number',a_l
will produce this..
Freq [4, 4, 2, 1, 2]
number[1, 2, 3, 4, 5]

a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
counts = dict.fromkeys(a, 0)
for el in a: counts[el] += 1
print(counts)
# {1: 4, 2: 4, 3: 2, 4: 1, 5: 2}

a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
# 1. Get counts and store in another list
output = []
for i in set(a):
output.append(a.count(i))
print(output)
# 2. Remove duplicates using set constructor
a = list(set(a))
print(a)
Set collection does not allow duplicates, passing a list to the set() constructor will give an iterable of totally unique objects. count() function returns an integer count when an object that is in a list is passed. With that the unique objects are counted and each count value is stored by appending to an empty list output
list() constructor is used to convert the set(a) into list and referred by the same variable a
Output
D:\MLrec\venv\Scripts\python.exe D:/MLrec/listgroup.py
[4, 4, 2, 1, 2]
[1, 2, 3, 4, 5]

Simple solution using a dictionary.
def frequency(l):
d = {}
for i in l:
if i in d.keys():
d[i] += 1
else:
d[i] = 1
for k, v in d.iteritems():
if v ==max (d.values()):
return k,d.keys()
print(frequency([10,10,10,10,20,20,20,20,40,40,50,50,30]))

#!usr/bin/python
def frq(words):
freq = {}
for w in words:
if w in freq:
freq[w] = freq.get(w)+1
else:
freq[w] =1
return freq
fp = open("poem","r")
list = fp.read()
fp.close()
input = list.split()
print input
d = frq(input)
print "frequency of input\n: "
print d
fp1 = open("output.txt","w+")
for k,v in d.items():
fp1.write(str(k)+':'+str(v)+"\n")
fp1.close()

from collections import OrderedDict
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
def get_count(lists):
dictionary = OrderedDict()
for val in lists:
dictionary.setdefault(val,[]).append(1)
return [sum(val) for val in dictionary.values()]
print(get_count(a))
>>>[4, 4, 2, 1, 2]
To remove duplicates and Maintain order:
list(dict.fromkeys(get_count(a)))
>>>[4, 2, 1]

i'm using Counter to generate a freq. dict from text file words in 1 line of code
def _fileIndex(fh):
''' create a dict using Counter of a
flat list of words (re.findall(re.compile(r"[a-zA-Z]+"), lines)) in (lines in file->for lines in fh)
'''
return Counter(
[wrd.lower() for wrdList in
[words for words in
[re.findall(re.compile(r'[a-zA-Z]+'), lines) for lines in fh]]
for wrd in wrdList])

For the record, a functional answer:
>>> L = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>> import functools
>>> >>> functools.reduce(lambda acc, e: [v+(i==e) for i, v in enumerate(acc,1)] if e<=len(acc) else acc+[0 for _ in range(e-len(acc)-1)]+[1], L, [])
[4, 4, 2, 1, 2]
It's cleaner if you count zeroes too:
>>> functools.reduce(lambda acc, e: [v+(i==e) for i, v in enumerate(acc)] if e<len(acc) else acc+[0 for _ in range(e-len(acc))]+[1], L, [])
[0, 4, 4, 2, 1, 2]
An explanation:
we start with an empty acc list;
if the next element e of L is lower than the size of acc, we just update this element: v+(i==e) means v+1 if the index i of acc is the current element e, otherwise the previous value v;
if the next element e of L is greater or equals to the size of acc, we have to expand acc to host the new 1.
The elements do not have to be sorted (itertools.groupby). You'll get weird results if you have negative numbers.

Another approach of doing this, albeit by using a heavier but powerful library - NLTK.
import nltk
fdist = nltk.FreqDist(a)
fdist.values()
fdist.most_common()

Found another way of doing this, using sets.
#ar is the list of elements
#convert ar to set to get unique elements
sock_set = set(ar)
#create dictionary of frequency of socks
sock_dict = {}
for sock in sock_set:
sock_dict[sock] = ar.count(sock)

For an unordered list you should use:
[a.count(el) for el in set(a)]
The output is
[4, 4, 2, 1, 2]

Yet another solution with another algorithm without using collections:
def countFreq(A):
n=len(A)
count=[0]*n # Create a new list initialized with '0'
for i in range(n):
count[A[i]]+= 1 # increase occurrence for value A[i]
return [x for x in count if x] # return non-zero count

num=[3,2,3,5,5,3,7,6,4,6,7,2]
print ('\nelements are:\t',num)
count_dict={}
for elements in num:
count_dict[elements]=num.count(elements)
print ('\nfrequency:\t',count_dict)

You can use the in-built function provided in python
l.count(l[i])
d=[]
for i in range(len(l)):
if l[i] not in d:
d.append(l[i])
print(l.count(l[i])
The above code automatically removes duplicates in a list and also prints the frequency of each element in original list and the list without duplicates.
Two birds for one shot ! X D

This approach can be tried if you don't want to use any library and keep it simple and short!
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
marked = []
b = [(a.count(i), marked.append(i))[0] for i in a if i not in marked]
print(b)
o/p
[4, 4, 2, 1, 2]

Various list concatenation method and their performance

I was working on an algorithm and in that, we are trying to write every line in the code such that it adds up a good performance to the final code.
In one situation we have to add lists (more than two specifically). I know some of the ways to join more than two lists also I have looked upon StackOverflow but none of the answers are giving account on the performance of the method.
Can anyone show, what are the ways we can join more than two lists and their respective performance?
Edit : The size of the list is varying from 2 to 13 (to be specific).
Edit Duplicate : I have been specifically asking for the ways we can add and their respected questions and in duplicate question its limited to only 4 methods

There are multiples ways using which you can join more than two list.
Assuming that we have three list,
a = ['1']
b = ['2']
c = ['3']
Then, for joining two or more lists in python,
1)
You can simply concatenate them,
output = a + b + c
2)
You can do it using list comprehension as well,
res_list = [y for x in [a,b,c] for y in x]
3)
You can do it using extend() as well,
a.extend(b)
a.extend(c)
print(a)
4)
You can do it by using * operator as well,
res = [*a,*b,*c]
For calculating performance, I have used timeit module present in python.
The performance of the following methods are;
4th method < 1st method < 3rd method < 2nd [method on the basis of
time]
That means If you are going to use " * operator " for concatenation of more than two lists then you will get the best performance.
Hope you got what you were looking for.
Edit:: Image showing performance of all the methods (Calculated using timeit)

I did some simple measurements, here are my results:
import timeit
from itertools import chain
a = [*range(1, 10)]
b = [*range(1, 10)]
c = [*range(1, 10)]
tests = ("""output = list(chain(a, b, c))""",
"""output = a + b + c""",
"""output = [*chain(a, b, c)]""",
"""output = a.copy();output.extend(b);output.extend(c);""",
"""output = [*a, *b, *c]""",
"""output = a.copy();output+=b;output+=c;""",
"""output = a.copy();output+=[*b, *c]""",
"""output = a.copy();output += b + c""")
results = sorted((timeit.timeit(stmt=test, number=1, globals=globals()), test) for test in tests)
for i, (t, stmt) in enumerate(results, 1):
print(f'{i}.\t{t}\t{stmt}')
Prints on my machine (AMD 2400G, Python 3.6.7):
1. 6.010000106471125e-07 output = [*a, *b, *c]
2. 7.109999842214165e-07 output = a.copy();output += b + c
3. 7.720000212430023e-07 output = a.copy();output+=b;output+=c;
4. 7.820001428626711e-07 output = a + b + c
5. 1.0520000159885967e-06 output = a.copy();output+=[*b, *c]
6. 1.4030001693754457e-06 output = a.copy();output.extend(b);output.extend(c);
7. 1.4820000160398195e-06 output = [*chain(a, b, c)]
8. 2.525000127207022e-06 output = list(chain(a, b, c))

If you are going to concatenate a variable number of lists together, your input is going to be a list of lists (or some equivalent collection). The performance tests need to take this into account because you are not going to be able to do things like list1+list2+list3.
Here are some test results (1000 repetitions):
option1 += loop 0.00097 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option2 itertools.chain 0.00138 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option3 functools.reduce 0.00174 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option4 comprehension 0.00188 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option5 extend loop 0.00127 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option6 deque 0.00180 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
This would indicate that a += loop through the list of lists is the fastest approach
And the source to produce them:
allLists = [ list(range(10)) for _ in range(5) ]
def option1():
result = allLists[0].copy()
for lst in allLists[1:]:
result += lst
return result
from itertools import chain
def option2(): return list(chain(*allLists))
from functools import reduce
def option3():
return list(reduce(lambda a,b:a+b,allLists))
def option4(): return [ e for l in allLists for e in l ]
def option5():
result = allLists[0].copy()
for lst in allLists[1:]:
result.extend(lst)
return result
from collections import deque
def option6():
result = deque()
for lst in allLists:
result.extend(lst)
return list(result)
from timeit import timeit
count = 1000
t = timeit(lambda:option1(), number = count)
print(f"option1 += loop {t:.5f}",option1()[:15])
t = timeit(lambda:option2(), number = count)
print(f"option2 itertools.chain {t:.5f}",option2()[:15])
t = timeit(lambda:option3(), number = count)
print(f"option3 functools.reduce {t:.5f}",option3()[:15])
t = timeit(lambda:option4(), number = count)
print(f"option4 comprehension {t:.5f}",option4()[:15])
t = timeit(lambda:option5(), number = count)
print(f"option5 extend loop {t:.5f}",option5()[:15])
t = timeit(lambda:option6(), number = count)
print(f"option6 deque {t:.5f}",option6()[:15])

Store the sum result of a list into another one?

I'm trying to sum the elements of a list and then store the result of each sum in another list.
So far I have this:
array1 = [1, 2, 3, 4]
print(sum(int(i) for i in array1))
The output is 10.
But what I'm trying to do is something like this:
input = [1, 2, 3, 4]
output = [1, 3, 6, 10]
Do I need to store the value into the second list in each step?

If you're running Python 3.2 or higher, there is already a function for this, itertools.accumulate:
import itertools
input = [1,2,3,4]
output = list(itertools.accumulate(input))
If you're on a pre-3.2 version of Python, you can always borrow the equivalent code given in the accumulate documentation:
def accumulate(iterable, func=operator.add):
'Return running totals'
# accumulate([1,2,3,4,5]) --> 1 3 6 10 15
# accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
it = iter(iterable)
try:
total = next(it)
except StopIteration:
return
yield total
for element in it:
total = func(total, element)
yield total

Using a list comprehension is probably not the easiest way. You could do this:
input = [1, 2, 3, 4]
output = []
total = 0
for i in input:
total += i
output.append(total)

This does what you want, but is maybe not that good to understand...
output = [sum(array1[:i+1]) for i in range(len(array1))]
You can also do it like this (which should be easier to understand):
i = 0
output = []
for item in array1:
i += item
output.append(i)

You might numpy useful if you are doing a lot if numeric operations, the cumsum method would do what you want here:
In [11]: import numpy as np
In [12]: array1 = np.array([1, 2, 3, 4])
In [13]: array1.cumsum()
Out[13]: array([ 1, 3, 6, 10])
It is easy to apply the same logic using python:
def cumsum(l):
it = iter(l)
sm = next(it, 0)
yield sm
for i in it:
sm += i
yield sm
Demo:
In [16]: array1 = [1, 2, 3, 4]
In [17]: list(cumsum(array1))
Out[17]: [1, 3, 6, 10]

Python equivalent of R "split"-function

In R, you could split a vector according to the factors of another vector:
> a <- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> b <- rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> split(a,b)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).
Is there anything handy in python like that, except from the itertools.groupby approach?

From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)
>>> a = range(1, 11)
>>> b = [0,1] * 5
>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])
Then you can use itertools.compress:
def split(x, f):
return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
If you need more general input (multiple numbers), something like the following will return an n-tuple:
def split(x, f):
count = max(f) + 1
return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )
>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])

Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.
Here's one way with itertools.
import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]
{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}
This gives you a dictionary, which is analogous to the named list that you get from R's split.

As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
from collections import defaultdict
def split(x, f):
res = defaultdict(list)
for v, k in zip(x, f):
res[k].append(v)
return res
>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})

You could try:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]
results in:
In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]
In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]
To make this generalise you can simply iterate over the unique elements in b:
splits = {}
for index in set(b):
splits[index] = [a[k] for k in (i for i,j in enumerate(b) if j == index)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert list into table - python - python

Related

Python adding a list to a slice of another list

How to efficiently count each element in a list in Python? [duplicate]

Various list concatenation method and their performance

Store the sum result of a list into another one?

Python equivalent of R "split"-function

Categories

Resources