Transform a list of ranges into a single list - python

I have a data frame that have some points to mark another dataset.
I'm creating a range from the starting mark and the stopping mark that I want to transform into a single list or numpy array.
I have the following:
list(map(lambda limits : np.arange(limits[1] - limits[0]-1, -1, -1),
zip(df_cycles['Start_point'], df_cycles['Stop_point']))
)
This is returning a list of arrays:
[array([1155, 1154, 1153, ..., 2, 1, 0]),
array([71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55,
54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38,
37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21,
20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,
3, 2, 1, 0]),
...]
How can I modify or transform the output to have a single list or NumPy array like this:
array([1155, 1154, 1153, ..., 2, 1, 0, 71, 70, 69, 68, 67, 66, 65,
64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48,
47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31,
30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14,
13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2, 1, 0,...])

Just do:
flatarray = np.concatenate(list_of_arrays)
concatenate puts together two or more arrays into a single new array; you don't to do it a single array at a time (it creates a Schlemiel the Painter's algorithm), but once you've got them all, it's an efficient way to combine them.

Related

I have problem python use library Counter? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 days ago.
Improve this question
i have problem to use library Counter in python one number
Please développer help me
from collections import Counter
serie = [5, 6, 7, 8, 10, 12, 13, 25, 27, 29, 33, 37, 39, 41, 47, 56, 59, 66, 76, 78, 1, 7, 15, 16, 21, 25, 26, 28, 30, 38, 41, 48, 51, 59, 60, 65, 68, 70, 75, 79, 3, 6, 14, 15, 17, 23,
25, 27, 33, 34, 35, 38, 46, 51, 53, 58, 63, 68, 74, 77, 7, 9, 11, 21, 26, 27, 32, 35, 38, 43, 44, 52, 53, 56, 59, 65, 66, 74, 76, 80, 3, 9, 19, 27, 28, 34, 35, 39, 47, 49, 50, 51, 53, 57, 61, 66, 67, 72, 74, 80, 2, 3, 24, 25, 28, 30, 35, 36, 51, 54, 55, 57, 61, 67, 68, 69, 70, 71, 74, 79, 3, 11, 14, 16, 19, 25, 27, 33, 35, 38, 44, 46, 48, 58, 63, 64, 65, 68, 69, 73, 7, 12, 18, 23, 24, 25, 27, 28, 47, 52, 53, 59, 65, 66, 67, 68, 69, 70, 72, 75, 1, 2, 5, 8, 9, 10, 13, 20, 25, 28, 29, 33, 39, 41, 43, 48, 49, 53, 66, 74, 1, 6, 7, 9, 15, 18, 19, 23, 25, 26, 33, 34, 42, 45, 46, 62, 65, 71, 79, 80, 2, 4, 6, 7, 11, 12, 15,
21, 23, 24, 26, 33, 34, 38, 51, 53, 67, 68, 73, 79, 1, 8, 9, 19, 20, 24, 30, 32, 35, 40,
42, 44, 47, 54, 55, 56, 60, 61, 78, 80]
# Compter le nombre d'occurrences de chaque élément dans la série
occurrences = Counter(serie)
# Trier les éléments par ordre décroissant du nombre d'occurrences
sorted_occurrences = occurrences.most_common()
# Récupérer les éléments les plus fréquents
most_common_count = sorted_occurrences[0][1]
most_common = [x[0] for x in sorted_occurrences if x[1] == most_common_count][:5]
print(most_common)
I want this code to return the five most frequent numbers while it returns
You are already doing the correct thing:
from collections import Counter
serie = [5, 6, 7, 8, 10, 12, 13, 25, 27, 29, 33, 37, 39, 41, 47, 56, 59, 66, 76, 78, 1, 7, 15, 16, 21, 25, 26, 28, 30, 38, 41, 48, 51, 59, 60, 65, 68, 70, 75, 79, 3, 6, 14, 15, 17, 23,
25, 27, 33, 34, 35, 38, 46, 51, 53, 58, 63, 68, 74, 77, 7, 9, 11, 21, 26, 27, 32, 35, 38, 43, 44, 52, 53, 56, 59, 65, 66, 74, 76, 80, 3, 9, 19, 27, 28, 34, 35, 39, 47, 49, 50, 51, 53, 57, 61, 66, 67, 72, 74, 80, 2, 3, 24, 25, 28, 30, 35, 36, 51, 54, 55, 57, 61, 67, 68, 69, 70, 71, 74, 79, 3, 11, 14, 16, 19, 25, 27, 33, 35, 38, 44, 46, 48, 58, 63, 64, 65, 68, 69, 73, 7, 12, 18, 23, 24, 25, 27, 28, 47, 52, 53, 59, 65, 66, 67, 68, 69, 70, 72, 75, 1, 2, 5, 8, 9, 10, 13, 20, 25, 28, 29, 33, 39, 41, 43, 48, 49, 53, 66, 74, 1, 6, 7, 9, 15, 18, 19, 23, 25, 26, 33, 34, 42, 45, 46, 62, 65, 71, 79, 80, 2, 4, 6, 7, 11, 12, 15,
21, 23, 24, 26, 33, 34, 38, 51, 53, 67, 68, 73, 79, 1, 8, 9, 19, 20, 24, 30, 32, 35, 40,
42, 44, 47, 54, 55, 56, 60, 61, 78, 80]
# Compter le nombre d'occurrences de chaque élément dans la série
occurrences = Counter(serie)
# Trier les éléments par ordre décroissant du nombre d'occurrences
sorted_occurrences = occurrences.most_common()
print([x[0] for x in sorted_occurrences][:5])
#output
[25, 7, 27, 33, 68]

How do you split an array into specific intervals in Num.py for Python?

The question follows a such:
x = np.arange(100)
Write Python code to split the following array at these intervals: 10, 25, 45, 75, 95
I have used the split function and unable to get at these specific intervals, can anyone enlighten me on another method or am i doing it wrongly?
Here's both the manual way and the numpy way with split.
# Manual method
x = np.arange(100)
split_indices = [10, 25, 45, 75, 95]
split_arrays = []
for i, j in zip([0]+split_indices[:-1], split_indices):
split_arrays.append(x[i:j])
print(split_arrays)
# Numpy method
split_arrays_np = np.split(x, split_indices)
print(split_arrays_np)
And the result is (for both)
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]),
array([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]),
array([45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]),
array([75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94])
]

Select values from two different dataset in python

i have a trouble when i'm dealing with my 2 dataset, i explain my problem:
I have 2 different dataset:
training_df = pd.read_csv('.../train.csv')
test_df = pd.read_csv('.../test.csv')
I have to take values from some columns from train.csv and take other columns in test.csv, i tried like this:
num_attrib = pd.DataFrame(training_df, columns=[0, 2, 3, 15, 16, 17, 18, 24, 32, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 57, 59, 60, 64, 65, 66, 67, 68, 69, 70, 71, 72])
cat_attrib = pd.DataFrame(training_df, columns=[1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 33, 37, 38, 39, 40, 51, 53, 55, 56, 58, 61, 62, 63, 73, 74])
num_attrib_test = pd.DataFrame(test_df, columns=[0, 2, 3, 15, 16, 17, 18, 24, 32, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 57, 59, 60, 64, 65, 66, 67, 68, 69, 70, 71, 72])
cat_attrib_test = pd.DataFrame(test_df, columns=[1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 33, 37, 38, 39, 40, 51, 53, 55, 56, 58, 61, 62, 63, 73, 74])
Both datasets have numerical and categorial datas. I have to select and separate categorical from numerical datas for each datasets, but my way is wrong.
I have this trouble because i have to make the Columntransformer() on training_df and test_df.
Any suggestion?
Thank you so much
You are looking for iloc. See documentation here.
num_attrib = training_df.iloc[:,[0,2,3,...,15]]
You can also slice:
#even columns
num_attrib = training_df.iloc[:, ::2]
#odd columns
num_attrib = training_df.iloc[:, 1::2]

Reading formatted array from file in Python

I have a file which contains some strings and then two formatted arrays. It looks something like this
megabuck
Hello world
[58, 50, 42, 34, 26, 18, 10, 2,
61, 53, 45, 37, 29, 21, 13, 5,
63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4]
I don't know the size of the arrays beforehand. Only thing I know is the delimiter for the array which is []. What can be an elegant way to read the arrays.
I am a newbie in python.
Using Regex. re.findall
Ex:
import re
import ast
with open(filename) as infile:
data = infile.read()
for i in re.findall(r"(\[.*?\])", data, flags=re.S):
print(ast.literal_eval(i))
Output:
[58, 50, 42, 34, 26, 18, 10, 2, 61, 53, 45, 37, 29, 21, 13, 5, 63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9, 1, 58, 50, 42, 34, 26, 18, 14, 6, 61, 53, 45, 37, 29, 21, 13, 5, 28, 20, 12, 4]
I wouldn't call it elegant but it works
ars = """
megabuck
Hello world
[58, 50, 42, 34, 26, 18, 10, 2,
61, 53, 45, 37, 29, 21, 13, 5,
63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4]
"""
arrays = []
for a in ars.split("["):
if ']' in a:
arrays.append([i.strip() for i in a.replace("]",'').split(',')])

Indexing a numpy array using a numpy array of slices

(Edit: I wrote a solution basing on hpaulj's answer, see code at the bottom of this post)
I wrote a function that subdivides an n-dimensional array into smaller ones such that each of the subdivisions has max_chunk_size elements in total.
Since I need to subdivide many arrays of same shapes and then perform operations on the corresponding chunks, it doesn't actually operate on the data rather than creates an array of "indexers", i. e. an array of (slice(x1, x2), slice(y1, y2), ...) objects (see the code below). With these indexers I can retrieve subdivisions by calling the_array[indexer[i]] (see examples below).
Also, the array of these indexers has same number of dimensions as input and divisions are aligned along corresponding axes, i. e. blocks the_array[indexer[i,j,k]] and the_array[indexer[i+1,j,k]] are adjusent along the 0-axis, etc.
I was expecting that I should also be able to concatenate these blocks by calling the_array[indexer[i:i+2,j,k]] and that the_array[indexer] would return just the_array, however such calls result in an error:
IndexError: arrays used as indices must be of integer (or boolean)
type
Is there a simple way around this error?
Here's the code:
import numpy as np
import itertools
def subdivide(shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
for i, el in enumerate(itertools.product(*slices)): result[i] = el
result.shape = np.ceil(shape / slice_shape).astype(int)
return result
Here's an example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
(slice(0, 2, None), slice(6, 12, None)),
(slice(0, 2, None), slice(12, 15, None))],
[(slice(2, 4, None), slice(0, 6, None)),
(slice(2, 4, None), slice(6, 12, None)),
(slice(2, 4, None), slice(12, 15, None))],
[(slice(4, 6, None), slice(0, 6, None)),
(slice(4, 6, None), slice(6, 12, None)),
(slice(4, 6, None), slice(12, 15, None))]], dtype=object)
>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
[45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
[27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
[81, 82, 83, 84, 85, 86]])
>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type
Here's a solution based on hpaulj's answer:
import numpy as np
import itertools
class Subdivision():
def __init__(self, shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
self.slices = \
np.array(list(itertools.product(*slices)), \
dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))
def __getitem__(self, args):
if type(args) != tuple: args = (args,)
# turn integer index into equivalent slice
args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)
# select the slices
# always select all elements from the last axis (which contains slices for each data dimension)
slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]
return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
[0] * (len(slices.shape) - 2 - i) + [i])])] \
for i in range(len(slices.shape) - 1)))
Example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[0]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> ar[subdiv[:2,1]]
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26],
[36, 37, 38, 39, 40, 41],
[51, 52, 53, 54, 55, 56]])
>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[...,:2]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])
Your slices produce 2x6 and 2x3 arrays.
In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)
In [38]: ar[tuple(subslice[0,0])]
Out[38]:
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26]])
My numpy version expects me to turn the subslice into a tuple. This is the same as
ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]
That's just the basic syntax of indexing and slicing. ar is 2d, so ar[(i,j)] requires a 2 element tuple - of slices, lists, arrays, or integers. It won't work with an array of slice objects.
How ever it is possible to concatenate the results into a larger array. That can be done after indexing or the slices can be converted into indexing lists.
np.bmat for example concatenates together a 2d arangement of arrays:
In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]],
[ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]:
matrix([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[51, 52, 53, 54, 55, 56, 57, 58, 59]])
You could generalize this. It just uses hstack and vstack on the nested lists. The result is np.matrix but can be converted back to array.
The other approach is to use tools like np.arange, np.r_, np.xi_ to create index arrays. It'll take some playing around to generate an example.
To combine the [0,0] and [0,1] subslices:
In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]
In [66]: i,j
Out[66]: (array([0, 1]), array([ 6, 7, 8, 9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]:
(array([[0],
[1]]), array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14]]))
In [70]: ar[ix]
Out[70]:
array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
Or with i = np.r_[subslice[0,0,0], subslice[1,0,0]], ar[np.ix_(i,j)] produces the 4x9 array.

Categories