Finding the dimension of a nested list in python - python

We can create multi-dimensional arrays in python by using nested list, such as:
A = [[1,2,3],
[2,1,3]]
etc.
In this case, it is simple nRows= len(A) and nCols=len(A[0]). However, when I have more than three dimensions it would become complicated.
A = [[[1,1,[1,2,3,4]],2,[3,[2,[3,4]]]],
[2,1,3]]
etc.
These lists are legal in Python. And the number of dimensions is not a priori.
In this case, how to determine the number of dimensions and the number of elements in each dimension.
I'm looking for an algorithm and if possible implementation. I believe it has something similar to DFS. Any suggestions?
P.S.: I'm not looking for any existing packages, though I would like to know about them.

I believe to have solve the problem my self.
It is just a simple DFS.
For the example given above: A = [[[1,1,[1,2,3,4]],2,[3,[2,[3,4]]]],
[2,1,3]]
the answer is as follows:
[[3, 2, 2, 2, 3, 4], [3]]
The total number of dimensions is the 7.
I guess I was overthinking... thanks anyway...!

Related

2d Array column slicing in Pure Python without for loops

Is it possible to slice a column off a 2d array in pure Python without a for loop or list comprehension? Say for instance you have a 4x4 array of ints:
grid = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]
and let's say you'd like to return the grid without the first row and the last column [[5,6,7],[9,10,11],[13,14,15]]
Is there a slicing syntax that allows you to do this? Excluding the first row is easily achieved with grid = grid[1:4]
However doing something like grid = grid[1:4][0:2] seems like it should work but results in [[5, 6, 7, 8], [9, 10, 11, 12]]. If at all possible, I'd like to avoid having to iterate through it in a for loop/list comprehension. I know that would work, but I'm wondering if there's a more elegant syntax.
To ddejohn's point, this can't be done with just slicing notation. This doesn't use slicing notation but a good answer that doesn't use list comps or for loops is list(zip(*matrix)) if matrix is the input list.

save different array with different length in 1D array

I have arrays with different length and I want to save them inside 1D array using python,
a new array is generated after some tests this is why I have different sizes of arrays,
here is a smple of what I have:
array1=[1,3,5]
array2=[10,12,13,14]
array3=[12,14,14,15,15] #etc
The desired result:
myArray=[[1,3,5],[10,12,13,14],[12,14,14,15,15]]
I tried to use this code
myArray=[]
myArray.append(array1)
myArray.append(array2) #etc
when I print myArray I get:
[[array([1,3,5])], [array([10,12,13,14])], [array([12,14,14,15,15])]]
so when I try to get the second array, for example, I have to use this code
temp = myArray[1]
result = temp[0]
this was working for me but it looks like it has a limitation and it stopped working after a while when I'm retrieving results using some loops.
The currently accepted answer makes little sense, so here's what's actually going on: array_1, array_2, etc. are not plain Python lists, they're almost certainly NumPy arrays. my_array, however, is just a Python list.
Here is a simple program which should allow you to reproduce and understand the difference, at least in how it relates to your program:
import numpy as np
plain_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])
result_list = [plain_list, numpy_array]
print(plain_list) # [1, 2, 3]
print(numpy_array) # [1 2 3]
print(result_list) # [[1, 2, 3], array([1, 2, 3])]
Now, it isn't exactly clear what's happening to your program, since you just write this was working for me but it looks like it has a limitation and it stopped working after a while when I'm retrieving results using some loops.
Depending on what the rest of the program is doing, numpy arrays may or may not be the appropriate data structure. In any case, please share the entirety of your code as well as an explanation of the program.
First thing first there is no array data structure in python.
Instead List and tuples are used.
In your case variable array1, array2 & array3 are lists.
array1=[1,3,5]
array2=[10,12,13,14]
array3=[12,14,14,15,15]
# to get the desired result as myArray=[[1,3,5],[10,12,13,14],[12,14,14,15,15]]
myArray = [array1, array2, array3]
Check python documentation to know more about lists

How to combine two arrays

I want to combine 2 arrays to get an array composed of the common values in the arrays. For example:
x = np.array ([1,2,3,4,6,11])
y = np.array ([3,6,5,2,9,8])
The result should be z = [2, 3, 6] which are the values that are common to both.
You're looking for the function np.intersect1d(x,y).
Edit: also, keep this readily accessible: https://docs.scipy.org/doc/numpy-1.17.0/numpy-ref-1.17.0.pdf I can't tell you how much I just pop over to it regularly for those kinds of weird one-off functions.

maintaining hierarchically sorted lists in python

I'm not sure if 'hierarchical' is the correct way to label this problem, but I have a series of lists of integers that I'm intending to keep in 2D numpy array that I need to keep sorted in the following way:
array[0,:] = [1, 1, 1, 1, 2, 2, 2, 2, ...]
array[1,:] = [1, 1, 2, 2, 1, 1, 2, 2, ...]
array[2,:] = [1, 2, 1, 2, 1, 2, 1, 2, ...]
...
...
array[n,:] = [...]
So the first list is sorted, then the second list is broken into subsections of elements which all have the same value in the first list and those subsections are sorted, and so on down all the lists.
Initially each list will contain only one integer, and I'll then receive new columns that I need to insert into the array in such a way that it remains sorted as discussed above.
The purpose of keeping the lists in this order is that if I'm given a new column of integers I need to check whether an exact copy of that column exists in the array or not as efficiently as possible, and I assume this ordering will help me do it. It may be that there is a better way to make that check than keeping the lists like this - if you have thoughts about that please mention them!
I assume the correct position for a new column can be found by a series of binary searches but my attempts have been messy - any thoughts on doing this in a tidy and efficient way?
thanks!
If I understand your problem correctly, you have a bunch of sequences of numbers that you need to process, but you need to be able to tell if the latest one is a duplicate of one of the sequences you've processed before. Currently you're trying to insert the new sequences as columns in a numpy array, but that's awkward since numpy is really best with fixed-sized arrays (concatenating or inserting things is always going to be slow).
A much better data structure for your needs is a set. Membership tests and the addition of new items on a set are both very fast (amortized O(1) time complexity). The only limitation is that a set's items must be hashable (which is true for tuples, but not for lists or numpy arrays).
Here's the outline of some code you might be able to use:
seen = set()
for seq in sequences:
tup = tuple(sequence) # you only need to make a tuple if seq is not already hashable
if tup not in seen:
seen.add(tup)
# do whatever you want with seq here, it has not been seen before
else:
pass # if you want to do something with duplicated sequences, do it here
You can also look at the unique_everseen recipe in the itertools documentation, which does basically the same as the above, but as a well-optimized generator function.

Time-varying data: list of tuples vs 2D array?

My example code is in python but I'm asking about the general principle.
If I have a set of data in time-value pairs, should I store these as a 2D array or as a list of tuples? for instance, if I have this data:
v=[1,4,4,4,23,4]
t=[1,2,3,4,5,6]
Is it generally better to store it like this:
data=[v,t]
or as a list of tuples:
data=[(1,1),(4,2)(4,3)...]
Is there a "standard" way of doing this?
If speed is your biggest concern, in Python, look at Numpy.
In general, you should choose choose a data structure that makes dealing with the data natural and easy. Worry about speed later, after you know it works!
As for an easy data structure, how about an list of tuples:
v=[1,4,4,4,23,4]
t=[1,2,3,4,5,6]
data=[(1,1),(4,2)(4,3)...]
Then you can unpack like so:
v,t=data[1]
#v,t are 4,2
The aggregate array container is probably the best choice. Assuming that your time points are not regularly spaced (and therefore you need to keep track of it rather than just use the indexing), this allows you to take slices of your entire data set like:
import numpy as np
v=[1,4,4,4,23,4]
t=[1,2,3,4,5,6]
data = np.array([v,t])
Then you could slice it to get a subset of the data easily:
data[:,2:4] #array([[4, 4],[3, 4]])
ii = [1,2,5] # Fancy indexing
data[:,ii] # array([[4, 4, 4],
# [2, 3, 6]])
You could try a dictionary? In other languages this may be known as a hash-map, hash-table, associative array, or some other term which means the same thing. Of course it depends on how you intend to access your data.
Instead of:
v=[1,4,4,4,23,4]
t=[1,2,3,4,5,6]
you'd have:
v_with_t_as_key = {1:1, # excuse the name...
2:4,
3:4,
4:4,
5:23,
6:4}
This is a fairly standard construct in python, although if order is important you might want to look at the ordered dictionary in collections.
I've found that for exploring and prototyping, it's more convenient to store as a list/jagged array of columns, where the first column is the observational index and each column after that is a variable.
data=[(1,2,3,4,5,6),(1,4,4,4,23,4)]
Most of the time i'm loading many observations with many variables, and then performing sorting, formatting, or displaying one or more of those variables, or even joining two sets of data with columns as parameters. It's a lot rarer when I need to pull a subset of observations out. Even if I did, it's more convenient to use a function that returns a subset of the data given a column of observation indexes.
Having said that, I still use functions to convert jagged arrays to 2d arrays and to transpose 2d arrays.

Categories