Problems regarding conversion of List to Array in Python - python

What I want to do:
I want to create an array and add each Item from a List to the array. This is what I have so far:
count = 0
arr = []
with open(path,encoding='utf-8-sig') as f:
data = f.readlines() #Data is the List
for s in data:
arr[count] = s
count+=1
What am I doing wrong? The Error I get is IndexError: list assignment index out of range

When you try to access arr at index 0, there is not anything there. What you are trying to do is add to it. You should do arr.append(s)

Your arr is an empty array. So, arr[count] = s is giving that error.
Either you initialize your array with empty elements, or use the append method of array. Since you do not know how many elements you will be entering into the array, it is better to use the append method in this case.
for s in data:
arr.append(s)
count+=1

It's worth taking a step back and asking what you're trying to do here.
f is already an iterable of lines: something you can loop over with for line in f:. But it's "lazy"—once you loop over it once, it's gone. And it's not a sequence—you can loop over it, but you can't randomly access it with indexes or slices like f[20] or f[-10:].
f.readlines() copies that into a list of lines: something you can loop over, and index. While files have the readlines method for this, it isn't really necessary—you can convert any iterable to a list just like this by just calling list(f).
Your loop appears to be an attempt to create another list of the same lines. Which you could do with just list(data). Although it's not clear why you need another list in the first place.
Also, the term "array" betrays some possible confusion.
A Python list is a dynamic array, which can be indexed and modified, but can also be resized by appending, inserting, and deleting elements. So, technically, arr is an array.
But usually when people talk about "arrays" in Python, they mean fixed-size arrays, usually of fixed-size objects, like those provided by the stdlib array module, the third-party numpy library, or special-purpose types like the builtin bytearray.
In general, to convert a list or other iterable into any of these is the same as converting into a list: just call the constructor. For example, if you have a list of numbers between 0-255, you can do bytearray(lst) to get a bytearray of the same numbers. Or, if you have a list of lists of float values, np.array(lst) will give you a 2D numpy array of floats. And so on.
So, why doesn't your code work?
When you write arr = [], you're creating a list of 0 elements.
When you write arr[count] = s, you're trying to set the countth element in the list to s. But there is no countth element. You're writing past the end of the list.
One option is to call arr.append(s) instead. This makes the list 1 element longer than it used to be, and puts s in the new slot.
Another option is to create a list of the right size in the first place, like arr = [None for _ in data]. Then, arr[count] = s can replace the None in the countth slot with s.
But if you really just want a copy of data in another list, you're better off just using arr = list(data), or arr = data[:].
And if you don't have any need for another copy, just do arr = data, or just use data as-is—or even, if it works for your needs, just use f in the first place.

Seems like you are coming from matlab or R background. when you do arr=[], it creates an empty list, its not an array.
import numpy
count = 0
with open(path,encoding='utf-8-sig') as f:
data = f.readlines() #Data is the List
size = len(data)
array = numpy.zeros((size,1))
for s in data:
arr[count,0] = s
count+=1

Related

How to remove numbers in an array if it exists in another another

Here is my code so far. (Using NumPy for arrays)
avail_nums = np.array([1,2,3,4,5,6,7,8,9]) # initial available numbers
# print(avail_nums.shape[0])
# print(sudoku[spaces[x,1],spaces[x,2]]) # index of missing numbers in sudoku
print('\n')
# print(sudoku[spaces[x,1],:]) # rows of missing numbers
for i in range(sudoku[spaces[x,1],:].shape[0]): # Number of elements in the missing number row
for j in range(avail_nums.shape[0]): # Number of available numbers
if(sudoku[spaces[x,1],i] == avail_nums[j]):
avail_nums= np.delete(avail_nums,[j])
print(avail_nums)
A for loop cycles through all the elements in the 'sudoku row' and nested inside, another loop cycles through avail_nums. Every time there is a match (given by the if statement), that value is to be deleted from the avail_nums array until finally all the numbers in 'sudoku row' aren't in avail_nums.
I'm greeted with this error:
IndexError: index 8 is out of bounds for axis 0 with size 8
pointing to the line with the if statement.
Because avail_nums is shrinking, after the first deletion this happens. How can I resolve this issue?
When you are deleting items from the array, the array is getting smaller but your for loop does not know that because it is iterating over the original size of the array. So you are getting an out of bound error. So I would avoid using the for loop and deleting from the array I am iterating over.
My solution is to use a temporary array that contains allowed elements and then assign it to the original array name
temporary_array=list()
for element in array:
If element in another_array: # you can do this in Python
continue # ignore it
temporary_array.append(element)
array=temporary_array
the resulting array will have only the elements that do not exist in the another_array
You could also use list comprehension:
temporary_array = [ element for element in array if element not in another_array ]
array = temporary_array
Which is the same concept using fancy python syntax
Another option would be to use the builtin filter() which takes a filter function and an array and returns the filtered array. In the following I am using the lambda function notation, which is another nice Python syntax:
array = filter(lambda x: x not in another_array, array)
Since you are using numpy you should look for the numpy.extract() method here https://numpy.org/doc/stable/reference/generated/numpy.extract.html... for example using, numpy.where(), numpy.in1d() and numpy.extract() we could:
condition = numpy.where(numpy.in1d(np_array, np_another_array),False,True)
np_array = numpy.extract(condition, np_array)

How do I append ndarray to a list and access the each stored ndarray from the list?

I'm trying to create a list that store all ndarrays generated from my for loop:
for index in range(len(fishim)):
cropped_fishim = []
cropped_image = crop_img(fishim[index], labeled)#call function here.
cropped_fishim.append(cropped_image)
Then I want to use cropped_fishim[index] to access the each stored ndarray for further process. I have also tried to use extend instead of append method. The append method packed all ndarray as one array and does not allow me to access each individual ndarray stored in cropped_fishim. The extend method does store ndarray separately, but cropped_fishim[index] would only access the indexth col array. Any help would be appreciated.
append is correct; your problem is in the line above it:
for index in range(len(fishim)):
cropped_fishim = []
cropped_image = crop_img(fishim[index], labeled)#call function here.
cropped_fishim.append(cropped_image)
Each time through the loop, you reset the variable to [], then append the new image array to that empty list.
So, at the end of the loop, you have a list containing just one thing, the last image array.
To fix that, just move the assignment before the loop, so you only do it once instead of over and over:
cropped_fishim = []
for index in range(len(fishim)):
cropped_image = crop_img(fishim[index], labeled)#call function here.
cropped_fishim.append(cropped_image)
However, once you’ve got this working, you can simplify it.
You almost never need—or want—to loop over range(len(something)) in Python; you can just loop over something:
cropped_fishim = []
for fishy in fishim:
cropped_image = crop_img(fishy, labeled)#call function here.
cropped_fishim.append(cropped_image)
And then, once you’ve done that, this is exactly he pattern of a list comprehension, so you can optionally collapse it into one line:
cropped_fishim = [crop_img(fishy, labeled) for fishy in fishim]
Problem solved. Thanks!
easy trick learned:
cropped_fishim = [None]*len(fishim)
for index in range(len(fishim)):
cropped_image = crop_img(fishim[index], labeled)#call function here.
cropped_fishim[index] = cropped_image

Efficient way to make numpy object arrays intern strings

Consider numpy arrays of the object dtype. I can shove anything I want in there.
A common use case for me is to put strings in them. However, for very large arrays, this may use up a lot of memory, depending on how the array is constructed. For example, if you assign a long string (e.g. "1234567890123456789012345678901234567890") to a variable, and then assign that variable to each element in the array, everything is fine:
arr = np.zeros((100000,), dtype=object)
arr[:] = "1234567890123456789012345678901234567890"
The interpreter now has one large string in memory, and an array full of pointers to this one object.
However, we can also do it wrong:
arr2 = np.zeros((100000,), dtype=object)
for idx in range(100000):
arr2[idx] = str(1234567890123456789012345678901234567890)
Now, the interpreter has a hundred thousand copies of my long string in memory. Not so great.
(Naturally, in the above example, the generation of a new string each time is stunted - in real life, imagine reading a string from each line in a file.)
What I want to do is, instead of assigning each element to the string, first check if it's already in the array, and if it is, use the same object as the previous entry, rather than the new object.
Something like:
arr = np.zeros((100000,), dtype=object)
seen = []
for idx, string in enumerate(file): # Length of file is exactly 100000
if string in seen:
arr[idx] = seen[seen.index(string)]
else:
arr[idx] = string
seen.append(string)
(Apologies for not posting fully running code. Hopefully you get the idea.)
Unfortunately this requires a large number of superfluous operations on the seen list. I can't figure out how to make it work with sets either.
Suggestions?
Here's one way to do it, using a dictionary whose values are equal to its keys:
seen = {}
for idx, string in enumerate(file):
arr[idx] = seen.setdefault(string, string)

List of Lists to 2D Array in Python

I have a list of lists in Python that holds a mix of values, some are strings and some are tuples.
data = [[0,1,2],["a", "b", "c"]]
I am wondering if there is a way to easily convert any length list like that to a 2D Array without using Numpy. I am working with System.Array because that's the format required.
I understand that I can create a new instance of an Array and then use for loops to write all data from list to it. I was just curious if there is a nice Pythonic way of doing that.
x = len(data)
y = len(data[0])
arr = Array.CreateInstance(object, x, y)
Then I can loop through my data and set the arr values right?
arr = Array.CreateInstance(object, x, y)
for i in range(0, len(data),1):
for j in range(0,len(data[0]), 1):
arr.SetValue(data[i][j], i,j)
I want to avoid looping like that if possible. Thank you,
Ps. This is for Excel Interop where I can set a whole Range in Excel by setting it to be equal to an Array. That's why I want to convert a list to an Array. Thank you,
Thing that I am wondering about is that Array is a typed object, is it possible to set its constituents to either string or integer? I think i might be constrained to only one. Right? If so, is there any other type of data that I can use?
Is setting it to Arrayobject ensures that I can combine str/int inside of it?
Also I though I could use this:
arr= Array[Array[object]](map(object, data))
but it throws an error. Any ideas?
You can use Array.CreateInstance to create a single, or multidimensional, array. Since the Array.CreateInstance method takes in a "Type" you specify any type you want. For example:
// gives you an array of string
myArrayOfString = Array.CreateInstance(String, 3)
// gives you an array of integer
myArrayOfInteger = Array.CreateInstance(Int32, 3)
// gives you a multidimensional array of strings and integer
myArrayOfStringAndInteger = [myArrayOfString, myArrayOfInteger]
Hope this helps. Also see the msdn website for examples of how to use Array.CreateInstance.

Appending arrays in numpy

I have a loop that reads through a file until the end is reached. On each pass through the loop, I extract a 1D numpy array. I want to append this array to another numpy array in the 2D direction. That is, I might read in something of the form
x = [1,2,3]
and I want to append it to something of the form
z = [[0,0,0],
[1,1,1]]
I know I can simply do z = numpy.append([z],[x],axis = 0) and achieve my desired result of
z = [[0,0,0],
[1,1,1],
[1,2,3]]
My issue comes from the fact that in the first run through the loop, I don't have anything to append to yet because first array read in is the first row of the 2D array. I dont want to have to write an if statement to handle the first case because that is ugly. If I were working with lists I could simply do z = [] before the loop and every time I read in an array, simply do z.append(x) to achieve my desired result. However I can find no way doing a similar procedure in numpy. I can create an empty numpy array, but then I can't append to it in the way I want. Can anyone help? Am I making any sense?
EDIT:
After some more research, I found another workaround that does technically do what I want although I think I will go with the solution given by #Roger Fan given that numpy appending is very slow. I'm posting it here just so its out there.
I can still define z = [] at the beginning of the loop. Then append my arrays with `np.append(z, x). This will ultimately give me something like
z = [0,0,0,1,1,1,1,2,3]
Then, because all the arrays I read in are of the same size, after the loop I can simply resize with `np.resize(n, m)' and get what I'm after.
Don't do it. Read the whole file into one array, using for example numpy.genfromtext().
With this one array, you can then loop over the rows, loop over the columns, and perform other operations using slices.
Alternatively, you can create a regular list, append a lot of arrays to that list, and in the end generate your desired array from the list using either numpy.array(list_of_arrays) or, for more control, numpy.vstack(list_of_arrays).
The idea in this second approach is "delayed array creation": find and organize your data first, and then create the desired array once, already in its final form.
As #heltonbiker mentioned in his answer, something like np.genfromtext is going to be the best way to do this if it fits your needs. Otherwise, I suggest reading the answers to this question about appending to numpy arrays. Basically, numpy array appending is extremely slow and should be avoided whenever possible. There are two much better (and faster by about 20x) solutions:
If you know the length in advance, you can preallocate your array and assign to it.
length_of_file = 5000
results = np.empty(length_of_file)
with open('myfile.txt', 'r') as f:
for i, line in enumerate(f):
results[i] = processing_func(line)
Otherwise, just keep a list of lists or list of arrays and convert it to a numpy array all at once.
results = []
with open('myfile.txt', 'r') as f:
for line in f:
results.append(processing_func(line))
results = np.array(results)

Categories