Memoryerror with too big list - python

I'm writing script in python, and now I have to create pretty big list exactly containing 248956422 integers. The point is, that some of this "0" in this table will be changed for 1,2 or 3, cause I have 8 lists, 4 with beginning positions of genes, and 4 with endings of them.
The point is i have to iterate "anno" several time cause numbers replacing 0 can change with other iteration.
"Anno" has to be written to the file to create annotation file.
Here's my question, how can I divide, or do it on-the-fly , not to get memoryerror including replacing "0" for others, and 1,2,3s for others.
Mabye rewriting the file? I'm waitin for your advice, please ask me if it is not so clear what i wrote :P .
whole_st_gen = [] #to make these lists more clear for example
whole_end_gen = [] # whole_st_gen has element "177"
whole_st_ex = [] # and whole_end_gen has "200" so from position 177to200
whole_end_ex = [] # i need to put "1"
whole_st_mr = [] # of course these list can have even 1kk+ elements
whole_end_mr = [] # note that every st/end of same kind have equal length
whole_st_nc = []
whole_end_nc = [] #these lists are including some values of course
length = 248956422
anno = ['0' for i in range(0,length)] # here i get the memoryerror
#then i wanted to do something like..
for j in range(0, len(whole_st_gen)):
for y in range(whole_st_gen[j],whole_end_gen[j]):
anno[y]='1'

You might be better of by determine the value of each element in anno on the fly:
def anno():
for idx in xrange(248956422):
elm = "0"
for j in range(0, len(whole_st_gen)):
if whole_st_gen[j] <= idx < whole_end_gen[j]:
elm = "1"
for j in range(0, len(whole_st_ex)):
if whole_st_ex[j] <= idx < whole_end_ex[j]:
elm = "2"
for j in range(0, len(whole_st_mr)):
if whole_st_mr[j] <= idx < whole_end_mr[j]:
elm = "3"
for j in range(0, len(whole_st_nc)):
if whole_st_nc[j] <= idx < whole_end_nc[j]:
elm = "4"
yield elm
Then you just iterate using for elm in anno().
I got an edit proposal from the OP suggesting one function for each of whole_*_gen, whole_st_ex and so on, something like this:
def anno_st():
for idx in xrange(248956422):
elm = "0"
for j in range(0, len(whole_st_gen)):
if whole_st_ex[j] <= idx <= whole_end_ex[j]:
elm = "2"
yield elm
That's of course doable, but it will only result in the changes from whole_*_ex applied and one would need to combine them afterwards when writing to file which may be a bit awkward:
for a, b, c, d in zip(anno_st(), anno_ex(), anno_mr(), anno_nc()):
if d != "0":
write_to_file(d)
elif c != "0":
write_to_file(c)
elif b != "0":
write_to_file(b)
else:
write_to_file(a)
However if you only want to apply some of the change sets you could write a function that takes them as parameters:
def anno(*args):
for idx in xrange(248956422):
elm = "0"
for st, end, tag in args:
for j in range(0, len(st)):
if st <= idx < end[j]:
elm = tag
yield tag
And then call by supplying the lists (for example with only the two first changes):
for tag in anno((whole_st_gen, whole_end_gen, "1"),
(whole_st_ex, whole_end_ex, "2")):
write_to_file(tag)

You could use a bytearray object to have a much more compact memory representation than a list of integers:
anno = bytearray(b'\0' * 248956422)
print(anno[0]) # → 0
anno[0] = 2
print(anno[0]) # → 2
print(anno.__sizeof__()) # → 248956447 (on my computer)

Instead of creating a list using list comprehension I suggest to create an iterator using a generator-expression which produce the numbers on demand instead of saving all of them in memory.Also you don't need to use the i in your loop since it's just a throw away variable which you don't use it.
anno = ('0' for _ in range(0,length)) # In python 2.X use xrange() instead of range()
But note that and iterator is a one shot iterable and you can not use it after iterating over it one time.If you want to use it for multiple times you can create N independent iterators from it using itertools.tee().
Also note that you can not change it in-place if you want to change some elements based on a condition you can create a new iterator by iterating over your iterator and applying the condition using a generator expression.
For example :
new_anno =("""do something with i""" for i in anno if #some condition)

Related

How to handle operating on items in a list without perfectly even len()?

I'm trying to operate on every 5 items in a list, but can't figure out how to handle the remaining items if they don't divide evenly into 5. Right now I'm using modulo, but I can't shake the feeling it's not quite the right answer. Here's an example...
list = ["ValA","ValB","ValC","ValD","ValE","ValF","ValG","ValH","ValI","ValJ","ValK","ValL","ValM","ValN",]
newlist = []
i = 0
for o in list:
i += 1
newlist.append(o)
if i % 5 == 0:
for obj in newlist:
function_for(obj)
newlist.clear()
This code will execute function_for() twice, but not a third time to handle the remaining 4 values. If I add an 'else' statement it runs on every execution.
What's the correct way to handle a situation like this?
This way is pretty easy, if you don't mind modifying the list:
mylist = ["ValA","ValB","ValC","ValD","ValE","ValF","ValG","ValH","ValI","ValJ","ValK","ValL","ValM","ValN",]
while mylist:
function_for( mylist[:5] )
mylist = mylist[5:]
You can also check if the index is equal to the length of the list. (Additionally, it is more idiomatic to use enumerate instead of a counter variable here.)
lst = ["ValA","ValB","ValC","ValD","ValE","ValF","ValG","ValH","ValI","ValJ","ValK","ValL","ValM","ValN",]
newlist = []
for i, o in enumerate(lst, 1):
newlist.append(o)
if i % 5 == 0 or i == len(lst):
print(newlist)
newlist.clear()

If the input number is in the list add its index to a new one

I want to check if the input number is in the list, and if so - add its index in the original list to the new one. If it's not in the list - I want to add a -1.
I tried using the for loop and adding it like that, but it is kind of bad on the speed of the program.
n = int(input())
k = [int(x) for x in input().split()]
z = []
m = int(input())
for i in range(m):
a = int(input())
if a in k: z.append(k.index(a))
else: z.append(-1)
The input should look like this :
3
2 1 3
1
8
3
And the output should be :
1
-1
2
How can I do what I'm trying to do more efficiently/quickly
There are many approaches to this problem. This is typical when you're first starting in programming as, the simpler the problem, the more options you have. Choosing which option depends what you have and what you want.
In this case we're expecting user input of this form:
3
2 1 3
1
8
3
One approach is to generate a dict to use for lookups instead of using list operations. Lookup in dict will give you better performance overall. You can use enumerate to give me both the index i and the value x from the list from user input. Then use int(x) as the key and associate it to the index.
The key should always be the data you have, and the value should always be the data you want. (We have a value, we want the index)
n = int(input())
k = {}
for i, x in enumerate(input().split()):
k[int(x)] = i
z = []
for i in range(n):
a = int(input())
if a in k:
z.append(k[a])
else:
z.append(-1)
print(z)
k looks like:
{2: 0, 1: 1, 3: 2}
This way you can call k[3] and it will give you 2 in O(1) or constant time.
(See. Python: List vs Dict for look up table)
There is a structure known as defaultdict which allows you to specify behaviour when a key is not present in the dictionary. This is particularly helpful in this case, as we can just request from the defaultdict and it will return the desired value either way.
from collections import defaultdict
n = int(input())
k = defaultdict(lambda: -1)
for i, x in enumerate(input().split()):
k[int(x)] = i
z = []
for i in range(n):
a = int(input())
z.append(k[a])
print(z)
While this does not speed up your program, it does make your second for loop easier to read. It also makes it easier to move into the comprehension in the next section.
(See. How does collections.defaultdict work?
With these things in place, we can use, yes, list comprehension, to very minimally speed up the construction of z and k. (See. Are list-comprehensions and functional functions faster than “for loops”?
from collections import defaultdict
n = int(input())
k = defaultdict(lambda: -1)
for i, x in enumerate(input().split()):
k[int(x)] = i
z = [k[int(input())] for i in range(n)]
print(z)
All code snippets print z as a list:
[1, -1, 2]
See Printing list elements on separated lines in Python if you'd like different print outs.
Note: The index function will find the index of the first occurrence of the value in a list. Because of the way the dict is built, the index of the last occurrence will be stored in k. If you need to mimic index exactly you should ensure that a later index does not overwrite a previous one.
for i, x in enumerate(input().split()):
x = int(x)
if x not in k:
k[x] = i
Adapt this solution for your problem.
def test(list1,value):
try:
return list1.index(value)
except ValueError as e:
print(e)
return -1
list1=[2, 1, 3]
in1 = [1,8,3]
res= [test(list1,i) for i in in1]
print(res)
output
8 is not in list
[1, -1, 2]

return a new list that interleaves the two lists but with a twist

def back_interleave(first, second):
if first == [] and second == []:
return []
elif first == []:
return second[::-1]
elif second == []:
return first[::-1]
else:
newlist = []
for i in range(len(first)-1, 0,-1):
newlist.append(first[i])
newlist.append(second[i])
for j in range(len(second)-len(first)-1,0,-1):
newlist.append(second[i])
return newlist
can anybody tells me what's wrong with my code towards this question.
I'm not exactly sure what's wrong with your code, but the second and third if-statements appear to use built-in list reversing functionality which the original problem forbids.
What I would do is determine the length of the longer list, then iterate through both lists backwards.
def back_interleave(first, second):
newlist = []
# You want to iterate through the length of the longer list
length = max(len(first), len(second))
for x in range(length):
# start appending elements from the back of the list
index = -1*(x+1)
if x < len(first):
newlist.append(first[index])
if x < len(second):
newlist.append(second[index])
return newlist
The problem in your code is when you use the range function, the stop value is exclusive i.e., the index 0 is becoming exclusive in your case. And also in the j loop the values at index i are being stored instead of the values at index j.
#CyanideTesla has given the code that works pretty well for your problem

Variable not auto incrementing Python

I have to classify this list (Lista variable) according to their data type, and the code works as far as I pick a specific [] to classify but the i inside the for and if loops is not augmenting after each loop.
__author__ = 'rodrigocano'
Lista = [55.5,'hola','abc',10,'5','x5',0.25,['A',2,1.5],5,2,5.3,'AEIOU',('perro','gato','pollo'),[1,2,3],1001,['a',1],'mundo','01/10/2015',20080633,'2.5',0.123,(1,2,'A','B')]
lista_clasificable = len(Lista)
def clasificar(lista_clasificable):
for Lista in range(0,len(lista_clasificable)):
i = 0
lista_string = []
lista_int = []
lista_float =[]
lista_tuple = []
lista_list = [] #duh
if type(lista_clasificable[i]) is str:
lista_string.append(lista_clasificable[i])
i += 1
elif type(lista_clasificable[i]) is int:
lista_int.append(lista_clasificable[i])
i += 1
elif type(lista_clasificable[i]) is float:
lista_float.append(lista_clasificable[i])
i += 1
elif type(lista_clasificable[i]) is list :
lista_list.append(lista_clasificable[i])
i += 1
elif type(lista_clasificable[i]) is tuple:
lista_tuple.append(lista_clasificable[i])
i += 1
return 'Su Lista contiente los siguientes ints',lista_int,'Sus Strings son:',lista_string,'Sus floats son:',lista_float,'Sus listas son:',lista_list,'Y Sus tuplas son:',lista_tuple
print(clasificar(Lista))
You're resetting i over and over. It is inside your for loop, so for every item, i is first set to 0. You need to put it before your for loop:
i=0
for Lista in range(0,len(lista_clasificable)):
...
See MSW's comment for your 2nd problem. You are also overwriting the master lists each time, I missed that.
This is pretty un-pythonesque though. Instead of getting the length of the list and using this setup where i tracks the index, just do something like:
for item in Lista:
do something
Python can loop through the list directly - you don't need to use indexing.
You are resetting i to zero at every iteration of the for loop. Put the i=0 above your for loop.
Here's a cleaner version. I am not explaining it in the hopes that you will learn. Ask questions if you can't puzzle it out with the manual at hand. Sorry if I botched the Spanish.
# a little more pythonically and far less repetitious
def classificar(p, types):
# create a dict of lists such that dict[typename] = []
lists = dict()
for t in types:
lists[t.__name__] = []
# for all of the elements in p, assign them to a type list
# if applicable
for x in p:
for t in types:
if type(x) == t:
lists[t.__name__].append(x)
return lists
input = [55.5,'hola','abc',10,'5','x5',0.25,['A',2,1.5],5,2,5.3,'AEIOU',
('perro','gato','pollo'), [1,2,3], 1001, ['a', 1],
'mundo','01/10/2015',20080633,'2.5',0.123,(1,2,'A','B')]
types = [str, int, float, tuple, list]
lists = classificar(input, types)
print('Su lista contiente los siguientes:')
for type in types:
print(' ', type.__name__, lists[type.__name__])

Python number to word converter needs a space detector

I have been working on a sort of encryption tool in python. This bit of code is for the decryption feature.
The point is to take the given numbers and insert them into a list from where they will be divided by the given keys.
My idea for code is below but I keep getting the out of list index range whenever I try it out. Any suggestions? Keep in mind I'm a beginner:
need = []
detr = raw_input('What would you like decrypted?')
count = 0
for d in detr:
if (d == '.' or d == '!') or (d.isalpha() or d== " "):
count +=1
else:
need[count].append(d)
The problem is you are attempting to overwrite list values that don't exist.
list.append(item) adds item to the end of list. list[index] = item inserts item into list at position index.
list = [0,0,0]
list.append(0) # = [0,0,0,0]
list[0] = 1 # [1,0,0,0]
list[99] = 1 # ERROR: out of list index range
You should get rid of the count variable entirely. You could append None in the case of d==' ' etc. or just ignore them.
The way I understood your description you want to extract the numbers in a string and append them to a list using a for-loop to iterate over each character.
I think it would be easier doing it with regular expressions (something like r'([\d]+)').
But the way joconner said: "get rid of the count variable":
need = []
detr = input('What would you like decrypted?\n')
i = iter(detr) # get an iterator
# iterate over the input-string
for d in i:
numberstr = ""
try:
# as long as there are digits
while d.isdigit():
# append them to a cache-string
numberstr+= d
d = next(i)
except StopIteration:
# occurs when there are no more characters in detr
pass
if numberstr != "":
# convert the cache-string to an int
# and append the int to the need-array
need.append( int(numberstr) )
# print the need-array to see what is inside
print(need)

Categories