list comprehension and transforming data - python

I've been trying to convert some of my functions where I use a for loop by using list comprehension. Here is my first version of the function,
def adstocked_advertising(data, adstock_rate):
'''
Transforming data with applying Adstock transformations
data - > The dataframe that is being used to create Adstock variables
adstock_rate -> The rate at of the adstock
ex. data['Channel_Adstock'] = adstocked_advertising(data['Channel'], 0.5)
'''
adstocked_advertising = []
for i in range(len(data)):
if i == 0:
adstocked_advertising.append(data[i])
else:
adstocked_advertising.append(data[i] + adstock_rate * adstocked_advertising[i-1])
return adstocked_advertising
I want to convert it to this,
def adstocked_advertising_list(data, adstock_rate):
adstocked_advertising = [data[i] if i == 0 else data[i] + adstock_rate * data[i-1] for i in range(len(data))]
return adstocked_advertising
However, when viewing the df after running both functions I get two different values.
data['TV_adstock'] = adstocked_advertising_list(data['TV'], 0.5)
data['TV_adstock_2'] = adstocked_advertising(data['TV'], 0.5)
here is output,
data.head()
data.tail()
I am not too sure why the first two rows are the same and then from there the numbers are all different. I am new to list comprehension so I may be missing something here.

You need to refer to the previously generated element in the list, and list comprehensions are not well suited to this type of problem. They work well for operations that only need to look at a single element at once.
This question goes into more detail.
In your initial example, you use adstock_rate * adstocked_advertising[i-1]. The list comprehension version uses adstock_rate * data[i-1], which is why you are getting different results.
A standard for loop works just fine for your use case. You could switch to using enumerate, as for i in range(len(data)) is discouraged.
if data:
res = [data[0]]
for index, item in enumerate(data[1:]):
results.append(item + rate * data[index-1])

You've changed your logic in the list comp version. Originally, your else formula looked like:
data[i] + adstock_rate * adstocked_advertising[i-1]
But the list comprehension version looks like:
data[i] + adstock_rate * data[i-1]
The first version accesses the i-1th element of the result list, while the second version accesses the i-1th element of the input list.
index == 0 is only true once at the beginning of the list. Why not eliminate the conditional:
def adstocked_advertising(data, adstock_rate):
if data:
res = [data[0]]
for i in range(1, len(data)):
res.append(data[i] + adstock_rate * res[i-1])
return res

Related

Python: How to implement a new line in every row of a nested list

So, recently I have been trying to create a random nested list generator to further my python knowledge. I have encountered a few problems that tested me. I have overcome all of them except for my current one. I want to make a new line for every row, so it is easier to visualize. I have looked at other posts on this website and tried to implement one of the solutions to one of them; however it only returned the error -
TypeError: sequence item 0: expected str instance, list found
My code is the following -
import random
def random_2d_array(rows, columns):
output = []
iteration = 0
while iteration != rows:
i2 = 0
iteration += 1
output.append([])
output = '\n'.join(output)
while i2 != columns:
i2 += 1
num = random.randint(1, 10)
col = list(output[iteration - 1])
col.insert(i2 - 1, num)
return output
print(random_2d_array(3, 3))
Use the following hack inside the function
return '\n'.join([str(i) for i in output])
Edit:
This takes each list in a nested list and converts it to a string and join the converted list to the end of the newline character and returns the result

Understanding Specific Python List Comprehension

I am trying to use this code that produces all possible outcomes of a dice roll, given the number of dice and number of sides. This codes works (but I do not quite understand how the list comprehension is working.
def dice_rolls(dice, sides):
"""
Equivalent to list(itertools.product(range(1,7), repeat=n)) except
for returning a list of lists instead of a list of tuples.
"""
result = [[]]
print([range(1, sides + 1)] * dice)
for pool in [range(1, sides + 1)] * dice:
result = [x + [y] for x in result for y in pool]
return result
Therefore, I am trying to re-write the list comprehension
result = [x + [y] for x in result for y in pool]
into FOR loops to try to make sense of how it is working, but am currently unable to properly do it. Current failed code:
for x in result:
for y in pool:
result = [x + [y]]
2nd Question: If I wanted to make this into a generator (because this function is a memory hog if you have enough dice and sides), would I simply just yield each item in the list as it is being produced instead of throwing it into the result list?
EDIT: I came up with a way to break the list comprehension into loops after getting great responses and wanted to capture it:
def dice_rolls(dice, sides):
result = [[]]
for pool in [range(1, sides + 1)] * dice:
temp_result = []
for existing_values in result: # existing_value same as x in list comp.
for new_values in pool: # new_value same as y in list comp.
temp_result.append(existing_values + [new_values])
result = temp_result
return result
My first instinct for this problem (and list comprehensions in general) would be to use recursion. Though YOU have asked for LOOPS, which is surprisingly challenging.
This is what I came up with;
def dice_rollsj(dice, sides):
result = [[]]
for num_dice in range(dice):
temp_result = []
for possible_new_values in range(1, sides+1):
for existing_values in result:
new_tuple = existing_values + [possible_new_values]
temp_result.append(new_tuple)
result = temp_result
I think that you'll get the same correct answers, but the numbers will be differently ordered. This may be due to the way the values are appended to the list. I don't know.... Let me know if this helps.
I tried to add as many lines as I could, because the goal was to expand and understand the comprehension.
you are redefining result with every iteration of the for y in pool loop. I believe what you want is to append the results:
result = []
for x in result:
for y in pool:
result.append(x + [y])

Memoryerror with too big list

I'm writing script in python, and now I have to create pretty big list exactly containing 248956422 integers. The point is, that some of this "0" in this table will be changed for 1,2 or 3, cause I have 8 lists, 4 with beginning positions of genes, and 4 with endings of them.
The point is i have to iterate "anno" several time cause numbers replacing 0 can change with other iteration.
"Anno" has to be written to the file to create annotation file.
Here's my question, how can I divide, or do it on-the-fly , not to get memoryerror including replacing "0" for others, and 1,2,3s for others.
Mabye rewriting the file? I'm waitin for your advice, please ask me if it is not so clear what i wrote :P .
whole_st_gen = [] #to make these lists more clear for example
whole_end_gen = [] # whole_st_gen has element "177"
whole_st_ex = [] # and whole_end_gen has "200" so from position 177to200
whole_end_ex = [] # i need to put "1"
whole_st_mr = [] # of course these list can have even 1kk+ elements
whole_end_mr = [] # note that every st/end of same kind have equal length
whole_st_nc = []
whole_end_nc = [] #these lists are including some values of course
length = 248956422
anno = ['0' for i in range(0,length)] # here i get the memoryerror
#then i wanted to do something like..
for j in range(0, len(whole_st_gen)):
for y in range(whole_st_gen[j],whole_end_gen[j]):
anno[y]='1'
You might be better of by determine the value of each element in anno on the fly:
def anno():
for idx in xrange(248956422):
elm = "0"
for j in range(0, len(whole_st_gen)):
if whole_st_gen[j] <= idx < whole_end_gen[j]:
elm = "1"
for j in range(0, len(whole_st_ex)):
if whole_st_ex[j] <= idx < whole_end_ex[j]:
elm = "2"
for j in range(0, len(whole_st_mr)):
if whole_st_mr[j] <= idx < whole_end_mr[j]:
elm = "3"
for j in range(0, len(whole_st_nc)):
if whole_st_nc[j] <= idx < whole_end_nc[j]:
elm = "4"
yield elm
Then you just iterate using for elm in anno().
I got an edit proposal from the OP suggesting one function for each of whole_*_gen, whole_st_ex and so on, something like this:
def anno_st():
for idx in xrange(248956422):
elm = "0"
for j in range(0, len(whole_st_gen)):
if whole_st_ex[j] <= idx <= whole_end_ex[j]:
elm = "2"
yield elm
That's of course doable, but it will only result in the changes from whole_*_ex applied and one would need to combine them afterwards when writing to file which may be a bit awkward:
for a, b, c, d in zip(anno_st(), anno_ex(), anno_mr(), anno_nc()):
if d != "0":
write_to_file(d)
elif c != "0":
write_to_file(c)
elif b != "0":
write_to_file(b)
else:
write_to_file(a)
However if you only want to apply some of the change sets you could write a function that takes them as parameters:
def anno(*args):
for idx in xrange(248956422):
elm = "0"
for st, end, tag in args:
for j in range(0, len(st)):
if st <= idx < end[j]:
elm = tag
yield tag
And then call by supplying the lists (for example with only the two first changes):
for tag in anno((whole_st_gen, whole_end_gen, "1"),
(whole_st_ex, whole_end_ex, "2")):
write_to_file(tag)
You could use a bytearray object to have a much more compact memory representation than a list of integers:
anno = bytearray(b'\0' * 248956422)
print(anno[0]) # → 0
anno[0] = 2
print(anno[0]) # → 2
print(anno.__sizeof__()) # → 248956447 (on my computer)
Instead of creating a list using list comprehension I suggest to create an iterator using a generator-expression which produce the numbers on demand instead of saving all of them in memory.Also you don't need to use the i in your loop since it's just a throw away variable which you don't use it.
anno = ('0' for _ in range(0,length)) # In python 2.X use xrange() instead of range()
But note that and iterator is a one shot iterable and you can not use it after iterating over it one time.If you want to use it for multiple times you can create N independent iterators from it using itertools.tee().
Also note that you can not change it in-place if you want to change some elements based on a condition you can create a new iterator by iterating over your iterator and applying the condition using a generator expression.
For example :
new_anno =("""do something with i""" for i in anno if #some condition)

Replacing loop with List Comprehension instead of loop getting a function to return a new array within the list comprehension

Basically I am trying to avoid looping through big arrays before I had code that looked like this:
for rows in book:
bs = []
as = []
trdsa = []
trdsb = []
for ish in book:
var = (float(str(ish[0]).replace(':',"")) - float(str(book[0]).replace(':',"")))
if var < .1 and var > 0 :
bs.append(int(ish[4]))
as.append(int(ish[5]))
trdsa.append(int(ish[-2]))
trdsb.append(int(ish[-1]))
time = ish[0]
bflow = sum(numpy.diff(bs))
aflow = sum(numpy.diff(as))
OFI = bflow - aflow - sum(trdsb) + sum(trdsa)
OFIlist.append([time,bidflow,askflow,OFI])
I don't want to loop through the list twice as it consumes way too much time. I was thinking I could do a list comprehension but I'm not sure if I'm on the right track
OFIcreate(x,y):
bs = []
as = []
trdsa = []
trdsb = []
var = (float(str(y[0]).replace(':',"")) - float(str(x[0]).replace(':',"")))
if var < .1 and var >= 0 :
bs.append(int(ish[4]))
as.append(int(ish[5]))
trdsa.append(int(ish[-2]))
trdsb.append(int(ish[-1]))
time = ish[0]
bflow = sum(numpy.diff(bs))
aflow = sum(numpy.diff(as))
OFI = bflow - aflow - sum(trdsb) + sum(trdsa)
OFIlist.append([time,bidflow,askflow,OFI])
return OFIlist
OFIc = [ OFIcreate(x,y) for x in book for y in book)
The problem is that I want to loop through the list and group all instances where var >=0 and var <.1 then append values into a new list. The way I have it now I dont think it does that as it will just keep creating lists with a length of one. Any ideas on how I can accomplish this? Or rather how can I make the first block of code more efficient?
While list comprehensions are indeed interpreted faster than regular loops, they can't work for everything. I don't think you could replace your main for loop by a list comprehension. However, there might be some room for improvement:
You could build a list of your time by list comprehension.
time = [ish[0] for ish in book]
You could compute a list of var by list comprehension and transform it a np.array.
var = np.array([t.replace(':',',') for t in time], dtype=float)
var -= float(str(book[0]).replace(":", ","))
You could build 4 numpy int arrays for bs, as (that you need to rename, as is a Python keyword)...
You could then filter your bs... arrays with fancy indexing:
bs_reduced = bs[(var < 0.1) & (var >=0)]
I don't want to loop through the list twice as it consumes way too much time. I was thinking I could do a list comprehension but I'm not sure if I'm on the right track
Probably not. A list comprehension does nothing but looping through the given list(s), so it should make no noticeable difference.

Modifying specific array elements in Python 3.1

I have two arrays: array and least_common (filter array)
The following code iterates through array, checks for elements that match least_common, and if finds it, modifies it and appends it to a new array.
for i in range (len(array)):
for j in range(len(least_common)):
if array[i] is least_common[j][0]:
new_array.append ((array[i]) + (array[i] * (mod[1]/100)))
However, if the element in array does not match any of the elements in least_common I wan't to append it to new_array, then iterate to the next element in array to begin the checking process again.
This code is a bit wonky to me -- I think you want to start with something more like:
lookup = set([x[0] for x in least_common])
new_array = []
for elem in array:
if elem in lookup:
new_array.append(elem + (elem * (mod[1]/100)))
else:
new_array.append(elem)
In Python, what you are trying to do is done using lists. There is another separate data type called arrays, but that is for totally different purpose. Please don't confuse and use the proper terminology, list.
Lists can be iterated through. You need to not index the elements out of the list and then access them using the index. That is C or C++ way of doing things and not python.
You use a list or a dictionary called mod in your original code. It is a bad idea to override builtin names. I tried to understand what you are trying, came up with the following code. Take it further, but before that, I think some beginner tutorials might help you as well.
new_array = []
somevalue = 0.001
for elem in array:
for anotherelem in least_common:
if elem == anotherelem[0]:
new_array.append(elem + (elem * somevalue))
Keep track of whether you found a match using a boolean, which you set to False before each inner loop and set to True within your if. After each iteration, if it's still False it means you found no matches and should then do your appending.
You should also follow what #Andrew says and iterate over lists using for a in array:. If you need the index, use for i, a in enumerate(array):. And be aware that is is not the same as ==.
new_array = []
for array_item in array:
found = False
for least_common_item in least_common:
if array_item is least_common_item:
found = True
if not found:
new_array.append (array_item * (1 + mod[1]/100))
You can also greatly shorten this code using in if you meant to use == instead of is:
for array_item in array:
if array_item not in least_common:
new_array.append (array_item * (1 + mod[1]/100))
Why not this:
least_common_set = frozenset(x[0] for x in least_common)
for e in array:
if e is not in least_common_set:
new_array.append(e + (e * (mod[1]/100)))
If I understand correctly your problem, here is a possible solution:
for e in array:
for lc in least_common:
if e is lc[0]:
new_array.append(e + e * (md[1] / 100))
break
else:
new_array.append(e)
The else clause in the for loop is executed when the loop terminates through exhaustion of the list, but not when the loop is terminated by a break statement.
Note that there is no need to use range or len, in Python you can just iterate on the elements of a sequence without involving indexes - you may use enumerate for that, but in this case you don't need to. Also, please don't use built-in names such as mod for your variables: here, I have renamed your dictionary md.

Categories