Python: Specifying Positions in Lists? - python

I have three lists, of the form:
Day: [1, 1, 1, 2, 2, 2, 3, 3, 3, ..... n, n, n]
Wavelength: [10, 20, 30, 10, 20, 30, 10, 20, 30, ..... 10, 20, 30]
Flux: [1, 2, 3, 1, 2, 3, 1, 2, 3, ..... 1, 2, 3]
I want to split the lists so that the sections of the list with the "Day" value of 1 are seperated and run through a function, and the process then repeats and does this all the way through until it has been done for all n days.
I've tried splitting them into lists and currently have:
x=[]
y=[]
z=[]
for i in day:
if Day[i] == Day[i+1]:
x.append(Day(i))
y.append(Wavelength[i])
z.append(Flux[i])
i+=1
else "integrate over the Wavelength/Flux values where the value of Day is 1"
i+=1
This doesn't work, and I'm not convinced I'm going about it the best way. I'm relatively new to programming so it still takes me ages to find and fix errors!

If you use zip() to combine the three lists into one list of tuples, you can then filter it for each day you care about. (This isn't particularly efficient if you have lots of data, and will require more memory than your approach, but has the advantage of being, concise, fairly pythonic, and I believe readable.)
data = zip(day, wavelength, flux)
for d in range(min(day), max(day)+1):
print d, [ datum for datum in data if datum[0] == d ]
Instead of print you could just pass that list (the output of the […] list comprehension) to whatever function you need to run over the data (possibly with d, the day you're dealing with at that time).

Related

Random series whre non are 75% alike

So, I've been trying to make a random series generator with the given numbers using an array:
so the possibilities are: [0-9, 0-9, 0-9, 0 - 59, 0-9, 0-9, 0-9].
The only problem is that I want that all the series aren't even 75% the same (no more than 2 numbers the same).
So here are some examples:
Good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 2]
Not good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 1]
So, if there are fewer than 2 numbers the same it deletes the second one.
And the second problem is that I want 10,000 of these series.
Sorry if I didn't explain it well, the code would probably explain what I tried to explain.
TRIGGER WARNING!! CODE ISN'T EFFICIENT AT ALL!!
TOTAL_SERIES = 10000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
def create_series():
global fail, success
series = []
for i in range(len(placement_amount)):
series.append(random.randint(0, placement_amount[i]))
for i in all_series:
count = 0
for j in range(len(i)):
if series[j] == i[j]:
count += 1
if count > 2:
return;
all_series.append(series)
while len(all_series) < TOTAL_SERIES:
create_series()
The code technically works but it takes around 1 hour to generate 400 of these since the longer it runs the harder it takes to find a series that follows the rules.
So, my question is how do I make it more efficient and so it will make 10,000 series the fastest a code can.
What I've tried so far:
Tried adding cuda so I'll be able to run the code on a gpu making it faster (have python 32-bit so can't)
Tried creating a few threads where each generates 10,000/threads amount and then run a code that deletes all the ones who don't follow the rules (the code just got stuck).
I'm open to hear how I can try these again but with a correct code or anything that will make it efficient.
The answer for me isn't code efficiency but just that it's impossible make 10,000 series since the first 3 numbers can't be identical, so I changed the lines:
if counter > 2:
to
if counter > 3
Thanks everyone for the help, but if you got a way to make it more efficient it would be nice :D
Your original solution is in O( P(N)*N), you can reduce it to O(P(N)) with dicts and computing the differrent index combinations:
-P(N) is the expected number of iterations to get N such series
- the constants are larger!
import itertools
import random
indexes=list(itertools.combinations(range(7),3))
big_dict={ k : {} for k in indexes }
TOTAL_SERIES = 1000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
loops=0
while len(all_series) < TOTAL_SERIES:
loops+=1
candidate = tuple(random.randint(0, amount) for amount in placement_amount)
if any( (candidate[index[0]],candidate[index[1]],candidate[index[2]]) in \
big_dict[index] for index in indexes ):
continue
else:
for index in indexes:
big_dict[index[(candidate[index[0]],candidate[index[1]],candidate[index[2]])]=True
all_series.append(candidate)
This must be the solution:
import random
def gen_series(pattern):
return [random.randint(0, max_val) for max_val in pattern]
pattern = [9, 9, 9, 59, 9, 9, 9]
for i in range(100):
print(gen_series(pattern))

Raise Elements of Array to Series of Exponents

Suppose I have a numpy array such as:
a = np.arange(9)
>> array([0, 1, 2, 3, 4, 5, 6, 7, 8])
If I want to raise each element to succeeding powers of two, I can do it this way:
power_2 = np.power(a,2)
power_4 = np.power(a,4)
Then I can combine the arrays by:
np.c_[power_2,power_4]
>> array([[ 0, 0],
[ 1, 1],
[ 4, 16],
[ 9, 81],
[ 16, 256],
[ 25, 625],
[ 36, 1296],
[ 49, 2401],
[ 64, 4096]])
What's an efficient way to do this if I don't know the degree of the even monomial (highest multiple of 2) in advance?
One thing to observe is that x^(2^n) = (...(((x^2)^2)^2)...^2)
meaning that you can compute each column from the previous by taking the square.
If you know the number of columns in advance you can do something like:
import functools as ft
a = np.arange(5)
n = 4
out = np.empty((*a.shape,n),a.dtype)
out[:,0] = a
# Note: this works by side-effect!
# The optional second argument of np.square is "out", i.e. an
# array to write the result to (nonetheless the result is also
# returned directly)
ft.reduce(np.square,out.T)
out
# array([[ 0, 0, 0, 0],
# [ 1, 1, 1, 1],
# [ 2, 4, 16, 256],
# [ 3, 9, 81, 6561],
# [ 4, 16, 256, 65536]])
If the number of columns is not known in advance then the most efficient method is to make a list of columns, append as needed and only in the end use np.column_stack or np.c_ (if using np.c_ do not forget to cast the list to tuple first).
The straightforward approach is:
exponents = [2**n for n in a]
[a**e for e in exponents]
This works fine for relatively small numbers, but I see what looks like numerical overflow on the larger numbers. (Although I can compute those high powers just fine using scalars.)
The most elegant way I could think of is to not calculate the exponents beforehand. Since your exponents follow a very easy pattern, you can express everything using on list-comprehension.
result = [item**2*index for index,item in enumerate(a)]
If you are working with quite large datasets, this will cause some serious overhead. This statement will do all calculations immediately and save all calculated element in one large array. To mitigate this problem, you could you a generator expression, which will generate the data on the fly.
result = (item**2*index for index,item in enumerate(a))
See here for more details.

Most efficient way to iterate through list of lists

I'm currently collecting data from quandl and is saved as a list of lists. The list looks something like this (Price data):
['2', 1L, datetime.date(1998, 1, 2), datetime.datetime(2016, 9, 26, 1, 35, 3, 830563), datetime.datetime(2016, 9, 26, 1, 35, 3, 830563), '82.1900', '83.6200', '81.7500', '83.5000', '28.5183', 1286500.0]
This is typically 1 of about 5000 lists, and every once in awhile Quandl will spit back some NaN values that don't like being saved into the database.
['2', 1L, datetime.date(1998, 1, 2), datetime.datetime(2016, 9, 26, 1, 35, 3, 830563), datetime.datetime(2016, 9, 26, 1, 35, 3, 830563), 'nan', 'nan', 'nan', 'nan', 'nan', 0]
What would be the most efficient way of iterating through the list of lists to change 'nan' values into zeros?
I know I could do something like this, but it seems rather inefficient. This operation will need to be performed on 11 different values * 5000 different dates * 500 companies:
def screen_data(data):
new_data = []
for d in data:
new_list = []
for x in d:
new_value = x
if math.isNan(x):
new_value = 0
new_list.append(new_value)
new_data.append(new_list)
return new_data
I would be interested in any solution that could reduce the time. I know DataFrames might work, but not sure how it would solve the NaN issue.
Or if there is a way to include NaN values in an SQLServer5.6 database along with floats, changing the database is also a viable option.
Don't create a new list - rather, edit the old list in-place:
import math
def screenData(L):
for subl in L:
for i,n in enumerate(subl):
if math.isnan(n): subl[i] = 0
The only way I can think of, to make this faster, would be with multiprocessing
I haven't timed it but have you tried using nested list comprehension with conditional expressions ?
For example:
import datetime
data = [
['2', 1, datetime.date(1998, 1, 2),
datetime.datetime(2016, 9, 26, 1, 35, 3, 830563),
datetime.datetime(2016, 9, 26, 1, 35, 3, 830563),
'82.1900', '83.6200', '81.7500', '83.5000',
'28.5183', 1286500.0],
['2', 1, datetime.date(1998, 1, 2),
datetime.datetime(2016, 9, 26, 1, 35, 3, 830563),
datetime.datetime(2016, 9, 26, 1, 35, 3, 830563),
'nan', 'nan', 'nan', 'nan', 'nan', 0],
]
new_data = [[y if str(y).lower() != 'nan' else 0 for y in x] for x in data]
print(new_data)
I did not use math.isnan(y) because you have to be sure that y is a float number or you'll get an error. This is much more difficult to do while almost everything has a string representation. But I still made sure that I did the lower case comparison to 'nan' (with .lower()) since 'NaN' or 'Nan' are legal ways to express "Not a Number".
how about this
import math
def clean_nan(data_list,value=0):
for i,x in enumerate(data_list):
if math.isnan(x):
data_list[i] = value
return data_list
(the return is optional, as the modification was made in-place, but it is needed if used with map or similar, assuming of course that data_list is well a list or similar container)
depending on how you get your data and how you work with it will determined how to use it, for instance if you do something like this
for data in (my database/Quandl/whatever):
#do stuff with data
you can change it to
for data in (my database/Quandl/whatever):
clean_nan(data)
#do stuff with data
or use map or if you are in python 2 imap
for data in map(clean_nan,(my database/Quandl/whatever)):
#do stuff with data
that way you get to work with your data as soon as that arrive from the database/Quandl/whatever, granted if the place where you get the data also work as a generator, that is don't process the whole thing all at once, and if it does, procure to change it to a generator if possible. In either case with this you get to work with your data as soon as possible.

Python3 Make a list that increments for a certain amount, decrements for a certain amount

if I have the following:
[0, 1, 2, 3, 4, 5, 6...] how can I reorder the list (actually make a new copy of the list) and then fill it as such:
[0, 1, 2, 3, 4, 5, 10, 9, 8, 7, 6, 11, 12, 13...]
I.e. Every five iterations, the list starts to decrement, or increment. The reason I want to do this is that I have a list of objects, and I want to fill a new list with the objects in a different order.
One technique I tried is:
copied_icons = [{key:'Object1'}, {key:'Object2'}, {key:'Object3'}...]
reversed_copied_icons = copied_icons[::-1]
left_to_right = []
for h in range(17):
left_to_right.append(copied_icons[h])
for j in range(18, 35):
left_to_right.append(reversed_copied_icons[j])
for k in range(36, 53):
left_to_right.append(copied_icons[k])
for l in range(54, 71):
left_to_right.append(reversed_copied_icons[l])
But for some reason this returns the list out of order and duplicates some of the objects. I am wondering if there is a simpler way to alternating incrementing and decrementing while filling my list.
There are two problems with your approach:
You are reversing the entire list, not just that slice. Let's say the list is [1,2,3,4], and we want to reverse the second half, i.e. get [1,2,4,3]; with your approach, you would take the third and fourth element from the reversed list, [4,3,2,1], and end up with [1,2,2,1]
The to-index in a range is exclusive, thus by using range(17) and then range(18,35) and so forth, you are missing out on the elements at index 17, 35, and 53
You can use a loop for the different parts to be reversed, and then replace that slice of the list with the same slice in reverse order.
lst = list(range(20))
for start in range(5, len(lst), 10):
lst[start:start+5] = lst[start+4:start-1:-1]
Or this way, as pointed out in comments, which also gets rid of those nasty off-by-one indices:
for start in range(5, len(lst), 10):
lst[start:start+5] = reversed(lst[start:start+5])
Afterwards, lst is [0, 1, 2, 3, 4, 9, 8, 7, 6, 5, 10, 11, 12, 13, 14, 19, 18, 17, 16, 15].
Or, in case the intervals to be reversed are irregular (as it seems to be in your question):
reverse = [(3, 7), (12,17)]
for start, end in reverse:
lst[start:end] = reversed(lst[start:end])
This appears to accomplish your objective:
def foo(lst, n=5):
""" lst=list to be re-ordered, n=item count before reversal """
new_list = list()
direction = 1
start = 0
end = start
while start < len(lst):
# process through the list in steps of 'n',
# except use min for possible stub at end.
end = start + min(n, len(lst) - start) # i.e. start+5
# If direction is 1, append list to new list. Otherwise, append reversed list.
new_list[start:end] = lst[start:end][::direction]
direction *= -1 # Switch directions
start = end # Jump to new starting position.
return new_list
lst = np.arange(20).tolist()
foo(lst,5)
[0, 1, 2, 3, 4, 9, 8, 7, 6, 5, 10, 11, 12, 13, 14, 19, 18, 17, 16, 15]
If the direction *= -1 line where to be removed, the code would simply copy the existing list (lst) in chunks of size 'n', the number of items you'd like before reversing the list.
Just above where the direction is to be changed, the [::direction] will be either [::1] in which case the list will be sorted in regular order or else [::-1] in which case the list will be reversed for the chunk of size n which is being processed. The third argument when slicing a list is the 'step size' argument, so a step of -1 returns a copy of the list in reverse order.
In case there is a stub, i.e. your stub is 2 if your list has 22 elements but your 'n' is in steps of 5, then you need to adjust your step size 'n' so you don't go past the end of your list. The min(n, len(lst) - start) will ensure you don't go past the end of the list. Alternatively, and probably clearer, you could use end = min(start + n, len(lst)).

Counting like elements in a list and appending list

I am trying to create a list in Python with values pulled from an active excel sheet. I want it to pull the step # value from the excel file and append it to the list while also including which number of that element it is. For example, 1_1 the first time it pulls 1, 1_2 the second time, 1_3 the third, etc. My code is as follows...
import win32com.client
xl = win32com.client.Dispatch("Excel.Application")
CellNum = xl.ActiveSheet.UsedRange.Rows.Count
Steps = []
for i in range(2,CellNum + 1): #Create load and step arrays in abaqus after importing from excel
if str(int(xl.Cells(i,1).value))+('_1' or '_2' or '_3' or '_4' or '_5' or '_6') in Steps:
StepCount = 1
for x in Steps:
if x == str(int(xl.Cells(i,1).value))+('_1' or '_2' or '_3' or '_4' or '_5' or '_6'):
StepCount+=1
Steps.append(str(int(xl.Cells(i,1).value))+'_'+str(StepCount))
else:
Steps.append(str(int(xl.Cells(i,1).value))+'_1')
I understand that without the excel file, the program will not run for any of you, but I was just wondering if it is some simple error that I am missing. When I run this, the StepCount does not go higher than 2 so I receive a bunch of 1_2, 2_2, 3_2, etc elements. I've posted my resulting list below.
>>> Steps
['1_1', '2_1', '3_1', '4_1', '5_1', '6_1', '7_1', '8_1', '9_1', '10_1', '11_1', '12_1',
'13_1', '14_1', '1_2', '14_2', '13_2', '12_2', '11_2', '10_2', '2_2', '3_2', '9_2',
'8_2', '7_2', '6_2', '5_2', '4_2', '3_2', '2_2', '1_2', '2_2', '3_2', '4_2', '5_2',
'6_2', '7_2', '8_2', '9_2', '10_2', '11_2', '12_2', '13_2', '14_2', '1_2', '2_2']
EDIT #1: So, if the ('_1' or '_2' or '_3' or '_4' or '_5' or '_6') will ALWAYS only use _1, is it this line of code that is messing with my counter?
if x == str(int(xl.Cells(i,1).value))+('_1' or '_2' or '_3' or '_4' or '_5' or '_6'):
Since it is only using _1, it will only count 1_1 and not check 1_2, 1_3, 1_4, etc
EDIT #2: Now I am using the following code. My input list is also below.
from collections import defaultdict
StepsList = []
Steps = []
tracker = defaultdict(int)
for i in range(2,CellNum + 1):
StepsList.append(int(xl.Cells(i,1).value))
>>> StepsList
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 14, 13, 12, 11, 10, 2, 3, 9, 8,
7, 6, 5, 4, 3, 2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2]
for cell in StepsList:
Steps.append('{}_{}'.format(cell, tracker[cell]+1)) # This is +1 because the tracker starts at 0
tracker[cell]+=1
I get the following error: ValueError: zero length field name in format from the for cell in StepsList: iteration block
EDIT #3: Got it working. For some reason it didn't like
Steps.append('{}_{}'.format(cell, tracker[cell]+1))
So I just changed it to
for cell in StepsList:
tracker[cell]+=1
Steps.append(str(cell)+'_'+str(tracker[cell]))
Thanks for all of your help!
This line:
if str(int(xl.Cells(i,1).value))+('_1' or '_2' or '_3' or '_4' or '_5' or '_6') in Steps:
does not do what you think it does. ('_1' or '_2' or '_3' or '_4' or '_5' or '_6') will always return '_1'. It does not iterate over that series of or values looking for a match.
Without seeing expected input vs. expected output, it's hard to point you in the correct direction to actually get what you want out of your code, but likely you'll want to leverage itertools.product or one of the other combinatoric methods from itertools.
Update
Based on your comments, I think that this is a way of solving your problem. Assuming an input list of the following:
in_list = [1, 1, 1, 2, 3, 3, 4]
You can do the following:
from collections import defaultdict
tracker = defaultdict(int) # defaultdict is just a regular dict with a default value at new keys (in this case 0)
steps = []
for cell in in_list:
steps.append('{}_{}'.format(cell, tracker[cell]+1)) # This is +1 because the tracker starts at 0
tracker[cell]+=1
Result:
>>> steps
['1_1', '1_2', '1_3', '2_1', '3_1', '3_2', '4_1']
There are likely more efficient ways to do this using combinations of itertools, but this way is certainly the most straight-forward

Categories