Counting like elements in a list and appending list - python

I am trying to create a list in Python with values pulled from an active excel sheet. I want it to pull the step # value from the excel file and append it to the list while also including which number of that element it is. For example, 1_1 the first time it pulls 1, 1_2 the second time, 1_3 the third, etc. My code is as follows...
import win32com.client
xl = win32com.client.Dispatch("Excel.Application")
CellNum = xl.ActiveSheet.UsedRange.Rows.Count
Steps = []
for i in range(2,CellNum + 1): #Create load and step arrays in abaqus after importing from excel
if str(int(xl.Cells(i,1).value))+('_1' or '_2' or '_3' or '_4' or '_5' or '_6') in Steps:
StepCount = 1
for x in Steps:
if x == str(int(xl.Cells(i,1).value))+('_1' or '_2' or '_3' or '_4' or '_5' or '_6'):
StepCount+=1
Steps.append(str(int(xl.Cells(i,1).value))+'_'+str(StepCount))
else:
Steps.append(str(int(xl.Cells(i,1).value))+'_1')
I understand that without the excel file, the program will not run for any of you, but I was just wondering if it is some simple error that I am missing. When I run this, the StepCount does not go higher than 2 so I receive a bunch of 1_2, 2_2, 3_2, etc elements. I've posted my resulting list below.
>>> Steps
['1_1', '2_1', '3_1', '4_1', '5_1', '6_1', '7_1', '8_1', '9_1', '10_1', '11_1', '12_1',
'13_1', '14_1', '1_2', '14_2', '13_2', '12_2', '11_2', '10_2', '2_2', '3_2', '9_2',
'8_2', '7_2', '6_2', '5_2', '4_2', '3_2', '2_2', '1_2', '2_2', '3_2', '4_2', '5_2',
'6_2', '7_2', '8_2', '9_2', '10_2', '11_2', '12_2', '13_2', '14_2', '1_2', '2_2']
EDIT #1: So, if the ('_1' or '_2' or '_3' or '_4' or '_5' or '_6') will ALWAYS only use _1, is it this line of code that is messing with my counter?
if x == str(int(xl.Cells(i,1).value))+('_1' or '_2' or '_3' or '_4' or '_5' or '_6'):
Since it is only using _1, it will only count 1_1 and not check 1_2, 1_3, 1_4, etc
EDIT #2: Now I am using the following code. My input list is also below.
from collections import defaultdict
StepsList = []
Steps = []
tracker = defaultdict(int)
for i in range(2,CellNum + 1):
StepsList.append(int(xl.Cells(i,1).value))
>>> StepsList
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 14, 13, 12, 11, 10, 2, 3, 9, 8,
7, 6, 5, 4, 3, 2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2]
for cell in StepsList:
Steps.append('{}_{}'.format(cell, tracker[cell]+1)) # This is +1 because the tracker starts at 0
tracker[cell]+=1
I get the following error: ValueError: zero length field name in format from the for cell in StepsList: iteration block
EDIT #3: Got it working. For some reason it didn't like
Steps.append('{}_{}'.format(cell, tracker[cell]+1))
So I just changed it to
for cell in StepsList:
tracker[cell]+=1
Steps.append(str(cell)+'_'+str(tracker[cell]))
Thanks for all of your help!

This line:
if str(int(xl.Cells(i,1).value))+('_1' or '_2' or '_3' or '_4' or '_5' or '_6') in Steps:
does not do what you think it does. ('_1' or '_2' or '_3' or '_4' or '_5' or '_6') will always return '_1'. It does not iterate over that series of or values looking for a match.
Without seeing expected input vs. expected output, it's hard to point you in the correct direction to actually get what you want out of your code, but likely you'll want to leverage itertools.product or one of the other combinatoric methods from itertools.
Update
Based on your comments, I think that this is a way of solving your problem. Assuming an input list of the following:
in_list = [1, 1, 1, 2, 3, 3, 4]
You can do the following:
from collections import defaultdict
tracker = defaultdict(int) # defaultdict is just a regular dict with a default value at new keys (in this case 0)
steps = []
for cell in in_list:
steps.append('{}_{}'.format(cell, tracker[cell]+1)) # This is +1 because the tracker starts at 0
tracker[cell]+=1
Result:
>>> steps
['1_1', '1_2', '1_3', '2_1', '3_1', '3_2', '4_1']
There are likely more efficient ways to do this using combinations of itertools, but this way is certainly the most straight-forward

Related

Applying an iterable mask, checking it against a value - if value doesn't satisfy the mask condition, move to the next value which does

I currently have some code where I've created a mask which checks to see if a variable matches the first position in a sequence, called index_pos_overload. If it matches, the variable is chosen, and the check ends. However, I want to be able to use this mask to not only check if the number satisfies the condition of the mask, but if it doesn't move along to the next value in the sequence which does. It's essentially to pick out a row in my pandas data column, hyst. My code currently looks like this:
import pandas as pd
from itertools import chain
hyst = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})
possible_overload_cycle = 1
index_pos_overload = chain.from_iterable((hyst.index[i])
for i in range(0, len(hyst)-1, 5))
if (possible_overload_cycle == index_pos_overload):
hyst_overload_cycle = possible_overload_cycle
else:
hyst_overload_cycle = 5 #next value in iterable where index_pos_overload is true
The expected output of hyst_overload_cycle should be this:
print(hyst_overload_cycle)
5
I've included my logic as to how I think this should work - possible_overload_cycle = 1 does not point to the first position in the dataframe, so hyst_overload_cycle should return as 5, the first position in the mask. I hope I've made sense, as I can't quite seem to work out how I would go about this programatically.
If I understood you correctly, it may be simpler than you think:
index_pos_overload can be an array / list, there is no need to use complex constructs to store a sequence of values
to find the first non-zero value from index_pos_overload, one can simply use np.nonzero()[0][0] (the first [0] is to select the dimension, the second is to select the index within that axis) and use array indexing of that on the original index_pos_overload array
The code would look like:
import numpy as np
import pandas as pd
hyst = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})
possible_overload_cycle = 1
index_pos_overload = np.array([hyst.index[i] for i in range(0, len(hyst)-1, 5)])
if possible_overload_cycle in index_pos_overload:
hyst_overload_cycle = possible_overload_cycle
else:
hyst_overload_cycle = index_pos_overload[np.nonzero(index_pos_overload)[0][0]]
print(hyst_overload_cycle)
# 5

Random series whre non are 75% alike

So, I've been trying to make a random series generator with the given numbers using an array:
so the possibilities are: [0-9, 0-9, 0-9, 0 - 59, 0-9, 0-9, 0-9].
The only problem is that I want that all the series aren't even 75% the same (no more than 2 numbers the same).
So here are some examples:
Good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 2]
Not good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 1]
So, if there are fewer than 2 numbers the same it deletes the second one.
And the second problem is that I want 10,000 of these series.
Sorry if I didn't explain it well, the code would probably explain what I tried to explain.
TRIGGER WARNING!! CODE ISN'T EFFICIENT AT ALL!!
TOTAL_SERIES = 10000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
def create_series():
global fail, success
series = []
for i in range(len(placement_amount)):
series.append(random.randint(0, placement_amount[i]))
for i in all_series:
count = 0
for j in range(len(i)):
if series[j] == i[j]:
count += 1
if count > 2:
return;
all_series.append(series)
while len(all_series) < TOTAL_SERIES:
create_series()
The code technically works but it takes around 1 hour to generate 400 of these since the longer it runs the harder it takes to find a series that follows the rules.
So, my question is how do I make it more efficient and so it will make 10,000 series the fastest a code can.
What I've tried so far:
Tried adding cuda so I'll be able to run the code on a gpu making it faster (have python 32-bit so can't)
Tried creating a few threads where each generates 10,000/threads amount and then run a code that deletes all the ones who don't follow the rules (the code just got stuck).
I'm open to hear how I can try these again but with a correct code or anything that will make it efficient.
The answer for me isn't code efficiency but just that it's impossible make 10,000 series since the first 3 numbers can't be identical, so I changed the lines:
if counter > 2:
to
if counter > 3
Thanks everyone for the help, but if you got a way to make it more efficient it would be nice :D
Your original solution is in O( P(N)*N), you can reduce it to O(P(N)) with dicts and computing the differrent index combinations:
-P(N) is the expected number of iterations to get N such series
- the constants are larger!
import itertools
import random
indexes=list(itertools.combinations(range(7),3))
big_dict={ k : {} for k in indexes }
TOTAL_SERIES = 1000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
loops=0
while len(all_series) < TOTAL_SERIES:
loops+=1
candidate = tuple(random.randint(0, amount) for amount in placement_amount)
if any( (candidate[index[0]],candidate[index[1]],candidate[index[2]]) in \
big_dict[index] for index in indexes ):
continue
else:
for index in indexes:
big_dict[index[(candidate[index[0]],candidate[index[1]],candidate[index[2]])]=True
all_series.append(candidate)
This must be the solution:
import random
def gen_series(pattern):
return [random.randint(0, max_val) for max_val in pattern]
pattern = [9, 9, 9, 59, 9, 9, 9]
for i in range(100):
print(gen_series(pattern))

csv.writer - How to dynamically write columns in writerow(n1,n2,n3, ...nth)?

I am trying to write to a CSV file. I want to write three variables on a row and then write a variable number of columns.
So for example my script will do a bunch of calculations and come up with the idea that I need 12 columns.
So the 'variable' needs to contain column 0 thru 11.
How to do this dynamically?
numberofcolumns = 12
with open(f+".csv",'wb') as output_csvfile:
filewriter = csv.writer(output_csvfile)
filewriter.writerow([constant1,constant2,constant3,variable[0],...,variable[n]])
What I want is to do
filewriter.writerow([constant1, constant2, constant3, variable[0], variable[1],....,variable[11]])
However variable[11] may not be 11 it may be 8 or 10 or whatever. the length is dynamic. How can I make it so that this code will be able to output to Nth column if the function writerow() isn't defined to use *args?
What martineau pointed out in a comment is correct. writerow accepts a list, or sequence, of any length.
So you could do something like the following:
variable = range(12)
# Change your writerow line to be something like this:
filewriter.writerow([constant1,constant2,constant3] + variable)
range in this case is an example of creating a list of however-many items. range is documented here.
Notice that the above example uses + to put two sequences/lists together.
Here's an example of that from the command line/repl:
>>> variable = range(12)
>>> variable
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> ["x", "y", "z"] + variable
['x', 'y', 'z', 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

How is sorted(key=lambda x:) implemented behind the scene?

An example:
names = ["George Washington", "John Adams", "Thomas Jefferson", "James Madison"]
sorted(names, key=lambda name: name.split()[-1].lower())
I know key is used to compare different names, but it can have two different implementations:
First compute all keys for each name, and bind the key and name together in some way, and sort them. The p
Compute the key each time when a comparison happens
The problem with the first approach is that it has to define another data structure to bind the key and data. The problem with the second approach is that the key might be computed for multiple times, that is, name.split()[-1].lower() will be executed many times, which is very time-consuming.
I am just wondering in which way Python implemented sorted().
The key function is executed just once per value, to produce a (keyvalue, value) pair; this is then used to sort and later on just the values are returned in the sorted order. This is sometimes called a Schwartzian transform.
You can test this yourself; you could count how often the function is called, for example:
>>> def keyfunc(value):
... keyfunc.count += 1
... return value
...
>>> keyfunc.count = 0
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.count
10
or you could collect all the values that are being passed in; you'll see that they follow the original input order:
>>> def keyfunc(value):
... keyfunc.arguments.append(value)
... return value
...
>>> keyfunc.arguments = []
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.arguments
[0, 8, 1, 6, 4, 5, 3, 7, 9, 2]
If you want to read the CPython source code, the relevant function is called listsort(), and the keyfunc is used in the following loop (saved_ob_item is the input array), which is executed before sorting takes place:
for (i = 0; i < saved_ob_size ; i++) {
keys[i] = PyObject_CallFunctionObjArgs(keyfunc, saved_ob_item[i],
NULL);
if (keys[i] == NULL) {
for (i=i-1 ; i>=0 ; i--)
Py_DECREF(keys[i]);
if (saved_ob_size >= MERGESTATE_TEMP_SIZE/2)
PyMem_FREE(keys);
goto keyfunc_fail;
}
}
lo.keys = keys;
lo.values = saved_ob_item;
so in the end, you have two arrays, one with keys and one with the original values. All sort operations act on the two arrays in parallel, sorting the values in lo.keys and moving the elements in lo.values in tandem.

In Python how can I change the values in a list to meet certain criteria

In Python, I have several lists that look like variations of:
[X,1,2,3,4,5,6,7,8,9,X,11,12,13,14,15,16,17,18,19,20]
[X,1,2,3,4,5,6,7,8,9,10,X,12,13,14,15,16,17,18,19,20]
[0,X,2,3,4,5,6,7,8,9,10,11,X,13,14,15,16,17,18,19,20]
The X can fall anywhere. There are criteria where I put an X, but it's not important for this example. The numbers are always contiguous around/through the X.
I need to renumber these lists to meet a certain criteria - once there is an X, the numbers need to reset to zero. Each X == a reset. Each X needs to become a zero, and counting resumes from there to the next X. Results I'd want:
[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,10]
[0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9]
Seems like a list comprehension of some type or a generator could help me here, but I can't get it right.
I'm new and learning - your patience and kindness are appreciated. :-)
EDIT: I'm getting pummeled with downvotes, like I've reposted on reddit or something. I want to be a good citizen - what is getting me down arrows? I didn't show code? Unclear question? Help me be better. Thanks!
Assuming the existing values don't matter this would work
def fixList(inputList, splitChar='X'):
outputList = inputList[:]
x = None
for i in xrange(len(outputList)):
if outputList[i] == splitChar:
outputList[i] = x = 0
elif x is None:
continue
else:
outputList[i] = x
x += 1
return outputList
eg
>>> a = ['X',1,2,3,4,5,6,7,8,9,'X',11,12,13,14,15,16,17,18,19,20]
>>> fixList(a)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = ['y',1,2,3,4,5,6,7,8,9,10,'y',12,13,14,15,16,17,18,19,20]
>>> fixList(b, splitChar='y')
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
EDIT: fixed to account for the instances where list does not start with either X or 0,1,2,...
Using the string 'X' as X and the_list as list:
[0 if i == 'X' else i for i in the_list]
This will return the filtered list.

Categories