Right structure for nested while loop - python

I'm trying to fill up a 2D array with data rows selected on basis of a criterion spelled out in np.append. Python doesn't seem to complain about what I've done but somethings wrong with the nesting I guess and the loop gets stuck. I'm not sure whats wrong with it. My current thought is that somethings wrong with the way I structured the nested while loop in Python. I would appreciate if someone could point out the mistake.
nrows = 132
scan_length = 22
fulldata = fulldatat[0:0] # The actual data array of shape (528,32768)
ch = 0
while ch <= 3:
n = 1
while n <= nscans:
fulldata = np.append(fulldata, fulldatat[ch*nrows:ch*nrows+scan_length*n],axis=0)
n += 1
ch += 1

"for" is more appropriate than "while" for this type of loop :
nrows = 132
scan_length = 22
fulldata = fulldatat[0:0] # The actual data array of shape (528,32768)
for ch in range(4):
for n in range(1, nscans+1):
fulldata = np.append(fulldata, fulldatat[ch*nrows:ch*nrows+scan_length*n],axis=0)

You should try this:
nrows = 132
scan_length = 22
fulldata = fulldatat[0:0] # The actual data array of shape (528,32768)
ch = 0
while ch <= 3:
n = 1
while n <= nscans:
fulldata = np.append(fulldata, fulldatat[ch*nrows:ch*nrows+scan_length*n],axis=0)
n += 1
ch += 1
Code indentation needs to be care about.

Related

How to append python list to a numpy matrix in fastest way?

I am writing a code to read research data which have up to billion lines. I have to read data line by line because the data have multiple blocks. Each block has headers which are different from other block headers and datasets.
I hope to read those datasets into a Numpy matrix so I can perform matrix operations. Here are essential codes.
with open(datafile, "r") as dump:
i = 0 # block line number
line_no = 0 # total line number
block_size = 0
block_count = 0
for line in dump:
values = line.rstrip().rsplit()
i += 1
line_no += 1
if i <= self.head_line_no:
print(line) # for test
if self.tag_block in line or i == 1: # 1st line of a block
# save block size after reading 1st block
if block_size == 0 and block_count == 0:
block_size = line_no - 1
i = 1 # reset block line number
self.box = [] # reset box constant
print(self.matrix)
self.matrix = np.zeros((0, 0), dtype="float") # reset matrix
block_count += 1
elif i == 2:
self.timestamp.append(values[0])
elif i == 3 or i == 5:
continue
elif i == 4:
if self.atom_no != 0 and self.atom_no != values[0]:
self.warning_message = "atom number in timestep " + self.timestamp[-1] + "is inconsistent with" + self.timestamp[-2]
config.ConfigureUserEnv.log(self.warning_message)
else:
pass
self.atom_no = values[0]
elif i == 6 or i == 7 or i == 8:
self.box.append(values[0])
self.box.append(values[1])
elif i == self.head_line_no:
values = line.rstrip().rsplit(":")
for j in range(1,len(values)):
self.column_name.append(values[j])
else:
if self.matrix.size != 0:
np_array = np.array(values)
self.matrix = np.append(self.matrix, np.array(np.asarray(values)), 0)
else:
np_array = np.array(values)
self.matrix = np.zeros((1,len(values)), dtype="float")
self.matrix = np.asarray(values)
dump.close()
print(self.matrix) # for test
print(self.matrix.size) # for test
Original data like below:
ITEM: TIMESTEP
100
ITEM: NUMBER OF ATOMS
17587
ITEM: BOX BOUNDS pp pp pp
0.0000000000000000e+00 4.3491000000000000e+01
0.0000000000000000e+00 4.3491000000000000e+01
0.0000000000000000e+00 1.2994000000000000e+02
ITEM: ATOMS id type q xs ys zs
59 1 1.80278 0.110598 0.129682 0.0359397
297 1 1.14132 0.139569 0.0496654 0.00692627
315 1 1.17041 0.0832356 0.00620818 0.00507927
509 1 1.67165 0.0420777 0.113817 0.0313991
590 1 1.65209 0.114966 0.0630015 0.0447129
731 1 1.65143 0.0501253 0.13658 0.0108512
1333 2 1.049 0.00850751 0.0526546 0.0406341
......
I hope to add matrix data like below:
matrix = [[59 1 1.80278 0.110598 0.129682 0.0359397],
[297 1 1.14132 0.139569 0.0496654 0.00692627],
[315 1 1.17041 0.0832356 0.00620818 0.00507927],
...]
As mentioned above, there are very big size of datasets. I hope to use the fastest way to append array to the matrix. Any further help and advice would be highly appreciated.
Here are some important point to speed up the computation:
Do not use self.matrix = np.append(self.matrix, ...) in a loop, this is not efficient as it recreate a new growing array for each iteration (and copy the old one). This result in a quadratic run time. Use a pure-Python list instead with append and convert the list to a Numpy array in the end. This is the most critical performance-wise point.
Using self.box.extend((values[0], values[1])) should be significantly faster than performing two append.
Using dtype="float" is not very clear not very efficient, please consider using dtype=np.float64 instead (that do not need to be parsed by Numpy).
Using enumerate may be a bit faster than a manual increment in the loop.
Cython may help you to speed up this program if this is not fast enough for your input file. One should keep in mind that the standard Python interpreter (CPython) is not very fast to parse complex huge files compared to compiled native programs/modules written in languages like C or C++.
Note that values[i] are strings and so self.timestamp and self.box. Aren't they supposed to be integers/floats?

List of fixed length that sums to a number but minimizes standard deviation

Im not sure if I am even asking this question the right way but here goes:
Say I want to create a python list with 20 non-zero integer elements and those elements must sum to 87.
How can I go about this to ensure that the integers chosen minimize the standard deviation of the list as a whole (not sure this is the right metric).
The following code example works, but I'm thinking there must be a better way to do this
import pandas as pd
import numpy as np
target = 87
target_length = 20
starter_series = pd.Series([1 for val in range(target_length)])
while True:
current_sum = starter_series.sum()
if current_sum==target:
break
if target - current_sum > 20:
starter_series += 1
continue
else:
to_be_added = target - current_sum
index_points = np.random.choice(starter_series.index.to_list(), to_be_added, replace=False)
starter_series.loc[index_points] += 1
This simple code should work:
n = 20
s = 87
q,r = divmod(s,n)
l = [q+1]*r + [q]*(n-r)

how to refer to the loop value in the loop itself in python panda?

I am trying to do do a loop to repeat the following instructions. The loop should consider nivel1, nivel2,nivel3 and nivel4. Is there a smart way to do this? So far I have tried
for x in range(2, 5):
n_index = len(VaR_nivel1.index)
n_columns = len(VaR_nivel1.columns)
VaR_profit_nivel1=pd.DataFrame(np.random.rand(n_index ,n_columns ))
VaR_profit_nivel1.columns = VaR_nivel1.columns
zero_one_nivel1= pd.DataFrame(np.zeros ((n_index, n_columns)))
columna=0
indices=0
while indices<n_index:
while columna< n_columns:
VaR_profit_nivel1.iloc[indices,columna]=VaR_nivel1.iloc[indices,columna] + profit_nivel1.iloc[indices,columna]
if VaR_profit_nivel1.iloc[indices,columna] <0:
zero_one_nivel1.iloc[indices,columna]=1
columna += 1
indices += 1
And then I have to change the level1 for something like levelx...
Thank you.

Python code not working as intended

I started learning Python < 2 weeks ago.
I'm trying to make a function to compute a 7 day moving average for data. Something wasn't going right so I tried it without the function.
moving_average = np.array([])
i = 0
for i in range(len(temp)-6):
sum_7 = np.array([])
avg_7 = 0
missing = 0
total = 7
j = 0
for j in range(i,i+7):
if pd.isnull(temp[j]):
total -= 1
missing += 1
if missing == 7:
moving_average = np.append(moving_average, np.nan)
break
if not pd.isnull(temp[j]):
sum_7 = np.append(sum_7, temp[j])
if j == (i+6):
avg_7 = sum(sum_7)/total
moving_average = np.append(moving_average, avg_7)
If I run this and look at the value of sum_7, it's just a single value in the numpy array which made all the moving_average values wrong. But if I remove the first for loop with the variable i and manually set i = 0 or any number in the range of the data set and run the exact same code from the inner for loop, sum_7 comes out as a length 7 numpy array. Originally, I just did sum += temp[j] but the same problem occurred, the total sum ended up as just the single value.
I've been staring at this trying to fix it for 3 hours and I'm clueless what's wrong. Originally I wrote the function in R so all I had to do was convert to python language and I don't know why sum_7 is coming up as a single value when there are two for loops. I tried to manually add an index variable to act as i to use it in the range(i, i+7) but got some weird error instead. I also don't know why that is.
https://gyazo.com/d900d1d7917074f336567b971c8a5cee
https://gyazo.com/132733df8bbdaf2847944d1be02e57d2
Hey you can using rolling() function and mean() function from pandas.
Link to the documentation :
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.rolling.html
df['moving_avg'] = df['your_column'].rolling(7).mean()
This would give you some NaN values also, but that is a part of rolling mean because you don't have all past 7 data points for first 6 values.
Seems like you misindented the important line:
moving_average = np.array([])
i = 0
for i in range(len(temp)-6):
sum_7 = np.array([])
avg_7 = 0
missing = 0
total = 7
j = 0
for j in range(i,i+7):
if pd.isnull(temp[j]):
total -= 1
missing += 1
if missing == 7:
moving_average = np.append(moving_average, np.nan)
break
# The following condition should be indented one more level
if not pd.isnull(temp[j]):
sum_7 = np.append(sum_7, temp[j])
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
if j == (i+6):
# this ^ condition does not do what you meant
# you should use a flag instead
avg_7 = sum(sum_7)/total
moving_average = np.append(moving_average, avg_7)
Instead of a flag you can use a for-else construct, but this is not readable. Here's the relevant documentation.
Shorter way to do this:
moving_average = np.array([])
for i in range(len(temp)-6):
ngram_7 = [t for t in temp[i:i+7] if not pd.isnull(t)]
average = (sum(ngram_7) / len(ngram_7)) if ngram_7 else np.nan
moving_average = np.append(moving_average, average)
This could be refactored further:
def average(ngram):
valid = [t for t in temp[i:i+7] if not pd.isnull(t)]
if not valid:
return np.nan
return sum(valid) / len(valid)
def ngrams(seq, n):
for i in range(len(seq) - n):
yield seq[i:i+n]
moving_average = [average(k) for k in ngrams(temp, 7)]

How to cycle through the index of an array?

line 14 is where my main problem is.i need to cycle through each item in the array and use it's index to determine whether or not it is a multiple of four so i can create proper spacing for binary numbers.
def decimalToBinary(hu):
bits = []
h = []
while hu > 0:
kla = hu%2
bits.append(kla)
hu = int(hu/2)
for i in reversed(bits):
h.append(i)
if len(h) <= 4:
print (''.join(map(str,h)))
else:
for j in range(len(h)):
h.index(1) = h.index(1)+1
if h.index % 4 != 0:
print (''.join(map(str,h)))
elif h.index % 4 == 0:
print (' '.join(map(str,h)))
decimalToBinary( 23 )
If what you're looking for is the index of the list from range(len(h)) in the for loop, then you can change that line to for idx,j in enumerate(range(len(h))): where idx is the index of the range.
This line h.index(1) = h.index(1)+1 is incorrect. Modified your function, so at least it executes and generates an output, but whether it is correct, i dont know. Anyway, hope it helps:
def decimalToBinary(hu):
bits = []
h = []
while hu > 0:
kla = hu%2
bits.append(kla)
hu = int(hu/2)
for i in reversed(bits):
h.append(i)
if len(h) <= 4:
print (''.join(map(str,h)))
else:
for j in range(len(h)):
h_index = h.index(1)+1 # use h_index variable instead of h.index(1)
if h_index % 4 != 0:
print (''.join(map(str,h)))
elif h_index % 4 == 0:
print (' '.join(map(str,h)))
decimalToBinary( 23 )
# get binary version to check your result against.
print(bin(23))
This results:
#outout from decimalToBinary
10111
10111
10111
10111
10111
#output from bin(23)
0b10111
You're trying to join the bits to string and separate them every 4 bits. You could modify your code with Marcin's correction (by replacing the syntax error line and do some other improvements), but I suggest doing it more "Pythonically".
Here's my version:
def decimalToBinary(hu):
bits = []
while hu > 0:
kla = hu%2
bits.append(kla)
hu = int(hu/2)
h = [''.join(map(str, bits[i:i+4])) for i in range(0,len(bits),4)]
bu = ' '.join(h)
print bu[::-1]
Explanation for the h assignment line:
range(0,len(bits),4): a list from 0 to length of bits with step = 4, eg. [0, 4, 8, ...]
[bits[i:i+4] for i in [0, 4, 8]: a list of lists whose element is every four elements from bits
eg. [ [1,0,1,0], [0,1,0,1] ...]
[''.join(map(str, bits[i:i+4])) for i in range(0,len(bits),4)]: convert the inner list to string
bu[::-1]: reverse the string
If you are learning Python, it's good to do your way. As #roippi pointed out,
for index, value in enumerate(h):
will give you access to both index and value of member of h in each loop.
To group 4 digits, I would do like this:
def decimalToBinary(num):
binary = str(bin(num))[2:][::-1]
index = 0
spaced = ''
while index + 4 < len(binary):
spaced += binary[index:index+4]+' '
index += 4
else:
spaced += binary[index:]
return spaced[::-1]
print decimalToBinary(23)
The result is:
1 0111

Categories