I have the following functions in python:
def extractParam(word, varName, stringToReplace):
if word.startswith(stringToReplace):
varName=int (word.replace(stringToReplace, ''))
return varName
def getMParams(line):
l = r = b = t = 0
sline = line.strip().split()
for i in range(len(sline)):
l = extractParam(sline[i], l, "l=")
r = extractParam(sline[i], r, "r=")
b = extractParam(sline[i], b, "b=")
t = extractParam(sline[i], t, "t=")
return l, r, b, t
def getIterParams (line):
width = height = stride = x = y = 0
sline = line.strip().split()
for i in range(len(sline)):
width = extractParam(sline[i], width, "width=")
height = extractParam(sline[i], height,"height=")
stride = extractParam(sline[i], stride,"stride=")
x = extractParam(sline[i], x, "x=")
y = extractParam(sline[i], y, "y=")
return width, height , stride, x, y
the functions getMparams and getIterParams are quite the same, my question is if there's a way to create a function that will replace both of them, I was thinking about something like that:
def func (line, params)
//params is an array of parameters (i.e [l,r,b,t] or [width,height,stride,x,y])
//init all params
sline = line.strip().split()
for i in range(len(sline)):
//for everyParam:
param = extractParam(sline[i],param,"param=")
is it possible? or there's another way to do it?
First off, some style points:
The way you handle varName in extractParam is ugly and confusing. It took me a while to figure out what you are trying to do (i.e., allow for the fact that extractParam might not find any data). For now, this is better handled directly instead of trying to call out to a function.
That range(len( thing you're doing has to stop.
You do not need to strip the line before splitting it - any leading and trailing whitespace will disappear during the splitting operation. You will not end up with any extra empty strings in the result.
The name sline is just plain ugly. You've split the line up into words; why not refer to the words as, well, words? (And in any case, don't use abbreviations and jumble things up. Otherwise you get things like sline that are not actually words.)
We don't use namesLikeThis for functions (or anything else; although we do use NamesLikeThis, with a starting capital letter, for classes) in Python. We use names_like_this.
Also, it looks as though you are repeatedly trying to replace
That said, your proposed approach is fine. Note that since we don't know ahead of time how many items will be extracted, we can't just toss each one into a separate variable. But we can solve this easily by returning a dict.
My approach is as follows: I iterate over the names, and for each attempt to find the corresponding word in the line. Upon finding one, I replace the default value of 0, and after this check is done, I insert the corresponding key-value pair into the returned result. I also take a simpler approach to cutting up the word around the equals sign.
def extract_params(line, names):
words = line.split()
result = {}
for name in names:
value = 0
for word in words:
maybe_name, equals, maybe_value = word.partition('=')
if maybe_name == name and equals == '=':
value = maybe_value
result[name] = value
return result
This could potentially be improved quite a bit, but much depends on your exact specifications. I tried to create something that follows your basic logic as closely as possible.
Related
I'm trying to use Python to call an API and clean a bunch of strings that represent a movie budget.
So far, I have the following 6 variants of data that come up.
"$1.2 million"
"$1,433,333"
"US$ 2 million"
"US$1,644,736 (est.)
"$6-7 million"
"£3 million"
So far, I've only gotten 1 and 2 parsed without a problem with the following code below. What is the best way to handle all of the other cases or a general case that may not be listed below?
def clean_budget_string(input_string):
number_to_integer = {'million' : 1000000, 'thousand' : 1000}
budget_parts = input_string.split(' ')
#Currently, only indices 0 and 1 are necessary for computation
text_part = budget_parts[1]
if text_part in number_to_integer:
number = budget_parts[0].lstrip('$')
int_representation = number_to_integer[text_part]
return int(float(number) * int_representation)
else:
number = budget_parts[0]
idx_dollar = 0
for idx in xrange(len(number)):
if number[idx] == '$':
idx_dollar = idx
return int(number[idx_dollar+1:].replace(',', ''))
The way I would approach a parsing task like this -- and I'm happy to hear other opinions -- would be to break up your function into several parts, each of which identify a single piece of information in the input string.
For instance, I'd start by identifying what float number can be parsed from the string, ignoring currency and order of magnitude (a million, a thousand) for now :
f = float(''.join([c for c in input_str if c in '0123456789.']))
(you might want to add error handling for when you end up with a trailing dot, because of additions like 'est.')
Then, in a second step, you determine whether the float needs to be multiplied to adjust for the correct order of magnitude. One way of doing this would be with multiple if-statements :
if 'million' in input_str :
oom = 6
elif 'thousand' in input_str :
oom = 3
else :
oom = 1
# adjust number for order of magnitude
f = f*math.pow(10, oom)
Those checks could of course be improved to account for small differences in formatting by using regular expressions.
Finally, you separately determine the currency mentioned in your input string, again using one or more if-statements :
if '£' in input_str :
currency = 'GBP'
else :
currency = 'USD'
Now the one case that this doesn't yet handle is the dash one where lower and upper estimates are given. One way of making the function work with these inputs is to split the initial input string on the dash and use the first (or second) of the substrings as input for the initial float parsing. So we would replace our first line of code with something like this:
if '-' in input_str :
lower = input_str.split('-')[0]
f = float(''.join([c for c in lower if c in '0123456789.']))
else :
f = float(''.join([c for c in input_str if c in '0123456789.']))
using regex and string replace method, i added the return of the curency as well if needed.
Modify accordingly to handle more input or multiplier like billion etc.
import re
# take in string and return integer amount and currency
def clean_budget_string(s):
mult_dict = {'million':1000000,'thousand':1000}
tmp = re.search('(^\D*?)\s*((?:\d+\.?,?)+)(?:-\d+)?\s*((?:million|thousand)?)', s).groups()
currency = tmp[0]
mult = tmp[-1]
tmp_int = ''.join(tmp[1:-1]).replace(',', '') # join digits and multiplier, remove comma
tmp_int = int(float(tmp_int) * mult_dict.get(mult, 1))
return tmp_int, currency
>>? clean_budget_string("$1.2 million")
(1200000, '$')
>>? clean_budget_string("$1,433,333")
(1433333, '$')
>>? clean_budget_string("US$ 2 million")
(2000000, 'US$')
>>? clean_budget_string("US$1,644,736 (est.)")
(1644736, 'US$')
>>? clean_budget_string("$6-7 million")
(6000000, '$')
>>? clean_budget_string("£3 million")
(3000000, '£') # my script don't recognize the £ char, might need to set the encoding properly
I'm currently working with DNA sequence data and I have run into a bit of a performance roadblock.
I have two lookup dictionaries/hashes (as RDDs) with DNA "words" (short sequences) as keys and a list of index positions as the value. One is for a shorter query sequence and the other for a database sequence. Creating the tables is pretty fast even with very, very large sequences.
For the next step, I need to pair these up and find "hits" (pairs of index positions for each common word).
I first join the lookup dictionaries, which is reasonably fast. However, I now need the pairs, so I have to flatmap twice, once to expand the list of indices from the query and the second time to expand the list of indices from the database. This isn't ideal, but I don't see another way to do it. At least it performs ok.
The output at this point is: (query_index, (word_length, diagonal_offset)), where the diagonal offset is the database_sequence_index minus the query sequence index.
However, I now need to find pairs of indices on with the same diagonal offset (db_index - query_index) and reasonably close together and join them (so I increase the length of the word), but only as pairs (i.e. once I join one index with another, I don't want anything else to merge with it).
I do this with an aggregateByKey operation using a special object called Seed().
PARALELLISM = 16 # I have 4 cores with hyperthreading
def generateHsps(query_lookup_table_rdd, database_lookup_table_rdd):
global broadcastSequences
def mergeValueOp(seedlist, (query_index, seed_length)):
return seedlist.addSeed((query_index, seed_length))
def mergeSeedListsOp(seedlist1, seedlist2):
return seedlist1.mergeSeedListIntoSelf(seedlist2)
hits_rdd = (query_lookup_table_rdd.join(database_lookup_table_rdd)
.flatMap(lambda (word, (query_indices, db_indices)): [(query_index, db_indices) for query_index in query_indices], preservesPartitioning=True)
.flatMap(lambda (query_index, db_indices): [(db_index - query_index, (query_index, WORD_SIZE)) for db_index in db_indices], preservesPartitioning=True)
.aggregateByKey(SeedList(), mergeValueOp, mergeSeedListsOp, PARALLELISM)
.map(lambda (diagonal, seedlist): (diagonal, seedlist.mergedSeedList))
.flatMap(lambda (diagonal, seedlist): [(query_index, seed_length, diagonal) for query_index, seed_length in seedlist])
)
return hits_rdd
Seed():
class SeedList():
def __init__(self):
self.unmergedSeedList = []
self.mergedSeedList = []
#Try to find a more efficient way to do this
def addSeed(self, (query_index1, seed_length1)):
for i in range(0, len(self.unmergedSeedList)):
(query_index2, seed_length2) = self.unmergedSeedList[i]
#print "Checking ({0}, {1})".format(query_index2, seed_length2)
if min(abs(query_index2 + seed_length2 - query_index1), abs(query_index1 + seed_length1 - query_index2)) <= WINDOW_SIZE:
self.mergedSeedList.append((min(query_index1, query_index2), max(query_index1+seed_length1, query_index2+seed_length2)-min(query_index1, query_index2)))
self.unmergedSeedList.pop(i)
return self
self.unmergedSeedList.append((query_index1, seed_length1))
return self
def mergeSeedListIntoSelf(self, seedlist2):
print "merging seed"
for (query_index2, seed_length2) in seedlist2.unmergedSeedList:
wasmerged = False
for i in range(0, len(self.unmergedSeedList)):
(query_index1, seed_length1) = self.unmergedSeedList[i]
if min(abs(query_index2 + seed_length2 - query_index1), abs(query_index1 + seed_length1 - query_index2)) <= WINDOW_SIZE:
self.mergedSeedList.append((min(query_index1, query_index2), max(query_index1+seed_length1, query_index2+seed_length2)-min(query_index1, query_index2)))
self.unmergedSeedList.pop(i)
wasmerged = True
break
if not wasmerged:
self.unmergedSeedList.append((query_index2, seed_length2))
return self
This is where the performance really breaks down for even sequences of moderate length.
Is there any better way to do this aggregation? My gut feeling says yes, but I can't come up with it.
I know this is a very long winded and technical question, and I would really appreciate any insight even if there is no easy solution.
Edit: Here is how I am making the lookup tables:
def createLookupTable(sequence_rdd, sequence_name, word_length):
global broadcastSequences
blank_list = []
def addItemToList(lst, val):
lst.append(val)
return lst
def mergeLists(lst1, lst2):
#print "Merging"
return lst1+lst2
return (sequence_rdd
.flatMap(lambda seq_len: range(0, seq_len - word_length + 1))
.repartition(PARALLELISM)
#.partitionBy(PARALLELISM)
.map(lambda index: (str(broadcastSequences.value[sequence_name][index:index + word_length]), index), preservesPartitioning=True)
.aggregateByKey(blank_list, addItemToList, mergeLists, PARALLELISM))
#.map(lambda (word, indices): (word, sorted(indices))))
And here is the function that runs the whole operation:
def run(query_seq, database_sequence, translate_query=False):
global broadcastSequences
scoring_matrix = 'nucleotide' if isinstance(query_seq.alphabet, DNAAlphabet) else 'blosum62'
sequences = {'query': query_seq,
'database': database_sequence}
broadcastSequences = sc.broadcast(sequences)
query_rdd = sc.parallelize([len(query_seq)])
query_rdd.cache()
database_rdd = sc.parallelize([len(database_sequence)])
database_rdd.cache()
query_lookup_table_rdd = createLookupTable(query_rdd, 'query', WORD_SIZE)
query_lookup_table_rdd.cache()
database_lookup_table_rdd = createLookupTable(database_rdd, 'database', WORD_SIZE)
seeds_rdd = generateHsps(query_lookup_table_rdd, database_lookup_table_rdd)
return seeds_rdd
Edit 2: I have tweaked things a bit and slightly improved performance by replacing:
.flatMap(lambda (word, (query_indices, db_indices)): [(query_index, db_indices) for query_index in query_indices], preservesPartitioning=True)
.flatMap(lambda (query_index, db_indices): [(db_index - query_index, (query_index, WORD_SIZE)) for db_index in db_indices], preservesPartitioning=True)
in hits_rdd with:
.flatMap(lambda (word, (query_indices, db_indices)): itertools.product(query_indices, db_indices))
.map(lambda (query_index, db_index): (db_index - query_index, (query_index, WORD_SIZE) ))
At least now I'm not burning up storage with intermediate data structures as much.
Let's forget about the technical details of what your doing and think "functionally" about the steps involved, forgetting about the details of the implementation. Functional thinking like this is an important part of parallel data analysis; ideally if we can break the problem up like this, we can reason more clearly about the steps involved, and end up with clearer and often more concise. Thinking in terms of a tabular data model, I would consider your problem to consist of the following steps:
Join your two datasets on the sequence column.
Create a new column delta containing the difference between the indices.
Sort by (either) index to make sure that the subsequences are in the correct order.
Group by delta and concatenate the strings in the sequence column, to obtain the full matches between your datasets.
For the first 3 steps, I think it makes sense to use DataFrames, since this data model makes sense in my head of the kind processing that we're doing. (Actually I might use DataFrames for step 4 as well, except pyspark doesn't currently support custom aggregate functions for DataFrames, although Scala does).
For the fourth step (which is if I understand correctly what you're really asking about in your question), it's a little tricky to think about how to do this functionally, however I think an elegant and efficient solution is to use a reduce (also known a right fold); this pattern can be applied to any problem that you can phrase in terms of iteratively applying an associative binary function, that is a function where the "grouping" of any 3 arguments doesn't matter (although the order certainly may matter), Symbolically, this is a function x,y -> f(x,y) where f(x, f(y, z)) = f(f(x, y), z). String (or more generally list) concatenation is just such a function.
Here's an example of how this might look in pyspark; hopefully you can adapt this to the details of your data:
#setup some sample data
query = [['abcd', 30] ,['adab', 34] ,['dbab',38]]
reference = [['dbab', 20], ['ccdd', 24], ['abcd', 50], ['adab',54], ['dbab',58], ['dbab', 62]]
#create data frames
query_df = sqlContext.createDataFrame(query, schema = ['sequence1', 'index1'])
reference_df = sqlContext.createDataFrame(reference, schema = ['sequence2', 'index2'])
#step 1: join
matches = query_df.join(reference_df, query_df.sequence1 == reference_df.sequence2)
#step 2: calculate delta column
matches_delta = matches.withColumn('delta', matches.index2 - matches.index1)
#step 3: sort by index
matches_sorted = matches_delta.sort('delta').sort('index2')
#step 4: convert back to rdd and reduce
#note that + is just string concatenation for strings
r = matches_sorted['delta', 'sequence1'].rdd
r.reduceByKey(lambda x, y : x + y).collect()
#expected output:
#[(24, u'dbab'), (-18, u'dbab'), (20, u'abcdadabdbab')]
I am new to Python and I have a hard time solving this.
I am trying to sort a list to be able to human sort it 1) by the first number and 2) the second number. I would like to have something like this:
'1-1bird'
'1-1mouse'
'1-1nmouses'
'1-2mouse'
'1-2nmouses'
'1-3bird'
'10-1birds'
(...)
Those numbers can be from 1 to 99 ex: 99-99bird is possible.
This is the code I have after a couple of headaches. Being able to then sort by the following first letter would be a bonus.
Here is what I've tried:
#!/usr/bin/python
myList = list()
myList = ['1-10bird', '1-10mouse', '1-10nmouses', '1-10person', '1-10cat', '1-11bird', '1-11mouse', '1-11nmouses', '1-11person', '1-11cat', '1-12bird', '1-12mouse', '1-12nmouses', '1-12person', '1-13mouse', '1-13nmouses', '1-13person', '1-14bird', '1-14mouse', '1-14nmouses', '1-14person', '1-14cat', '1-15cat', '1-1bird', '1-1mouse', '1-1nmouses', '1-1person', '1-1cat', '1-2bird', '1-2mouse', '1-2nmouses', '1-2person', '1-2cat', '1-3bird', '1-3mouse', '1-3nmouses', '1-3person', '1-3cat', '2-14cat', '2-15cat', '2-16cat', '2-1bird', '2-1mouse', '2-1nmouses', '2-1person', '2-1cat', '2-2bird', '2-2mouse', '2-2nmouses', '2-2person']
def mysort(x,y):
x1=""
y1=""
for myletter in x :
if myletter.isdigit() or "-" in myletter:
x1=x1+myletter
x1 = x1.split("-")
for myletter in y :
if myletter.isdigit() or "-" in myletter:
y1=y1+myletter
y1 = y1.split("-")
if x1[0]>y1[0]:
return 1
elif x1[0]==y1[0]:
if x1[1]>y1[1]:
return 1
elif x1==y1:
return 0
else :
return -1
else :
return -1
myList.sort(mysort)
print myList
Thanks !
Martin
You have some good ideas with splitting on '-' and using isalpha() and isdigit(), but then we'll use those to create a function that takes in an item and returns a "clean" version of the item, which can be easily sorted. It will create a three-digit, zero-padded representation of the first number, then a similar thing with the second number, then the "word" portion (instead of just the first character). The result looks something like "001001bird" (that won't display - it'll just be used internally). The built-in function sorted() will use this callback function as a key, taking each element, passing it to the callback, and basing the sort order on the returned value. In the test, I use the * operator and the sep argument to print it without needing to construct a loop, but looping is perfectly fine as well.
def callback(item):
phrase = item.split('-')
first = phrase[0].rjust(3, '0')
second = ''.join(filter(str.isdigit, phrase[1])).rjust(3, '0')
word = ''.join(filter(str.isalpha, phrase[1]))
return first + second + word
Test:
>>> myList = ['1-10bird', '1-10mouse', '1-10nmouses', '1-10person', '1-10cat', '1-11bird', '1-11mouse', '1-11nmouses', '1-11person', '1-11cat', '1-12bird', '1-12mouse', '1-12nmouses', '1-12person', '1-13mouse', '1-13nmouses', '1-13person', '1-14bird', '1-14mouse', '1-14nmouses', '1-14person', '1-14cat', '1-15cat', '1-1bird', '1-1mouse', '1-1nmouses', '1-1person', '1-1cat', '1-2bird', '1-2mouse', '1-2nmouses', '1-2person', '1-2cat', '1-3bird', '1-3mouse', '1-3nmouses', '1-3person', '1-3cat', '2-14cat', '2-15cat', '2-16cat', '2-1bird', '2-1mouse', '2-1nmouses', '2-1person', '2-1cat', '2-2bird', '2-2mouse', '2-2nmouses', '2-2person']
>>> print(*sorted(myList, key=callback), sep='\n')
1-1bird
1-1cat
1-1mouse
1-1nmouses
1-1person
1-2bird
1-2cat
1-2mouse
1-2nmouses
1-2person
1-3bird
1-3cat
1-3mouse
1-3nmouses
1-3person
1-10bird
1-10cat
1-10mouse
1-10nmouses
1-10person
1-11bird
1-11cat
1-11mouse
1-11nmouses
1-11person
1-12bird
1-12mouse
1-12nmouses
1-12person
1-13mouse
1-13nmouses
1-13person
1-14bird
1-14cat
1-14mouse
1-14nmouses
1-14person
1-15cat
2-1bird
2-1cat
2-1mouse
2-1nmouses
2-1person
2-2bird
2-2mouse
2-2nmouses
2-2person
2-14cat
2-15cat
2-16cat
You need leading zeros. Strings are sorted alphabetically with the order different from the one for digits. It should be
'01-1bird'
'01-1mouse'
'01-1nmouses'
'01-2mouse'
'01-2nmouses'
'01-3bird'
'10-1birds'
As you you see 1 goes after 0.
The other answers here are very respectable, I'm sure, but for full credit you should ensure that your answer fits on a single line and uses as many list comprehensions as possible:
import itertools
[''.join(r) for r in sorted([[''.join(x) for _, x in
itertools.groupby(v, key=str.isdigit)]
for v in myList], key=lambda v: (int(v[0]), int(v[2]), v[3]))]
That should do nicely:
['1-1bird',
'1-1cat',
'1-1mouse',
'1-1nmouses',
'1-1person',
'1-2bird',
'1-2cat',
'1-2mouse',
...
'2-2person',
'2-14cat',
'2-15cat',
'2-16cat']
I wrote a script in Python removing tabs/blank spaces between two columns of strings (x,y coordinates) plus separating the columns by a comma and listing the maximum and minimum values of each column (2 values for each the x and y coordinates). E.g.:
100000.00 60000.00
200000.00 63000.00
300000.00 62000.00
400000.00 61000.00
500000.00 64000.00
became:
100000.00,60000.00
200000.00,63000.00
300000.00,62000.00
400000.00,61000.00
500000.00,64000.00
10000000 50000000 60000000 640000000
This is the code I used:
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_new.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = s.split()
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
s = input.readline()
input.close()
output.close()
I need to change the above code to also transform the coordinates from two decimal to one decimal values and each of the two new columns to be sorted in ascending order based on the values of the x coordinate (left column).
I started by writing the following but not only is it not sorting the values, it is placing the y coordinates on the left and the x on the right. In addition I don't know how to transform the decimals since the values are strings and the only function I know is using %f and that needs floats. Any suggestions to improve the code below?
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_sorted.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = string.split(s)
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
sorted(s, key=lambda x: x[o])
s = input.readline()
input.close()
output.close()
thanks!
First, try to format your code according to PEP8—it'll be easier to read. (I've done the cleanup in your post already).
Second, Tim is right in that you should try to learn how to write your code as (idiomatic) Python not just as if translated directly from its C equivalent.
As a starting point, I'll post your 2nd snippet here, refactored as idiomatic Python:
# there is no need to import the `string` module; `.strip()` is a built-in
# method of strings (i.e. objects of type `str`).
# read in the data as a list of pairs of raw (i.e. unparsed) coordinates in
# string form:
with open(r'C:\coordinates.txt') as in_file:
coords_raw = [line.strip().split() for line in in_file.readlines()]
# convert the raw list into a list of pairs (2-tuples) containing the parsed
# (i.e. float not string) data:
coord_pairs = [(float(x_raw), float(y_raw)) for x_raw, y_raw in coords_raw]
coord_pairs.sort() # you want to sort the entire data set, not just values on
# individual lines as in your original snippet
# build a list of all x and y values we have (this could be done in one line
# using some `zip()` hackery, but I'd like to keep it readable (for you at
# least)):
all_xs = [x for x, y in coord_pairs]
all_ys = [y for x, y in coord_pairs]
# compute min and max:
x_min, x_max = min(all_xs), max(all_xs)
y_min, y_max = min(all_ys), max(all_ys)
# NOTE: the above section performs well for small data sets; for large ones, you
# should combine the 4 lines in a single for loop so as to NOT have to read
# everything to memory and iterate over the data 6 times.
# write everything out
with open(r'C:\coordinates_sorted.txt', 'wb') as out_file:
# here, we're doing 3 things in one line:
# * iterate over all coordinate pairs and convert the pairs to the string
# form
# * join the string forms with a newline character
# * write the result of the join+iterate expression to the file
out_file.write('\n'.join('%f,%f' % (x, y) for x, y in coord_pairs))
out_file.write('\n\n')
out_file.write('%f %f %f %f' % (x_min, x_max, y_min, y_max))
with open(...) as <var_name> gives you guaranteed closing of the file handle as with try-finally; also, it's shorter than open(...) and .close() on separate lines. Also, with can be used for other purposes, but is commonly used for dealing with files. I suggest you look up how to use try-finally as well as with/context managers in Python, in addition to everything else you might have learned here.
Your code looks more like C than like Python; it is quite unidiomatic. I suggest you read the Python tutorial to find some inspiration. For example, iterating using a while loop is usually the wrong approach. The string module is deprecated for the most part, <> should be !=, you don't need to call str() on an object that's already a string...
Then, there are some errors. For example, sorted() returns a sorted version of the iterable you're passing - you need to assign that to something, or the result will be discarded. But you're calling it on a string, anyway, which won't give you the desired result. You also wrote x[o] where you clearly meant x[0].
You should be using something like this (assuming Python 2):
with open(r'C:\coordinates.txt') as infile:
values = []
for line in infile:
values.append(map(float, line.split()))
values.sort()
with open(r'C:\coordinates_sorted.txt', 'w') as outfile:
for value in values:
outfile.write("{:.1f},{:.1f}\n".format(*value))
I have some project which I decide to do in Python. In brief: I have list of lists. Each of them also have lists, sometimes one-element, sometimes more. It looks like this:
rules=[
[[1],[2],[3,4,5],[4],[5],[7]]
[[1],[8],[3,7,8],[3],[45],[12]]
[[31],[12],[43,24,57],[47],[2],[43]]
]
The point is to compare values from numpy array to values from this rules (elements of rules table). We are comparing some [x][y] point to first element (e.g. 1 in first element), then, if it is true, value [x-1][j] from array with second from list and so on. Five first comparisons must be true to change value of [x][y] point. I've wrote sth like this (main function is SimulateLoop, order are switched because simulate2 function was written after second one):
def simulate2(self, i, j, w, rule):
data = Data(rule)
if w.world[i][j] in data.c:
if w.world[i-1][j] in data.n:
if w.world[i][j+1] in data.e:
if w.world[i+1][j] in data.s:
if w.world[i][j-1] in data.w:
w.world[i][j] = data.cc[0]
else: return
else: return
else: return
else: return
else: return
def SimulateLoop(self,w):
for z in range(w.steps):
for i in range(2,w.x-1):
for j in range(2,w.y-1):
for rule in w.rules:
self.simulate2(i,j,w,rule)
Data class:
class Data:
def __init__(self, rule):
self.c = rule[0]
self.n = rule[1]
self.e = rule[2]
self.s = rule[3]
self.w = rule[4]
self.cc = rule[5]
NumPy array is a object from World class. Rules is list as described above, parsed by function obtained from another program (GPL License).
To be honest it seems to work fine, but it does not. I was trying other possibilities, without luck. It is working, interpreter doesn't return any errors, but somehow values in array changing wrong. Rules are good because it was provided by program from which I've obtained parser for it (GPL license).
Maybe it will be helpful - it is Perrier's Loop, modified Langton's loop (artificial life).
Will be very thankful for any help!
)
I am not familiar with Perrier's Loop, but if you code something like famous "game life" you would have done simple mistake: store the next generation in the same array thus corrupting it.
Normally you store the next generation in temporary array and do copy/swap after the sweep, like in this sketch:
def do_step_in_game_life(world):
next_gen = zeros(world.shape) # <<< Tmp array here
Nx, Ny = world.shape
for i in range(1, Nx-1):
for j in range(1, Ny-1):
neighbours = sum(world[i-1:i+2, j-1:j+2]) - world[i,j]
if neighbours < 3:
next_gen[i,j] = 0
elif ...
world[:,:] = next_gen[:,:] # <<< Saving computed next generation