How to speed up string splitting in Python

How to speed up string splitting in Python - python

I am receiving data from an Arduino with a Tkinter GUI and need to receive 8 different values at 20 samples per second and graph them. I am plotting 4 on one graph and 4 on another graph. The code on the Arduino side works fine and is sending at the correct rate using the following format.
Serial.println(String(val1) + "," + String(val2) + ...
On the Python side I am receiving and graphing like this:
def update_graph(self, i):
self.xdata.append(i)
while (self.arduinoData.inWaiting()==0):
pass
x = self.arduinoData.readline()
split_data = x.split(",")
print split_data[1]
self.ydata1.append(int(split_data[0]))
self.ydata2.append(int(split_data[1]))
self.ydata3.append(int(split_data[2]))
self.ydata4.append(int(split_data[3]))
self.ydata5.append(int(split_data[4]))
self.ydata6.append(int(split_data[5]))
self.ydata7.append(int(split_data[6]))
self.ydata8.append(int(split_data[7]))
self.line1.set_data(self.xdata, self.ydata1)
self.line2.set_data(self.xdata, self.ydata2)
self.line3.set_data(self.xdata, self.ydata3)
self.line4.set_data(self.xdata, self.ydata4)
self.ax1.set_ylim(min(self.ydata1), max(self.ydata4))
self.ax1.set_xlim(min(self.xdata), max(self.xdata))
self.line5.set_data(self.xdata, self.ydata5)
self.line6.set_data(self.xdata, self.ydata6)
self.line7.set_data(self.xdata, self.ydata7)
self.line8.set_data(self.xdata, self.ydata8)
self.ax2.set_ylim(min(self.ydata5), max(self.ydata8))
self.ax2.set_xlim(min(self.xdata), max(self.xdata))
if i >= self.points - 1:
self.running = False
self.ani = None
return self.line1,
return self.line2,
return self.line3,
return self.line4,
return self.line5,
return self.line6,
return self.line7,
return self.line8,
This has proved to be way too slow to keep up with the incoming data. Is there a faster way receive and parse the data?

I agree with #gre_gor that the parsing is not the slowest part. A while back I was doing a similar project and found that setting the arduino to a higher serial speed did the trick.
void setup(){
Serial.begin(115200);
}

Related

PyQt Freezing while many graphs plotting

def initPlots(self):
print("init_Plots")
for S in range(self.sensor_num):
globals()["self.plot{0}".format(S)] = pg.PlotWidget()
globals()["self.plot{0}".format(S)].setYRange(-30,30)
self.layout.addWidget(globals()["self.plot{0}".format(S)],S//5,S%5)
def showPlot(self,number,x_axis):
print("init_showPlot")
'''x_axis : time(s) / y_axis : data(rgb)'''
globals()["self.plot{0}".format(number)].clear()
globals()["self.plot{0}".format(number)].plot(x=x_axis,y=self.y_domain_r[number],
pen=pg.mkPen(width=2,color='r'),name="sensor_"+str(number)) # R
globals()["self.plot{0}".format(number)].plot(x=x_axis,y=self.y_domain_g[number],
pen=pg.mkPen(width=2,color='g'),name="sensor_"+str(number)) # G
globals()["self.plot{0}".format(number)].plot(x=x_axis,y=self.y_domain_b[number],
pen=pg.mkPen(width=2,color='b'),name="sensor_"+str(number)) # B
#QtCore.pyqtSlot(np.ndarray)
def run(self, arr):
print("init_run(plot)")
self.time += 1
self.data = arr
self.t_domain.append(self.time)
for S in range(self.sensor_num):
self.y_domain_r[S].append(self.data[S][0])
self.y_domain_g[S].append(self.data[S][1])
self.y_domain_b[S].append(self.data[S][2])
self.showPlot(S,self.t_domain)
This is a program that receives the rgb change value of an image every frame and displays a graph for that value.
np.ndarray of delta RGBs Signal -> Slot
This program often shuts down when I press run. How to reduce memory usage?
Is that okay with that I'm using many globals()?
Is it an inevitable flaw in Python?
help me guys~

For anyone who see this, I delete my globals and I made a plot list.
It worked for me.

make big process on graph with python parallelised

i'm working on graphs and big dataset of complex network's. i run SIR algorithm on them with ndlib library.
but each iteration takes something like 1Sec and it make code takes 10-12 h to complete .
i was wondering is there any way to make it parallelised ?
the code is like down bellow
this line of the code is core :
sir = model.infected_SIR_MODEL(it, infectionList, False)
is there any simple method to make it run on multi thread or parallelised ?
count = 500
for i in numpy.arange(1, count, 1):
for it in model.get_nodes():
sir = model.infected_SIR_MODEL(it, infectionList, False)
each iteration :
for u in self.graph.nodes():
u_status = self.status[u]
eventp = np.random.random_sample()
neighbors = self.graph.neighbors(u)
if isinstance(self.graph, nx.DiGraph):
neighbors = self.graph.predecessors(u)
if u_status == 0:
infected_neighbors = len([v for v in neighbors if self.status[v] == 1])
if eventp < self.BetaList[u] * infected_neighbors:
actual_status[u] = 1
elif u_status == 1:
if eventp < self.params['model']['gamma']:
actual_status[u] = 2

So, if the iterations are independent, then I don't see the point of iteration over count=500. Either way the multiprocessing library might be of interest to you.
I've prepared 2 stub solutions (i.e. alter to your exact needs).
The first expects that every input is static (the changes in solutions as far as I understand the OP's question raise from the random state generation inside each iteration). With the second, you can update the input data between iterations of i. I've not tried the code as I don't have the model so it might not work directly.
import multiprocessing as mp
# if everything is independent (eg. "infectionList" is static and does not change during the iterations)
def worker(model, infectionList):
sirs = []
for it in model.get_nodes():
sir = model.infected_SIR_MODEL(it, infectionList, False)
sirs.append(sir)
return sirs
count = 500
infectionList = []
model = "YOUR MODEL INSTANCE"
data = [(model, infectionList) for _ in range(1, count+1)]
with mp.Pool() as pool:
results = pool.starmap(worker, data)
The second proposed solution if "infectionList" or something else gets updated in each iteration of "i":
def worker2(model, it, infectionList):
sir = model.infected_SIR_MODEL(it, infectionList, False)
return sir
with mp.Pool() as pool:
for i in range(1, count+1):
data = [(model, it, infectionList) for it in model.get_nodes()]
results = pool.starmap(worker2, data)
# process results, update something go to next iteration....
Edit: Updated the answer to separate proposals more clearly.

Why does my association model find subgroups in a dataset when there shouldn't any?

I give a lot of information on the methods that I used to write my code. If you just want to read my question, skip to the quotes at the end.
I'm working on a project that has a goal of detecting sub populations in a group of patients. I thought this sounded like the perfect opportunity to use association rule mining as I'm currently taking a class on the subject.
I there are 42 variables in total. Of those, 20 are continuous and had to be discretized. For each variable, I used the Freedman-Diaconis rule to determine how many categories to divide a group into.
def Freedman_Diaconis(column_values):
#sort the list first
column_values[1].sort()
first_quartile = int(len(column_values[1]) * .25)
third_quartile = int(len(column_values[1]) * .75)
fq_value = column_values[1][first_quartile]
tq_value = column_values[1][third_quartile]
iqr = tq_value - fq_value
n_to_pow = len(column_values[1])**(-1/3)
h = 2 * iqr * n_to_pow
retval = (column_values[1][-1] - column_values[1][1])/h
test = int(retval+1)
return test
From there I used min-max normalization
def min_max_transform(column_of_data, num_bins):
min_max_normalizer = preprocessing.MinMaxScaler(feature_range=(1, num_bins))
data_min_max = min_max_normalizer.fit_transform(column_of_data[1])
data_min_max_ints = take_int(data_min_max)
return data_min_max_ints
to transform my data and then I simply took the interger portion to get the final categorization.
def take_int(list_of_float):
ints = []
for flt in list_of_float:
asint = int(flt)
ints.append(asint)
return ints
I then also wrote a function that I used to combine this value with the variable name.
def string_transform(prefix, column, index):
transformed_list = []
transformed = ""
if index < 4:
for entry in column[1]:
transformed = prefix+str(entry)
transformed_list.append(transformed)
else:
prefix_num = prefix.split('x')
for entry in column[1]:
transformed = str(prefix_num[1])+'x'+str(entry)
transformed_list.append(transformed)
return transformed_list
This was done to differentiate variables that have the same value, but appear in different columns. For example, having a value of 1 for variable x14 means something different from getting a value of 1 in variable x20. The string transform function would create 14x1 and 20x1 for the previously mentioned examples.
After this, I wrote everything to a file in basket format
def create_basket(list_of_lists, headers):
#for filename in os.listdir("."):
# if filename.e
if not os.path.exists('baskets'):
os.makedirs('baskets')
down_length = len(list_of_lists[0])
with open('baskets/dataset.basket', 'w') as basketfile:
basket_writer = csv.DictWriter(basketfile, fieldnames=headers)
for i in range(0, down_length):
basket_writer.writerow({"trt": list_of_lists[0][i], "y": list_of_lists[1][i], "x1": list_of_lists[2][i],
"x2": list_of_lists[3][i], "x3": list_of_lists[4][i], "x4": list_of_lists[5][i],
"x5": list_of_lists[6][i], "x6": list_of_lists[7][i], "x7": list_of_lists[8][i],
"x8": list_of_lists[9][i], "x9": list_of_lists[10][i], "x10": list_of_lists[11][i],
"x11": list_of_lists[12][i], "x12":list_of_lists[13][i], "x13": list_of_lists[14][i],
"x14": list_of_lists[15][i], "x15": list_of_lists[16][i], "x16": list_of_lists[17][i],
"x17": list_of_lists[18][i], "x18": list_of_lists[19][i], "x19": list_of_lists[20][i],
"x20": list_of_lists[21][i], "x21": list_of_lists[22][i], "x22": list_of_lists[23][i],
"x23": list_of_lists[24][i], "x24": list_of_lists[25][i], "x25": list_of_lists[26][i],
"x26": list_of_lists[27][i], "x27": list_of_lists[28][i], "x28": list_of_lists[29][i],
"x29": list_of_lists[30][i], "x30": list_of_lists[31][i], "x31": list_of_lists[32][i],
"x32": list_of_lists[33][i], "x33": list_of_lists[34][i], "x34": list_of_lists[35][i],
"x35": list_of_lists[36][i], "x36": list_of_lists[37][i], "x37": list_of_lists[38][i],
"x38": list_of_lists[39][i], "x39": list_of_lists[40][i], "x40": list_of_lists[41][i]})
and I used the apriori package in Orange to see if there were any association rules.
rules = Orange.associate.AssociationRulesSparseInducer(patient_basket, support=0.3, confidence=0.3)
print "%4s %4s %s" % ("Supp", "Conf", "Rule")
for r in rules:
my_rule = str(r)
split_rule = my_rule.split("->")
if 'trt' in split_rule[1]:
print 'treatment rule'
print "%4.1f %4.1f %s" % (r.support, r.confidence, r)
Using this, technique I found quite a few association rules with my testing data.
THIS IS WHERE I HAVE A PROBLEM
When I read the notes for the training data, there is this note
...That is, the only
reason for the differences among observed responses to the same treatment across patients is
random noise. Hence, there is NO meaningful subgroup for this dataset...
My question is,
why do I get multiple association rules that would imply that there are subgroups, when according to the notes I shouldn't see anything?
I'm getting lift numbers that are above 2 as opposed to the 1 that you should expect if everything was random like the notes state.
Supp Conf Rule
0.3 0.7 6x0 -> trt1
Even though my code runs, I'm not getting results anywhere close to what should be expected. This leads me to believe that I messed something up, but I'm not sure what it is.

After some research, I realized that my sample size is too small for the number of variables that I have. I would need a way larger sample size in order to really use the method that I was using. In fact, the method that I tried to use was developed with the assumption that it would be run on databases with hundreds of thousands or millions of rows.

Building AIS Messages Decoder

I used to decode AIS messages with theis package (Python) https://github.com/schwehr/noaadata/tree/master/ais until I started getting a new format of the messages.
As you may know, AIS messages come in two types mostly. one part (one message) or two parts (multi message). Message#5 is always comes in two parts. example:
!AIVDM,2,1,1,A,55?MbV02;H;s<HtKR20EHE:address#hidden#Dn2222222216L961O5Gf0NSQEp6ClRp8,0*1C
!AIVDM,2,2,1,A,88888888880,2*25
I used to decode this just fine using the following piece of code:
nmeamsg = fields.split(',')
if nmeamsg[0] != '!AIVDM':
return
total = eval(nmeamsg[1])
part = eval(nmeamsg[2])
aismsg = nmeamsg[5]
nmeastring = string.join(nmeamsg[0:-1],',')
bv = binary.ais6tobitvec(aismsg)
msgnum = int(bv[0:6])
--
elif (total>1):
# Multi Slot Messages: 5,6,8,12,14,17,19,20?,21,24,26
global multimsg
if total==2:
if msgnum==5:
if nmeastring.count('!AIVDM')==2 and len(nmeamsg)==13: # make sure there are two parts concatenated together
aismsg = nmeamsg[5]+nmeamsg[11]
bv = binary.ais6tobitvec(aismsg)
msg5 = ais_msg_5.decode(bv)
print "message5 :",msg5
return msg5
Now I'm getting a new format of the messages:
!SAVDM,2,1,7,A,55#0hd01sq`pQ3W?O81L5#E:1=0U8U#000000016000006H0004m8523k#Dp,0*2A,1410825672
!SAVDM,2,2,7,A,4hC`2U#C`40,2*76,1410825672,1410825673
Note. the number at the last index is the time in epoch format
I tried to adjust my code to decode this new format. I succeed in decoding messages with one part. My problem is multi message type.
nmeamsg = fields.split(',')
if nmeamsg[0] != '!AIVDM' and nmeamsg[0] != '!SAVDM':
return
total = eval(nmeamsg[1])
part = eval(nmeamsg[2])
aismsg = nmeamsg[5]
nmeastring = string.join(nmeamsg[0:-1],',')
dbtimestring = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(float(nmeamsg[7])))
bv = binary.ais6tobitvec(aismsg)
msgnum = int(bv[0:6])
Decoder can't bring the two lines as one. So decoding fails because message#5 should contain two strings not one. The error i get is in these lines:
if nmeastring.count('!SAVDM')==2 and len(nmeamsg)==13:
aismsg = nmeamsg[5]+nmeamsg[11]
Where len(nmeamsg) is always 8 (second line) and nmeastring.count('!SAVDM') is always 1
I hope I explained this clearly so someone can let me know what I'm missing here.
UPDATE
Okay I think I found the reason. I pass messages from file to script line by line:
for line in file:
i=i+1
try:
doais(line)
Where message#5 should be passed as two lines. Any idea on how can I accomplish that?
UPDATE
I did it by modifying the code a little bit:
for line in file:
i=i+1
try:
nmeamsg = line.split(',')
aismsg = nmeamsg[5]
bv = binary.ais6tobitvec(aismsg)
msgnum = int(bv[0:6])
print msgnum
if nmeamsg[0] != '!AIVDM' and nmeamsg[0] != '!SAVDM':
print "wrong format"
total = eval(nmeamsg[1])
if total == 1:
dbtimestring = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(float(nmeamsg[8])))
doais(line,msgnum,dbtimestring,aismsg)
if total == 2: #Multi-line messages
lines= line+file.next()
nmeamsg = lines.split(',')
dbtimestring = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(float(nmeamsg[15])))
aismsg = nmeamsg[5]+nmeamsg[12]
doais(lines,msgnum,dbtimestring,aismsg)

Be aware that noaadata is my old research code. libais is my production library thst is in use for NOAA's ERMA and WhaleAlert.
I usually make decoding a two pass process. First join multi-line messages. I refer to this as normalization (ais_normalize.py). You have several issues in this step. First the two component lines have different timestamps on the right of the second string. By the USCG old metadata standard, the last one matters. So my code will assume that these two lines are not related. Second, you don't have the required station id field.
Where are you getting the SA from in SAVDM? What device ("talker" in the NMEA vocab) is receiving these messages?

If you're in Ruby, I can recommend the NMEA and AIS decoder ruby gem that I wrote, available on github. It's based on the unofficial AIS spec at catb.org which is maintained by one of Kurt's colleagues.
It handles combining of multipart messages, reads from streams, and supports a large of NMEA and AIS messages. Decoding the 50 binary subtypes of AIS messages 6 and 8 is presently in development.
To handle the nonstandard lines you posted:
!SAVDM,2,1,7,A,55#0hd01sq`pQ3W?O81L5#E:1=0U8U#000000016000006H0004m8523k#Dp,0*2A,1410825672
!SAVDM,2,2,7,A,4hC`2U#C`40,2*76,1410825672,1410825673
It would be necessary to add a new parse rule that accepts fields after the checksum, but aside from that it should go smoothly. In other words, you'd copy the parser line here:
| BANG DATA CSUM { result = NMEAPlus::AISMessageFactory.create(val[0], val[1], val[2]) }
and have something like
| BANG DATA CSUM COMMA DATA { result = NMEAPlus::AISMessageFactory.create(val[0], val[1], val[2], val[4]) }
What do you do with those extra timestamp(s)? It almost looks like they've been appended by whatever software is doing the logging, rather than being part of the actual message.

depth-first algorithm in python does not work

I have some project which I decide to do in Python. In brief: I have list of lists. Each of them also have lists, sometimes one-element, sometimes more. It looks like this:
rules=[
[[1],[2],[3,4,5],[4],[5],[7]]
[[1],[8],[3,7,8],[3],[45],[12]]
[[31],[12],[43,24,57],[47],[2],[43]]
]
The point is to compare values from numpy array to values from this rules (elements of rules table). We are comparing some [x][y] point to first element (e.g. 1 in first element), then, if it is true, value [x-1][j] from array with second from list and so on. Five first comparisons must be true to change value of [x][y] point. I've wrote sth like this (main function is SimulateLoop, order are switched because simulate2 function was written after second one):
def simulate2(self, i, j, w, rule):
data = Data(rule)
if w.world[i][j] in data.c:
if w.world[i-1][j] in data.n:
if w.world[i][j+1] in data.e:
if w.world[i+1][j] in data.s:
if w.world[i][j-1] in data.w:
w.world[i][j] = data.cc[0]
else: return
else: return
else: return
else: return
else: return
def SimulateLoop(self,w):
for z in range(w.steps):
for i in range(2,w.x-1):
for j in range(2,w.y-1):
for rule in w.rules:
self.simulate2(i,j,w,rule)
Data class:
class Data:
def __init__(self, rule):
self.c = rule[0]
self.n = rule[1]
self.e = rule[2]
self.s = rule[3]
self.w = rule[4]
self.cc = rule[5]
NumPy array is a object from World class. Rules is list as described above, parsed by function obtained from another program (GPL License).
To be honest it seems to work fine, but it does not. I was trying other possibilities, without luck. It is working, interpreter doesn't return any errors, but somehow values in array changing wrong. Rules are good because it was provided by program from which I've obtained parser for it (GPL license).
Maybe it will be helpful - it is Perrier's Loop, modified Langton's loop (artificial life).
Will be very thankful for any help!
)

I am not familiar with Perrier's Loop, but if you code something like famous "game life" you would have done simple mistake: store the next generation in the same array thus corrupting it.
Normally you store the next generation in temporary array and do copy/swap after the sweep, like in this sketch:
def do_step_in_game_life(world):
next_gen = zeros(world.shape) # <<< Tmp array here
Nx, Ny = world.shape
for i in range(1, Nx-1):
for j in range(1, Ny-1):
neighbours = sum(world[i-1:i+2, j-1:j+2]) - world[i,j]
if neighbours < 3:
next_gen[i,j] = 0
elif ...
world[:,:] = next_gen[:,:] # <<< Saving computed next generation

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to speed up string splitting in Python - python

I agree with #gre_gor that the parsing is not the slowest part. A while back I was doing a similar project and found that setting the arduino to a higher serial speed did the trick. void setup(){ Serial.begin(115200); }

Related

PyQt Freezing while many graphs plotting

make big process on graph with python parallelised

Why does my association model find subgroups in a dataset when there shouldn't any?

Building AIS Messages Decoder

depth-first algorithm in python does not work

Categories

Resources