Overlapping writing tasks in multiprocessing (pool.map) - python

I'm facing overlapping writing problems when using multiprocessing when running the following piece of code.
def spectrum(i):
for j in range (num_x):
coordinate = data[:,j,i]
filtered = filter(lambda a: a != 0, coordinate)
occupancy = float(len(filtered))/framespfile
if filtered == [] or filtered[0] > 500:
output = str([j, i]) + "\n" + str(filtered) + "\n"
badpixelfile.write(output)
else :
output = str([j, i]) + "\n" + str(filtered) + "\n"
coordinatefile.write(output)
pool2 = multiprocessing.Pool(multiprocessing.cpu_count())
pool2.map(spectrum, range(num_y))
pool2.close()
pool2.join()
It should write away results like:
[14,0]
[50, 51, 84]
[0, 314]
[60, 74, 12, 202, 129]
But sometimes processes overlap and the file looks like (this happens very occasionally, but it results in analysis problems)
[149, 27]
[27, 34, 26, 25, 19, 45, 32, 36, 46, 29, 25, 25, 40, 62, 24, 31, 23, 46, 33, 35, 60, 33, 8, 24, 49, 29, 29, 42, 8, 22, 31, 28, 25, 25, 56, 32, 31, 27, 11, 20, 29, 23, 51, 28, 31, 29, 28, 30, 23, 16, 34, 36, 25, 17, 25, 19, 19, 51, 27, 37, 9, 32, 26, 28, 27, 3, 44, 4, 38, 20, 34, 28, 22, 26, 26, 19, 21, 25, 25, 48, 24, 29, 22, 20, 23, 29, 15, 32, 42, 3, 23, 26, 34, 28, 26, 39, 17, [0, 123]
[20, 43, 33, 34, 18, 44, 15, 22, 33, 20, 45, 30, 21, 33, 32, 43, 30, 8, 37, 54, 9, 46, 33, 16, 27, 29, 31, 47, 26, 38, 40, 29, 34, 38, 17, 33, 47, 28, 24, 33, 40, 47, 16, 32, 33, 21, 49, 34, 26, 21, 47, 46, 49, 13, 62, 62, 31, 41, 14, 65, 36, 49, 27, 38, 44, 54, 55, 64, 32, 50, 28, 34, 41, 49, 33, 40, 28, 32, 31, 56, 16, 35, 37, 50, 33, 41, 38, 26, 41, 26, 28, 25, 37, 27, 20, 47, 31, 35, 28, 43, 48, 37, 31, 24, 34, 36, 41, 19, 41, 41, 3, 36]
[1, 123]
Thus it doesn't finish the process for [149, 27] and already begins with [0, 123] without closing the [149,27] process.

Related

I have problem python use library Counter? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 days ago.
Improve this question
i have problem to use library Counter in python one number
Please développer help me
from collections import Counter
serie = [5, 6, 7, 8, 10, 12, 13, 25, 27, 29, 33, 37, 39, 41, 47, 56, 59, 66, 76, 78, 1, 7, 15, 16, 21, 25, 26, 28, 30, 38, 41, 48, 51, 59, 60, 65, 68, 70, 75, 79, 3, 6, 14, 15, 17, 23,
25, 27, 33, 34, 35, 38, 46, 51, 53, 58, 63, 68, 74, 77, 7, 9, 11, 21, 26, 27, 32, 35, 38, 43, 44, 52, 53, 56, 59, 65, 66, 74, 76, 80, 3, 9, 19, 27, 28, 34, 35, 39, 47, 49, 50, 51, 53, 57, 61, 66, 67, 72, 74, 80, 2, 3, 24, 25, 28, 30, 35, 36, 51, 54, 55, 57, 61, 67, 68, 69, 70, 71, 74, 79, 3, 11, 14, 16, 19, 25, 27, 33, 35, 38, 44, 46, 48, 58, 63, 64, 65, 68, 69, 73, 7, 12, 18, 23, 24, 25, 27, 28, 47, 52, 53, 59, 65, 66, 67, 68, 69, 70, 72, 75, 1, 2, 5, 8, 9, 10, 13, 20, 25, 28, 29, 33, 39, 41, 43, 48, 49, 53, 66, 74, 1, 6, 7, 9, 15, 18, 19, 23, 25, 26, 33, 34, 42, 45, 46, 62, 65, 71, 79, 80, 2, 4, 6, 7, 11, 12, 15,
21, 23, 24, 26, 33, 34, 38, 51, 53, 67, 68, 73, 79, 1, 8, 9, 19, 20, 24, 30, 32, 35, 40,
42, 44, 47, 54, 55, 56, 60, 61, 78, 80]
# Compter le nombre d'occurrences de chaque élément dans la série
occurrences = Counter(serie)
# Trier les éléments par ordre décroissant du nombre d'occurrences
sorted_occurrences = occurrences.most_common()
# Récupérer les éléments les plus fréquents
most_common_count = sorted_occurrences[0][1]
most_common = [x[0] for x in sorted_occurrences if x[1] == most_common_count][:5]
print(most_common)
I want this code to return the five most frequent numbers while it returns
You are already doing the correct thing:
from collections import Counter
serie = [5, 6, 7, 8, 10, 12, 13, 25, 27, 29, 33, 37, 39, 41, 47, 56, 59, 66, 76, 78, 1, 7, 15, 16, 21, 25, 26, 28, 30, 38, 41, 48, 51, 59, 60, 65, 68, 70, 75, 79, 3, 6, 14, 15, 17, 23,
25, 27, 33, 34, 35, 38, 46, 51, 53, 58, 63, 68, 74, 77, 7, 9, 11, 21, 26, 27, 32, 35, 38, 43, 44, 52, 53, 56, 59, 65, 66, 74, 76, 80, 3, 9, 19, 27, 28, 34, 35, 39, 47, 49, 50, 51, 53, 57, 61, 66, 67, 72, 74, 80, 2, 3, 24, 25, 28, 30, 35, 36, 51, 54, 55, 57, 61, 67, 68, 69, 70, 71, 74, 79, 3, 11, 14, 16, 19, 25, 27, 33, 35, 38, 44, 46, 48, 58, 63, 64, 65, 68, 69, 73, 7, 12, 18, 23, 24, 25, 27, 28, 47, 52, 53, 59, 65, 66, 67, 68, 69, 70, 72, 75, 1, 2, 5, 8, 9, 10, 13, 20, 25, 28, 29, 33, 39, 41, 43, 48, 49, 53, 66, 74, 1, 6, 7, 9, 15, 18, 19, 23, 25, 26, 33, 34, 42, 45, 46, 62, 65, 71, 79, 80, 2, 4, 6, 7, 11, 12, 15,
21, 23, 24, 26, 33, 34, 38, 51, 53, 67, 68, 73, 79, 1, 8, 9, 19, 20, 24, 30, 32, 35, 40,
42, 44, 47, 54, 55, 56, 60, 61, 78, 80]
# Compter le nombre d'occurrences de chaque élément dans la série
occurrences = Counter(serie)
# Trier les éléments par ordre décroissant du nombre d'occurrences
sorted_occurrences = occurrences.most_common()
print([x[0] for x in sorted_occurrences][:5])
#output
[25, 7, 27, 33, 68]

Transform a list of ranges into a single list

I have a data frame that have some points to mark another dataset.
I'm creating a range from the starting mark and the stopping mark that I want to transform into a single list or numpy array.
I have the following:
list(map(lambda limits : np.arange(limits[1] - limits[0]-1, -1, -1),
zip(df_cycles['Start_point'], df_cycles['Stop_point']))
)
This is returning a list of arrays:
[array([1155, 1154, 1153, ..., 2, 1, 0]),
array([71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55,
54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38,
37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21,
20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,
3, 2, 1, 0]),
...]
How can I modify or transform the output to have a single list or NumPy array like this:
array([1155, 1154, 1153, ..., 2, 1, 0, 71, 70, 69, 68, 67, 66, 65,
64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48,
47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31,
30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14,
13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2, 1, 0,...])
Just do:
flatarray = np.concatenate(list_of_arrays)
concatenate puts together two or more arrays into a single new array; you don't to do it a single array at a time (it creates a Schlemiel the Painter's algorithm), but once you've got them all, it's an efficient way to combine them.

Select values from two different dataset in python

i have a trouble when i'm dealing with my 2 dataset, i explain my problem:
I have 2 different dataset:
training_df = pd.read_csv('.../train.csv')
test_df = pd.read_csv('.../test.csv')
I have to take values from some columns from train.csv and take other columns in test.csv, i tried like this:
num_attrib = pd.DataFrame(training_df, columns=[0, 2, 3, 15, 16, 17, 18, 24, 32, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 57, 59, 60, 64, 65, 66, 67, 68, 69, 70, 71, 72])
cat_attrib = pd.DataFrame(training_df, columns=[1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 33, 37, 38, 39, 40, 51, 53, 55, 56, 58, 61, 62, 63, 73, 74])
num_attrib_test = pd.DataFrame(test_df, columns=[0, 2, 3, 15, 16, 17, 18, 24, 32, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 57, 59, 60, 64, 65, 66, 67, 68, 69, 70, 71, 72])
cat_attrib_test = pd.DataFrame(test_df, columns=[1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 33, 37, 38, 39, 40, 51, 53, 55, 56, 58, 61, 62, 63, 73, 74])
Both datasets have numerical and categorial datas. I have to select and separate categorical from numerical datas for each datasets, but my way is wrong.
I have this trouble because i have to make the Columntransformer() on training_df and test_df.
Any suggestion?
Thank you so much
You are looking for iloc. See documentation here.
num_attrib = training_df.iloc[:,[0,2,3,...,15]]
You can also slice:
#even columns
num_attrib = training_df.iloc[:, ::2]
#odd columns
num_attrib = training_df.iloc[:, 1::2]

How to iterate over values of one list, change that with a function and add that to a second list?

I have this list of temperatures:
temp_data = [19, 21, 21, 21, 23, 23, 23, 21, 19, 21, 19, 21, 23, 27, 27, 28, 30, 30, 32, 32, 32, 32, 34, 34,
34, 36, 36, 36, 36, 36, 36, 34, 34, 34, 34, 34, 34, 32, 30, 30, 30, 28, 28, 27, 27, 27, 23, 23,
21, 21, 21, 19, 19, 19, 18, 18, 21, 27, 28, 30, 32, 34, 36, 37, 37, 37, 39, 39, 39, 39, 39, 39,
41, 41, 41, 41, 41, 39, 39, 37, 37, 36, 36, 34, 34, 32, 30, 30, 28, 27, 27, 25, 23, 23, 21, 21,
19, 19, 19, 18, 18, 18, 21, 25, 27, 28, 34, 34, 41, 37, 37, 39, 39, 39, 39, 41, 41, 39, 39, 39,
39, 39, 41, 39, 39, 39, 37, 36, 34, 32, 28, 28, 27, 25, 25, 25, 23, 23, 23, 23, 21, 21, 21, 21,
19, 21, 19, 21, 21, 19, 21, 27, 28, 32, 36, 36, 37, 39, 39, 39, 39, 39, 41, 41, 41, 41, 41, 41,
41, 41, 41, 39, 37, 36, 36, 34, 32, 30, 28, 28, 27, 27, 25, 25, 23, 23, 23, 21, 21, 21, 19, 19,
19, 19, 19, 19, 21, 23, 23, 23, 25, 27, 30, 36, 37, 37, 39, 39, 41, 41, 41, 39, 39, 41, 43, 43,
43, 43, 43, 43, 43, 43, 43, 39, 37, 37, 37, 36, 36, 36, 36, 34, 32, 32, 32, 32, 30, 30, 28, 28,
28, 27, 27, 27, 27, 25, 27, 27, 27, 28, 28, 28, 30, 32, 32, 32, 34, 34, 36, 36, 36, 37, 37, 37,
37, 37, 37, 37, 37, 37, 36, 34, 30, 30, 27, 27, 25, 25, 23, 21, 21, 21, 21, 19, 19, 19, 19, 19,
18, 18, 18, 18, 18, 19, 23, 27, 30, 32, 32, 32, 32, 32, 32, 34, 34, 34, 34, 34, 36, 36, 36, 36,
36, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 28, 28]
I have imported a module that I created with two functions, to change the temperature from Fahrenheit to Celsius and the other to classify into 4 classes according to Celsius.
from temp_functions import fahr_to_celsius, temp_classifier
So, I also created an empty list where the classified according to Celsius will go:
temp_classes =[]
and a for loop after that:
the for loop should iterate over all values in temp_data, change them via
fahr_to_celsius function, and than append them to the empty list temp_celsius.
for t in temp_data:
temp_celsius = []
temp_celsius.append(fahr_to_celsius(t))
Issue: I just get the first value. I tried range, len, =+1 and several other but no luck.
EDIT:
Adding info from OP comment:
This is an assignment that I am working on:
Iterate over the Fahrenheit temperature values in the temp_data list
(one by one) and inside the loop: Create a new variable called
temp_celsius in which you should assign the temperature in Celsius
using the fahr_to_celsius function to convert the Fahrenheit
temperature into Celsius. Create a new variable called temp_class in
which you should assign the temperature class number (0, 1, 2, or 3)
using the temp_classifier function Add the temp_class value to the
temp_classes list
Another strategy would be to use a Python list comprehension:
temp_celsius = [fahr_to_celsius(t) for t in temp_data]
You are creating a new list at each iteration. move the creation of the list outside of the for loop :
temp_celsius = []
for t in temp_data:
temp_celsius.append(fahr_to_celsius(t))
You might consider to use pandas to have a table view
import pandas as pd
df = pd.DataFrame({"fahr":temp_data})
df["celsius"] = df["fahr"].apply(fahr_to_celsius)
# or
df["celsius"] = fahr_to_celsius(df["fahr"])
# or (even faster)
df["celsius"] = fahr_to_celsius(df["fahr"].values)
You could also do this using map although some folks don't consider this to be "pythonic"
map() maps the function to each item in an iterable and takes the form of
map(function, iterable) #where an iterable is a list, set, tuple
It is essentially the same as the list comprehension.
python3:
temp_celcius = list(map(fahr_to_celsius, temp_data)) # explicitly call it a list
python2:
temp_celcius = map(fahr_to_celsius, temp_data) # interpreted as a list

Reading formatted array from file in Python

I have a file which contains some strings and then two formatted arrays. It looks something like this
megabuck
Hello world
[58, 50, 42, 34, 26, 18, 10, 2,
61, 53, 45, 37, 29, 21, 13, 5,
63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4]
I don't know the size of the arrays beforehand. Only thing I know is the delimiter for the array which is []. What can be an elegant way to read the arrays.
I am a newbie in python.
Using Regex. re.findall
Ex:
import re
import ast
with open(filename) as infile:
data = infile.read()
for i in re.findall(r"(\[.*?\])", data, flags=re.S):
print(ast.literal_eval(i))
Output:
[58, 50, 42, 34, 26, 18, 10, 2, 61, 53, 45, 37, 29, 21, 13, 5, 63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9, 1, 58, 50, 42, 34, 26, 18, 14, 6, 61, 53, 45, 37, 29, 21, 13, 5, 28, 20, 12, 4]
I wouldn't call it elegant but it works
ars = """
megabuck
Hello world
[58, 50, 42, 34, 26, 18, 10, 2,
61, 53, 45, 37, 29, 21, 13, 5,
63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4]
"""
arrays = []
for a in ars.split("["):
if ']' in a:
arrays.append([i.strip() for i in a.replace("]",'').split(',')])

Categories