i have a trouble when i'm dealing with my 2 dataset, i explain my problem:
I have 2 different dataset:
training_df = pd.read_csv('.../train.csv')
test_df = pd.read_csv('.../test.csv')
I have to take values from some columns from train.csv and take other columns in test.csv, i tried like this:
num_attrib = pd.DataFrame(training_df, columns=[0, 2, 3, 15, 16, 17, 18, 24, 32, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 57, 59, 60, 64, 65, 66, 67, 68, 69, 70, 71, 72])
cat_attrib = pd.DataFrame(training_df, columns=[1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 33, 37, 38, 39, 40, 51, 53, 55, 56, 58, 61, 62, 63, 73, 74])
num_attrib_test = pd.DataFrame(test_df, columns=[0, 2, 3, 15, 16, 17, 18, 24, 32, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 57, 59, 60, 64, 65, 66, 67, 68, 69, 70, 71, 72])
cat_attrib_test = pd.DataFrame(test_df, columns=[1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 33, 37, 38, 39, 40, 51, 53, 55, 56, 58, 61, 62, 63, 73, 74])
Both datasets have numerical and categorial datas. I have to select and separate categorical from numerical datas for each datasets, but my way is wrong.
I have this trouble because i have to make the Columntransformer() on training_df and test_df.
Any suggestion?
Thank you so much
You are looking for iloc. See documentation here.
num_attrib = training_df.iloc[:,[0,2,3,...,15]]
You can also slice:
#even columns
num_attrib = training_df.iloc[:, ::2]
#odd columns
num_attrib = training_df.iloc[:, 1::2]
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 days ago.
Improve this question
i have problem to use library Counter in python one number
Please développer help me
from collections import Counter
serie = [5, 6, 7, 8, 10, 12, 13, 25, 27, 29, 33, 37, 39, 41, 47, 56, 59, 66, 76, 78, 1, 7, 15, 16, 21, 25, 26, 28, 30, 38, 41, 48, 51, 59, 60, 65, 68, 70, 75, 79, 3, 6, 14, 15, 17, 23,
25, 27, 33, 34, 35, 38, 46, 51, 53, 58, 63, 68, 74, 77, 7, 9, 11, 21, 26, 27, 32, 35, 38, 43, 44, 52, 53, 56, 59, 65, 66, 74, 76, 80, 3, 9, 19, 27, 28, 34, 35, 39, 47, 49, 50, 51, 53, 57, 61, 66, 67, 72, 74, 80, 2, 3, 24, 25, 28, 30, 35, 36, 51, 54, 55, 57, 61, 67, 68, 69, 70, 71, 74, 79, 3, 11, 14, 16, 19, 25, 27, 33, 35, 38, 44, 46, 48, 58, 63, 64, 65, 68, 69, 73, 7, 12, 18, 23, 24, 25, 27, 28, 47, 52, 53, 59, 65, 66, 67, 68, 69, 70, 72, 75, 1, 2, 5, 8, 9, 10, 13, 20, 25, 28, 29, 33, 39, 41, 43, 48, 49, 53, 66, 74, 1, 6, 7, 9, 15, 18, 19, 23, 25, 26, 33, 34, 42, 45, 46, 62, 65, 71, 79, 80, 2, 4, 6, 7, 11, 12, 15,
21, 23, 24, 26, 33, 34, 38, 51, 53, 67, 68, 73, 79, 1, 8, 9, 19, 20, 24, 30, 32, 35, 40,
42, 44, 47, 54, 55, 56, 60, 61, 78, 80]
# Compter le nombre d'occurrences de chaque élément dans la série
occurrences = Counter(serie)
# Trier les éléments par ordre décroissant du nombre d'occurrences
sorted_occurrences = occurrences.most_common()
# Récupérer les éléments les plus fréquents
most_common_count = sorted_occurrences[0][1]
most_common = [x[0] for x in sorted_occurrences if x[1] == most_common_count][:5]
print(most_common)
I want this code to return the five most frequent numbers while it returns
You are already doing the correct thing:
from collections import Counter
serie = [5, 6, 7, 8, 10, 12, 13, 25, 27, 29, 33, 37, 39, 41, 47, 56, 59, 66, 76, 78, 1, 7, 15, 16, 21, 25, 26, 28, 30, 38, 41, 48, 51, 59, 60, 65, 68, 70, 75, 79, 3, 6, 14, 15, 17, 23,
25, 27, 33, 34, 35, 38, 46, 51, 53, 58, 63, 68, 74, 77, 7, 9, 11, 21, 26, 27, 32, 35, 38, 43, 44, 52, 53, 56, 59, 65, 66, 74, 76, 80, 3, 9, 19, 27, 28, 34, 35, 39, 47, 49, 50, 51, 53, 57, 61, 66, 67, 72, 74, 80, 2, 3, 24, 25, 28, 30, 35, 36, 51, 54, 55, 57, 61, 67, 68, 69, 70, 71, 74, 79, 3, 11, 14, 16, 19, 25, 27, 33, 35, 38, 44, 46, 48, 58, 63, 64, 65, 68, 69, 73, 7, 12, 18, 23, 24, 25, 27, 28, 47, 52, 53, 59, 65, 66, 67, 68, 69, 70, 72, 75, 1, 2, 5, 8, 9, 10, 13, 20, 25, 28, 29, 33, 39, 41, 43, 48, 49, 53, 66, 74, 1, 6, 7, 9, 15, 18, 19, 23, 25, 26, 33, 34, 42, 45, 46, 62, 65, 71, 79, 80, 2, 4, 6, 7, 11, 12, 15,
21, 23, 24, 26, 33, 34, 38, 51, 53, 67, 68, 73, 79, 1, 8, 9, 19, 20, 24, 30, 32, 35, 40,
42, 44, 47, 54, 55, 56, 60, 61, 78, 80]
# Compter le nombre d'occurrences de chaque élément dans la série
occurrences = Counter(serie)
# Trier les éléments par ordre décroissant du nombre d'occurrences
sorted_occurrences = occurrences.most_common()
print([x[0] for x in sorted_occurrences][:5])
#output
[25, 7, 27, 33, 68]
The question follows a such:
x = np.arange(100)
Write Python code to split the following array at these intervals: 10, 25, 45, 75, 95
I have used the split function and unable to get at these specific intervals, can anyone enlighten me on another method or am i doing it wrongly?
Here's both the manual way and the numpy way with split.
# Manual method
x = np.arange(100)
split_indices = [10, 25, 45, 75, 95]
split_arrays = []
for i, j in zip([0]+split_indices[:-1], split_indices):
split_arrays.append(x[i:j])
print(split_arrays)
# Numpy method
split_arrays_np = np.split(x, split_indices)
print(split_arrays_np)
And the result is (for both)
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]),
array([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]),
array([45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]),
array([75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94])
]
I have a data frame that have some points to mark another dataset.
I'm creating a range from the starting mark and the stopping mark that I want to transform into a single list or numpy array.
I have the following:
list(map(lambda limits : np.arange(limits[1] - limits[0]-1, -1, -1),
zip(df_cycles['Start_point'], df_cycles['Stop_point']))
)
This is returning a list of arrays:
[array([1155, 1154, 1153, ..., 2, 1, 0]),
array([71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55,
54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38,
37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21,
20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,
3, 2, 1, 0]),
...]
How can I modify or transform the output to have a single list or NumPy array like this:
array([1155, 1154, 1153, ..., 2, 1, 0, 71, 70, 69, 68, 67, 66, 65,
64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48,
47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31,
30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14,
13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2, 1, 0,...])
Just do:
flatarray = np.concatenate(list_of_arrays)
concatenate puts together two or more arrays into a single new array; you don't to do it a single array at a time (it creates a Schlemiel the Painter's algorithm), but once you've got them all, it's an efficient way to combine them.
What exactly does adding more bins into np.histogram(data, bins=100) do? I know that it divides the data into the amount of bins you specify but what exactly does that entail? For example, I have a histogram and I plotted a best fit line to the histogram using scipy.curve_fit and when I increased the bins, it also increased the accuracy for my best fit line.
The following function illustrates the difference using matplotlib. The same data is plotted using 5 bins and 10 bins:
import matplotlib.pyplot as plt
def plot_histogram(num_bins):
x = [1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
75, 77, 81, 83, 84, 87, 89, 90, 90, 91
]
plt.hist(x, bins=num_bins)
plt.title(f'{num_bins} bins')
plt.show()
plot_histogram(5)
plot_histogram(10)
Above, there are 30 data points that have a value between 20 and 40.
Above, you have more detail. There are 19 data points between 20 and 30 and 11 data points between 30 and 40.
I have a file which contains some strings and then two formatted arrays. It looks something like this
megabuck
Hello world
[58, 50, 42, 34, 26, 18, 10, 2,
61, 53, 45, 37, 29, 21, 13, 5,
63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4]
I don't know the size of the arrays beforehand. Only thing I know is the delimiter for the array which is []. What can be an elegant way to read the arrays.
I am a newbie in python.
Using Regex. re.findall
Ex:
import re
import ast
with open(filename) as infile:
data = infile.read()
for i in re.findall(r"(\[.*?\])", data, flags=re.S):
print(ast.literal_eval(i))
Output:
[58, 50, 42, 34, 26, 18, 10, 2, 61, 53, 45, 37, 29, 21, 13, 5, 63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9, 1, 58, 50, 42, 34, 26, 18, 14, 6, 61, 53, 45, 37, 29, 21, 13, 5, 28, 20, 12, 4]
I wouldn't call it elegant but it works
ars = """
megabuck
Hello world
[58, 50, 42, 34, 26, 18, 10, 2,
61, 53, 45, 37, 29, 21, 13, 5,
63, 55, 47, 39, 31, 23, 15, 7]
[57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4]
"""
arrays = []
for a in ars.split("["):
if ']' in a:
arrays.append([i.strip() for i in a.replace("]",'').split(',')])