I get the foll. output from a pandas cut operation:
0 (0, 20]
1 (0, 20]
2 (0, 20]
3 (0, 20]
4 (0, 20]
5 (0, 20]
6 (0, 20]
7 (0, 20]
8 (0, 20]
9 (0, 20]
How can I convert the (0, 20] to 0 - 20?
I am doing this:
.str.replace('(', '').str.replace(']', '').str.replace(',', ' -')
Any better approach?
Use the labels parameter of pd.cut:
pd.cut(df['some_col'], bins=[0,20,40,60], labels=['0-20', '20-40', '40-60'])
I don't know what your exact pd.cut command looks like, but the code above should give you a good idea of what to do.
Example usage:
df = pd.DataFrame({'some_col': range(5, 56, 5)})
df['cut'] = pd.cut(df['some_col'], bins=[0,20,40,60], labels=['0-20','20-40','40-60'])
Example output:
some_col cut
0 5 0-20
1 10 0-20
2 15 0-20
3 20 0-20
4 25 20-40
5 30 20-40
6 35 20-40
7 40 20-40
8 45 40-60
9 50 40-60
10 55 40-60
assuming the output was assigned to a variable cut
cut.astype(str)
To remove bracketing
cut.astype(str).str.strip('()[]')
Related
I have a dataframe which I want to cut up according to the elements in a list. For example, I have a range_list[16, 14, 2...]
I then want to cut up the dataframe so that the first chunk will be 16 rows long, the second part 14, the third part 2.. etc. It could be beneficial to put this in a list as well.
Use numpy.split. This can take a range of indices to slice on, so will require you to cumsum your range list.
indices = np.cumsum(range_list, dtype=np.int32)
np.split(df, indices)
Example
range_list = [16, 14, 2]
np.random.seed(0)
df = pd.DataFrame(np.random.randn(sum(range_list), 2))
indices = np.cumsum(range_list, dtype=np.int32)
np.split(df, indices)
[returns]
Returns a list with 3 DataFrames in this example, of shapes (16, 2) , (14, 2) & (2, 2)
[ 0 1
0 1.060679 1.092185
1 -0.043971 -1.394001
2 1.106233 -0.711420
3 -0.585148 0.179987
4 -0.871562 0.730840
5 0.810119 -0.130510
6 -0.957646 -0.324547
7 0.235788 -0.460025
8 -0.262714 -0.496833
9 0.454519 -1.244402
10 0.084796 1.587114
11 -0.353880 1.110543
12 -0.570345 0.774158
13 1.772536 1.283950
14 -1.682226 -0.376789
15 0.956894 0.081805, 0 1
16 0.014841 0.110091
17 -0.408881 0.260970
18 0.004939 0.940186
19 -2.056951 0.353928
20 0.618294 -2.201036
21 1.375224 0.526367
22 -0.424886 -1.253565
23 1.785862 0.774936
24 -0.341340 -1.056191
25 -0.274463 -1.637185
26 1.596336 2.311630
27 -0.479840 1.021640
28 -1.307765 -0.232664
29 0.243427 0.339242, 0 1
30 0.345476 0.331306
31 0.895437 -1.163441, Empty DataFrame
Columns: [0, 1]
Index: []]
I'm not sure if I understand you correctly.
If you just want to split the list you can do something like this:
def split_list(l, range_list):
i = 0
for x in range_list:
start = i
end = start + x
print(l[start:end])
You can create an array with the cumulative sum of the list elements, add an initial zero, and a final -1, then iterate over it for slicing the initial dataframe:
ls = [16,14,2, ..]
chucks = np.cumsum(ls)
c=np.zeros(len(chucks)+2)
c[1:-1] = chucks
c[-1] = -1
all_dfs= []
for i range(len(c)-1):
df_list.append(df[c[i]:c[i+1]])
I have a dataframe :
start end
1 10
26 50
6 15
1 5
11 25
I expect following dataframe :
start end
1 10
11 25
26 50
1 5
6 15
here sort order is noting but end of nth row must be start+1 of n+1th row.If not found, search for other starts where start is one.
can anyone suggest what combination of sort and group by can I use to convert above dataframe in required format?
You could transform the df to a list and then do:
l=[1,10,26,50,6,15,1,5,11,25]
result=[]
for x in range(int(len(l)/2)):
result.append(sorted([l[2*x],l[2*x+1]])[1])
result.append(sorted([l[2*x],l[2*x+1]])[0])
This will give you result:
[1, 10, 26, 50, 6, 15, 1, 5, 11, 25]
To transform the original df to list you can do:
startcollist=df['start'].values.tolist()
endcollist=df['end'].values.tolist()
l=[]
for index, each in enumerate(originaldf):
l.append(each)
l.append(endcollist[index])
You can then transform result back to a dataframe:
df=pd.DataFrame({'start':result[1::2], 'end':result[0::2]})
Giving the result:
end start
0 10 1
1 50 26
2 15 6
3 5 1
4 25 11
The expression result[1::2] gives every odd element of result, result[0::2] gives every even element. For explanation, see here: https://stackoverflow.com/a/12433705/8565438
I am trying to split my array that is composed by 100 elements to small arrays each one has 10 elements and calculate their average (the average of each small array). My problem is that each time I want to shift two elements, is what I am doing in the next code is correct ?
Avg_Arr=[sum(Signal[k:k+10])/10 for k in range(0,N,2)]
More precisely, if my Array is the following
Array=[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 .....]
My first small array is
My_Array1=[0 1 2 3 4 5 6 7 8 9]
==> average is (0+1+2+3+4+5+6+7+8+9)/10
while my second one must be
My_Array2=[2 3 4 5 6 7 8 9 10 11]
This should works:
Signal=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
N = len(Signal)
Avg_Arr=[sum(Signal[k:k+10])/10 for k in range(0, N-10, 2)]
print(Avg_Arr)
Beware that you must stop 10 elements from the end. Otherwise you are not averaging over 10 elements.
currently I am trying to read my SQL-Data to an array in python.
I am new to python so please be kind ;)
My csv-export can be read easily:
data = pd.read_csv('esoc_data.csv', header = None)
x = data[[1,2,3,4,5,6,7,8,9,10,11,12]]
This one picks the second column (starting from 1, not 0) till 12th column of my dataset. I need this data in this exact format!
Now I want to do the same with the data I get from my SQL-fetch.
names = [i for i in cursor.fetchall()]
This one gives me my data with all (0-12) columns and separated by ","
Result:
[(name#mail.com', 13, 13, 0, 24, 2, 0, 20, 3, 0, 31, 12, 2), (...)]
Now .. how do I get this into my "specific" format I mentioned before?
I just need the numbers like this:
1 2 3 4 5 6 7 8 9 10 11 12
0 13 13 0 24 2 0 20 3 0 31 12 2
1 21 0 0 24 0 0 32 0 0 30 0 0
2 9 7 0 26 31 0 19 27 0 30 32 2
I'm sorry if this is peanuts for you.
You can run a multi-loop for this, something like
def our_method():
parent_list = list()
for name in names:
child_list = list()
for index, item in enumerate(name):
if index != 0:
child_list.append(item)
parent_list.append(child_list)
return parent_list
I would like to bin a dataframe in pandas based on the sum of another column.
I have the following dataframe:
time variable frequency
2 7 7
3 12 2
4 13 3
6 15 4
6 18 4
6 3 1
10 21 2
11 4 5
13 6 5
15 17 6
17 5 4
I would like to bin the data so that each group contains a minimum total frequency of 10 and output the average time and the total variable and total frequency.
avg time total variable total frequency
3 32 12
7 57 11
12 10 10
16 22 10
Any help would be greatly appreciated
A little brute force would get you a long way.
import numpy as np
data = ((2, 7, 7),
(3, 12, 2),
(4, 13, 3),
(6, 15, 4),
(6, 18, 4),
(6, 3, 1),
(10, 21, 2),
(11, 4, 5),
(13, 6, 5),
(15, 17, 6),
(17, 5, 4))
freq = [data[i][2] for i in range(len(data))]
variable = [data[i][1] for i in range(len(data))]
time = [data[i][0] for i in range(len(data))]
freqcounter = 0
timecounter = 0
variablecounter = 0
counter = 0
freqlist = []
timelist = []
variablelist = []
for k in range(len(data)):
freqcounter += freq[k]
timecounter += time[k]
variablecounter += variable[k]
counter += 1
if freqcounter >= 10:
freqlist.append(freqcounter)
timelist.append(timecounter/counter)
variablelist.append(variablecounter)
freqcounter = 0
timecounter = 0
variablecounter = 0
counter = 0
print(timelist)
print(variablelist)
print(freqlist)