Sort on 2 columns which are inter - python

I have a dataframe :
start end
1 10
26 50
6 15
1 5
11 25
I expect following dataframe :
start end
1 10
11 25
26 50
1 5
6 15
here sort order is noting but end of nth row must be start+1 of n+1th row.If not found, search for other starts where start is one.
can anyone suggest what combination of sort and group by can I use to convert above dataframe in required format?

You could transform the df to a list and then do:
l=[1,10,26,50,6,15,1,5,11,25]
result=[]
for x in range(int(len(l)/2)):
result.append(sorted([l[2*x],l[2*x+1]])[1])
result.append(sorted([l[2*x],l[2*x+1]])[0])
This will give you result:
[1, 10, 26, 50, 6, 15, 1, 5, 11, 25]
To transform the original df to list you can do:
startcollist=df['start'].values.tolist()
endcollist=df['end'].values.tolist()
l=[]
for index, each in enumerate(originaldf):
l.append(each)
l.append(endcollist[index])
You can then transform result back to a dataframe:
df=pd.DataFrame({'start':result[1::2], 'end':result[0::2]})
Giving the result:
end start
0 10 1
1 50 26
2 15 6
3 5 1
4 25 11
The expression result[1::2] gives every odd element of result, result[0::2] gives every even element. For explanation, see here: https://stackoverflow.com/a/12433705/8565438

Related

Pandas DataFrame group-by indexes matching list - indexes respectively smaller than list[i+1] and greater than list[i]

I have a DataFrame Times_df with times in a single column and a second DataFrame End_df with specific end times for each group indexed by group name.
Times_df = pd.DataFrame({'time':np.unique(np.cumsum(np.random.randint(5, size=(100,))), axis=0)})
End_df = pd.DataFrame({'end time':np.unique(random.sample(range(Times_df.index.values[0], Times_df.index.values[-1]), 10))})
End_df.index.name = 'group'
I want to add a group index for all times in Times_df smaller or equal than each consequitive end time in End_df but greater than the previous one
I can only do it for now with a loop, which takes forever ;(
lis = []
i = 1
for row in Times_df['time'].values:
while i <= row:
lis.append((End_df['end time']==row).index)
i +1
Then I add the list lis as a new column to Times_df
Times_df['group']=lis
A nother sollution that sadly still uses a loop is this:
test_df = pd.DataFrame()
for group, index in End_df.iterrows():
test = count.loc[count.index<=index['end time]][:]
test['group']=group
test_df = pd.concat([test_df,test], axis=0, ignore_index=True)
I think what you are looking for is pd.cut to bin your values into the groups.
bins = [0, 3, 10, 20, 53, 59, 63, 65, 68, 74, np.inf]
groups = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Times_df["group"] = pd.cut(Times_df["time"], bins, labels=groups)
print(Times_df)
time group
0 2 0
1 3 0
2 7 1
3 11 2
4 15 2
5 16 2
6 18 2
7 22 3
8 25 3
9 28 3

How do I split a dataframe by a list of numbers of rows for each chunk?

I have a dataframe which I want to cut up according to the elements in a list. For example, I have a range_list[16, 14, 2...]
I then want to cut up the dataframe so that the first chunk will be 16 rows long, the second part 14, the third part 2.. etc. It could be beneficial to put this in a list as well.
Use numpy.split. This can take a range of indices to slice on, so will require you to cumsum your range list.
indices = np.cumsum(range_list, dtype=np.int32)
np.split(df, indices)
Example
range_list = [16, 14, 2]
np.random.seed(0)
df = pd.DataFrame(np.random.randn(sum(range_list), 2))
indices = np.cumsum(range_list, dtype=np.int32)
np.split(df, indices)
[returns]
Returns a list with 3 DataFrames in this example, of shapes (16, 2) , (14, 2) & (2, 2)
[ 0 1
0 1.060679 1.092185
1 -0.043971 -1.394001
2 1.106233 -0.711420
3 -0.585148 0.179987
4 -0.871562 0.730840
5 0.810119 -0.130510
6 -0.957646 -0.324547
7 0.235788 -0.460025
8 -0.262714 -0.496833
9 0.454519 -1.244402
10 0.084796 1.587114
11 -0.353880 1.110543
12 -0.570345 0.774158
13 1.772536 1.283950
14 -1.682226 -0.376789
15 0.956894 0.081805, 0 1
16 0.014841 0.110091
17 -0.408881 0.260970
18 0.004939 0.940186
19 -2.056951 0.353928
20 0.618294 -2.201036
21 1.375224 0.526367
22 -0.424886 -1.253565
23 1.785862 0.774936
24 -0.341340 -1.056191
25 -0.274463 -1.637185
26 1.596336 2.311630
27 -0.479840 1.021640
28 -1.307765 -0.232664
29 0.243427 0.339242, 0 1
30 0.345476 0.331306
31 0.895437 -1.163441, Empty DataFrame
Columns: [0, 1]
Index: []]
I'm not sure if I understand you correctly.
If you just want to split the list you can do something like this:
def split_list(l, range_list):
i = 0
for x in range_list:
start = i
end = start + x
print(l[start:end])
You can create an array with the cumulative sum of the list elements, add an initial zero, and a final -1, then iterate over it for slicing the initial dataframe:
ls = [16,14,2, ..]
chucks = np.cumsum(ls)
c=np.zeros(len(chucks)+2)
c[1:-1] = chucks
c[-1] = -1
all_dfs= []
for i range(len(c)-1):
df_list.append(df[c[i]:c[i+1]])

Python SQL Data to Array (like csv)

currently I am trying to read my SQL-Data to an array in python.
I am new to python so please be kind ;)
My csv-export can be read easily:
data = pd.read_csv('esoc_data.csv', header = None)
x = data[[1,2,3,4,5,6,7,8,9,10,11,12]]
This one picks the second column (starting from 1, not 0) till 12th column of my dataset. I need this data in this exact format!
Now I want to do the same with the data I get from my SQL-fetch.
names = [i for i in cursor.fetchall()]
This one gives me my data with all (0-12) columns and separated by ","
Result:
[(name#mail.com', 13, 13, 0, 24, 2, 0, 20, 3, 0, 31, 12, 2), (...)]
Now .. how do I get this into my "specific" format I mentioned before?
I just need the numbers like this:
1 2 3 4 5 6 7 8 9 10 11 12
0 13 13 0 24 2 0 20 3 0 31 12 2
1 21 0 0 24 0 0 32 0 0 30 0 0
2 9 7 0 26 31 0 19 27 0 30 32 2
I'm sorry if this is peanuts for you.
You can run a multi-loop for this, something like
def our_method():
parent_list = list()
for name in names:
child_list = list()
for index, item in enumerate(name):
if index != 0:
child_list.append(item)
parent_list.append(child_list)
return parent_list

input a none-regular matrix in python

link: https://cw.felk.cvut.cz/courses/a4b33alg/task.php?task=pary_py&idu=2341
I want to input the matrix split by space by using:
def neighbour_pair(l):
matrix = [[int(row) for row in input().split()] for i in range(l)]
but the program told me
TypeError: 'str' object cannot be interpreted as an integer
It seems the .split() didn't work but I don't know why.
here is an example of the input matrix:
13 5
7 50 0 0 1
2 70 10 11 0
4 30 9 0 0
6 70 0 0 0
1 90 8 12 0
9 90 0 2 1
13 90 0 6 0
5 30 4 3 0
12 80 0 0 1
10 50 0 0 1
11 50 0 0 0
3 80 1 13 0
8 70 7 0 1
The input is a binary tree with N nodes, the nodes are labeled by numbers 1 to N in random order, each label is unique. Each node contains an integer key in the range from 0 to (2^31)−1.
The first line of input contains two integers N and R separated by space. N is the number of nodes in the tree, R is the label of the tree root.
Next, there are N lines. Each line describes one node and the order of the nodes is arbitrary. A node is specified by five integer values. The first value is the node label, the second value is the node key, the third and the fourth values represent the labels of the left and right child respectively, and the fifth value represents the node color, white is 0, black is 1. If any of the children does not exist there is value 0 instead of the child label at the corresponding place. The values on the line are separated by a space.
This is the range() complaining that your l variable is a string:
>>> range('1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object cannot be interpreted as an integer
I suspect you are reading the l from the standard in as well, cast it to integer:
l = int(input())
matrix = [[int(row) for row in input().split()] for i in range(l)]
I agree with #alecxe. It seems that your error is in reference to the string being used as l in your range(l) function. If I put a static int in the range() function it seems to work. 3 followed by three rows of input, will give me the below output.
>>> l = input() # define the number of rows expected the input matrix
>>> [[int(row) for row in input().split()] for i in range(int(l))]
13 5
7 50 0 0 1
2 70 10 11 0
output
[[13, 5], [7, 50, 0, 0, 1], [2, 70, 10, 11, 0]]
Implemented as a method, per the OP request in the comments below:
def neighbour_pair():
l = input()
return [[int(row) for row in input().split()] for i in range(int(l))]
print( neighbour_pair() )
# input
# 3
# 13 5
# 7 50 0 0 1
# 2 70 10 11 0
# output
[[13, 5], [7, 50, 0, 0, 1], [2, 70, 10, 11, 0]]
Still nothing wrong with this implementation...

formatting output with for loop or different method in python

I did a program to calculate the inventory in python;however, i have problem formatting the layout output. What I have done so far is:
def summary(a,b,c,row,col,tot):
d={0:"Small", 1:"Medium", 2:"Large", 3:"Xlarge"}
for i in range(row):
for j in range(col):
print "%6d" %(a[i][j]),
print "%s%6d\n" %(d[i],(b[i])),
print "\n" ,
for j in range(col):
print "%6d" %(c[j]),
print "%6d\n" %tot
so the output comes the 7 x 4 matrix and the total to the right hand side and by column total. However I want to put some names on the left hand side to represent the specific name like size small etc so i used a dictionary but what i am getting is on the right hand side just before the row total. I can't figure out how can i put it on the left hand side in the same row as the numbers. I want to put two columns apart from the number (matrix) which one would be a size in the first far left column in the middle and then in second column names as u can see specified used in dictionary and then the numbers would come in the same row.
Thanks a lot for any help or suggestions. I did a program to calculate the inventory in python;however, i have problem formatting the layout output. What I have done so far is:
def summary(a,b,c,row,col,tot):
d={0:"Small", 1:"Medium", 2:"Large", 3:"Xlarge"}
for i in range(row):
for j in range(col):
print "%6d" %(a[i][j]),
print "%s%6d\n" %(d[i],(b[i])),
print "\n" ,
for j in range(col):
print "%6d" %(c[j]),
print "%6d\n" %tot
so the output comes the 7 x 4 matrix and the total to the right hand side and by column total. However I want to put some names on the left hand side to represent the specific name like size small etc so i used a dictionary but what i am getting is on the right hand side just before the row total. I can't figure out how can i put it on the left hand side in the same row as the numbers. I want to put two columns apart from the number (matrix) which one would be a size in the first far left column in the middle and then in second column names as u can see specified used in dictionary and then the numbers would come in the same row.
Thanks a lot for any help or suggestions.
I want it to look like this
small 1 1 1 1 1 1 1 7
medium 1 1 1 1 1 1 1 7
size large 1 1 1 1 1 1 1 7
xlarge 1 1 1 1 1 1 1 7
4 4 4 4 4 4 4 28
and i get
1 1 1 1 1 1 1 small 7
1 1 1 1 1 1 1 medium 7
1 1 1 1 1 1 1 large 7
1 1 1 1 1 1 1 xlarge 7
4 4 4 4 4 4 4 28
sorry for not being specific enough previously.
Just print it before the row:
def summary(a,b,c,row,col,tot):
d={0:"Small", 1:"Medium", 2:"Large", 3:"Xlarge"}
for i in range(row):
print d[i].ljust(6),
for j in range(col):
print "%6d" %(a[i][j]),
print "%6d\n" %(b[i]),
print "\n" ,
for j in range(col):
print "%6d" %(c[j]),
print "%6d\n" %tot
This assumes you want the first column left justified. Right justification (rjust()) and centering (center()) are also available.
Also, since you're just using contiguous numeric indices, you can just use a list instead of a dictionary.
As a side note, more descriptive variables are never a bad thing. Also, according to this, % formatting is obsolete, and the format() method should be used in new programs.
You just have to move the "%s" and the appropriate variable to the correct position:
def summary(a,b,c,row,col,tot):
d={0:"Small", 1:"Medium", 2:"Large", 3:"Xlarge"}
for i in range(row):
print "%8s" % d[i],
for j in range(col):
print "%6d" %(a[i][j]),
print "%6d\n" % ((b[i])),
print "\n" ,
print "%8s" % " ",
for j in range(col):
print "%6d" %(c[j]),
print "%6d\n" %tot
When calling this with (note that this are just test-numbers, you will replace them with the real ones):
summary([[1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7]], [12, 13, 14, 15],
[22, 23, 24, 25, 26, 27, 28], 4, 7, 7777)
you get something like:
Small 1 2 3 4 5 6 7 12
Medium 1 2 3 4 5 6 7 13
Large 1 2 3 4 5 6 7 14
Xlarge 1 2 3 4 5 6 7 15
22 23 24 25 26 27 28 7777
If you want the names left adjusted, you have to add a '-' before the format description like:
print "%-8s" % d[i],

Categories