Split Dataframe from back to front - python

Somebody know how to make a split from back to front,
when I make a split like
dfgeo['geo'].str.split(',',expand=True)
I have:
1,2,3,4,nan,nan,nan
but I want
nan,nan,nan,4,3,2,1
thanks peopleee :)

if you're looking to reverse the column order you can do this:
new_df = dfgeo['geo'].str.split(',', expand=True)
new_df[new_df.columns[::-1]]

Try this:
list(reversed(dfgeo['geo'].str.split(',',expand=True)))
Assuming your code returns a list!

Use iloc with ::-1 for swap order of columns:
dfgeo = pd.DataFrame({'geo': ['1,2,3,4', '1,2,3,4,5,6,7']})
print (dfgeo)
geo
0 1,2,3,4
1 1,2,3,4,5,6,7
df = dfgeo['geo'].str.split(',',expand=True).iloc[:, ::-1]
#if necessary set default columns names
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6
0 None None None 4 3 2 1
1 7 6 5 4 3 2 1

Related

How to perform groupby and remove duplicate based on first occurrence of a column condition?

This problem is a bit hard for me to wrap my head around so I hope I can explain it properly below.
I have a data frame with a lot of rows but only 3 columns like below:
data = {'line_group': [1,1,8,8,4,4,5,5],
'route_order': [1,2,1,2,1,2,1,2],
'StartEnd':['20888->20850','20888->20850','20888->20850','20888->20850',
'20961->20960','20961->20960','20961->20960','20961->20960']}
df = pd.DataFrame(data)
In the end, I want to use this data to plot routes between points for instance 20888 to 20850. But the problem is that there are a lot of trips/line_group that also goes through these two points so when I do plot things, it will be overlapping and very slow which is not what I want.
So I only want the first line_group which has the unique StartEnd like in the data frame below:
I believe it could have something to do with groupby like in the following code below that I have tried but it doesn't produce the results I want. And in the full dataset route orders aren't usually just from 1 point to another and can be up to much more (E.g 1,2,3,4,...).
drop_duplicates(subset='StartEnd', keep="first")
Group by StartEnd and keep only the first line_group value
Then filter to rows which contain the unique line groups
unique_groups = df.groupby('StartEnd')['line_group'].agg(lambda x: list(x)[0]).reset_index()
StartEnd line_group
20888->20850 1
20961->20960 4
unique_line_groups = unique_groups['line_group']
filtered_df = df[df['line_group'].isin(unique_line_groups)]
Final Output
line_group route_order StartEnd
1 1 20888->20850
1 2 20888->20850
4 1 20961->20960
4 2 20961->20960
You can add in route_order to the argument subset to get output you want.
In [8]: df.drop_duplicates(subset=['StartEnd', 'route_order'], keep='first')
Out[8]:
line_group route_order StartEnd
0 1 1 20888->20850
1 1 2 20888->20850
4 4 1 20961->20960
5 4 2 20961->20960
You can use groupby.first():
df.groupby(["route_order", "StartEnd"], as_index=False).first()
output:
route_order StartEnd line_group
0 1 20888->20850 1
1 1 20961->20960 4
2 2 20888->20850 1
3 2 20961->20960 4

changing index of 1 row in pandas

I have the the below df build from a pivot of a larger df. In this table 'week' is the the index (dtype = object) and I need to show week 53 as the first row instead of the last
Can someone advice please? I tried reindex and custom sorting but can't find the way
Thanks!
here is the table
Since you can't insert the row and push others back directly, a clever trick you can use is create a new order:
# adds a new column, "new" with the original order
df['new'] = range(1, len(df) + 1)
# sets value that has index 53 with 0 on the new column
# note that this comparison requires you to match index type
# so if weeks are object, you should compare df.index == '53'
df.loc[df.index == 53, 'new'] = 0
# sorts values by the new column and drops it
df = df.sort_values("new").drop('new', axis=1)
Before:
numbers
weeks
1 181519.23
2 18507.58
3 11342.63
4 6064.06
53 4597.90
After:
numbers
weeks
53 4597.90
1 181519.23
2 18507.58
3 11342.63
4 6064.06
One way of doing this would be:
import pandas as pd
df = pd.DataFrame(range(10))
new_df = df.loc[[df.index[-1]]+list(df.index[:-1])].reset_index(drop=True)
output:
0
9 9
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Alternate method:
new_df = pd.concat([df[df["Year week"]==52], df[~(df["Year week"]==52)]])

python panda new column with order of values

I would like to make a new column with the order of the numbers in a list. I get 3,1,0,4,2,5 ( index of the lowest numbers ) but I would like to have a new column with 2,1,4,0,3,5 ( so if I look at a row i get the list and I get in what order this number comes in the total list. what am I doing wrong?
df = pd.DataFrame({'list': [4,3,6,1,5,9]})
df['order'] = df.sort_values(by='list').index
print(df)
What you're looking for is the rank:
import pandas as pd
df = pd.DataFrame({'list': [4,3,6,1,5,9]})
df['order'] = df['list'].rank().sub(1).astype(int)
Result:
list order
0 4 2
1 3 1
2 6 4
3 1 0
4 5 3
5 9 5
You can use the method parameter to control how to resolve ties.

How can I rename NaN columns in python pandas?

Good day everyone! I had trouble putting a nested dictionary as separate columns. However, I fixed it using the concat and json.normalize function. But for some reason the code I used removed all the column names and returned NaN as values for the columns...
Does someone knows how to fix this?
Code I used:
import pandas as pd
c = ['photo.photo_replace', 'photo.photo_remove', 'photo.photo_add', 'photo.photo_effect', 'photo.photo_brightness',
'photo.background_color', 'photo.photo_resize', 'photo.photo_rotate', 'photo.photo_mirror', 'photo.photo_layer_rearrange',
'photo.photo_move', 'text.text_remove', 'text.text_add', 'text.text_edit', 'text.font_select', 'text.text_color', 'text.text_style',
'text.background_color', 'text.text_align', 'text.text_resize', 'text.text_rotate', 'text.text_move', 'text.text_layer_rearrange']
df_edit = pd.concat([json_normalize(x)[c] for x in df['editables']], ignore_index=True)
df.columns = df.columns.str.split('.').str[1]
Current problem:
Result I want:
df= pd.DataFrame({
'A':[1,2,3],
'B':[3,3,3]
})
print(df)
A B
0 1 3
1 2 3
2 3 3
c=['new_name1','new_name2']
df.columns=c
print(df)
new_name1 new_name2
0 1 3
1 2 3
2 3 3
remember , lenght of column names (c) should be equal to column amount

Obtaining the first few rows of a dataframe

Is there a way to get the first n rows of a dataframe without using the indices. For example, I know if I have a dataframe called df I could get the first 5 rows via df.ix[5:]. But, what if my indices are not ordered and I dont want to order them? This does not seem to work. Hence, I was wondering if there is another way to select the first couple of rows. I apologize if there is already an answer to this. I wasnt able to find one.
Use head(5) or iloc[:5]
In [7]:
df = pd.DataFrame(np.random.randn(10,3))
df
Out[7]:
0 1 2
0 -1.230919 1.482451 0.221723
1 -0.302693 -1.650244 0.957594
2 -0.656565 0.548343 1.383227
3 0.348090 -0.721904 -1.396192
4 0.849480 -0.431355 0.501644
5 0.030110 0.951908 -0.788161
6 2.104805 -0.302218 -0.660225
7 -0.657953 0.423303 1.408165
8 -1.940009 0.476254 -0.014590
9 -0.753064 -1.083119 -0.901708
In [8]:
df.head(5)
Out[8]:
0 1 2
0 -1.230919 1.482451 0.221723
1 -0.302693 -1.650244 0.957594
2 -0.656565 0.548343 1.383227
3 0.348090 -0.721904 -1.396192
4 0.849480 -0.431355 0.501644
In [11]:
df.iloc[:5]
Out[11]:
0 1 2
0 -1.230919 1.482451 0.221723
1 -0.302693 -1.650244 0.957594
2 -0.656565 0.548343 1.383227
3 0.348090 -0.721904 -1.396192
4 0.849480 -0.431355 0.501644

Categories