How to evenly distribute row values from one dataframe using another?

How to evenly distribute row values from one dataframe using another? - python

I have the dataframe called plus with the next data:
id
segment
segment_code
1
CNCE_PLUS
None
2
CNCE_PLUS
None
3
CNCE_PLUS
None
4
CNCE_PLUS
None
5
CNCE_PLUS
None
6
CNCE_PLUS
None
7
CNCE_PLUS
None
8
CNCE_PLUS
None
And this other dataframe called segments:
segment
segment_code
ammount
CNCE_PLUS
CNCE_PLUS_1
4
CNCE_PLUS
CNCE_PLUS_2
1
So the question is how can I evenly distribute the rows from one dataframe using the other one? Keep in mind that it must prioritize the rows from segments that has the smallest value in ammount column. If there are rows with equal values in ammount, must distribute 1 to 1 until there are no more rows from plus.
So in the end, ammount should be CNCE_PLUS_1: 6 and CNCE_PLUS_2: 7.
This is my expected output for the final plus dataframe:
id
segment
segment_code
1
CNCE_PLUS
CNCE_PLUS_2
2
CNCE_PLUS
CNCE_PLUS_2
3
CNCE_PLUS
CNCE_PLUS_2
4
CNCE_PLUS
CNCE_PLUS_2
5
CNCE_PLUS
CNCE_PLUS_1
6
CNCE_PLUS
CNCE_PLUS_1
7
CNCE_PLUS
CNCE_PLUS_2
8
CNCE_PLUS
CNCE_PLUS_2
Hope you could help me. I'm trying to solve this for a long time, so I'm kinda desperate.
Regards.

Related

Merging tables and concatenate strings in specific table

I am trying to merge two different tables into one table.
The first table is Pandas data frame that contains information about the period years from 2000 until 2005 or six observations:
time_horizon=pd.DataFrame(range(2000,2005+1))
Now I want to concatenate this text 'WT' with the previous time_horizon
time_horizon+str('WT')
After this next step should be to add specific values for this observation
values=pd.DataFrame(range(1,7))
In the end, I need to have a data frame as data frame showed on the pic below
The second step for concatenation not works for me so I can't implement the third step and make this table.
So can anybody help me how to make this table?

solution to the second step that failed for you.
str('WT')+(time_horizon).astype(str)
0
0 WT2000
1 WT2001
2 WT2002
3 WT2003
4 WT2004
5 WT2005
One way to solve it is
# create a df, with columns only
df=pd.DataFrame(columns=range(2000,2005+1)).add_prefix('WT')
# fill first column with range of values
df.iloc[:,0]= range(1,7)
# forward fill across rows
df.ffill(axis=1)
WT2000 WT2001 WT2002 WT2003 WT2004 WT2005
0 1 1 1 1 1 1
1 2 2 2 2 2 2
2 3 3 3 3 3 3
3 4 4 4 4 4 4
4 5 5 5 5 5 5
5 6 6 6 6 6 6

Filtering Pandas dataset based on multiple conditions

I'm using this dataframe:
Callsign Distance AirportSize
0 HVN19 3.727263 2
1 HVN19 3.727263 1
2 HVN19 11.485452 2
3 CCA839 2.094717 2
4 CCA839 2.094717 1
5 CCA839 6.622537 2
6 CES219 1.751279 1
7 CES219 5.436940 4
8 CES219 6.950773 4
9 ETH704 2.976954 4
10 ETH704 3.844980 4
11 ETH704 5.452634 4
I'm trying to connect Callsign with the smallest value of AirportSize only if Distance between airports for the same Callsign is less than 1. Besides, I would like to keep only these lines and drop other rows within the same Callsign.
I will give an example of the first three lines within the same Callsign (HVN19) to be more clear: we can instantly drop the 3rd row because of Distance difference between the third row and first two lines (11.485452 - 3.727263 > 1). When it comes to the remaining two rows, we choose the second row because AirportSize is smaller compared to the first row (1 < 2).
The result should look like this:
Callsign Distance AirportSize
1 HVN19 3.727263 1
2 CCA839 2.094717 1
3 CES219 1.751279 1
4 ETH704 2.976954 4

With the clarified requirements in the comments, here is a working solution, that can be run and returns the expected result.
It should be very efficient as well.
import pandas as pd
import io
df_txt = """
Id Calls Distance AirportSize
0 HVN19 3.727263 2
1 HVN19 3.727263 1
2 HVN19 11.485452 2
3 CCA839 2.094717 2
4 CCA839 2.094717 1
5 CCA839 6.622537 2
6 CES219 1.751279 1
7 CES219 5.436940 4
8 CES219 6.950773 4
9 ETH704 2.976954 4
10 ETH704 3.844980 4
11 ETH704 5.452634 4
"""
df1 = pd.read_fwf(io.StringIO(df_txt)).set_index('Id').rename(columns = {'Calls':'Callsign'})
# Need to sort or to ensure following will always work
df1.sort_values(['Callsign','Distance'])
# calc distance between each subsequent ap - fillna as 0 for closest ap
df1['dist_diff'] = df1.groupby('Callsign')['Distance'].diff().fillna(0)
df1['dist_diff'] = df1.groupby('Callsign')['dist_diff'].cumsum()
# Keep only smallest qualifiying airport for each callsign, tiebreak with Distance
df1[df1['dist_diff']<1].sort_values(['Callsign','AirportSize','Distance']).drop_duplicates('Callsign').drop('dist_diff',axis=1)
output
Callsign Distance AirportSize
Id
4 CCA839 2.094717 1
6 CES219 1.751279 1
9 ETH704 2.976954 4
1 HVN19 3.727263 1

How to compare the values of two data frames by values and ignore others, the irrelevant rows

How do I compare the values of two columns from a data frame and skip other rows where there is no match between the two columns because the values are not on the same index position or row. i have tried several methods but none has worked so far.
I want to match my second data frame to the the first data frame if the values they have are the same thing, I.e. the value of text and real-text column, when they are not the same it should ignore the unmatch one in the last data frame i have below
I have data frame that looks likes this
Text occurrence
0 my 4
1 name 6
2 is 7
3 very 3
4 popular 1
5 last 6
6 in 4
7 the 2
8 country 2
and another dataframe that looks like this:
real-text
0 my
1 name
2 is
3 very
4 popular
5 in
6 the
7 country
and now I want to merge the two where they actually match up and ignore any rows where there is no match
this is what I have gotten so far but not getting the result i wanted :
Text real-text occurrence
0 my my 4
1 name name 6
2 is is 7
3 very very 3
4 popular popular 1
5 last in 6
6 in the 4
7 the country 2
8 country NaN 1
This is the result i'm expecting
Text real-text occurence
0 my my 4
1 name name 6
2 is is 7
3 very very 3
4 popular popular 1
6 in in 4
7 the the 2
8 country country 1
If you look at the expected data frame, it doesn't have the index position of 5 where there is no match between the two data frame
Thanks in advance as I'm still new to python

Eliminate duplicate connections in undirected network

I have an undirected network of connections in a dataframe.
Source_ID Target_ID
0 1 5
1 7 2
2 12 6
3 3 9
4 16 11
5 2 7 <------The same as row 1
6 4 8
7 5 1 <------The same as row 0
8 99 81
But since this is an undirected network, row 0 and row 7 are technically the same, as are row 1 and row 5. df.drop_duplicates() isn't smart enough to know how to eliminate these as duplicates, as it see them as two distinct rows, at least as far as my attempts have yielded.
I also tried what I thought should work, which is using the index of Source_ID and Target_ID and setting Source_ID to be "lower" than target_ID. But that didn't seem to produce the results I needed either.
df.drop(df.loc[df['Target_ID'] < d['Source_ID']]
.index.tolist(), inplace=True)
Therefore, I need to figure out a way to drop the duplicate connections (while keeping the first) such that my fixed dataframe looks like (after an index reset):
Source_ID Target_ID
0 1 5
1 7 2
2 12 6
3 3 9
4 16 11
5 4 8
6 99 81

Certainly not the most efficient, but might do the job:
df.apply(lambda row: pd.Series() if row[::-1].values in df.values \
and row[0] < row[1] else row, axis=1).dropna().reset_index(drop=True)

Pandas merge two dataframes by time column

I have been struggling to merge data frames. I need to have the rows arranged by the time, with both sets of data's columns merged into a new data frame. I'm sorry if this is clearly documented somewhere, but it has been hard for me to find an appropriate method. I tried append and merge but I am struggling to find an appropriate solution.
dataframe1:
# Date Time, GMT-07:00 Crossflow (Cold) (Volts) \
0 1 8:51:00 AM 1.13431
1 2 8:51:01 AM 1.12821
2 3 8:51:02 AM 1.12943
3 4 8:51:03 AM 1.12759
4 5 8:51:04 AM 1.13065
5 6 8:51:05 AM 1.12821
6 7 8:51:06 AM 1.12943
7 8 8:51:07 AM 1.13065
8 9 8:51:08 AM 1.13126
9 10 8:51:09 AM 1.13126
10 11 8:51:10 AM 1.12821
dataframe2:
# Date Time, GMT-07:00 \
0 1 9:06:39 AM
1 2 9:06:40 AM
2 3 9:06:41 AM
3 4 9:06:42 AM
4 5 9:06:43 AM
5 6 9:06:44 AM
6 7 9:06:45 AM
7 8 9:06:46 AM
8 9 9:06:47 AM
9 10 9:06:48 AM
10 11 9:06:49 AM
K-Type, °C (LGR S/N: 10118625, SEN S/N: 10118625)
0 43.96
1 47.25
2 48.90
3 50.21
4 43.63
5 43.63
6 42.97
7 42.97
8 42.30
9 41.64
10 40.98

It appears that you want to append the dataframes to each other. Make sure that your date column has the same name in both dataframes otherwise pandas will treat them as two totally separate columns.
df = dataframe1.append(dataframe2, ignore_index=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to evenly distribute row values from one dataframe using another? - python

Related

Merging tables and concatenate strings in specific table

Filtering Pandas dataset based on multiple conditions

How to compare the values of two data frames by values and ignore others, the irrelevant rows

Eliminate duplicate connections in undirected network

Pandas merge two dataframes by time column

Categories

Resources