Merging tables and concatenate strings in specific table - python

I am trying to merge two different tables into one table.
The first table is Pandas data frame that contains information about the period years from 2000 until 2005 or six observations:
time_horizon=pd.DataFrame(range(2000,2005+1))
Now I want to concatenate this text 'WT' with the previous time_horizon
time_horizon+str('WT')
After this next step should be to add specific values for this observation
values=pd.DataFrame(range(1,7))
In the end, I need to have a data frame as data frame showed on the pic below
The second step for concatenation not works for me so I can't implement the third step and make this table.
So can anybody help me how to make this table?

solution to the second step that failed for you.
str('WT')+(time_horizon).astype(str)
0
0 WT2000
1 WT2001
2 WT2002
3 WT2003
4 WT2004
5 WT2005
One way to solve it is
# create a df, with columns only
df=pd.DataFrame(columns=range(2000,2005+1)).add_prefix('WT')
# fill first column with range of values
df.iloc[:,0]= range(1,7)
# forward fill across rows
df.ffill(axis=1)
WT2000 WT2001 WT2002 WT2003 WT2004 WT2005
0 1 1 1 1 1 1
1 2 2 2 2 2 2
2 3 3 3 3 3 3
3 4 4 4 4 4 4
4 5 5 5 5 5 5
5 6 6 6 6 6 6

Related

Sliding minimum value in a pandas column

I am working with a pandas dataframe where I have the following two columns: "personID" and "points". I would like to create a third variable ("localMin") which will store the minimum value of the column "points" at each point in the dataframe as compared with all previous values in the "points" column for each personID (see image below).
Does anyone have an idea how to achieve this most efficiently? I have approached this problem using shift() with different period sizes, but of course, shift is sensitive to variations in the sequence and doesn't always produce the output I would expect.
Thank you in advance!
Use groupby.cummin:
df['localMin'] = df.groupby('personID')['points'].cummin()
Example:
df = pd.DataFrame({'personID': list('AAAAAABBBBBB'),
'points': [3,4,2,6,1,2,4,3,1,2,6,1]
})
df['localMin'] = df.groupby('personID')['points'].cummin()
output:
personID points localMin
0 A 3 3
1 A 4 3
2 A 2 2
3 A 6 2
4 A 1 1
5 A 2 1
6 B 4 4
7 B 3 3
8 B 1 1
9 B 2 1
10 B 6 1
11 B 1 1

How do I transpose columns into rows of a Pandas DataFrame?

My current data frame is comprised of 10 rows and thousands of columns. The setup currently looks similar to this:
A B A B
1 2 3 4
5 6 7 8
But I desire something more like below, where essentially I would transpose the columns into rows once the headers start repeating themselves.
A B
1 2
5 6
3 4
7 8
I've been trying df.reshape but perhaps can't get the syntax right. Any suggestions on how best to transpose the data like this?
I'd probably go for stacking, grouping and then building a new DataFrame from scratch, eg:
pd.DataFrame({col: vals for col, vals in df.stack().groupby(level=1).agg(list).items()})
That'll also give you:
A B
0 1 2
1 3 4
2 5 6
3 7 8
Try with stack, groupby and pivot:
stacked = df.T.stack().to_frame().assign(idx=df.T.stack().groupby(level=0).cumcount()).reset_index()
output = stacked.pivot("idx", "level_0", 0).rename_axis(None, axis=1).rename_axis(None, axis=0)
>>> output
A B
0 1 2
1 5 6
2 3 4
3 7 8

Is there an efficient way to categorise rows of sequential increasing data into a group in a pandas data frame

I have a dataset that looks roughly like this (the first column being the index):
measurement value
0 1 0.617350
1 2 0.394176
2 3 0.775822
3 1 0.811693
4 2 0.621867
5 3 0.743718
6 4 0.183111
7 1 0.118586
8 2 0.274038
9 3 0.871772
My values in the second column are sequentially increasing measurement parameters, the test cycles through these measurement parameters, taking a reading at each step, before resetting and going again from the start.
The challenge I face is I need to group each cycle with a label in a fourth column.
measurement value group
0 1 0.617350 1
1 2 0.394176 1
2 3 0.775822 1
3 1 0.811693 2
4 2 0.621867 2
5 3 0.743718 2
6 4 0.183111 2
7 1 0.118586 3
8 2 0.274038 3
9 3 0.871772 3
The only solution I can think of is to have two nested for loops, the first finding the start of each measurement condition, the second counting to the end of each measurement condition, then labelling that group. This doesn't seems to be very efficient though, I wondered if there was a better way?
If each measure starting by 1 compare values by it and add cumulative sum:
df['group'] = df['measurement'].eq(1).cumsum()

How to compare the values of two data frames by values and ignore others, the irrelevant rows

How do I compare the values of two columns from a data frame and skip other rows where there is no match between the two columns because the values are not on the same index position or row. i have tried several methods but none has worked so far.
I want to match my second data frame to the the first data frame if the values they have are the same thing, I.e. the value of text and real-text column, when they are not the same it should ignore the unmatch one in the last data frame i have below
I have data frame that looks likes this
Text occurrence
0 my 4
1 name 6
2 is 7
3 very 3
4 popular 1
5 last 6
6 in 4
7 the 2
8 country 2
and another dataframe that looks like this:
real-text
0 my
1 name
2 is
3 very
4 popular
5 in
6 the
7 country
and now I want to merge the two where they actually match up and ignore any rows where there is no match
this is what I have gotten so far but not getting the result i wanted :
Text real-text occurrence
0 my my 4
1 name name 6
2 is is 7
3 very very 3
4 popular popular 1
5 last in 6
6 in the 4
7 the country 2
8 country NaN 1
This is the result i'm expecting
Text real-text occurence
0 my my 4
1 name name 6
2 is is 7
3 very very 3
4 popular popular 1
6 in in 4
7 the the 2
8 country country 1
If you look at the expected data frame, it doesn't have the index position of 5 where there is no match between the two data frame
Thanks in advance as I'm still new to python

Summarize dataframe by extracting and grouping column with pandas

I would like to summarize column from a csv file. Pretty much extract column data and match it up with relevant ratings and count.
Also, any idea how should I match the expected dataframe with the website image?
website rate
1 two 5
2 two 3
3 two 5
4 one 2
5 one 4
6 one 4
7 one 2
8 one 2
9 two 2
website rate(over 5) count appeal(rate over 5 / count >= 0.5)
one 0 5 0
two 2 4 1
You can use a groupby operation:
res = df.assign(rate_over_5=df['rate'].ge(5))\
.groupby('website').agg({'rate_over_5': ['sum', 'size']})\
.xs('rate_over_5', axis=1).reset_index()
res['appeal'] = ((res['sum'] / res['size']) >= 0.5).astype(int)
print(res)
website sum size appeal
0 one 0.0 5 0
1 two 2.0 4 1

Categories