Convert data resulted a function into data frame - python

I have run a function code on python to get output elevation using a code as given below. But I would like to convert the results into a data frame.
depth() #function
Results: 0 49 2
1 50 2.5
2 52 3
3 53 3.5
4 54 4
......
.......
100 102 9
I am facing problems to turn these results into a data frame. I used following codes, but didn't work.
df = pd.DataFrame(columns = ['id', 'Z', 'water level'])
df = df.apply(water_depth())
print(df)

IIUC, you can try:
import pandas as pd
from io import StringIO
data = StringIO("""0 49 2
1 50 2.5
2 52 3
3 53 3.5
4 54 4
100 102 9
""")
df = pd.read_csv(data, sep=' ', header=None)
df.columns = ['id', 'Z', 'water level']
print(df)
Output:
id Z water level
0 0 49 2.0
1 1 50 2.5
2 2 52 3.0
3 3 53 3.5
4 4 54 4.0
5 100 102 9.0
When you have the data saved in a file.csv, you can replace data with 'file.csv'.

The way I'd do it:
dictionary = {
'water_level':[x/10 for x in range(20,91,5)], #example
'Z':[] # generate values of z
'id':[] # generate values of id
}
If you have some special function to generate the values, just create three lists and then create the dictionary.
After that:
pd.DataFrame(dictionary)

Related

Generating a new variable based on the values of other variables

I have the following data set
import pandas as pd
df = pd.DataFrame({"ID": [1,1,1,1,1,2,2,2,2,2],
"TP1": [1,2,3,4,5,9,8,7,6,5],
"TP2": [11,22,32,43,53,94,85,76,66,58],
"TP10": [114,222,324,443,535,94,385,76,266,548],
"count": [1,2,3,4,10,1,2,3,4,10]})
print (df)
I want a "Final" variable in the df that will be based on the ID, TP and count variable.
The final result will look like following.
import pandas as pd
import numpy as np
df = pd.DataFrame({"ID": [1,1,1,1,1,2,2,2,2,2], "TP1": [1,2,3,4,5,9,8,7,6,5],
"TP2": [11,22,32,43,53,94,85,76,66,58], "TP10": [114,222,324,443,535,94,385,76,266,548],
"count": [1,2,3,4,10,1,2,3,4,10],
"final" : [1,22,np.nan,np.nan,535,9,85,np.nan,np.nan,548]})
print (df)
So for example, the loop of if will do the following
It will look at the ID
Then for 1st ID it should look at value of count, if the value of count is 1
Then if should look at the variable TP1 and its 1st value should be placed in "final" variable.
The look will then look at count 2 for ID 1 and the value of TP2 should come in the "final" variable and so on.
I hope my question is clear. I am looking for a loop because there are 1000 TP variables in the original dataset.
I tried to make a code something like the following but it is utterly rubbish.
for col in df.columns:
if col.startswith('TP') and count == int(col[2:])
df["Final"] = count
Thanks
If my understanding is correct, if count=1 then pick TP1, if count=2 then pick TP2 etc.
This can be done with numpy.select(). Note that I have added the condition if f"TP{x}" in df.columns because not all columns TP1, TP2, TP3, ... TP10 are available in the dataframe. If all are available in your actual dataframe then this if statement is not required.
import numpy as np
conds = [df["count"] == x for x in range(1,11) if f"TP{x}" in df.columns]
output = [df[f"TP{x}"] for x in range(1,11) if f"TP{x}" in df.columns]
df["final"] = np.select(conds, output, np.nan)
print(df)
Output:
ID TP1 TP2 TP10 count final
0 1 1 11 114 1 1.0
1 1 2 22 222 2 22.0
2 1 3 32 324 3 NaN
3 1 4 43 443 4 NaN
4 1 5 53 535 10 535.0
5 2 9 94 94 1 9.0
6 2 8 85 385 2 85.0
7 2 7 76 76 3 NaN
8 2 6 66 266 4 NaN
9 2 5 58 548 10 548.0

How to stack two columns of a pandas dataframe in python

I want to stack two columns on top of each other
So I have Left and Right values in one column each, and want to combine them into a single one. How do I do this in Python?
I'm working with Pandas Dataframes.
Basically from this
Left Right
0 20 25
1 15 18
2 10 35
3 0 5
To this:
New Name
0 20
1 15
2 10
3 0
4 25
5 18
6 35
7 5
It doesn't matter how they are combined as I will plot it anyway, and the new column name also doesn't matter because I can rename it.
You can create a list of the cols, and call squeeze to anonymise the data so it doesn't try to align on columns, and then call concat on this list, passing ignore_index=True creates a new index, otherwise you'll get the names as index values repeated:
cols = [df[col].squeeze() for col in df]
pd.concat(cols, ignore_index=True)
Many options, stack, melt, concat, ...
Here's one:
>>> df.melt(value_name='New Name').drop('variable', 1)
New Name
0 20
1 15
2 10
3 0
4 25
5 18
6 35
7 5
You can also use np.ravel:
import numpy as np
out = pd.DataFrame(np.ravel(df.values.T), columns=['New name'])
print(out)
# Output
New name
0 20
1 15
2 10
3 0
4 25
5 18
6 35
7 5
Update
If you have only 2 cols:
out = pd.concat([df['Left'], df['Right']], ignore_index=True).to_frame('New name')
print(out)
# Output
New name
0 20
1 15
2 10
3 0
4 25
5 18
6 35
7 5
Solution with unstack
df2 = df.unstack()
# recreate index
df2.index = np.arange(len(df2))
A solution with masking.
# Your data
import numpy as np
import pandas as pd
df = pd.DataFrame({"Left":[20,15,10,0], "Right":[25,18,35,5]})
# Masking columns to ravel
df2 = pd.DataFrame({"New Name":np.ravel(df[["Left","Right"]])})
df2
New Name
0 20
1 25
2 15
3 18
4 10
5 35
6 0
7 5
I ended up using this solution, seems to work fine
df1 = dfTest[['Left']].copy()
df2 = dfTest[['Right']].copy()
df2.columns=['Left']
df3 = pd.concat([df1, df2],ignore_index=True)

How to Calculate Percentage of Total in Dataframe Column Stacked on Top of Total of Column (Python)

so let's say I have a dataframe:
data = [['1', 10,], ['2', 15], ['3', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['id', '# of Wagons'])
The output looks like:
id # of Wagons
0 1 10
1 2 15
2 3 14
How do I create percentages of the total while also keeping the total? If I use the .apply() function, I apply percentages to every value in the column, including the total, which I want to avoid doing.
My preferred output is:
id # of Wagons new_column
0 1 10 25.64%
1 2 15 38.46%
2 3 14 35.89%
Total 39
We can do
df['New']=df['# of Wagons']/df['# of Wagons'].sum()
df=df.append(pd.Series(['Total',df['# of Wagons'].sum(),1],index=df.columns),ignore_index=True)
df
Out[158]:
id # of Wagons New
0 1 10 0.256410
1 2 15 0.384615
2 3 14 0.358974
3 Total 39 1.000000
You can use pd.Series.div then use {:.precision%}.format to get values as percentage values.
df.assign(new_col = df['# of Wagons'].div(df['# of Wagons'].sum()).map('{:.2%}'.format))
id # of Wagons new_col
0 1 10 25.64%
1 2 15 38.46%
2 3 14 35.90%
Note:
'{:.precision%}' is part of python's mini string language
You can do something like this:
total = sum(df['# of Wagons'].values)
df["percentage"] = df['# of Wagons'].apply(lambda x: "{:.2f}%".format((x/total)*100))
print(df)
# id # of Wagons percentage
#0 1 10 25.64%
#1 2 15 38.46%
#2 3 14 35.90%
You can add the percentage based on the '# of Wagons' like so:
import numpy as np
import pandas as pd
from pandas import DataFrame
total = np.sum(df.loc[:,'# of Wagons':].values)
df['percent'] = df.loc[:,'# of Wagons':].sum(axis=1)/total * 100
df
And if you want to add a 'Total' row you can use this:
df.append(df.sum(numeric_only=True), ignore_index=True)

Extend and fill a Pandas DataFrame to match another

I have two Pandas DataFrames A and B.
They have an identical index (weekly dates) up to a point: the series ends at the beginning of the year
for A and goes on for a number of observations in frame B. I need to set data frame A to have the same index as frame B - and fill each column with its own last values.
Thank you in advance.
Tikhon
EDIT: thank you for the advice on the question. What I need is for dfA_before to look at dfB and become dfA_after:
print(dfA_before)
a b
0 10 100
1 20 200
2 30 300
print(dfB)
a b
0 11 111
1 22 222
2 33 333
3 44 444
4 55 555
print(dfA_after)
a b
0 10 100
1 20 200
2 30 300
3 30 300
4 30 300
This should work
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'a':[10,20,30],'b':[100,200,300]})
df2 = pd.DataFrame({'a':[11,22,33,44,55],'c':[111,222,333,444,555]})
# solution
last = df1.iloc[-1].to_numpy()
df3 = pd.DataFrame(np.tile(last,(2,1)),
columns=df1.columns)
df4 = df1.append(df3,ignore_index=True)
# method 2
for _ in range(len(df2)-len(df1)):
df1.loc[len(df1)] = df1.loc[len(df1)-1]
# method 3
for _ in range(df2.shape[0]-df1.shape[0]):
df1 = df1.append(df1.loc[len(df1)-1],ignore_index=True)
# result
a b
0 10 100
1 20 200
2 30 300
3 30 300
4 30 300
Probably very inefficient - I am a beginner:
dfA_New = dfB.copy()
dfA_New.loc[:] = 0
dfA_New.loc[:] = dfA.loc[:]
dfA_New.fillna(method='ffill', inplace = True)
dfA = dfA_New

Take a cell's value to indicate a column name in pandas

Input
DBN Grade 3 4 5
0 01M015 3 30 44 15
1 01M015 4 30 44 15
2 01M015 5 30 44 15
Desired Output
DBN Grade 3 4 5 Enrollment
0 01M015 3 30 44 15 30
1 01M015 4 30 44 15 44
2 01M015 5 30 44 15 15
How would you create the Enrollment column?
Note that the column we seek for each record depends on the value at df['Grade'].
I've tried variations of df[df['Grade']] so that I could find the column df['3'], but I haven't been successful.
Is there a way to do this simply?
import pandas as pd
import numpy as np
data={'DBN':['01M015','01M015','01M015'],
'Grade':['3','4','5'],
'3':['30','30','30'],
'4':['44','44','44'],
'5':['15','15','15']}
df = pd.DataFrame(data)
# This line below doesn't work: raises ValueError: Length of values does not match length of index
df['Enrollment'] = [df[c] if (df.loc[i,'Grade'] == c) else None for i in df.index for c
in df.columns]
Set your index, and then use lookup:
df.set_index('Grade').lookup(df['Grade'], df['Grade'])
array(['30', '44', '15'], dtype=object)
You might run into some issues if your data is numeric (in your sample data it is all strings), requiring a cast to make the lookup succeed.
import pandas as pd
import numpy as np
data={'DBN':['01M015','01M015','01M015'],
'Grade':['3','4','5'],
'3':['30','30','30'],
'4':['44','44','44'],
'5':['15','15','15']}
df = pd.DataFrame(data)
enrollmentList = []
for index, row in df.iterrows():
enrollmentList.append(row[row["Grade"]])
df['Enrollment'] = enrollmentList

Categories