I have a dictionary with set of Keys & Values and also have CSV file with columns have the same header name of the dictionary Keys.
i need to apply common function which is if the value of the row of the header that has the name of the key is less than the value defined in the dictionary to write true in a new column in the data set.
the problem is i need to make a loop for all columns to match the header name with the Key name and apply the function according to the value of the key on all column rows.
Below code if i did it with out loop , but if the dictionary increased it will be not efficient way.
Code:
dict = {'RRC_SR%' : 99, 'RAB_SR%' : 97, 'UL_UE_Throughput' : 2, 'PS_CDR%' : 0.15, 'DL_UE_Throughput' : 17,
'INTRA HO SR %' : 99}
kpi = pd.read_csv('KPIs for Triggering.csv')
kpi['RRC_SR Result'] = kpi.loc[:, 'RRC_SR%'].apply(lambda x:x>dict['RRC_SR%'])
kpi['RAB_SR Result'] = kpi.loc[:, 'RAB_SR%'].apply(lambda x:x>dict['RAB_SR%'])
kpi['UL_UE_Throughput Result'] = kpi.loc[:, 'UL_UE_Throughput'].apply(lambda x:x>dict['UL_UE_Throughput'])
kpi['PS_CDR Result'] = kpi.loc[:, 'PS_CDR%'].apply(lambda x:x>dict['PS_CDR%'])
kpi['DL_UE_Throughput Result'] = kpi.loc[:, 'DL_UE_Throughput'].apply(lambda x:x>dict['DL_UE_Throughput'])
kpi['INTRA HO SR Result'] = kpi.loc[:, 'INTRA HO SR %'].apply(lambda x:x>dict['INTRA HO SR %'])
Time Object DL_UE_Throughput UL_UE_Throughput PS_CDR% RAB_SR% RRC_SR% INTRA HO SR %
0:00 A_1 4.76 1.04 0.17 99.88 99.98 98.22
0:00 B_2 8.1 1.04 0.15 99.92 99.99 97.8
0:00 C_3 5.63 0.72 0.11 99.94 99.98 96.17
0:00 D_4 8.65 1.06 0.25 99.75 99.95 99.51
0:00 E_5 10.21 0.59 0.37 99.67 99.97 99.18
0:00 F_6 7.46 0.34 0.38 99.3 99.95 99.56
0:00 G_7 10.08 2.31 0.6 99.38 99.73 93.63
0:00 H_8 10.29 1.29 0.84 99.5 99.84 99.22
0:00 I_9 10.19 4.76 0.92 99.26 99.75 97.66
0:00 J_10 7.45 8.6 0.85 99.29 99.72 98.55
Related
I have the data like this:
df:
A-A A-B A-C A-D A-E
Tg 0.37 10.24 5.02 0.63 20.30
USL 0.39 10.26 5.04 0.65 20.32
LSL 0.35 10.22 5.00 0.63 20.28
1 0.35 10.23 5.05 0.65 20.45
2 0.36 10.19 5.07 0.67 20.25
3 0.34 10.25 5.03 0.66 20.33
4 0.35 10.20 5.08 0.69 20.22
5 0.33 10.17 5.05 0.62 20.40
Max 0.36 10.25 5.08 0.69 20.45
Min 0.33 10.17 5.03 0.62 20.22
I would like to color-highlight the data (index 1-5 in this df) by comparing Max and Min of the data (last two rows) to USL and LSL respectively. if Max > USL or Min < LSL, I would like to highlight the corresponding data points as red. if Max == USL or Min == LSL, corresponding data point as yellow and otherwise everything green.
I tried this :
highlight = np.where(df.loc['Max']>df.loc['USL'], 'background-color: red', '')
df.style.apply(lambda _: highlight)
but i get the error:
ValueError: Function <function <lambda> at 0x7fb681b601f0> created invalid index labels.
Usually, this is the result of the function returning a Series which contains invalid labels, or returning an incorrectly shaped, list-like object which cannot be mapped to labels, possibly due to applying the function along the wrong axis.
Result index has shape: (5,)
Expected index shape: (10,)
Out[58]:
<pandas.io.formats.style.Styler at 0x7fb681b52e20>
Use custom function for create DataFrame of styles by conditions:
#changed data for test
print (df)
A-A A-B A-C A-D
Tg 0.37 10.24 5.02 0.63
USL 0.39 10.26 5.04 0.65
LSL 0.33 0.22 5.00 10.63
1 0.35 10.23 5.05 0.65
2 0.36 10.19 5.07 0.67
3 0.34 10.25 5.03 0.66
4 0.35 10.20 5.08 0.69
5 0.33 10.17 5.05 0.62
Max 0.36 10.25 5.08 0.69
Min 0.33 10.17 5.03 0.62
def hightlight(x):
c1 = 'background-color:red'
c2 = 'background-color:yellow'
c3 = 'background-color:green'
#if values of index are strings
r = list('12345')
#if values of index are integers
r = [1,2,3,4,5]
m1 = (x.loc['Max']>x.loc['USL']) | (x.loc['Min']<x.loc['LSL'])
print (m1)
m2 = (x.loc['Max']==x.loc['USL']) | (x.loc['Min']==x.loc['LSL'])
print (m2)
#DataFrame with same index and columns names as original filled empty strings
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
#modify values of df1 columns by boolean mask
df1.loc[r, :] = np.select([m1, m2], [c1, c2], default=c3)
return df1
df.style.apply(hightlight, axis=None)
EDIT: For compare 1-5 and Min/Max use:
def hightlight(x):
c1 = 'background-color:red'
c2 = 'background-color:yellow'
c3 = 'background-color:green'
#if values of index are strings
r = list('12345')
#if values of index are integers
# r = [1,2,3,4,5]
r += ['Max','Min']
m1 = (x.loc[r]>x.loc['USL']) | (x.loc[r]<x.loc['LSL'])
m2 = (x.loc[r]==x.loc['USL']) | (x.loc[r]==x.loc['LSL'])
#DataFrame with same index and columns names as original filled empty strings
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
#modify values of df1 columns by boolean mask
df1.loc[r, :] = np.select([m1, m2], [c1, c2], default=c3)
return df1
df.style.apply(hightlight, axis=None)
I have a large dataframe with two columns and a datetime index. When plotting a section of it, it looks like this:
Basically, data can go up (charging) or down (discharging) (sometimes remaining constant through these cycles) according to the SOC column.
The dataframe looks like the following:
SoC Power
2021-09-25 16:40:00 0.76 2.18
2021-09-25 16:40:10 0.76 2.14
2021-09-25 16:40:20 0.77 2.18
2021-09-25 16:40:30 0.76 1.14
2021-09-25 16:40:30 0.75 1.14
2021-09-25 16:40:30 0.75 1.14
I want to extract the first charging and discharging cycles. In this example, the expected output would be new dataframes as:
"Charging":
SoC Power
2021-09-25 16:40:00 0.76 2.18
2021-09-25 16:40:10 0.76 2.14
2021-09-25 16:40:20 0.77 2.18
"Discharging"
SoC Power
2021-09-25 16:40:30 0.76 1.14
2021-09-25 16:40:30 0.75 1.14
2021-09-25 16:40:30 0.75 1.14
My closest approach for extracting a charging session was the following:
max = df_3['SoC'].diff() < 0
idx = max.idxmax()
df = df.loc[df.index[0]:idx]
However, it only works when the data starts with a charging session (as all it does is stop whenever the values begin to decrease). I want a solution that works despite the initial data point and gives me the first charging cycle data points.
The exact expected output is unclear, but here is an approach to split each phase (charging, discharging) in a dictionary (2 statuses: charging/discharging, with a list of all phases per status):
s = np.sign(df['SoC'].diff())
s2 = s.mask(s.eq(0)).ffill().bfill().map({1: 'charging', -1: 'discharging'})
from collections import defaultdict
out = defaultdict(list)
for (status,_), d in s2.groupby([s2, s2.ne(s2.shift()).cumsum()]):
out[status].append(d)
dict(out)
output:
{'charging': [2021-09-25 16:40:00 charging
2021-09-25 16:40:10 charging
2021-09-25 16:40:20 charging
Name: SoC, dtype: object],
'discharging': [2021-09-25 16:40:30 discharging
2021-09-25 16:40:30 discharging
2021-09-25 16:40:30 discharging
Name: SoC, dtype: object]}
For a single item:
out['charging'][0]
output:
2021-09-25 16:40:00 charging
2021-09-25 16:40:10 charging
2021-09-25 16:40:20 charging
Name: SoC, dtype: object
as DataFrame:
s = np.sign(df['SoC'].diff())
df['status'] = (s.mask(s.eq(0)).ffill().bfill()
.map({1: 'charging', -1: 'discharging'})
)
df['phase'] = s2.ne(s2.shift()).cumsum()
output:
SoC Power status phase
2021-09-25 16:40:00 0.76 2.18 charging 1
2021-09-25 16:40:10 0.76 2.14 charging 1
2021-09-25 16:40:20 0.77 2.18 charging 1
2021-09-25 16:40:30 0.76 1.14 discharging 2
2021-09-25 16:40:30 0.75 1.14 discharging 2
2021-09-25 16:40:30 0.75 1.14 discharging 2
I don't how the data is, but I think that creating a window walking through the data is a good idea and which pandas has a functionality for it.
I explain:
The window starts small, it moves through the data and can potentially increase in size by checking if the pattern of increase or decrease is still there. In case of a change in pattern the window will be transformed to its original size and repeat the process until the end.
I have a df that looks like this:
a b c
124 -3.09 -0.38 2.34
2359 4.81 0.51 -1.53
56555 -4.34 -0.64 2.31
96786 -3.33 -3.34 -7.62
I want to calculate the absolute max value of each row in a new column that keeps negatives as negatives. The closest I've gotten is with the following:
df['new_column'] = df.abs().max(axis = 1)
new_column
3.09
4.81
4.34
7.62
But I need the new column to keep the negative signs—i.e. to look like this:
new_column
-3.09
4.81
-4.34
-7.62
I've attempted a few things using abs().idxmax(), and am wondering if I need to find the location of the absolute max value, and then return the value in that location in the new column—just not sure how to do this. Thoughts?
Here's one way using two steps: First, find the absolute max. Then see if absolute max equals any values in the DataFrame using eq and use the output as the power of -1 to get the signs:
row_max = df.abs().max(axis=1)
df['new_column'] = row_max * (-1) ** df.ne(row_max, axis=0).all(axis=1)
Another option is to use mask to choose values:
df['columns'] = df.max(axis=1).mask(lambda x: x < row_max, -row_max)
Output:
a b c new_column
124 -3.09 -0.38 2.34 -3.09
2359 4.81 0.51 -1.53 4.81
56555 -4.34 -0.64 2.31 -4.34
96786 -3.33 -3.34 -7.62 -7.62
I like the original idea you thought of, keeping with the theme:
# setup
data = {'a': [-3.09, 4.81, -4.34, -3.33],
'b': [-.38, .51, -.64, -3.34],
'c': [2.34, -1.53, 2.31, -7.62]}
df = pd.DataFrame(data, index= [124, 2359,56555,96786])
instead of:
df['new_column'] = df.abs().max(axis = 1)
let's change it to return the column instead of actual value:
max_col = df.abs().idxmax(axis = 1)
from there we can just iterate over it with enumerate for the row number and set it as the new column:
df['new_column'] = [df.loc[row,col] for row, col in zip(df.index, max_col)]
results:
a b c new_column
124 -3.09 -0.38 2.34 -3.09
2359 4.81 0.51 -1.53 4.81
56555 -4.34 -0.64 2.31 -4.34
96786 -3.33 -3.34 -7.62 -7.62
I'm trying to fit a set of data taken by an external simulation, and stored in a vector, with the Lmfit library.
Below there's my code:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from lmfit import Parameters
def DGauss3Par(x,I1,sigma1,sigma2):
I2 = 2.63 - I1
return (I1/np.sqrt(2*np.pi*sigma1))*np.exp(-(x*x)/(2*sigma1*sigma1)) + (I2/np.sqrt(2*np.pi*sigma2))*np.exp(-(x*x)/(2*sigma2*sigma2))
#TAKE DATA
xFull = []
yFull = []
fileTypex = np.dtype([('xFull', np.float)])
fileTypey = np.dtype([('yFull', np.float)])
fDatax = "xValue.dat"
fDatay = "yValue.dat"
xFull = np.loadtxt(fDatax, dtype=fileTypex)
yFull = np.loadtxt(fDatay, dtype=fileTypey)
xGauss = xFull[:]["xFull"]
yGauss = yFull[:]["yFull"]
#MODEL'S DEFINITION
gmodel = Model(DGauss3Par)
params = Parameters()
params.add('I1', value=1.66)
params.add('sigma1', value=1.04)
params.add('sigma2', value=1.2)
result3 = gmodel.fit(yGauss, x=xGauss, params=params)
#PLOTS
plt.plot(xGauss, result3.best_fit, 'y-')
plt.show()
When I run it, I get this error:
File "Overlap.py", line 133, in <module>
result3 = gmodel.fit(yGauss, x=xGauss, params=params)
ValueError: The input contains nan values
These are the values of the data contained in the vector xGauss (related to the x axis):
[-3.88 -3.28 -3.13 -3.08 -3.03 -2.98 -2.93 -2.88 -2.83 -2.78 -2.73 -2.68
-2.63 -2.58 -2.53 -2.48 -2.43 -2.38 -2.33 -2.28 -2.23 -2.18 -2.13 -2.08
-2.03 -1.98 -1.93 -1.88 -1.83 -1.78 -1.73 -1.68 -1.63 -1.58 -1.53 -1.48
-1.43 -1.38 -1.33 -1.28 -1.23 -1.18 -1.13 -1.08 -1.03 -0.98 -0.93 -0.88
-0.83 -0.78 -0.73 -0.68 -0.63 -0.58 -0.53 -0.48 -0.43 -0.38 -0.33 -0.28
-0.23 -0.18 -0.13 -0.08 -0.03 0.03 0.08 0.13 0.18 0.23 0.28 0.33
0.38 0.43 0.48 0.53 0.58 0.63 0.68 0.73 0.78 0.83 0.88 0.93
0.98 1.03 1.08 1.13 1.18 1.23 1.28 1.33 1.38 1.43 1.48 1.53
1.58 1.63 1.68 1.73 1.78 1.83 1.88 1.93 1.98 2.03 2.08 2.13
2.18 2.23 2.28 2.33 2.38 2.43 2.48 2.53 2.58 2.63 2.68 2.73
2.78 2.83 2.88 2.93 2.98 3.03 3.08 3.13 3.28 3.88]
And these ones the ones in the vector yGauss (related to y axis):
[0.00173977 0.00986279 0.01529543 0.0242624 0.0287456 0.03238484
0.03285927 0.03945234 0.04615091 0.05701618 0.0637672 0.07194268
0.07763934 0.08565687 0.09615262 0.1043281 0.11350606 0.1199406
0.1260062 0.14093328 0.15079665 0.16651464 0.18065023 0.1938894
0.2047541 0.21794024 0.22806706 0.23793043 0.25164404 0.2635118
0.28075974 0.29568682 0.30871501 0.3311846 0.34648062 0.36984661
0.38540666 0.40618835 0.4283945 0.45002014 0.48303911 0.50746062
0.53167057 0.5548792 0.57835128 0.60256181 0.62566436 0.65704847
0.68289386 0.71332794 0.73258027 0.769608 0.78769989 0.81407275
0.83358852 0.85210239 0.87109068 0.89456217 0.91618782 0.93760247
0.95680234 0.96919757 0.9783219 0.98486193 0.9931429 0.9931429
0.98486193 0.9783219 0.96919757 0.95680234 0.93760247 0.91618782
0.89456217 0.87109068 0.85210239 0.83358852 0.81407275 0.78769989
0.769608 0.73258027 0.71332794 0.68289386 0.65704847 0.62566436
0.60256181 0.57835128 0.5548792 0.53167057 0.50746062 0.48303911
0.45002014 0.4283945 0.40618835 0.38540666 0.36984661 0.34648062
0.3311846 0.30871501 0.29568682 0.28075974 0.2635118 0.25164404
0.23793043 0.22806706 0.21794024 0.2047541 0.1938894 0.18065023
0.16651464 0.15079665 0.14093328 0.1260062 0.1199406 0.11350606
0.1043281 0.09615262 0.08565687 0.07763934 0.07194268 0.0637672
0.05701618 0.04615091 0.03945234 0.03285927 0.03238484 0.0287456
0.0242624 0.01529543 0.00986279 0.00173977]
I've also tried to print the values returned by my function, to see if there really were some NaN values:
params = Parameters()
params.add('I1', value=1.66)
params.add('sigma1', value=1.04)
params.add('sigma2', value=1.2)
func = DGauss3Par(xGauss,I1,sigma1,sigma2)
print func
but what I obtained is:
[0.04835225 0.06938855 0.07735839 0.08040181 0.08366964 0.08718237
0.09096169 0.09503048 0.0994128 0.10413374 0.10921938 0.11469669
0.12059333 0.12693754 0.13375795 0.14108333 0.14894236 0.15736337
0.16637406 0.17600115 0.18627003 0.19720444 0.20882607 0.22115413
0.23420498 0.24799173 0.26252377 0.27780639 0.29384037 0.3106216
0.32814069 0.34638266 0.3653266 0.38494543 0.40520569 0.42606735
0.44748374 0.46940149 0.49176057 0.51449442 0.5375301 0.56078857
0.58418507 0.60762948 0.63102687 0.65427809 0.6772804 0.69992818
0.72211377 0.74372824 0.76466232 0.78480729 0.80405595 0.82230355
0.83944875 0.85539458 0.87004937 0.88332762 0.89515085 0.90544838
0.91415806 0.92122688 0.92661155 0.93027889 0.93220625 0.93220625
0.93027889 0.92661155 0.92122688 0.91415806 0.90544838 0.89515085
0.88332762 0.87004937 0.85539458 0.83944875 0.82230355 0.80405595
0.78480729 0.76466232 0.74372824 0.72211377 0.69992818 0.6772804
0.65427809 0.63102687 0.60762948 0.58418507 0.56078857 0.5375301
0.51449442 0.49176057 0.46940149 0.44748374 0.42606735 0.40520569
0.38494543 0.3653266 0.34638266 0.32814069 0.3106216 0.29384037
0.27780639 0.26252377 0.24799173 0.23420498 0.22115413 0.20882607
0.19720444 0.18627003 0.17600115 0.16637406 0.15736337 0.14894236
0.14108333 0.13375795 0.12693754 0.12059333 0.11469669 0.10921938
0.10413374 0.0994128 0.09503048 0.09096169 0.08718237 0.08366964
0.08040181 0.07735839 0.06938855 0.04835225]
So it doesn't seems that there are NaN values, I'm not understanding for which reason it returns me that error.
Could anyone help me, please? Thanks!
If you add a print function to your fit function, printing out sigma1 and sigma2, you'll find that
DGauss3Par is evaluated already a few times before the error occurs.
Both sigma variables have a negative value at the time the error occurs.
Taking the square root of a negative value causes, of course, a NaN.
You should add a min bound or similar to your sigma1 and sigma2 parameters to prevent this. Using min=0.0 as an additional argument to params.add(...) will result in a good fit.
Be aware that for some analyses, setting explicit bounds to your fitting parameters may make these analyses invalid. For most cases, you'll be fine, but for some cases, you'll need to check whether the fitting parameters should be allowed to vary from negative infinity to positive infinity, or are allowed to be bounded.
I want to extract the name of comets from my table held in a text file. However some comets are 1-worded, others are 2-worded, and some are 3-worded. My table looks like this:
9P/Tempel 1 1.525 0.514 10.5 5.3 2.969
27P/Crommelin 0.748 0.919 29.0 27.9 1.484
126P/IRAS 1.713 0.697 45.8 13.4 1.963
177P/Barnard 1.107 0.954 31.2 119.6 1.317
P/2008 A3 (SOHO) 0.049 0.984 22.4 5.4 1.948
P/2008 Y11 (SOHO) 0.046 0.985 24.4 5.3 1.949
C/1991 L3 Levy 0.983 0.929 19.2 51.3 1.516
However, I know that the name of the comets is from character 5 till character 37. How can I write a code to tell python that the first column is from character 5 till character 37?
data = """9P/Tempel 1 1.525 0.514 10.5 5.3 2.969
27P/Crommelin 0.748 0.919 29.0 27.9 1.484
126P/IRAS 1.713 0.697 45.8 13.4 1.963
177P/Barnard 1.107 0.954 31.2 119.6 1.317
P/2008 A3 (SOHO) 0.049 0.984 22.4 5.4 1.948
P/2008 Y11 (SOHO) 0.046 0.985 24.4 5.3 1.949
C/1991 L3 Levy 0.983 0.929 19.2 51.3 1.516""".split('\n')
To read the whole file you can use
f = open('data.txt', 'r').readlines()
It seems that you have columns that you can use.
If you're only interested in the first column then :
len("9P/Tempel 1 ")
It gives 33.
So,
Extract the first column :
for line in data:
print line[:33].strip()
Here what's printed :
9P/Tempel 1
27P/Crommelin
126P/IRAS
177P/Barnard
P/2008 A3 (SOHO)
P/2008 Y11 (SOHO)
C/1991 L3 Levy
If what you want is :
Tempel 1
Crommelin
IRAS
...
You have to use a regular expression.
Example :
reg = '.*?/[\d\s]*(.*)'
print re.match(reg, '27P/Crommelin').group(1)
print re.match(reg, 'C/1991 L3 Levy').group(1)
Here's the output :
Crommelin
L3 Levy
You can also take a glance to the read_fwf of the python pandas library.
It allows to parse your file specifying the number of characters per columns.