Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Having 2 Data Frames with readings at 2 different times as:
DF1
Sensor ID Reference Pressure Sensor Pressure
0 013677 100.15 93.18
1 013688 101.10 95.23
2 013699 100.87 93.77
... ... ... ...
And
DF2
Sensor ID Reference Pressure Sensor Pressure
0 013688 120.01 119.43
1 013677 118.93 118.88
2 013699 120.05 118.85
... ... ... ...
What would be the optimal way of creating a third Dataframe, that contains the difference between those readings, given that the "Sensor ID" values order does not match between the two dataframes?
Pandas has this beautiful feature where it automatically aligns on indices. So we can use that to solve your problem:
df1.set_index("Sensor ID").sub(df2.set_index("Sensor ID"))
Reference Pressure Sensor Pressure
Sensor ID
13677 -18.78 -25.70
13688 -18.91 -24.20
13699 -19.18 -25.08
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 hours ago.
Improve this question
I have a dataframe with number of serial numbers, each serial number has 'START' and 'STOP' events. I want to calculate stoppage time for each serial number for each day, there can be multiple start stop events in a day but i need to consider cumulative stoppage time(STOP-START). How can I do it with python? Added image of glimpse of data.
enter image description here
This is for 1 serial number, but how to write it for all serial numbers
dfout = pd.DataFrame()
dfout['EventType'] = UptimeS['EventType']
dfout['EventStartTime'] = UptimeS['EventDate']
dfout['SerialNumber'] = UptimeS['SerialNumber']
dfout['change'] = np.where(UptimeS['EventType']!=UptimeS['EventType'].shift(),1,0)
dfout = dfout.loc[dfout['change'] !=0 ,:]
dfout['EventEndTime'] = dfout['EventStartTime'].shift(-1)
dfout = dfout.loc[dfout['EventType']=='STOP_Op']
dfout['CommDownTime'] = pd.to_datetime(dfout['EventEndTime']) - pd.to_datetime(dfout['EventStartTime'])
dfout['CommDownTime'] = pd.to_datetime(dfout['CommDownTime'])
dfout = dfout.groupby(['SerialNumber'])['DownTime'].sum()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am working on a dataframe. Data in the image
Q. I want the number of shows released per year but if I'm applying count() function, it's giving me 6 instead of 3. Could anyone suggest how do I get the correct value count.
To get unique value of single year, you can use
count = len(df.loc[df['release_year'] == 1945, 'show_id'].unique())
# or
count = df.loc[df['release_year'] == 1945, 'show_id'].nunique()
To summarize unique value of dataframe by year, you can drop_duplicates() on column show_id first.
df.drop_duplicates(subset=['show_id']).groupby('release_year').count()
Or use value_counts() on column after dropping duplicates.
df.drop_duplicates(subset=['show_id'])['release_year'].value_counts()
df['show_id'].nunique().count()
should do the job.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
The data is like this and it is in a data frame.
PatientId Payor
0 PAT10000 [Cash, Britam]
1 PAT10001 [Madison, Cash]
2 PAT10002 [Cash]
3 PAT10003 [Cash, Madison, Resolution]
4 PAT10004 [CIC Corporate, Cash]
I want to remove the square brackets and filter all patients who used at least a certain mode of payment eg madison then obtain their ID. Please help.
This will generate a list of tuples (id, payor). (df is the dataframe)
payment = 'Madison'
ids = [(id, df.Payor[i][1:-1]) for i, id in enumerate(df.PatientId) if payment in df.Payor[i]]
let's say, your data frame variable initialized as "df" and after removing square brackets, you want to filter all elements containing "Madison" under "Payor" column
df.replace({'[':''}, regex = True)
df.replace({']':''}, regex = True)
filteredDf = df.loc[df['Payor'].str.contains("Madison")]
print(filteredDf)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a dataframe in which each row show one transaction, items purchased together. Here is how my dataframe looks like
items
['A','B','C']
['A','C]
['C','F']
...
I need to create a dictionary which shows how many times items have been purchased together, something like below
{'A':[('B',1),('C':5)], 'B': [('A':1),('C':6)], ...}
Right now, I have defined a variable freq and then loop through my dataframe and calculate/update the dictionary (freq). but it's taking very long.
What's the efficient way of calculating this without looping through the dataframe?
You can speed this up with sklearn's MultiLabelBinarizer:
from sklearn.preprocessing import MultiLabelBinarizer
Transform your data using:
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(df['items']),
columns=mlb.classes_,
index=df.index)
to get it in the following format:
A B C F
0 1 1 1 0
1 1 0 1 0
2 0 0 1 1
And then getting you can define a trivial function like:
get_num_coexisting = lambda x, y: (df[x] & df[y]).sum()
And use as so:
get_num_coexisting('A', 'C')
>>> 2
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a dataframe df with 3 columns :
df=pd.DataFrame({
'User':['A','A','B','A','C','B','C'],
'Values':['x','y','z','p','q','r','s'],
'Date':[14,11,14,12,13,10,14]
})
I want to create a new dataframe that will contain the rows corresponding to highest values in the 'Date' columns for each user. For example for the above dataframe I want the desired dataframe to be as follows ( its a jpeg image):
Can anyone help me with this problem?
This answer assumes that there is different maximum values per user in Values column:
In [10]: def get_max(group):
...: return group[group.Date == group.Date.max()]
...:
In [12]: df.groupby('User').apply(get_max).reset_index(drop=True)
Out[12]:
Date User Values
0 14 A x
1 14 B z
2 14 C s