Pandas equivalent of SQL Group By while concatenating columns - python

Suppose I have table having following columns:
firstname, surname, tel
something like this :
firstname surname tel
alex topol 1234
jim jimix 2312
alex topol 2344
now I want to find numberof tel per person and sort, so I write this in SQL:
select concat(firstname,' ',surname),count(*) from wp_eqra_requests group by concat(firstname,' ',surname) order by count(*) desc
But do I write this in Python Pandas? I tried using groupby but had no sucess in concatening two columns:
numUsers = df.groupby(by=["firstname", "surname")["tel"].count()

Similar to SQL you can use the add operator to concat two columns then groupby and count the values
df.groupby(df['firstname'] + ' ' + df['surname'])['tel'].count()
alex topol 2
jim jimix 1
Name: tel, dtype: int64

Related

SQLite removing a row, and reordering the rest of the rows

I'm struggling with something relatively simple,
Let's say I have a table the first column is the primary key autoincrement
id
name
age
1
tom
22
2
harry
33
3
greg
44
4
sally
55
I want to remove row 2 and the rest of the items to automatically reorder so it looks like this
id
name
age
1
tom
22
2
greg
44
3
sally
55
I have tried every available bit of advice I can find online, they all involve deleting the table name from the sqlite_sequence table which doesn't work
is there a simple way to do archive what I am after?
I don't see the point/need to resequence the id column, as all values there will continue to be unique, even after deleting one or more records. If you really wanted to view your data this way, you could delete the record mentioned, and then use ROW_NUMBER:
DELETE
FROM yourTable
WHERE id = 2;
SELECT ROW_NUMBER() OVER (ORDER BY id) id, name, age
FROM yourTable
ORDER BY id;

Update a value based on another dataframe pairing

I have problem where I need to update a value if people were at the same table.
import pandas as pd
data = {"p1":['Jen','Mark','Carrie'],
"p2":['John','Jason','Rob'],
"value":[10,20,40]}
df = pd.DataFrame(data,columns=["p1",'p2','value'])
meeting = {'person':['Jen','Mark','Carrie','John','Jason','Rob'],
'table':[1,2,3,1,2,3]}
meeting = pd.DataFrame(meeting,columns=['person','table'])
df is a relationship table and value is the field i need to update. So if two people were at the same table in the meeting dataframe then update the df row accordingly.
for example: Jen and John were both at table 1, so I need to update the row in df that has Jen and John and set their value to value + 100 so 110.
I thought about maybe doing a self join on meeting to get the format to match that of df but not sure if this is the easiest or fastest approach
IIUC you could set the person as index in the meeting dataframe, and use its table values to replace the names in df. Then if both mappings have the same value (table), replace with df.value+100:
m = df[['p1','p2']].replace(meeting.set_index('person').table).eval('p1==p2')
df['value'] = df.value.mask(m, df.value+100)
print(df)
p1 p2 value
0 Jen John 110
1 Mark Jason 120
2 Carrie Rob 140
This could be an approach, using df.to_records():
groups=meeting.groupby('table').agg(set)['person'].to_list()
df['value']=[row[-1]+100 if set(list(row)[1:3]) in groups else row[-1] for row in df.to_records()]
Output:
df
p1 p2 value
0 Jen John 110
1 Mark Jason 120
2 Carrie Rob 140

Modify series from other series objects

so I've data like this:
Id Title Fname lname email
1 meeting with Jay, Aj Jay kay jk#something.com
1 meeting with Jay, Aj Aj xyz aj#something.com
2 call with Steve Steve Jack st#something.com
2 call with Steve Harvey Ray h#something.com
3 lunch Mike Mil Mike m#something.com
I want to remove firstname & last name for each unique Id from Title.
I tried grouping by Id which gives series Objects for Title, Fname, Lname,etc
df.groupby('Id')
I've concatenated Fname with .agg(lambda x: x.sum() if x.dtype == 'float64' else ','.join(x))
& kept in concated dataframe.
likewise all other columns get aggregated. Question is how do I replace values in Title based on this aggregated series.
concated['newTitle'] = [ concated.Title.str.replace(e[0]).replace(e[1]).replace(e[1])
for e in
zip(concated.FName.str.split(','), concated.LName.str.split(','))
]
I want something like this, or some other way, by which for each Id, I could get newTitle, with replaced values.
output be like:
Id Title
1 Meeting with ,
2 call with
3 lunch
Create a mapper series by joining Fname and lname and replace,
s = df.groupby('Id')[['Fname', 'lname']].apply(lambda x: '|'.join(x.stack()))
df.set_index('Id')['Title'].replace(s, '', regex = True).drop_duplicates()
Id
1 meeting with ,
2 call with
3 lunch

Dropping selected rows in Pandas with duplicated columns

Suppose I have a dataframe like this:
fname lname email
Joe Aaron
Joe Aaron some#some.com
Bill Smith
Bill Smith
Bill Smith some2#some.com
Is there a terse and convenient way to drop rows where {fname, lname} is duplicated and email is blank?
You should first check whether your "empty" data is NaN or empty strings. If they are a mixture, you may need to modify the below logic.
If empty rows are NaN
Using pd.DataFrame.sort_values and pd.DataFrame.drop_duplicates:
df = df.sort_values('email')\
.drop_duplicates(['fname', 'lname'])
If empty rows are strings
If your empty rows are strings, you need to specify ascending=False when sorting:
df = df.sort_values('email', ascending=False)\
.drop_duplicates(['fname', 'lname'])
Result
print(df)
fname lname email
4 Bill Smith some2#some.com
1 Joe Aaron some#some.com
You can using first with groupby (Notice replace empty with np.nan, since the first will return the first not null value for each columns)
df.replace('',np.nan).groupby(['fname','lname']).first().reset_index()
Out[20]:
fname lname email
0 Bill Smith some2#some.com
1 Joe Aaron some#some.com

Pandas Python writing to existing file and matching column values

I have 2 excel sheets that I have loaded. I need to add information from one to the other one. See example below.
table 1:
cust_id fname lname date_registered
1 bob holly 1/1/80
2 terri jones 2/3/90
table 2:
fname lname date_registered cust_id zip
lawrence fisher 2/3/12 3 12345
So I need to add cust_id 3 from table 2 to table 1. Along with all the other information, fname, lname, and date_registered. I don't need all the columns though, such as the zip.
I am thinking I can use the pandas/merge. But I am new to all this and not sure how this works. I need to populate the next row in table 1 with the corresponding row information in table 2. Any information would be helpful. Thanks!
With concat:
In [1]: import pandas as pd
In [2]: table_1 = pd.DataFrame({'cust_id':[1,2], 'fname':['bob', 'teri'], 'lname':['holly', 'jones'], 'date_registered':['1/1/80', '2/3/90']})
In [3]: table_2 = pd.DataFrame({'cust_id':[3], 'fname':['lawrence'], 'lname':['fisher'], 'date_registered':['2/3/12'], 'zip':[12345]})
In [4]: final_table = pd.concat([table_1, table_2])
In [5]: final_table
Out[5]:
cust_id date_registered fname lname zip
0 1 1/1/80 bob holly NaN
1 2 2/3/90 teri jones NaN
0 3 2/3/12 lawrence fisher 12345.0
Use append
appended = table1.append(table2[table1.columns])
or concat
concated = pd.concat([table1,table2], join='inner')
Both resulting in
cust_id fname lname date_registered
0 1 bob holly 1/1/80
1 2 terri jones 2/3/90
0 3 lawrence fisher 2/3/12

Categories