How to insert column values based on another column - python

Suppose I have a postgresql table like this:
CustomerID | Pincode
and another table like this
Pincode | City
I want to create a new column in the first table name City which maps city from the second one and inserts it.
Final result
CustomerID | Pincode | City
Please note that the second table contains unique pincode to city mapping while first table can have many pincodes with different customer id (customer id is unique)
How to do it?
If can't be done in the database, then I am ok with python solution as well.

Let's call those tables A and B. SQL query :
SELECT A.CustomerID, A.Pincode, B.City
FROM A, B
WHERE A.Pincode = B.Pincode;
If data in A is :
1 1
2 2
3 1
5 2
and B:
1 1
2 2
result will be :
1 1 1
2 2 2
3 1 1
5 2 2

alter table first_table add column City text;
insert into first_table (CustomerID, Pincode, City)
select ft.CustomerID, at.Pincode, at.City
from first_table as ft
join another_table as at on ft.Pincode = at.Pincode
;

Related

How to get distinct values from many columns of table using sqlAlchemy ORM?

I have created a database with these tables in it
Employee(id, name, deptid, salary)
Department(id, name)
with these values in it
Employee
id | name | deptid | salary. |
------- ------------ ------------ -----------
1 Alex 1 600000
2 Larry 1 700000
3 Jesse 3 400000
4 Alex 2 500000
5 Marcus 3 400000
Department
id | name |
-------- -----------
1. Engineering
2. Finance
3. Sales
I want to fetch the distinct values from every column of the database
I want the output to look something like this
name
------
Alex
Larry
Jesse
Marcus
deptid
-------
1
3
2
I have written a ORM query like this
session.query(Employee.name, Employee.deptid).distinct().all()
this returns me
[('Alex',1,), ('Larry',1), ('Jesse', 3),('Alex',2),('Marcus',3)]
this is returning distinct values of both columns combined, which is not the desired solution I am looking for rather I am looking for rather I want the solution with distinct values of different columns in one query itself.
SELECT DISTINCT is applied to the full rows that are being selected.
You will therefore need two SELECT DISTINCT queries to find the distinct values in two different columns.
session.query(Employee.name).distinct().all()
session.query(Employee.deptid).distinct().all()
You should also use Department.id directly to find its distinct values, rather than using the foreign key Employee.deptid, unless you only want to find the referenced ones.

how can i store rows that repeat on a certain value pandas

I have a dataset with about 3m rows, and i want to know which id's have more than one unique value for a column lets call company_id. (i dont want to remove them, i need to identify these rows for analysis)
Table
id
company_id
1
10
2
11
1
13
2
11
3
31
3
31
3
33
in this example it would store the id's 1 and 3 because they have two different unique values for company_id. But it wouldnt store the id 2 because it has only one unique value for company_id (11)
I dont want to know how many are labeled in each company_id, i just need their id or index. thanks in advance.
Group the dataframe by id, then calculate nunique aggregate for company_id for each groups:
>>> df.groupby('id')['company_id'].agg('nunique')
id
1 2
2 1
3 2
Name: company_id, dtype: int64

How can I pivot one column of unique IDs to show all matching secondary IDs in adjacent columns?

I've got one column with Primary ID numbers, and each of these Primary ID numbers can have up to 3 Secondary ID numbers associated with it. I want to pivot the secondary IDs so they all appear in up to 3 columns to the right of just one instance of each Primary ID.
Currently it looks like this:
Primary ID
Secondary ID
1
234234
1
435234
1
22233
2
334342
2
543236
2
134623
3
8475623
3
3928484
4
3723429
5
3945857
5
11112233
5
9878976
I want it to look like this:
Primary ID
Secondary 1
Secondary 2
Secondary 3
1
234234
435234
22233
2
334342
543236
134623
3
8475623
3928484
-
4
3723429
-
-
5
3945857
11112233
9878976
Not sure how to get the column headers there and probably where my issues are coming from when I try to use pivot or pivot table with pandas.
Here's one way:
df = (
df.pivot_table(
index='Primary ID',
columns=df.groupby('Primary ID').cumcount().add(1),
values='Secondary ID'
).add_prefix('Secondary').reset_index()
)
Alternative:
df = df.assign(t=df.groupby('Primary ID').cumcount().add(
1)).set_index(['Primary ID', 't']).unstack(-1)
OUTPUT:
Primary ID Secondary1 Secondary2 Secondary3
0 1 234234.0 435234.0 22233.0
1 2 334342.0 543236.0 134623.0
2 3 8475623.0 3928484.0 NaN
3 4 3723429.0 NaN NaN
4 5 3945857.0 11112233.0 9878976.0

Replacing the missing values of while comparing 2 tables

I have 2 tables on my database and would like to compare both the tables and replace the missing values from one table to another. For example.
TABLE 1
column 1 column 2
ab 3
ab -
a 1
a -
b -
b 2
ab 3
ab -
a 1
a -
b 2
TABLE 2
column 1 column 2
ab 3
a 1
b 2
I want to compare both the tables on column 1 and replace only the missing values on column 2 and not touch the values that are already there.
Is this possible on SQL or using pandas on python? Any solution would be helpful.
## SQL Query be like ##
This query replaces the column2 value of Table2 to the column2 of Table1 which are NULL.
UPDATE table1
SET table1.Col2=table2.Col2
FROM table1
JOIN table2
ON table1.col1=table2.col1
where table1.col2 IS NULL

Adding a new row to a dataframe with correct mapping in pandas

I have a dataframe something like below-
carrier_plan_identifier ... hios_issuer_identifier
1 AUSK ... 99806.0
2 AUSM ... 99806.0
3 AUSN ... 99806.0
4 AUSS ... 99806.0
5 AUST ... 99806.0
I need to pick a particular column ,lets say wellthie_issuer_identifier.
I need to query the database based on this column value. My select query will look something like .
select id, wellthie_issuer_identifier from issuers where wellthie_issuer_identifier in(....)
I need to add id column back to my existing dataframe with respect to the wellthie_issuer_identifier.
I have searched a lot but not clear with how this can be done.
Try this:
1.) pick a particular column ,lets say wellthie_issuer_identifier
t = tuple(df.wellthie_issuer_identifier)
This will give you a tuple like (1,0,1,1)
2.) query the database based on this column value
You need to substitute the above tuple in your query:
query = """select id, wellthie_issuer_identifier from issuers
where wellthie_issuer_identifier in{} """
Create a Cursor to the database and execute this query and Create a Dataframe of the result.
cur.execute(query.format(t))
df_new = pd.DataFrame(cur.fetchall())
df_new.columns = ['id','wellthie_issuer_identifier']
Now your df_new will have columns id, wellthie_issuer_identifier. You need to add this id column back to existing df.
Do this:
df = pd.merge(df,df_new, on='wellthie_issuer_identifier',how='left')
It will add an id column to df which will have values if a match is found on wellthie_issuer_identifier, otherwise it will put NaN.
Let me know if this helps.
You can add another column to a dataframe using pandas if the column is not too long, For example:
import pandas as pd
df = pd.read_csv('just.csv')
df
id user_id name
0 1 1 tolu
1 2 5 jb
2 3 6 jbu
3 4 7 jab
4 5 9 jbb
#to add new column to the data above
df['new_column']=['jdb','biwe','iuwfb','ibeu','igu']#new values
df
id user_id name new_column
0 1 1 tolu jdb
1 2 5 jb biwe
2 3 6 jbu iuwfb
3 4 7 jab ibeu
4 5 9 jbb igu
#this should help if the dataset is not too much
then you can go on querying your database
This will not take values for wellthie_issuer_identifier but as you told it will be all the values that are their, then below should work for you:
df1 = df.assign(id=(df['wellthie_issuer_identifier']).astype('category').cat.codes)

Categories