I have 2 tables on my database and would like to compare both the tables and replace the missing values from one table to another. For example.
TABLE 1
column 1 column 2
ab 3
ab -
a 1
a -
b -
b 2
ab 3
ab -
a 1
a -
b 2
TABLE 2
column 1 column 2
ab 3
a 1
b 2
I want to compare both the tables on column 1 and replace only the missing values on column 2 and not touch the values that are already there.
Is this possible on SQL or using pandas on python? Any solution would be helpful.
## SQL Query be like ##
This query replaces the column2 value of Table2 to the column2 of Table1 which are NULL.
UPDATE table1
SET table1.Col2=table2.Col2
FROM table1
JOIN table2
ON table1.col1=table2.col1
where table1.col2 IS NULL
Related
Context: I'd like to "bump" the index level of a multi-index dataframe up. In other words, I'd like to put the index level of a dataframe at the same level as the columns of a multi-indexed dataframe
Let's say we have this dataframe:
tt = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]})
tt.index.name = 'Index Column'
And we perform this change to add a multi-index level (like a label of a table)
tt = pd.concat([tt],keys=['Multi-Index Table Label'], axis=1)
Which results in this:
Multi-Index Table Label
A B C
Index Column
0 1 4 7
1 2 5 8
2 3 6 9
Desired Output: How can I make it so that the dataframe looks like this instead (notice the removal of the empty level on the dataframe/table):
Multi-Index Table Label
Index Column A B C
0 1 4 7
1 2 5 8
2 3 6 9
Attempts: I was testing something out and you can essentially remove the index level by doing this:
tt.index.name = None
Which would result in :
Multi-Index Table Label
A B C
0 1 4 7
1 2 5 8
2 3 6 9
Essentially removing that extra level/empty line, but the thing is that I do want to keep the Index Column as it will give information about the type of data present on the index (which in this example are just 0,1,2 but can be years, dates, etc).
How could I do that?
Thank you all in advance :)
How about this:
tt = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]})
tt.insert(loc=0, column='Index Column', value=tt.index)
tt = pd.concat([tt],keys=['Multi-Index Table Label'], axis=1)
tt = tt.style.hide_index()
I've got one column with Primary ID numbers, and each of these Primary ID numbers can have up to 3 Secondary ID numbers associated with it. I want to pivot the secondary IDs so they all appear in up to 3 columns to the right of just one instance of each Primary ID.
Currently it looks like this:
Primary ID
Secondary ID
1
234234
1
435234
1
22233
2
334342
2
543236
2
134623
3
8475623
3
3928484
4
3723429
5
3945857
5
11112233
5
9878976
I want it to look like this:
Primary ID
Secondary 1
Secondary 2
Secondary 3
1
234234
435234
22233
2
334342
543236
134623
3
8475623
3928484
-
4
3723429
-
-
5
3945857
11112233
9878976
Not sure how to get the column headers there and probably where my issues are coming from when I try to use pivot or pivot table with pandas.
Here's one way:
df = (
df.pivot_table(
index='Primary ID',
columns=df.groupby('Primary ID').cumcount().add(1),
values='Secondary ID'
).add_prefix('Secondary').reset_index()
)
Alternative:
df = df.assign(t=df.groupby('Primary ID').cumcount().add(
1)).set_index(['Primary ID', 't']).unstack(-1)
OUTPUT:
Primary ID Secondary1 Secondary2 Secondary3
0 1 234234.0 435234.0 22233.0
1 2 334342.0 543236.0 134623.0
2 3 8475623.0 3928484.0 NaN
3 4 3723429.0 NaN NaN
4 5 3945857.0 11112233.0 9878976.0
I'm trying to filter a pandas pivot table, but am not sure of the correct syntax to filter the "group by" arguments. I've been trying the standard df['column_name'] but I receive a KeyError.
Here's the code for the table
pivot = pd.pivot_table(q5,values='ENTRIES',index('DATE','STATION','ID'),aggfunc='sum')
Here is what my pivot table looks like:
ENTRIES
DATE STATION ID
1/1/13 1 AVE 1 12
2 60
3 0
4 111
5 123
...
The desired result is to return Dates and Stations where at least one ID had < 10 Entries, but not all IDs had <10 Entries
Thanks
Suppose I have a postgresql table like this:
CustomerID | Pincode
and another table like this
Pincode | City
I want to create a new column in the first table name City which maps city from the second one and inserts it.
Final result
CustomerID | Pincode | City
Please note that the second table contains unique pincode to city mapping while first table can have many pincodes with different customer id (customer id is unique)
How to do it?
If can't be done in the database, then I am ok with python solution as well.
Let's call those tables A and B. SQL query :
SELECT A.CustomerID, A.Pincode, B.City
FROM A, B
WHERE A.Pincode = B.Pincode;
If data in A is :
1 1
2 2
3 1
5 2
and B:
1 1
2 2
result will be :
1 1 1
2 2 2
3 1 1
5 2 2
alter table first_table add column City text;
insert into first_table (CustomerID, Pincode, City)
select ft.CustomerID, at.Pincode, at.City
from first_table as ft
join another_table as at on ft.Pincode = at.Pincode
;
I have a dataframe something like below-
carrier_plan_identifier ... hios_issuer_identifier
1 AUSK ... 99806.0
2 AUSM ... 99806.0
3 AUSN ... 99806.0
4 AUSS ... 99806.0
5 AUST ... 99806.0
I need to pick a particular column ,lets say wellthie_issuer_identifier.
I need to query the database based on this column value. My select query will look something like .
select id, wellthie_issuer_identifier from issuers where wellthie_issuer_identifier in(....)
I need to add id column back to my existing dataframe with respect to the wellthie_issuer_identifier.
I have searched a lot but not clear with how this can be done.
Try this:
1.) pick a particular column ,lets say wellthie_issuer_identifier
t = tuple(df.wellthie_issuer_identifier)
This will give you a tuple like (1,0,1,1)
2.) query the database based on this column value
You need to substitute the above tuple in your query:
query = """select id, wellthie_issuer_identifier from issuers
where wellthie_issuer_identifier in{} """
Create a Cursor to the database and execute this query and Create a Dataframe of the result.
cur.execute(query.format(t))
df_new = pd.DataFrame(cur.fetchall())
df_new.columns = ['id','wellthie_issuer_identifier']
Now your df_new will have columns id, wellthie_issuer_identifier. You need to add this id column back to existing df.
Do this:
df = pd.merge(df,df_new, on='wellthie_issuer_identifier',how='left')
It will add an id column to df which will have values if a match is found on wellthie_issuer_identifier, otherwise it will put NaN.
Let me know if this helps.
You can add another column to a dataframe using pandas if the column is not too long, For example:
import pandas as pd
df = pd.read_csv('just.csv')
df
id user_id name
0 1 1 tolu
1 2 5 jb
2 3 6 jbu
3 4 7 jab
4 5 9 jbb
#to add new column to the data above
df['new_column']=['jdb','biwe','iuwfb','ibeu','igu']#new values
df
id user_id name new_column
0 1 1 tolu jdb
1 2 5 jb biwe
2 3 6 jbu iuwfb
3 4 7 jab ibeu
4 5 9 jbb igu
#this should help if the dataset is not too much
then you can go on querying your database
This will not take values for wellthie_issuer_identifier but as you told it will be all the values that are their, then below should work for you:
df1 = df.assign(id=(df['wellthie_issuer_identifier']).astype('category').cat.codes)