I have a database which contains 2 tables. I'd like to get all the data from both tables into single dataframe. In both tables there is a time column on which I'd like to sort the data after the combining.
df1=pd.read_sql("Select * from table1")
df2=pd.read_sql("Select * from table2")
What is the best way to combine df1 and df2 to a single dataframe ordered by time column?
Do you mean by concat and sort_values:
print(pd.concat([df1, df2]).sort_values('time'))
Related
I have two pandas dataframes, call them A and B. In A is most of my data (it's 110 million rows long), while B contains information I'd like to add (it's a dataframe that lists all the identifiers and counties). In dataframe A, I have a column called identifier. In dataframe B, I have two columns, identifier and county. I want to be able to merge the dataframes such that a new dataframe is created where I preserve all of the information in A, while also adding a new column county where I use the information provided in B to do so.
You need to use pd.merge
import pandas as pd
data_A = {'incident_date':['ert','szd','vfd','dvb','sfa','iop'] \
,'incident':['A','B','A','C','B','F']
}
data_B = {'incident':['A','B','C','D','E'] \
, 'number':[1,1,3,23,23]}
df_a = pd.DataFrame(data_A)
df_b = pd.DataFrame(data_B)
Inorder to preserve you df_A which has million rows
df_ans = df_a.merge(df_b[['number','incident']], on='incident',how='left')
The output
print(df_ans)
Output
Note:- There is NaN value since that value was not present in 2nd Dataframe
I have two pandas data frame df1 and df2, each of is 5 columns and 100 rows. I concatenated both data frame and now it is 10x100. I insert this data frame (df3) to sqlite3 table.
df3.to_sql(name='table', con=conn)
What I want is to update the df1 data frame, keeping the values of df2 part unchanged. Is there an easy way to do so
For such a small table you can amend df1, reconcatenate it with df2 and resave df3 to sql using if_exists='replace'
df3.to_sql(name='table', if_exists='replace', con=conn)
I have a pivot table with a multi-index in the name of the columns like this :
I want to keep the same data it is correct, but I want to give one name to each column that summarizes all the indexes to have something like this:
You can flatten a multi-index by converting it to a dataframe with text columns and joining them:
df.columns = df.columns.to_frame().astype(str).apply(''.join, axis=1)
The result should not be far from what you want. But as you have not given any reproducible example, I could not test against your data...
Having pandas data frame df with at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame?
in other words, similiar to :
SELECT C1,C2,C3
FROM T
GROUP BY C1,C2,C3
Tried that
print df.groupby(by=['C1','C2','C3'])
but im getting
<pandas.core.groupby.DataFrameGroupBy object at 0x000000000769A9E8>
I believe you need drop_duplicates if want all unique triples:
df = df.drop_duplicates(subset=['C1','C2','C3'])
If want use groupby add first:
df = df.groupby(by=['C1','C2','C3'], as_index=False).first()
I have two dataframes. Both have the same structure with same columns/columnnames.
A-> dataframe with (v,w,x,y,z) columns ( Some values)
b -> dataframe with (v,w,x,y,z) columns ( All values)
I want to take the value from A dataframe and insert it into B dataframe.
Suppose when v=1, I need to fetch the rows from A dataframe where v==1 and insert into b dataframe. Also I want to insert it to the first row of the B Dataframe.
I tried the following,
b.insert(loc=1,values=A[A.v==1])
But getting errors
Can anybody help in doing this?
Thanks
Just concatenate?
import pandas as pd
b = pd.concat([A[A.v==1],b])