I have two pandas data frame df1 and df2, each of is 5 columns and 100 rows. I concatenated both data frame and now it is 10x100. I insert this data frame (df3) to sqlite3 table.
df3.to_sql(name='table', con=conn)
What I want is to update the df1 data frame, keeping the values of df2 part unchanged. Is there an easy way to do so
For such a small table you can amend df1, reconcatenate it with df2 and resave df3 to sql using if_exists='replace'
df3.to_sql(name='table', if_exists='replace', con=conn)
Related
there is a problem with my pandas dataframe. DF is my original dataframe. Then I select specific columns of my DF:
df1=df[['cod_far','geo_lat','geo_lon']]
Then I set new names for those columns:
df1.columns = ['new_col1', 'cod_far', 'lat', 'lon']
And finally I group by DF1 by specific columns and convert it to a new DF called "occur"
occur = df1.groupby(['cod_far','lat','lon' ]).size()
occur=pd.DataFrame(occur)
The problem is that I am getting this: a dataframe with only ONE column. Rows are fine, but there should be 3 columns! Is there any way to drop that "0" and convert my dataframe "occur" into a dataframe of 3 columns?
I have a database which contains 2 tables. I'd like to get all the data from both tables into single dataframe. In both tables there is a time column on which I'd like to sort the data after the combining.
df1=pd.read_sql("Select * from table1")
df2=pd.read_sql("Select * from table2")
What is the best way to combine df1 and df2 to a single dataframe ordered by time column?
Do you mean by concat and sort_values:
print(pd.concat([df1, df2]).sort_values('time'))
Importing a sql datatable as a pandas dataframe and dropping all completely empty columns:
equip = %sql select * from [coswin].[dbo].[Work Order]
df = equip.DataFrame()
#dropping empty columns
df.dropna(axis=1, how="all", inplace=True)
the problem is I am still finding the null columns without any errors in the output.
Are you sure the columns you want to remove are full of null values? You might check with df.isna().sum() if you haven't.
Also, you could use pd.read_sql() to read your data directly into a DataFrame.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
I have two data frames loaded into Pandas. Each data frame holds property information indexed by a 'pin' unique to a particular parcel of land.
The first data frame (df1) represents historic sales data. Because properties can be sold multiple times, index values (the 'pin') repeat (i.e. for each time a property was sold there will be a row with the parcel's 'pin' as the index number. If the property is sold 1 time in the data set, the index/'pin' is unique. If it was sold 5 times, the index/'pin' will occur 5 times in the data set).
The second data frame (df2) is a property record. Again they are indexed by the unique parcel pin, but because this data frame is a record of each property, the value_counts() for each index value is 1 (i.e. index values do not repeat).
I would like to add data to df1 or create a new data frame which keeps all data from df1 intact, but adds values from df2 based upon matching index values.
For Example: df1 has columns ['SALE_YEAR', 'SALE_VALUE'] - where there can be multiple rows with the same index value. df2 has columns ['Address', 'SQFT'], where the index values are all unique within the data frame. I want to add 'Address' & 'SQFT' data points to df1 by matching the index values.
Merge() & Concat() do not seem to work. I believe this is because the syntax is having a hard time processing/ matching df2 values to multiple df1 rows.
Visual Example:
Thank you for the help.
Try this:
import pandas as pd
merged_df = pd.merge(left=df1, right=df2, on='PIN', how='left')
If that still isn't working, maybe the PIN columns datatypes do not match.
df1['PIN'] = df1['PIN'].astype(int)
df2['PIN'] = df2['PIN'].astype(int)
merged_df = pd.merge(left=df1, right=df2, on='PIN', how='left')
I have a problem.
I have made 3 queries to 3 different tables on a database where the data is similar and stores the values on 3 different dataframes.
My question is: Is there any way to make a new data frame where the column is a Dataframe?
Like this image
https://imgur.com/pATNi80
Thank you!
I do not know what exactly you need but you can try this:-
pd.DataFrame([d["col_name"] for d in df])
Where df is the dataframe as shown in image, col_name is the column name which you want as a separate dataframe.
Thank you to jezrael for the answer.
df = pd.concat([df1, df2, df3], axis=1, keys=('df1','df2','df3'))