Could I define multiple dataframe using pd.DataFrame()? - python

Is is possible to create multiple dataframes under pandas library?
The following codes is what I have tried but it doesn't work...
df1, df2 = pd.DataFrame()

You could do something like this:
df1, df2 = (pd.DataFrame(),) * 2
Or, more explicitly:
df1, df2 = pd.DataFrame(), pd.DataFrame()
Or even:
df1 = df2 = pd.DataFrame()
See this answer for a great explanation.

Related

How can i add a column that has the same value

I was trying to add a new Column to my dataset but when i did the column only had 1 index
is there a way to make one value be in al indexes in a column
import pandas as pd
df = pd.read_json('file_1.json', lines=True)
df2 = pd.read_json('file_2.json', lines=True)
df3 = pd.concat([df,df2])
df3 = df.loc[:, ['renderedContent']]
görüş_column = ['Milet İttifakı']
df3['Siyasi Yönelim'] = görüş_column
As per my understanding, this could be your possible solution:-
You have mentioned these lines of code:-
df3 = pd.concat([df,df2])
df3 = df.loc[:, ['renderedContent']]
You can modify them into
df3 = pd.concat([df,df2],axis=1) ## axis=1 means second dataframe will add to columns, default value is axis=0 which adds to the rows
Second point is,
df3 = df3.loc[:, ['renderedContent']]
I think you want to write this one , instead of df3=df.loc[:,['renderedContent']].
Hope it will solve your problem.

What's the most efficient way to export multiple pandas dataframes to csv files?

I have multiple pandas dataframes:
df1
df2
df3
And I want to export them all to csv files.
df1.to_csv('df1.csv', index = False)
df2.to_csv('df2.csv', index = False)
df3.to_csv('df3.csv', index = False)
What's the most efficient way to do this?
def printToCSV(number):
num = str(num)
csvTemp = "df99.csv"
csv = csvTemp.replace('99',num)
dfTemp = "df99"
dfString = dfTemp.replace('99',num)
#i know i cant use the .to_csv function on a string
#but i dont know how iterate through different dataframes
dfString.to_csv(csv, index = False)
for i in range(1,4):
printToCSV(i)
How can I can call different dataframes to export?
You can add them all to a list with:
list_of_dataframes = [df1, df2, df3]
Then you can iterate through the list using enumerate to keep a going count of which dataframe you are writing, using f-strings for the filenames:
for count, dataframe in enumerate(list_of_dataframes):
dataframe.to_csv(f"dataframe_{count}.csv", index=False)
When you are creating the dataframes you can store them in a suitable data structure like list and when you want to create csv you can use map and do the same.
dfs = []
for i in range(10):
dfs.append(DataFrame())
result = [dfs[i].to_csv(f'df{i}') for i in range(10)]
If you want to stick with your function approach, use:
df_dict = {"df1": df1, "df2": df2, "df3": df3}
def printToCSV(num):
df_dict[f"df{num}"].to_csv(f"df{num}.csv", index=False)
printToCSV(1) # will create "df1.csv"
However, if you want to increase efficiency, I'd propose to use a df list (as #vtasca proposed as well):
df_list = [df1, df2, df3]
for num, df in enumerate(df_list): # this will give you a number (starting from 0) and the df, in each loop
df.to_csv(f"df{num}.csv", index=False)
Or working with a dict:
df_dict = {"df1": df1, "df2": df2, "df3": df3}
for name, df in df_dict.items():
df.to_csv(f"{name}.csv", index=False)

Joining/merging multiple dataframes

I have 4 dataframe objects with 1 row and X columns that I would want to join, here's a screenshot of them:
I would want them to become one big row.
Thanks for anybody who helps!
You could use concatenate in dataframe as below,
df1 = pd.DataFrame(columns=list('ABC'))
df1.loc[0] = [1,1.23,'Hello']
df2 = pd.DataFrame(columns=list('DEF'))
df2.loc[0] = [2,2.23,'Hello1']
df3 = pd.DataFrame(columns=list('GHI'))
df3.loc[0] = [3,3.23,'Hello3']
df4 = pd.DataFrame(columns=list('JKL'))
df4.loc[0] = [4,4.23,'Hello4']
pd.concat([df1,df2,df3,df4],axis=1)

Efficient method comparing 2 different tables columns

Hi all guys,
I have got 2 dfs and I need to check if the values from the first are matching on the second, only for a specific column on each, and save the values matching in a new list. This is what I did but it is taking quite a lot of time and I was wandering if there's a more efficient way. The lists are like in the image above from 2 different tables.
for x in df_bd_names['Building_Name']:
for y in df_sup['Source_String']:
if x == y:
matching_words_sup.append(x)
Thanks
Let's create both dataframes:
df1 = pd.DataFrame({
'Building_Name': ['Exces', 'Excs', 'Exec', 'Executer', 'Executor']
})
df2 = pd.DataFrame({
'Source_String': ['Executer', 'Executor', 'Executor Of', 'Executor For', 'Exeutor']
})
Perform inner merge between dataframes and convert first column to list:
pd.merge(df1, df2, left_on='Building_Name', right_on='Source_String', how='inner')['Building_Name'].tolist()
Output:
['Executer', 'Executor']
def __init__(self, df1, df2):
self.df1 = df1
self.df2 = df2
def compareDFsEffectively(self):
np1 = self.df1.to_numpy()
np2 = self.df2.to_numpy()
np_new = np.intersect1d(np1,np2)
print(np_new)
df_new = pd.DataFrame(np_new)
print(df_new)

calculating length of several files in pandas using for loop

I have five data frames (df1, df2, df3, df4, df5), and I am going to calculate their lengths using the following code:
df1 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_1.xlsx")
df2 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_2.xlsx")
df3 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_3.xlsx")
df4 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_4.xlsx")
df5 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_5.xlsx")
for i in [1,2,3,4,5]:
print(len(dfi.index))
But it throws the following error:
"name 'dfi' is not defined"
I also tried this:
for i in [1,2,3,4,5]:
print(len(df[i].index))
But that did not work.
This code works:
print(len(df1.index))
But I have to change name of the file each time.
What is problem and how can I solve it?
There are no dynamic variable names in Python - so dfi refers to a variable explicitly called dfi. It doesn't change to df1 just because i is 1 (or something else).
In your case you could simply iterate over a sequence of the dataframes:
df1 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_1.xlsx")
df2 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_2.xlsx")
df3 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_3.xlsx")
df4 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_4.xlsx")
df5 = pd.read_excel("/Users/us/Desktop/cymbalta_rated_5.xlsx")
for dfi in (df1, df2, df3, df4, df5): # explicitly defines the variable "dfi"!
print(len(dfi.index))

Categories