I have probably hundreds or thousands small excel file with bracket into one pandas dataframe
Before I merge them, I need to give flag for which category they come from
Here's my table of reference df
Dataframe_name Path Sheet
45 finance_auditing Finance - Accounting/TopSites-Fin... Aggregated_Data_for_Time_Period
46 finance_lending Finance - Banking/TopSites-... Aggregated_Data_for_Time_Period
What I did Dataframe_name name column is filled manually, but what I expected is using refference table
finance_auditing = pd.read_excel('Finance - Accounting/TopSites-Fin... ','Aggregated_Data_for_Time_Period')
finance_lending = pd.read_excel('Finance - Banking/TopSites-... ','Aggregated_Data_for_Time_Period')
finance_auditing['Dataframe_name'] = 'finance_auditing'
finance_lending['Dataframe_name'] = 'finance_lending'
dF_all = pd.concat([pd.read_excel(path, sheet_name=sheet)
for path, sheet in zip(df.Path, df.Sheet)])
The problem is I have hundreds of of file to read and need to append them all
This would be fairly simply, you can assign the flag dynamically for each iteration:
pd.concat([pd.read_excel(path, sheet_name=sheet).assign(df_name=name)
for name, path, sheet in df.to_numpy()])
I have a CSV file with over 5,000,000 rows of data that looks like this (except that it is in Farsi):
Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766140,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,1050000,7266.44,5,Concrete,13890108,5166884645
766146,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,700000,4844.29,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645
770822,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,50,500000,1730.1,5,Concrete,13890114,5166884645
I would like to write a code to pass the first row as the header and then extract data from two specific cities (Kish and Qeshm) and save it into a new CSV file. Somthing like this one:
Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645
It's worth mentioning that I'm very new to python.
I've written the following block to define the headers, but this is the furthest I've gotten so far.
import pandas as pd
path = '/Users/Desktop/sample.csv'
df = pd.read_csv(path , header=[0])
df.head = ()
You don't need to use header=... because the default is to treat the first row as the header, so
df = pd.read_csv(path)
Then, to keep rows on conditions:
df2 = df[df['City'].isin(['Kish', 'Qeshm'])]
And you can save it with
df2.to_csv(another_path)
In Python, how can I convert the column names(task, asset,name,owner) as row and store it in a new .csv file ?
Data Set (sample_change.csv) :
task asset name owner
JJJ01 61869 assetdev hoskot,john (100000)
JJJ02 87390 assetprod hope, ricky (100235)
JJJ10 28403 assetprod shaw, adam (199345)
The below is the code I started to write, but couldn't think of an approach.
import pandas as pd
import csv
#reading csv file and making the data frame
dataframe = pd.read_csv(r"C:\AWSGEEKS\dataset\sample_change.csv")
columns = list(dataframe.head(0))
print(columns)
Output :
columns
task
asset
name
owner
To write as a single row:
pd.DataFrame(columns=dataframe.columns).to_csv('header.csv')
To write as as single column:
pd.DataFrame(dataframe.columns).to_csv('header.csv', index=False, header=['Name'])
df = pd.DataFrame(dataframe.columns, columns=['column names'])
I need to perform some data analysis on some excel files (that are saved as account numbers of respective customers). I also have a master sheet with all the account numbers in a column. I need to iterate over the "Account Number" column in MasterSheet.xlsx and read the respective account's excel file (for eg: for account number 123, there is a "123.xlsx", that is located in the same directory as Master Sheet). Then I need to assign the corresponding account number as their variable name.
For a rough understanding of what I want to do, please refer to the code below. I'd prefer to use pandas or openpyxl.
master = pd.read_excel("MasterSheet.xlsx")
for account in master.iterrows():
filename= str(account)+'.xlsx'
account= pd.read_excel(filename)
You can see, I have tried to create a filename from each account number read via for loop. And then assign the account number as the variable name for each account's dataframe.
I know this is a very badly framed question but I tried and couldn't frame it any better. I have just started using python. If you need any further information, please ask.
I also have a master sheet with all the account numbers in a column. I
need python to iterate over "Account Number" column in
"MasterSheet.xlsx" and read the respective account's excel file (for
eg: for account number 123, i have a "123.xlsx", that is in the same
location as Master Sheet) and then assign that account number as their
variable name.
Since your account_number is saved in the df['Account Number'] column and the files are named as account_number.xlsx, you can just do the following:
import pandas as pd
master = pd.read_excel("MasterSheet.xlsx")
for account in master["Account Number"]:
filename = str(account) + ".xlsx"
account = pd.read_excel(filename)
#Import
import pandas as pd
#Read Master file
master = pd.read_excel("MasterSheet.xlsx")
#Make a dictionary -
# - Keys will be the each account number
# - Values will be each dataframe which are account number xlsx files
dictionary1 = {}
for index, row in master.iterrows():
dictionary1[row['AccountNumber']] = pd.read_excel(str(row['AccountNumber']) + '.xlsx')
#Iterate each dataframe via this code
next(iter(dictionary1.values()))
One way to do this:
import pandas as pd
master = pd.read_excel("MasterSheet.xlsx")
master['Account File'] = master['Account Number'].apply(lambda x: pd.read_excel(str(x)+'.xlsx'))
Now all your account numbers, account files and other data are in the same data-structure for easy re-use.
Let me know if this helps!
For excel file test.xlsx:
account
0 test1
1 test2
Loop over account column and load new file into new df2:
import pandas as pd
df = pd.read_excel("test.xlsx")
for index, row in df.iterrows():
df2 = pd.read_excel(row['account'] + '.xlsx')
Output:
data
0 1
1 2
data
0 3
1 4
I have a .csv called cleaned_data.csv formatted like so:
Date,State,Median Listing Price
1/31/2010,Alabama,169900
2/28/2010,Alabama,169900
3/31/2010,Alabama,169500
1/31/2010,Alaska,239900
2/28/2010,Alaska,241250
3/31/2010,Alaska,248000
I would like to create a new .csv file for each state, named {state}.csv, that has the Date and Median Listing Price.
So far I have this:
import pandas
csv = pandas.read_csv('cleaned_data.csv', sep='\s*,\s*', header=0, encoding='utf-8-sig')
state_list = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', ...]
for state in state_list:
csv = csv[csv['State'] == f'{state}']
csv.to_csv(f'state_csvs/{state}.csv', index=False, sep=',')
This successfully creates 51 .csv files named after each state, but only the Alabama.csv has Date, State, and Median Listing Price data for Alabama. Every other .csv only has the following headers with no data:
Date,State,Median Listing Price
Can someone explain to me why this is happening and how to fix it or a better way to do it?
Bonus points: I don't actually need the "State" column in the new .csv files but I'm unsure how to only add Date and Median Listing Price.
Try:
for i in df['State'].unique():
df.loc[df['State'] == i][['Date', 'Median Listing Price']].to_csv(f'state_csvs/{i}.csv', index=False)