Let's suppose I have a CSV file which looks like this:
Date,High,Low,Open,Close,Volume,Adj Close
1980-12-12,0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907
1980-12-15,0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
1980-12-16,0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533
I have also a Pandas DataFrame which has exactly the same values but also the new entries. My goal is to append to the CSV file only the new values.
I tried like this, but unfortunately this append not only the new entries, but the old ones also:
df.to_csv('{}/{}'.format(FOLDER, 'AAPL.CSV'), mode='a', header=False)
You can just re-read your csv file after writing it and drop any duplicates before appending the newly fetched data.
The following code was working for me:
import pandas as pd
# Creating original csv
columns = ['Date','High','Low','Open','Close','Volume','Adj Close']
original_rows = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
]]
df_original = pd.DataFrame(columns=columns, data=original_rows)
df_original.to_csv('AAPL.CSV', mode='w', index=False)
# Fetching the new data
rows_updated = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
], ["1980-12-16",0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533]]
df_updated = pd.DataFrame(columns=columns, data=rows_updated)
# Read in current csv values
current_csv_data = pd.read_csv('AAPL.CSV')
# Drop duplicates and append only new data
new_entries = pd.concat([current_csv_data, df_updated]).drop_duplicates(subset='Date', keep=False)
new_entries.to_csv('AAPL.CSV', mode='a', header=False, index=False)
Related
problem
I have first concatenating all data from the available excel files into a single dataframe and then writing that dataframe into a new excel file. However, I would like to do 2 simple things:
a leave a 2columns blank for each new dataframe that will be appended
b the headers and the bold formatting has disappeared after appending the dataframes. see a pic of how one excelFile initially looked Original formatting
attempt This is my attempt Two Seperate DataFrames
data = []
for excel_file in excel_files:
print(excel_file) # the name for the dataframe
data.append(pd.read_excel(excel_file, engine="openpyxl"))
df1 = pd.DataFrame(columns=['DVT', 'Col2', 'Col3']) #blank df maybe?!this line is not imp!
#df1.style.set_properties(subset=['DVT'], {'font-weight:bold'}) !this line is not imp!
# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)
I don't have Excel available right now, so I can't test, but something like this might be a good approach.
# Open the excel document using a context manager in 'append' mode.
with pd.ExcelWriter(excelAutoNamed, mode="a", engine="openpyxl", if_sheet_exists="overlay") as writer:
for excel_file in excel_files:
print(excel_file)
# Append Dataframe to Excel File.
pd.read_excel(excel_file, engine="openpyxl").to_excel(writer, index=False)
# Append Dataframe with two blank columns to File.
pd.DataFrame([np.nan, np.nan]).T.to_excel(writer, index=False, header=False)
I currently have a csv file which has four columns
the next time I write to the file, I want to write from E1. I've searched for solutions but none seems to work.
with open(file_location,"w") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerows(list_of_parameters)
where list_of_parameters is a zip of all the four columns.
list_of_parameters = zip(timestamp_list,request_count_list,error_rate_list,response_time_list)
Anyone have any idea to implement this? Appreciate your help.
The Python library Pandas is very good for these sorts of things. Here is how you could do this in Pandas:
import pandas as pd
# Read file as a DataFrame
df = pd.read_csv(file_location)
# Create a second DataFrame with your new data,
# you want the dict keys to be your column names
new_data = pd.DataFrame({
'Timestamp': timestamp_list,
'Key Request': request_count_list,
'Failure Rate': error_rate_list,
'Response': response_time_list
})
# Concatenate the existing and new data
# along the column axis (adding to E1)
df = pd.concat([df, new_data], axis=1)
# Save the combined data
df.to_csv(file_location, index=False)
I am trying to add new columns to an existing csv that already has rows and columns that looks like this:
I would like it to append all the new column names to the columns after column 4.
The code I currently have is adding all the new columns to the bottom of the csv:
def extract_data_from_report3():
with open('OMtest.csv', 'a', newline='') as f_out:
writer = csv.writer(f_out)
writer.writerow(
['OMGroup:OMRegister', 'OMGroup', 'OMRegister', 'RegisterType', 'Measures', 'Description', 'GeneratedOn'])
Is there any way to do this effectively?
You can use the pandas lib, without iterating through the values. Here an example
new_header = ['OMGroup:OMRegister', 'OMGroup', 'OMRegister', 'RegisterType', 'Measures', 'Description', 'GeneratedOn']
# Import pandas package
import pandas as pd
my_df = pd.read_csv(path_to_csv)
for column_name in new_header:
new_column = [ ... your values ...] #should be a list of your dataframe size
my_df[column_name] = new_column
keep in mind that the new column should have the same size of the number of rows of your table to work
If you need only to add the new columns without values, you can do as such:
for column_name in new_header:
new_column = ["" for i in range(len(mydf.index))] #should be a list of dataframe size
my_df[column_name] = new_column
Then you can write back the csv in this way:
my_df.to_csv(path_to_csv)
Here details on the read_csv method
Here details on the to_csv method
Assume this is my csv file: (df)
id,name,version,ct_id
1,testing,version1,245
2,testing1,version2,246
3,testing2,version3,247
4,testing3,version4,248
5,testing1,version5,249
Now I've performed some operation on the file and write it to another csv file.
df = pd.read_csv('op.csv')
df1 = df.groupby('name').agg({'version': ', '.join, 'ct_id': 'first'}).reset_index()
df1.to_csv('test.csv', index=False)
Now I've another csv file. (df_1)
id,name,version,ct_id
36,testing17,version16,338
37,testing18,version17,339
I want to write this to my existing test.csv file which I created earlier but I want to insert these two rows at the beginning of the file rather than at the end.
I tried something like this.
df_1.iloc[:, 1:].to_csv('test.csv', mode='a', index=False)
# This does append but at the end.
I would appreciate if someone could help?
Prepending A in B is same as appending B to A.
The below code should work for the above case.
test_df = pd.read_csv('test.csv')
df_1 = pd.read_csv('df_1.csv')
df_1 = df_1.append(test_df, sort=False)
df_1.to_csv('test.csv')
I have a set of data output in my program that I want to write to a .csv file. I am able to make a new file with the old input data, followed by the new data in the last column to the right. How can I manipulate which column my output data goes to? Also, how can I choose to not include the old input data in my new file? I'm new to pandas.
Thanks!
Loading from file:
import pandas as pd
df = pd.read_csv('D:\\Apps\\Coursera\\Kaggle-Titanic\\Data\\train.csv', header = 0)
Some manipulation:
df['Gender'] = df.Sex.map(lambda x: 0 if x=='female' else 1)
df['FamilySize'] = df.SibSp + df.Parch
Copy some fields to new:
result = df[['Sex', 'Survived', 'Age']]
Delete not needed fields:
del result['Sex']
Save to the file:
result.to_csv('D:\\Apps\\Coursera\\Kaggle-Titanic\\Swm\\result.csv', index=False)
Or if you want to save only some fields or in some specific order:
df[['Sex', 'Survived', 'Age']].to_csv('D:\\Apps\\Coursera\\Kaggle-Titanic\\Swm\\result.csv', index=False)