exporting data frame to csv file in python with pandas - python

I want to export my dataframe to a csv file. normally I want my dataframe as 2 columns but when I export it, in csv file there is only one column and the data is separated with comma.
m is one column and s is another.
df = pd.DataFrame({'MSE':[m], 'SSIM': [s]})
to append new data frames I used below function and save data to csv file:.
with open('test.csv', 'a+') as f:
df.to_csv(f, header=False)
print(df)
when I print dataframe on console output looks like:
MSE SSIM
0 0.743373 0.843658
but in csv file a column looks like: here first is index, second is m and last one is s. I want them in 3 seperate columns
0,1.1264238582283046,0.8178900901529639
How can I solve this?

Your excel setting is most likely ; (semi-colon). Use:
df.to_csv(f, header=False, sep=';')

Related

How to remove double quotes in value reading from csv file

My csv file:
Mp4,Mp3,"1234554"
My code:
csv=csv=b''.join(csv).split(b'\n')
for index,row in enumerate(csv):
row=re.split(b''',(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''',row)
for records in row:
print(records)
when it printing the records ,for the 3rd element it prints with ""i need to ignore this doubles quotes.
I think this should do it:
records = records.replace("\"","")
Edit
Using pandas.read_csv is better for working with csv files
import pandas as pd
csv = pd.read_csv('data.csv', delimiter=',', names=['x', 'y', 'z'])
# iterate over the dataframe
for index, row in csv.iterrows():
print(row['x'], row['y'], row['z'])
Assuming content of data.csv looks like
Mp4,Mp3,"1234554"
The Output would look like this:
Mp4 Mp3 1234554
If your csv file includes column names e.g.
file_type1,file_type2,size
mp4,mp3,"1234554"
Just remove the names parameter if you read in the csv file:
csv = pd.read_csv('data.csv', delimiter=',')
print(csv)
Then the Output would look like this:
file_type1 file_type2 size
0 mp4 mp3 1234554
Read more about pandas or pandas.read_csv
You could easlily replace it with
print(records.replace('"',''))

Skip Columns with pandas

problem
I have first concatenating all data from the available excel files into a single dataframe and then writing that dataframe into a new excel file. However, I would like to do 2 simple things:
a leave a 2columns blank for each new dataframe that will be appended
b the headers and the bold formatting has disappeared after appending the dataframes. see a pic of how one excelFile initially looked Original formatting
attempt This is my attempt Two Seperate DataFrames
data = []
for excel_file in excel_files:
print(excel_file) # the name for the dataframe
data.append(pd.read_excel(excel_file, engine="openpyxl"))
df1 = pd.DataFrame(columns=['DVT', 'Col2', 'Col3']) #blank df maybe?!this line is not imp!
#df1.style.set_properties(subset=['DVT'], {'font-weight:bold'}) !this line is not imp!
# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)
I don't have Excel available right now, so I can't test, but something like this might be a good approach.
# Open the excel document using a context manager in 'append' mode.
with pd.ExcelWriter(excelAutoNamed, mode="a", engine="openpyxl", if_sheet_exists="overlay") as writer:
for excel_file in excel_files:
print(excel_file)
# Append Dataframe to Excel File.
pd.read_excel(excel_file, engine="openpyxl").to_excel(writer, index=False)
# Append Dataframe with two blank columns to File.
pd.DataFrame([np.nan, np.nan]).T.to_excel(writer, index=False, header=False)

NaN with py form excel to csv

I'd like to export 1 column from excel to txt. I tried this:
import pandas as pd
pd.read_excel('C:/Events.xlsx', sheet_name='Data')
xlsx = pd.read_excel('C:/Events.xlsx', sheet_name='Data')
xlsx = pd.read_excel('C:/Events.xlsx','Data', usecols='F:F')
with open('C:/filename.txt', 'w') as outfile:
xlsx.to_string(outfile, index=False)
output:
20220,333333333333333
NaN
The problem are:
-I found first blank space.
-in second row I found NaN.
Do you have any ideas?
Thank you for your support
Angelo
In the test file it looks like the cell in the second row (in col F) is blank. Pandas will automatically read in blank cells as NaN. Is this row blank in the main file too?
If you have blank cells and you don't want to read them in as datatype NaN, you can convert them to empty string instead by using the keep_default_na parameter when importing the file:
pd.read_excel('your_file_name.csv', keep_default_na=False)
Does this help?
In the test file it looks like the cell in the second row (in col F) is blank. Pandas will automatically read in blank cells as NaN. Is this row blank in the main file too?
If you have blank cells and you don't want to read them in as datatype NaN, you can convert them to empty string instead by using the keep_default_na parameter when importing the file:
pd.read_excel('your_file_name.csv', keep_default_na=False)
graceface

Python script that efficiently drops columns from a CSV file

I have a csv file, where the columns are separated by tab delimiter but the number of columns is not constant. I need to read the file up to the 5th column. (I dont want to ready the whole file and then extract the columns, I would like to read for example line by line and skip the remaining columns)
You can use usecols argument in pd.read_csv to limit the number of columns to be read.
# test data
s = '''a,b,c
1,2,3'''
with open('a.txt', 'w') as f:
print(s, file=f)
df1 = pd.read_csv("a.txt", usecols=range(1))
df2 = pd.read_csv("a.txt", usecols=range(2))
print(df1)
print()
print(df2)
# output
# a
#0 1
#
# a b
#0 1 2
You can use pandas nrows to read only a certain number of csv lines:
import pandas as pd
df = pd.read_csv('out122.txt', usecols=[0,1,2,3,4])

Append only new values to CSV from DataFrame in Python

Let's suppose I have a CSV file which looks like this:
Date,High,Low,Open,Close,Volume,Adj Close
1980-12-12,0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907
1980-12-15,0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
1980-12-16,0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533
I have also a Pandas DataFrame which has exactly the same values but also the new entries. My goal is to append to the CSV file only the new values.
I tried like this, but unfortunately this append not only the new entries, but the old ones also:
df.to_csv('{}/{}'.format(FOLDER, 'AAPL.CSV'), mode='a', header=False)
You can just re-read your csv file after writing it and drop any duplicates before appending the newly fetched data.
The following code was working for me:
import pandas as pd
# Creating original csv
columns = ['Date','High','Low','Open','Close','Volume','Adj Close']
original_rows = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
]]
df_original = pd.DataFrame(columns=columns, data=original_rows)
df_original.to_csv('AAPL.CSV', mode='w', index=False)
# Fetching the new data
rows_updated = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
], ["1980-12-16",0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533]]
df_updated = pd.DataFrame(columns=columns, data=rows_updated)
# Read in current csv values
current_csv_data = pd.read_csv('AAPL.CSV')
# Drop duplicates and append only new data
new_entries = pd.concat([current_csv_data, df_updated]).drop_duplicates(subset='Date', keep=False)
new_entries.to_csv('AAPL.CSV', mode='a', header=False, index=False)

Categories