Python pandas xlsx/ csv - python

I want to convert xlsx to csv and it works, but after conversion python add ".0" to string...
Sample xlsx :
Name, Age
Mark, 20
CSV after conversion :
Name, Age
Mark, 20.0 <- add ".0"
What could the problem be?
#importing pandas as pd
import pandas as pd
# Read and store content
# of an excel file
read_file = pd.read_excel ("EXPORT.xlsx")
# Write the dataframe object
# into csv file
read_file.to_csv ("data.csv",
index = True,
header=True,
encoding='utf-8-sig')
# read csv file and convert
# into a dataframe object
df = pd.DataFrame(pd.read_csv("data.csv"))
# show the dataframe
df

I've tried to reproduce this behavior, but in my case pd.read_excel() automatically assigned the int64 format on the Age column using the presented Excel sheet.
However this case can be easily solved with the df.astype() function, that can transforms data types, e.g. for your case from floating to integer format.
#importing pandas as pd
import pandas as pd
# Read and store content
# of an excel file
read_file = pd.read_excel ("EXPORT.xlsx")
# transform data type of column "Age" to int64
read_file = read_file.astype({'Age': 'int64'})
# Write the dataframe object
# into csv file
read_file.to_csv ("data.csv",
index = True,
header=True,
encoding='utf-8-sig')
# read csv file and convert
# into a dataframe object
df = pd.DataFrame(pd.read_csv("data.csv"))
# show the dataframe
print(df)

I added float_format option and it seems that works
read_file.to_csv ("basf.csv",
index = None,
header=True,
encoding='utf-8-sig',
decimal=',',
float_format='%d'
)

Related

csv file to excel, resulting in messy table

I want to convert my csv file to excel, but the first line of the csv get read as header
I first created a csv with the lists below then I used pandas to convert it to excel
import pandas as pd
id=["id",1,2,3,4,5]
name=["name","Salma","Ahmad","Manar","Mustapha","Zainab"]
age=["age",14,12,15,13,10]
#this is how i created the csv file
Csv='path/csvfile.csv'
open_csv=open(Csv, 'w')
outfile=cvs.writer(open_csv)
outfile.writerows([id]+[name]+[age])
open_csv.close()
#Excel file
Excel='path/Excelfile.xlsx'
Excel_open=open(Excel, 'w')
csv_file=pd.read_csv(Csv)
csv_file.to_excel(Excel)
This is what I get from this code
"Results"
I want the Id title to be in the same column as name and age
I would suggest this instead:
import pandas as pd
df = pd.DataFrame({
"id": [1,2,3,4,5],
"name":["Salma","Ahmad","Manar","Mustapha","Zainab"],
"age":[14,12,15,13,10]
})
excel_file = df.to_excel("excel_file.xlsx", index=False)
In this way you can create a dataframe more easily and understandable.

AttributeError: 'ExcelFile' object has no attribute 'dropna'

I was trying to remove the empty column in an excel file using pandas using dropna() method. But I ended up with the above error message. Please find my code below :
import pandas as pd
df = pd.ExcelFile("1.xlsx")
print(df.sheet_names)
#df.dropna(how='all', axis=1)
newdf = df.dropna()
Please provide more code and context, but this might help:
import pandas as pd
excel_file_name = 'insert excel file path'
excel_sheet_name = 'insert sheet name'
# create dataframe from desired excel file
df = pd.read_excel(
excel_file_name,
engine='openpyxl',
sheet_name=excel_sheet_name
)
# drop columns with NaN values and write that into df
# # without the inplace option it would have to be
# < df = df.dropna(axis=1) >
df.dropna(axis=1, inplace=True)
# write that dataframe to excel file
with pd.ExcelWriter(
excel_file_name, # file to write to
engine='openpyxl', # which engine to use
mode='a', # use mode append (has to be used for if_sheet_exists to work)
if_sheet_exists='replace' # if that sheet exists, replace it
) as writer:
df.to_excel(writer, sheet_name=excel_sheet_name)

How to store integers as strings in CSV file Python

When trying to save some number as a string type in CSV file, then saving this file and reading it again, the file shows this saved data as a numpy.int64 instead of a string. How can this be solved so when reading the csv file it reads it as a string, not int?
Here is a Python script that describes this case
import pandas as pd
df = pd.DataFrame(columns=['ID'])
ID = '1'
df = df.append(pd.DataFrame([[ID]], columns=['ID']))
df.to_csv('test.csv', index=False)
"""
now the csv file looks like this:
ID
1
"""
df = pd.read_csv('test.csv')
print(df['ID'].iloc[0] == ID) # this will print False
print(type(df['ID'].iloc[0])) # this will print <class 'numpy.int64'>
The CSV file format doesn't distinguish between different data types. You have to specify the data type when reading the CSV with pandas.
df = pd.read_csv("test.csv", dtype=str)

Using Pandas to convert CSV data to numbers when concatenating into excel

I am writing a small program to concatenate a load of measurements from multiple csv files. into one excel file. I have pretty much all the program written and working, the only thing i'm struggling to do is to get the data from the csv files to automatically turn into numbers when the dataframe places them into the excel file.
The code I have looks like this:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import os
import csv
import glob
os.chdir(r"directoryname")
retval = os.getcwd()
print ("Directory changed to %s" % retval)
files = glob.glob(r"directoryname\datafiles*csv")
print(files)
files.sort(key=lambda x: os.path.getmtime(x))
writer = pd.ExcelWriter('test.xlsx')
df = pd.read_csv("datafile.csv", index_col=False)
df = df.iloc[0:41, 1]
df.to_excel(writer, 'sheetname', startrow =0, startcol=1, index=False)
for f in files:
i+=1
df = pd.read_csv(f, index_col=False)
df = df.iloc[0:41,2]
df.to_excel(writer, 'sheetname', startrow=0, startcol=1+i, index=False)
Thanks in advance
Do you mean:
df.loc[:,'measurements'] = df.loc[:,'measurements'].astype(float)
So when you read the dataframe you can cast all your columns like that for example.
Different solution is, while reading your csv to cast the columns by using dtypes (see Documentation)
EXAMPLE
df = pd.read_csv(os.path.join(savepath,'test.csv') , sep=";" , dtype={
ID' : 'Int64' , 'STATUS' : 'object' } ,encoding = 'utf-8' )

Append only new values to CSV from DataFrame in Python

Let's suppose I have a CSV file which looks like this:
Date,High,Low,Open,Close,Volume,Adj Close
1980-12-12,0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907
1980-12-15,0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
1980-12-16,0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533
I have also a Pandas DataFrame which has exactly the same values but also the new entries. My goal is to append to the CSV file only the new values.
I tried like this, but unfortunately this append not only the new entries, but the old ones also:
df.to_csv('{}/{}'.format(FOLDER, 'AAPL.CSV'), mode='a', header=False)
You can just re-read your csv file after writing it and drop any duplicates before appending the newly fetched data.
The following code was working for me:
import pandas as pd
# Creating original csv
columns = ['Date','High','Low','Open','Close','Volume','Adj Close']
original_rows = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
]]
df_original = pd.DataFrame(columns=columns, data=original_rows)
df_original.to_csv('AAPL.CSV', mode='w', index=False)
# Fetching the new data
rows_updated = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
], ["1980-12-16",0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533]]
df_updated = pd.DataFrame(columns=columns, data=rows_updated)
# Read in current csv values
current_csv_data = pd.read_csv('AAPL.CSV')
# Drop duplicates and append only new data
new_entries = pd.concat([current_csv_data, df_updated]).drop_duplicates(subset='Date', keep=False)
new_entries.to_csv('AAPL.CSV', mode='a', header=False, index=False)

Categories