Python: convert DAT files to XLS - python

I have a bunch of DAT files that I need to convert to XLS files using Python. Should I use the CSV library to do this or is there a better way?

I'd use pandas.
import pandas as pd
df = pd.read_table('DATA.DAT')
df.to_excel('DATA.xlsx')
and of course you can setup a loop to get through all you files. Something along these lines maybe
import glob
import os
os.chdir("C:\\FILEPATH\\")
for file in glob.glob("*.DAT"):
#What file is being converted
print file
df = pd.read_table(file)
file1 = file.replace('DAT','xlsx')
df.to_excel(file1)

writer = pd.ExcelWriter('pandas_example.dat',
engine='xlsxwriter',
options={'strings_to_urls': False})
or you can use :
pd.to_excel('example.xlsx')

Related

How to use Python to extract Excel Sheet data

I have many folders, each folders contains 1 excel file like 1Aug2022, 2Aug2022...
I want python to Read thru all Folders, and only open the excel file name like 19AUG2022, the excel file have many sheets inside like IP-1*****, IP-2*****, IP-3*****. Then go to sheets with (IP-2*****) to extract 2columns of data.
How can I do it in python?
You can use pandas package: https://pandas.pydata.org/
an example is
import pandas as pd
your_excel_path = "your/path/to/the/excel/file"
data = pd.read_excel(your_excel_path, sheet_name = "19AUG2022") # If you want to read specific sheet's data
data = pd.read_excel(your_excel_path, sheet_name = None) # If you want to read all sheets' data, it will return a list of dataframes
As Fergus said use pandas.
The code to search all directorys may look like that:
import os
import pandas as pd
directory_to_search = "./"
sheet_name = "IP-2*****"
for root, dirs, files in os.walk(directory_to_search):
for file in files:
if file == "19AUG2022":
df = pd.read_excel(io=os.path.join(root, file), sheet_name=sheet_name)

How to add filename as column to every file in a directory python

Hi there stack overflow community,
I have several csv-files in a folder and I need to append a column containing the first 8 chars of each filename in a aditional column of the csv. After this step i want to save the datafram including the new colum to the same file.
I get the right output, but it doesn't save the changes in the csv file :/
Maybe someone has some inspiration for me. Thanks a lot!
from tkinter.messagebox import YES
import pandas as pd
import glob, os
import fnmatch
import os
files = glob.glob(r'path\*.csv')
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
#for i in df('date'):
#Decoder problem
print(df)
use:
df.to_csv
like this:
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
df.to_csv(fp, index=False) # index=False if you don't want to save the index as a new column in the csv
btw, I think this may also work and is more readable:
for fp in files:
df = pd.read(fp)
df[date] = os.path.basename(fp).split('.')[0][:8]
df.to_csv(fp, index=False)

How to use the gzip module to open a csv file

I am looking to read in a .csv.gz file that is in the same directory as my python script using the gzip and pandas module only.
So far I have,
import gzip
import pandas as pd
data = gzip.open(test_data.csv.gz, mode='rb')
How do I proceed in converting / reading this file in as a dataframe without using the csv module as seen in similarly answered questions?
You can use pandas.read_csv directly:
import pandas as pd
df = pd.read_csv('test_data.csv.gz', compression='gzip')
If you must use gzip:
with gzip.open('test_data.csv.gz', mode='rb') as csv:
df = pd.read_csv(csv)

Concatenating Excel and CSV files

I've been asked to compile data files into one Excel spreadsheet using Python, but they are all either Excel files or CSV's. I'm trying to use the following code:
import glob, os
import shutil
import pandas as pd
par_csv = set(glob.glob("*Light*")) + - set(glob.glob("*all*")) - set(glob.glob("*Untitled"))
par
df = pd.DataFrame()
for file in par:
print(file)
df = pd.concat([df, pd.read(file)])
Is there a way I can use the pd.concat function to read the files in more than one format (si both xlsx and csv), instead of one or the other?

Read specific csv file from zip using pandas

Here is a data I am interested in.
http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip
It consists of 3 files:
I want to download zip with pandas and create DataFrame from 1 file called Production_Crops_E_All_Data.csv
import pandas as pd
url="http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip"
df=pd.read_csv(url)
Pandas can download files, it can work with zips and of course it can work with csv files. But how can I work with 1 specific file in archive with many files?
Now I get error
ValueError: ('Multiple files found in compressed zip file %s)
This post doesn't answer my question bcause I have multiple files in 1 zip
Read a zipped file as a pandas DataFrame
From this link
try this
from zipfile import ZipFile
import io
from urllib.request import urlopen
import pandas as pd
r = urlopen("http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip").read()
file = ZipFile(io.BytesIO(r))
data_df = pd.read_csv(file.open("Production_Crops_E_All_Data.csv"), encoding='latin1')
data_df_noflags = pd.read_csv(file.open("Production_Crops_E_All_Data_NOFLAG.csv"), encoding='latin1')
data_df_flags = pd.read_csv(file.open("Production_Crops_E_Flags.csv"), encoding='latin1')
Hope this helps!
EDIT: updated for python3 StringIO to io.StringIO
EDIT: updated the import of urllib, changed usage of StringIO to BytesIO. Also your CSV files are not utf-8 encoding, I tried latin1 and that worked.
You could use python's datatable, which is a reimplementation of Rdatatable in python.
Read in data :
from datatable import fread
#The exact file to be extracted is known, simply append it to the zip name:
url = "Production_Crops_E_All_Data.zip/Production_Crops_E_All_Data.csv"
df = fread(url)
#convert to pandas
df.to_pandas()
You can equally work within datatable; do note however, that it is not as feature-rich as Pandas; but it is a powerful and very fast tool.
Update: You can use the zipfile module as well :
from zipfile import ZipFile
from io import BytesIO
with ZipFile(url) as myzip:
with myzip.open("Production_Crops_E_All_Data.csv") as myfile:
data = myfile.read()
#read data into pandas
#had to toy a bit with the encoding,
#thankfully it is a known issue on SO
#https://stackoverflow.com/a/51843284/7175713
df = pd.read_csv(BytesIO(data), encoding="iso-8859-1", low_memory=False)

Categories