Read CSV files and Report creation in XLSX - python

Overview:
THIS PROGRAM/FUNCTION READS ALL - INDIVIDUAL METRICS FILES AND CREATES A EXCEL REPORT WITH ALL METRICS.
import glob, os, sys
import csv
import xlsxwriter
from pathlib import Path
import pandas as pd
from openpyxl import Workbook
#Output file name and location
#format for header object.
# Write the column headers with the defined format.
for col_number, value in enumerate(f3.columns.values):
worksheet_object.write(0, col_number + 1, value,
header_format_object)
writer_object.save()
Output in Terminal (Success)
PS C:\Users\Python-1> &
Actual output of file in Folder:
C:\Users\Desktop\Cobol\Outputs
Actual Output in XLSX file
Problem: Results are good, however the S.No Column in XLSX file [number of programs, starts with zero instead of 1]
S.No
0
1

Have you tried a reindex?
Set the index before write the csv.
For example:
f3.index = np.arange(1, len(f3) + 1)

Related

Pandas PDF to CSV with Auto Column Adjuster

Someone helped me with a program so that I can convert .PDF files from that format to .CSV but I would like to add an auto column adjuster to this program
Mass PDF to CSV Code:
import os
import glob
import tabula
path="/Users/username/Downloads/"
for filepath in glob.glob(path+'*.pdf'):
name=os.path.basename(filepath)
tabula.convert_into(input_path=filepath,
output_path=path+name+".csv",
pages="all")
Auto Column Adjuster Code:
import pandas as pd
from UliPlot.XLSX import auto_adjust_xlsx_column_width
# Load example dataset
file_encoding = "cp1252"
df = pd.read_csv("/Users/username/Downloads/", encoding=file_encoding)
# df.set_index("Timestamp", inplace=True)
# Export dataset to XLSX
with pd.ExcelWriter("example.xlsx") as writer:
df.to_excel(writer, sheet_name="MySheet")
auto_adjust_xlsx_column_width(df, writer, sheet_name="MySheet", margin=0)
If these two programs can be merged it would speed the process of me having to manually adjusting every file. Do note that the PDF to CSV Code takes a folder entry where as the Auto Column Adjuster Code takes a file entry.
Link to an example of my datasets:
https://drive.google.com/drive/folders/1nkLgo5tSFsxOTCa5EMWZlezDFi8AyaDq?usp=sharing
Thanks for helping

Concatenating Excel and CSV files

I've been asked to compile data files into one Excel spreadsheet using Python, but they are all either Excel files or CSV's. I'm trying to use the following code:
import glob, os
import shutil
import pandas as pd
par_csv = set(glob.glob("*Light*")) + - set(glob.glob("*all*")) - set(glob.glob("*Untitled"))
par
df = pd.DataFrame()
for file in par:
print(file)
df = pd.concat([df, pd.read(file)])
Is there a way I can use the pd.concat function to read the files in more than one format (si both xlsx and csv), instead of one or the other?

Pandas Create Excel with Table formatted as a Table

I have a .csv file that I am converting into a table format using the following python script. In order to make this useful, I need to create a table within the Excel that holds the data (actually formatted as a table (Insert > Table). Is this possible within python? I feel like it should be relatively easy, but can't find anything on the internet.
The idea here is that the python takes the csv file, converts it to xlsx with a table embedded on sheet1, and then moves it to the correct folder.
import os
import shutil
import pandas as pd
src = r"C:\Users\xxxx\Python\filename.csv"
src2 = r"C:\Users\xxxx\Python\filename.xlsx"
read_file = pd.read_csv (src) - convert to Excel
read_file.to_excel (src2, index = None, header=True)
dest = path = r"C:\Users\xxxx\Python\repository"
destination = shutil.copy2(src2, dest)
Edit: I got sidetracked by the original MWE.
This should work, using xlsxwriter:
import pandas as pd
import xlsxwriter
#Dummy data
my_data={"list1":[1,2,3,4], "list2":"a b c d".split()}
df1=pd.DataFrame(my_data)
df1.to_csv("myfile.csv", index=False)
df2=pd.read_csv("myfile.csv")
#List of column name dictionaries
headers=[{"header" : i} for i in list(df2.columns)]
#Create and propagate workbook
workbook=xlsxwriter.Workbook('output.xlsx')
worksheet1=workbook.add_worksheet()
worksheet1.add_table(0, 0, len(df2), len(df2.columns)-1, {"columns":headers, "data":df2.values.tolist()})
workbook.close()

how to handle excel file (xlsx,xls) with excel formulas(macros) in python

I need to pass inputs from Input_data.xls in iteration to existing xls file which have special function at various cells using python3.6. These function change primary data in existing xls as per inputs. But when xlrd open the file it doesn't import the xls cell function and save file file with modification. And write object name instead of its value
Python code:
import xlrd
import xlwt
import xlutils
from xlrd import open_workbook
from xlutils.copy import copy
import os.path
book = xlrd.open_workbook('input_data.xlsx')
sheet0 = book.sheet_by_index(0)
for i in range (sheet0.nrows):
st1=sheet0.row_values(i+1)
TIP=[st1[0]]
OOIPAN_IP=[st1[1]]
NM=[st1[2]]
book1 = xlrd.open_workbook('primary_data.xls')
wb=copy(book1)
w_sheet=wb.get_sheet(0)
w_sheet.write(1,0,'TIP')
w_sheet.write(1,1,'OIP')
w_sheet.write(1,2,'NM')
wb.save('ipsectemp.xls')
write object name in cells instead of object's vlaue
input 1 input 2 input 3
st1[0] st1[1] st1[2]
which module can help to open/read/write workbook with its excel functions (macros) in python.
Luckly, i found below code that can fetch excel macros, openpyxl module does good work using cell values
book = load_workbook('primary_data.xlsx') #open ipsec file with desired inputs
sheet0 = book.get_sheet_by_name('Sheet1')
for row in range(2,sheet0.max_row+1):
for column in "A": #Here add or reduce the columns
cell_name = "{}{}".format(column, row)
textlt=sheet0[cell_name].value
print(textlt)
information extracted from this answer
openpyxl - read only one column from excel file in python? used information other way

Extracting and manipulating data from excel worksheet with python

Scenario: I am trying to come up with a python code that reads all the workbooks in a given folder, gets the data of each and puts it to a single data frame (each workbook becomes a dataframe, so I can manipulate them individually).
Issue1: With this code, even though I am using the proper path and file types, I keep getting the error:
File "<ipython-input-3-2a450c707fbe>", line 14, in <module>
f = open(file,'r')
FileNotFoundError: [Errno 2] No such file or directory: '(1)Copy of
Preisanfrage_17112016.xlsx'
Issue2: The reason for me to create different data frames is that each workbook has an individual format (rows are my identifiers and columns are dates). My problem is that some of these workbooks have data on a sheet named "Closing", or "Opening" or the name is not specified. So I will try to configure each data frame individually and them join them afterwards.
Issue3: Considering the final output once the data frame data is already unified, my objective is to output them in a format like:
date 1 identifier 1 value
date 1 identifier 2 value
date 1 identifier 3 value
date 1 identifier 4 value
date 2 identifier 1 value
date 2 identifier 4 value
date 2 identifier 5 value
Obs1: For the output, not all dates have the same array of identifiers.
Question 1: Any ideas why the code is yielding this error? Is there a better way to extract data from excel?
Question 2: Is it possible to create a unique dataframe for each worksheet? Is this a good practice?
Question 3: Can I do this type of output using a loop? Is this a good practice?
Obs2: I don't know how relevant this is, but I am using Python 3.6 with Anaconda.
Code so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob, os
import datetime as dt
from datetime import datetime
import matplotlib as mpl
directory = os.path.join("C:\\","Users\\Dgms\\Desktop\\final 2")
for root,dirs,files in os.walk(directory):
for file in files:
print(file)
f = open(file,'r')
df1 = pd.read_excel(file)
think you do not need your open. And I would store them in a list. you can either use pd.concat(list_of_dfs) or some manual changes.
list_of_dfs = []
for root,dirs,files in os.walk(directory):
for file in files:
f = os.path.join(root, file)
print(f)
list_of_dfs .append(pd.read_excel(f))
or using glob:
import glob
list_of_dfs = []
for file in glob.iglob(directory + '*.xlsx')
print(file)
list_of_dfs .append(pd.read_excel(file))
or as jackie suggests you can read specific sheets list_of_dfs.append(pd.concat([pd.read_excel(file, 'Opening'), pd.read_excel(file, 'Closing')])). If you have only either of them available, you could even change to
try:
list_of_dfs.append(pd.concat([pd.read_excel(file, 'Opening'))
except:
pass
try:
list_of_dfs.append(pd.concat([pd.read_excel(file, 'Closing'))
except:
pass
(Of course, you should specify the exact error, but can't test that atm)
Issue 1: If you are using IDE or Jupyter put absolute path to file.
Or add the project folder to system path (workaround, not recommended).

Categories