Someone helped me with a program so that I can convert .PDF files from that format to .CSV but I would like to add an auto column adjuster to this program
Mass PDF to CSV Code:
import os
import glob
import tabula
path="/Users/username/Downloads/"
for filepath in glob.glob(path+'*.pdf'):
name=os.path.basename(filepath)
tabula.convert_into(input_path=filepath,
output_path=path+name+".csv",
pages="all")
Auto Column Adjuster Code:
import pandas as pd
from UliPlot.XLSX import auto_adjust_xlsx_column_width
# Load example dataset
file_encoding = "cp1252"
df = pd.read_csv("/Users/username/Downloads/", encoding=file_encoding)
# df.set_index("Timestamp", inplace=True)
# Export dataset to XLSX
with pd.ExcelWriter("example.xlsx") as writer:
df.to_excel(writer, sheet_name="MySheet")
auto_adjust_xlsx_column_width(df, writer, sheet_name="MySheet", margin=0)
If these two programs can be merged it would speed the process of me having to manually adjusting every file. Do note that the PDF to CSV Code takes a folder entry where as the Auto Column Adjuster Code takes a file entry.
Link to an example of my datasets:
https://drive.google.com/drive/folders/1nkLgo5tSFsxOTCa5EMWZlezDFi8AyaDq?usp=sharing
Thanks for helping
Related
I have looped through two HTML files (the two files have some similar columns/headers, one file has additional columns, screenshots included below). I wanted to load everything into a single dataframe and then export it to excel or csv. But, when I export, I only see records from one HTML file. In my case, I am only seeing the records from the collection_item_shorterned.html file.
Screenshot of HTML files:
Screenshot of printout when running program:
Screenshot of the excel output (there should be 7 records total):
Code:
import os
import pandas as pd
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 5000)
yelp_path = os.path.join('..','data_files','yelp\\',)
print(f"printing path: {yelp_path}")
dir = os.listdir(yelp_path)
print(f"print dir: {dir}")
for i in range(len(dir)):
data = pd.read_html(yelp_path + dir[i])
dataframe = pd.concat(data)
print(dataframe)
dataframe.to_excel("combined_yelp_data.xlsx", index=False)
Overview:
THIS PROGRAM/FUNCTION READS ALL - INDIVIDUAL METRICS FILES AND CREATES A EXCEL REPORT WITH ALL METRICS.
import glob, os, sys
import csv
import xlsxwriter
from pathlib import Path
import pandas as pd
from openpyxl import Workbook
#Output file name and location
#format for header object.
# Write the column headers with the defined format.
for col_number, value in enumerate(f3.columns.values):
worksheet_object.write(0, col_number + 1, value,
header_format_object)
writer_object.save()
Output in Terminal (Success)
PS C:\Users\Python-1> &
Actual output of file in Folder:
C:\Users\Desktop\Cobol\Outputs
Actual Output in XLSX file
Problem: Results are good, however the S.No Column in XLSX file [number of programs, starts with zero instead of 1]
S.No
0
1
Have you tried a reindex?
Set the index before write the csv.
For example:
f3.index = np.arange(1, len(f3) + 1)
I am splitting a xlsm file ( with multiple sheets) into a csv with each sheet as a separate csv file. I want to save into csv files only the sheets whose name contain the keyword "Robot" or "Auto". How can I do it? Currently it is saving all sheets into csv files. Here is the code I am using -
import pandas as pd
xl = pd.ExcelFile('Sample_File.xlsm')
for sheet in xl.sheet_names:
df = pd.read_excel(xl,sheet_name=sheet)
df1.to_csv(f"{sheet}.csv",index=False)
Can you try this?
import pandas as pd
import re
xl = pd.ExcelFile('Sample_File.xlsm')
for sheet in xl.sheet_names:
if re.search('Robot|Auto', sheet):
df = pd.read_excel(xl,sheet_name=sheet)
df.to_csv(f"{sheet}.csv",index=False)
I've been asked to compile data files into one Excel spreadsheet using Python, but they are all either Excel files or CSV's. I'm trying to use the following code:
import glob, os
import shutil
import pandas as pd
par_csv = set(glob.glob("*Light*")) + - set(glob.glob("*all*")) - set(glob.glob("*Untitled"))
par
df = pd.DataFrame()
for file in par:
print(file)
df = pd.concat([df, pd.read(file)])
Is there a way I can use the pd.concat function to read the files in more than one format (si both xlsx and csv), instead of one or the other?
I have a .csv file that I am converting into a table format using the following python script. In order to make this useful, I need to create a table within the Excel that holds the data (actually formatted as a table (Insert > Table). Is this possible within python? I feel like it should be relatively easy, but can't find anything on the internet.
The idea here is that the python takes the csv file, converts it to xlsx with a table embedded on sheet1, and then moves it to the correct folder.
import os
import shutil
import pandas as pd
src = r"C:\Users\xxxx\Python\filename.csv"
src2 = r"C:\Users\xxxx\Python\filename.xlsx"
read_file = pd.read_csv (src) - convert to Excel
read_file.to_excel (src2, index = None, header=True)
dest = path = r"C:\Users\xxxx\Python\repository"
destination = shutil.copy2(src2, dest)
Edit: I got sidetracked by the original MWE.
This should work, using xlsxwriter:
import pandas as pd
import xlsxwriter
#Dummy data
my_data={"list1":[1,2,3,4], "list2":"a b c d".split()}
df1=pd.DataFrame(my_data)
df1.to_csv("myfile.csv", index=False)
df2=pd.read_csv("myfile.csv")
#List of column name dictionaries
headers=[{"header" : i} for i in list(df2.columns)]
#Create and propagate workbook
workbook=xlsxwriter.Workbook('output.xlsx')
worksheet1=workbook.add_worksheet()
worksheet1.add_table(0, 0, len(df2), len(df2.columns)-1, {"columns":headers, "data":df2.values.tolist()})
workbook.close()