Converting .txt file with tab seperation to xlsx via python3 - python

Level: super-noob
I have been trying to convert a .txt file to .xlsx using a combination of csv & openpyxl & xlsxwriter modules.
My first column is an identity that should be saved as a string
Columns 2-21 are then all numbers.
How can I load up my .txt file.
Identify the proper columns as numbers
and then save the file as an xlsx?
So far I'm at:
import csv
import openpyxl
input_file = "C:/1.txt"
output_file = "C:/1.xlsx"
new_wb = openpyxl.Workbook()
ws = new_wb.worksheets[0]
read_file = csv.reader(input_file, delimitter="\t")
I have read people using enumerate to gun through an excel file online but I'm not sure how this function exactly works... but if someone can help me here it will be appreciated!

You need to iterate over each row in csv file and append that row to excel worksheet.
This could be helpful:
import csv
import openpyxl
input_file = 'path/to/inputfile.txt'
output_file = 'path/to/outputfile.xls'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'rb') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)

Related

How to copy data from txt file and paste to XLSX as value with Python?

How to copy data from txt file and paste to XLSX as value with Python?
(txt)File: simple.txt which contains date,name,qty,order id
I need the data from txt and copy paste to xlsx as VALUE.
How it's possible it? Which package could handle this process with Python?
openpyxl?Panda? Could you please give an example code?
My code which not suitable for the paste and save as values:
import csv
import openpyxl
input_file = 'C:\Users\mike\Documents\rep\LX02.txt'
output_file = 'C:\Users\mike\Documents\rep\LX02.xlsx'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'r') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)
In pandas, with pandas.read_csv and pandas.DataFrame.to_excel combined, you can store the content of a comma delimited .txt file in an .xlsx spreedsheet by running the code below :
#pip install pandas
import pandas as pd
input_file = r'C:\Users\mbalog\Documents\FGI\LX02.txt'
output_file = r'C:\Users\mbalog\Documents\FGI\LX02.xlsx'
pd.read_csv(input_file).to_excel(output_file, index=False)

Moving data from multiple csv files to xlsx files

I have a folder that contains 2 more folders. Inside each folder is a csv and xlsx file.
Ex:
test (folder 1)
test.csv
test.xlsx
test2 (folder 2)
test2.csv
test2.xlsx
I have a working script that moves data from a csv file to a xlsx file.
Say ‘test.csv’ contains the following data:
A
B
test.com
yes
test.com/dl
no
1.1.1.1
yes
The code below will move that data into test.xlsx:
from openpyxl import load_workbook
import csv
wb = load_workbook(“D:\\local\\test\\test\\test.xlsx”)
ws = wb.active
with open(“D:\\local\\test\\test\\test.csv”, ‘r’) as f:
for row in csv.reader(f):
ws.append(row)
wb.save(“D:\\local\\test\\test\\test.xlsx”)
Is there an easy way to move all data from ‘test.csv’ to ‘test.xlsx’ and ‘test2.csv’ to ‘test2.xlsx’ at once? The names of the csv and xlsx files will not always be the same but the location will.
I have tried the following but it returns a traceback error:
from openpyxl import load_workbook
import csv
wb = load_workbook(“D:\\local\\test\\{}\\{}.xlsx”)
ws = wb.active
with open(“D:\\local\\test\\{}\\{}.csv”, ‘r’) as f:
for row in csv.reader(f):
ws.append(row)
wb.save(“D:\\local\\test\\{}\\{}.xlsx”)
Thanks!
Assuming that the .xlsx files already exist and are empty, you can use the code below to copy the content of multiple .csv files to those .xlsx files (that have the same stem/filename).
import os
from pathlib import Path
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
directory = r'D:\local\test'
for file in Path(directory).glob('*/*.csv'):
df = pd.read_csv(file, encoding='utf-8-sig')
excel_path = os.path.splitext(file)[0]+'.xlsx'
wb = load_workbook(excel_path)
ws = wb.active
for r in dataframe_to_rows(df, index=False, header=True):
ws.append(r)
wb.save(excel_path)

Python - Convert .txt file to .xls or .xlsx

I have data which came in the form .data, so I have converted it to .txt files due to opening in it Microsoft Excel not fully loading it. There are over 2 million rows.
For this reason, I decided to try converted .txt to .xls or .xlsx using python with this script:
import csv
import openpyxl
input_file = 'path/to/inputfile.txt'
output_file = 'path/to/outputfile.xls'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'rb') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)
but I am getting the error for row in reader: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
You have to set the correct mode in the second parameter when opening the file.
With rb you are opening it in binary mode, but here you should write r to use text mode.
So your code should be:
import csv
import openpyxl
input_file = 'path/to/inputfile.txt'
output_file = 'path/to/outputfile.xls'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'r') as data: # read in text mode
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)
As already mentioned in a comment, Excel is not suitable for this amount of data as its limited to 1048576 rows, but gets quite slow to handle even below that. You should really try to import as csv or directly as tsv.

how to convert from CVS to existing excel file?

I am trying to convert a CSV file to an existing excel file named 'bench_configuration'.Inside this Excel there are few sheets. I want to convert the CSV file into a sheet named 'Setup_Loss' (which is already inside the EXCEL file)
I tried to use :
read_file = pd.read_csv('new_names.csv',sep='\t')
read_file.to_excel('bench_configuration.xlsx', index=None, header=True)
But its opening a new Excel file.
You need to have an instance of ExcelWriter and use it to write and save the content
import pandas as pd
from openpyxl import load_workbook as lw
workbook = lw('bench_configuration.xlsx')
writer = pd.ExcelWriter('bench_configuration.xlsx', engine='openpyxl', mode='a')
writer.book = workbook
writer.sheets = dict((ws.title, ws) for ws in workbook.worksheets)
read_file = pd.read_csv('new_names.csv',sep='\t')
read_file.to_excel(writer, "Setup_Loss")
writer.save()

text contents of pdf to csv file conversion- How to?

I want to take a PDF File as an input. And as an output file I want a csv file to show. So all the textual data which is there in the pdf file should be converted to a csv file. But I am not understanding how would this happen..I need your help at the earliest as I've tried to do but couldn't do it.
what ive done is used a library called Tabula-py which converts pdf to csv file. It does create a csv format but there are no contents being copied to the csv file from the pdf file.
heres the code
from tabula import convert_into,read_pdf
import tabula
df = tabula.read_pdf("crimestory.pdf", spreadsheet=True,
pages='all',output_format="csv")
df.to_csv('crimestoryy.csv', index=False)
the output should come as a csv file where the data is present.
what i am getting is a blank csv file.
I have find answer to this question by my own
To tackle this issue I came up with converting the pdf file into a text file. Then I converted this text file to a csv file.here's my code.
conversion.py
import os.path
import csv
import pdftotext
#Load your PDF
with open("crimestory.pdf", "rb") as f:
pdf = pdftotext.PDF(f)
# Save all text to a txt file.
with open('crimestory.txt', 'w') as f:
f.write("\n\n".join(pdf))
save_path = "/home/mayureshk/PycharmProjects/NLP/"
completeName_in = os.path.join(save_path, 'crimestory' + '.txt')
completeName_out = os.path.join(save_path, 'crimestoryycsv' + '.csv')
file1 = open(completeName_in)
In_text = csv.reader(file1, delimiter=',')
file2 = open(completeName_out, 'w')
out_csv = csv.writer(file2)
file3 = out_csv.writerows(In_text)
file1.close()
file2.close()
Try this, hope it will works
import tabula
# convert PDF into CSV
tabula.convert_into("crimestory.pdf", "crimestory.csv", output_format="csv", pages='all')
or
df = tabula.read_pdf("crimestory.pdf", encoding='utf-8', spreadsheet=True, pages='all')
df.to_csv('crimestory.csv', encoding='utf-8')
or
from tabula import read_pdf
df = read_pdf("crimestory.pdf")
df
#make sure df displays your pdf contents in the output
from tabula import convert_into
convert_into("crimestory.pdf", "crimestory.csv", output_format="csv")
!cat.crimestory.csv

Categories