Python - Convert .txt file to .xls or .xlsx - python

I have data which came in the form .data, so I have converted it to .txt files due to opening in it Microsoft Excel not fully loading it. There are over 2 million rows.
For this reason, I decided to try converted .txt to .xls or .xlsx using python with this script:
import csv
import openpyxl
input_file = 'path/to/inputfile.txt'
output_file = 'path/to/outputfile.xls'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'rb') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)
but I am getting the error for row in reader: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

You have to set the correct mode in the second parameter when opening the file.
With rb you are opening it in binary mode, but here you should write r to use text mode.
So your code should be:
import csv
import openpyxl
input_file = 'path/to/inputfile.txt'
output_file = 'path/to/outputfile.xls'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'r') as data: # read in text mode
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)
As already mentioned in a comment, Excel is not suitable for this amount of data as its limited to 1048576 rows, but gets quite slow to handle even below that. You should really try to import as csv or directly as tsv.

Related

How to copy data from txt file and paste to XLSX as value with Python?

How to copy data from txt file and paste to XLSX as value with Python?
(txt)File: simple.txt which contains date,name,qty,order id
I need the data from txt and copy paste to xlsx as VALUE.
How it's possible it? Which package could handle this process with Python?
openpyxl?Panda? Could you please give an example code?
My code which not suitable for the paste and save as values:
import csv
import openpyxl
input_file = 'C:\Users\mike\Documents\rep\LX02.txt'
output_file = 'C:\Users\mike\Documents\rep\LX02.xlsx'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'r') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)
In pandas, with pandas.read_csv and pandas.DataFrame.to_excel combined, you can store the content of a comma delimited .txt file in an .xlsx spreedsheet by running the code below :
#pip install pandas
import pandas as pd
input_file = r'C:\Users\mbalog\Documents\FGI\LX02.txt'
output_file = r'C:\Users\mbalog\Documents\FGI\LX02.xlsx'
pd.read_csv(input_file).to_excel(output_file, index=False)

Read and write CSV file in Python

I'm trying to read sentences in a csv file, convert them to lowercase and save in other csv file.
import csv
import pprint
with open('dataset_elec_4000.csv') as f:
with open('output.csv', 'w') as ff:
data = f.read()
data = data.lower
writer = csv.writer(ff)
writer.writerow(data)
but I got error "_csv.Error: sequence expected". What should I do?
*I'm a beginner. Please be nice to me:)
You need to read over your input CSV row-by-row, and for each row, transform it, then write it out:
import csv
with open('output.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
with open('dataset_elec_4000.csv', newline='') as f_in:
reader = csv.reader(f_in)
# comment these two lines if no input header
header = next(reader)
writer.writerow(header)
for row in reader:
# row is sequence/list of cells, so...
# select the cell with your sentence, I'm presuming it's the first cell (row[0])
data = row[0]
data = data.lower()
# need to put data back into a "row"
out_row = [data]
writer.writerow(out_row)
Python contains a module called csv for the handling of CSV files. The reader class from the module is used for reading data from a CSV file. At first, the CSV file is opened using the open() method in ‘r’ mode(specifies read mode while opening a file) which returns the file object then it is read by using the reader() method of CSV module that returns the reader object that iterates throughout the lines in the specified CSV document.
import csv
# opening the CSV file
with open('Giants.csv', mode ='r')as file:
# reading the CSV file
csvFile = csv.reader(file)
# displaying the contents of the CSV file
for lines in csvFile:
print(lines)

python: use CSV reader with single file extracted from tarfile

I am trying to use the Python CSV reader to read a CSV file that I extract from a .tar.gz file using Python's tarfile library.
I have this:
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
tarredCSV = tarFile.extractfile(file)
reader = csv.reader(tarredCSV)
next(reader) # skip header
for row in reader:
if row[3] not in CSVRows.values():
CSVRows[row[3]] = row
All the files in the tar file are all CSVs.
I am getting an exception on the first file. I am getting this exception on the first next line:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
How do I open said file (without extracting the file then opening it)?
tarfile.extractfile returns an io.BufferedReader object, a bytes stream, and yet csv.reader expects a text stream. You can use io.TextIOWrapper to convert the bytes stream to a text stream instead:
import io
...
reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))
You need to provide a file-like object to csv.reader.
Probably the best solution, without having to consume a complete file at once is this approach (thanks to blhsing and damon for suggesting it):
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.TextIOWrapper(tarFile.extractfile(file), encoding="utf-8")
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Alternatively a possible solution from here: Python3 working with csv files in tar files would be
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.StringIO(tarFile.extractfile(file).read().decode('utf-8'))
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Here a io.StringIO object is used to make csv.reader happy. However, this might not scale well for larger files contained in the tar as each file is read in one single step.

error during writing data into csv file, ValueError: I/O operation on closed file

This is a code:
import pandas as pd
import csv
with open('reviews.csv') as myFile:
reader = csv.reader(myFile)w
with open('bow.csv','a',newline="") as file:
handler= csv.writer(file)
for rowdata in reader:
handler.writerow({rowdata,'asd'})
Error is ValueError: I/O operation on closed file.
csv.reader() can only read from an open file. When you exit the first with block,myFile is automatically closed, so reader can't read from it any more.
You need to keep the input file open while you read from it.
import pandas as pd
import csv
with open('reviews.csv') as myFile:
reader = csv.reader(myFile)
with open('bow.csv','a',newline="") as file:
handler= csv.writer(file)
for rowdata in reader:
handler.writerow({rowdata,'asd'})
You can also open multiple files in a single with statement, so you don't need to nest them.
with open('reviews.csv') as myFile, open('bow.csv','a',newline="") as file:
reader = csv.reader(myFile)w
handler= csv.writer(file)
for rowdata in reader:
handler.writerow({rowdata,'asd'})

Converting .txt file with tab seperation to xlsx via python3

Level: super-noob
I have been trying to convert a .txt file to .xlsx using a combination of csv & openpyxl & xlsxwriter modules.
My first column is an identity that should be saved as a string
Columns 2-21 are then all numbers.
How can I load up my .txt file.
Identify the proper columns as numbers
and then save the file as an xlsx?
So far I'm at:
import csv
import openpyxl
input_file = "C:/1.txt"
output_file = "C:/1.xlsx"
new_wb = openpyxl.Workbook()
ws = new_wb.worksheets[0]
read_file = csv.reader(input_file, delimitter="\t")
I have read people using enumerate to gun through an excel file online but I'm not sure how this function exactly works... but if someone can help me here it will be appreciated!
You need to iterate over each row in csv file and append that row to excel worksheet.
This could be helpful:
import csv
import openpyxl
input_file = 'path/to/inputfile.txt'
output_file = 'path/to/outputfile.xls'
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_file, 'rb') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
ws.append(row)
wb.save(output_file)

Categories