Data disappearing from file when changing encoding python - python

So I've got this script I'm pretty satisfied with although it has one flaw. When changing the encoding it suddenly removes all the data from the file. No idea why. Got comments in the code for each line what it does.
Rename file --> Move File --> Change Encoding --> Exec SQL SP --> Move change back name + timestamp
import os
import shutil
import glob
import pyodbc
import os.path
import datetime
import codecs
#Defining function for SP
def SP():
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=serv400;DATABASE=db;Trusted_Connection=yes')
cursor = cnxn.cursor()
query = "exec [PD_ABC_SP]"
cursor.execute(query)
cnxn.commit()
#Changing name, moving, importing and changing encoding for files in loop
destdir = '\\\\serv400\\f$\\BulkInsert\\Steve\\'
srcdir = '\\\\sesrv414\\Applications\\Prod\\IMP\\Phone\\'
inldir = '\\\\sesrv414\\Applications\\Prod\\IMP\\Phone\\Inlasta\\'
newfilename = 'Phone_Import_ABC.csv'
now = datetime.datetime.now() #Adding datetime for timestamp
for oldfilename in os.listdir(srcdir): #Looping through files in directory
if oldfilename.endswith(".csv"): #Changes filenames on files where name ends with csv
os.rename(srcdir + oldfilename, destdir + newfilename) #Changing old path + filename
codecs.open(destdir + newfilename, "w", encoding="utf-16") #switch encoding
SP() #Executing the function for the stored procedure
os.rename(destdir + newfilename, inldir + oldfilename + now.strftime("%Y%m%d"))
#Moving back the files including the timestamp

codecs.open(.., "w", ..) opens a file for writing and truncates any previous content. It does not convert the file for you. For that, you'll need to open the file using its current encoding, read its contents and then re-open it in write mode using the target encoding and write the contents back. Something like
contents = codecs.open(old_filename, "r", encoding="utf-8").read()
codecs.open(new_filename, "w", encoding="utf-16").write(contents)
should work.

Related

Avoid date changes in Zipfile.write

Looking at Zipfile module, I'm trying to figure out why the content of zipfile changes when I recreate a file with the same content
Here's a sample code I'm working on:
import os
import hashlib
import zipfile
from io import BytesIO
FILE_PATH = './'
SAMPLE_FILE = "zip_test123.txt"
# create an empty file
new_file = FILE_PATH+"/"+SAMPLE_FILE
try:
open(new_file, 'x')
except FileExistsError:
os.remove(new_file)
open(new_file, 'x')
full_path = os.path.expanduser(FILE_PATH)
# zip it
data = BytesIO()
with zipfile.ZipFile(data, mode='w') as zf:
zf.write(os.path.join(full_path, SAMPLE_FILE), SAMPLE_FILE)
zip_cntn = data.getvalue()
data.close()
print(zip_cntn)
print(hashlib.md5(zip_cntn).hexdigest())
This first creates an empty file, then zip it and prints out the hash of zipped data.
Running this multiple times results in differnt contents/hash, which I think is caused by modification date (my assumption is based on this which shows the Modified date as well)
I'm only interested in zipping the actual contents, and not anything else (e.g. hash should stay the same if I recreate the same content for a given file)
Any suggestion how to achieve this goal/ignore extra info while archiving a file?

Need some assistance on a DBF "File not found" error in Python when looping through a directory?

I would like to ask for help with a Python script that is supposed to loop through a directory on a drive. Basically, what I want to do is convert over 10,0000 DBF files to CSV. So far, I can achieve this on an individual dbf file by using using the dbfread and Pandas packages. Running this script over 10,000 individual times is obviously not feasible, hence why I want automate the task by writing a script that will loop through each dbf file in the directory.
Here is what I would like to do.
Define the directory
Write a for loop that will loop through each file in the directory
Only open a file with the extension '.dbf'
Convert to Pandas DataFrame
Define the name for the output file
Write to CSV and place file in a new directory
Here is the code that I was using to test whether I could convert an individual '.dbf' file to a CSV.
from dbfread import DBF
import pandas as pd
table = DBF('Name_of_File.dbf')
#I originally kept receiving a unicode decoding error
#So I manually adjusted the attributes below
table.encoding = 'utf-8' # Set encoding to utf-8 instead of 'ascii'
table.char_decode_errors = 'ignore' #ignore any decode errors while reading in the file
frame = pd.DataFrame(iter(table)) #Convert to DataFrame
print(frame) #Check to make sure Dataframe is structured proprely
frame.to_csv('Name_of_New_File')
The above code worked exactly as it was intended.
Here is my code to loop through the directory.
import os
from dbfread import DBF
import pandas as pd
directory = 'Path_to_diretory'
dest_directory = 'Directory_to_place_new_file'
for file in os.listdir(directory):
if file.endswith('.DBF'):
print(f'Reading in {file}...')
dbf = DBF(file)
dbf.encoding = 'utf-8'
dbf.char_decode_errors = 'ignore'
print('\nConverting to DataFrame...')
frame = pd.DataFrame(iter(dbf))
print(frame)
outfile = frame.os.path.join(frame + '_CSV' + '.csv')
print('\nWriting to CSV...')
outfile.to_csv(dest_directory, index = False)
print('\nConverted to CSV. Moving to next file...')
else:
print('File not found.')
When I run this code, I receive a DBFNotFound error that says it couldn't find the first file in the directory. As I am looking at my code, I am not sure why this is happening when it worked in the first script.
This is the code from the dbfread package from where the exception is being raised.
class DBF(object):
"""DBF table."""
def __init__(self, filename, encoding=None, ignorecase=True,
lowernames=False,
parserclass=FieldParser,
recfactory=collections.OrderedDict,
load=False,
raw=False,
ignore_missing_memofile=False,
char_decode_errors='strict'):
self.encoding = encoding
self.ignorecase = ignorecase
self.lowernames = lowernames
self.parserclass = parserclass
self.raw = raw
self.ignore_missing_memofile = ignore_missing_memofile
self.char_decode_errors = char_decode_errors
if recfactory is None:
self.recfactory = lambda items: items
else:
self.recfactory = recfactory
# Name part before .dbf is the table name
self.name = os.path.basename(filename)
self.name = os.path.splitext(self.name)[0].lower()
self._records = None
self._deleted = None
if ignorecase:
self.filename = ifind(filename)
if not self.filename:
**raise DBFNotFound('could not find file {!r}'.format(filename))** #ERROR IS HERE
else:
self.filename = filename
Thank you any help provided.
os.listdir returns the file names inside the directory, so you have to join them to the base path to get the full path:
for file_name in os.listdir(directory):
if file_name.endswith('.DBF'):
file_path = os.path.join(directory, file_name)
print(f'Reading in {file_name}...')
dbf = DBF(file_path)

How to create and write into a file correctly in Python

I am trying to create a file in a certain directory, and save the name of that file with today's date.
I am having some issue, where the file is created, but the title line that I want to write in, does not work.
from datetime import datetime
today = datetime.now().date().strftime('%Y-%m-%d')
g = open(path_prefix+today+'.csv', 'w+')
if os.stat(path_prefix+today+'.csv').st_size == 0: # this checks if file is empty
g = open(path_prefix+today+'.csv', 'w+')
g.write('Title\r\n')
path_prefix is just a path to the directory I am saving in /Users/name/Documents/folder/subfolder/
I am expecting a file 2019-08-22.csv to be saved in the directory given by path_prefix with a title as specified in the last line of the code above.
What I am getting is an empty file, and if I run the code again then the title is appended into the file.
As mentioned by #sampie777 I was not losing the file after writing to it, which is why the changes were not being saved when I opened the file. Adding close in an extra line solves the issue that I was having
from datetime import datetime
today = datetime.now().date().strftime('%Y-%m-%d')
g = open(path_prefix+today+'.csv', 'w+')
if os.stat(path_prefix+today+'.csv').st_size == 0: #this checks if file is empty
g = open(path_prefix+today+'.csv', 'w+')
g.write('Title\r\n')
g.close()
I am sure there are plenty of other ways to do this
You need to close the file before the content will be written to it. So call
g.close().
I can suggest to use:
with open(path_prefix+today+'.csv', 'w+') as g:
g.write('...')
This will automatically handle closing the file for you.
Also, why are you opening the file two times?
Tip: I see you are using path_prefix+today+'.csv' a lot. Create a variable for this, so you're code will be a lot easier to maintain.
Suggested refactor of the last lines:
output_file_name = path_prefix + today + '.csv' # I prefer "{}{}.csv".format(path_prefix, today) or "%s%s.csv" % (path_prefix, today)
is_output_file_empty = os.stat(output_file_name).st_size == 0
with open(output_file_name, 'a') as output_file:
if is_output_file_empty:
output_file.write('Title\r\n')
For more information, see this question: Correct way to write line to file?
and maybo also How to check whether a file is empty or not?
I haven't used Python in a while, but by doing a quick bit of research, this seems like it could work:
# - Load imports
import os
import os.path
from datetime import datetime
# - Get the date
dateToday = datetime.now().date()
# - Set the savePath / path_prefix
savePath = 'C:/Users/name/Documents/folder/subfolder/'
fileName = dateToday.strftime("%Y-%m-%d") # - Convert 'dateToday' to string
# - Join path and file name
completeName = os.path.join(savePath, fileName + ".csv")
# - Check for file
if (not path.exists(completeName)):
# - If it doesn't exist, write to it and then close
with (open(completeName, 'w+') as file):
file.write('Title\r\n')
else:
print("File already exists")

Copy Contents of File Into a New File

I am trying to write a Python script that takes the contents of a text file, and copies it into a new file that the program creates itself.
This is the code I am testing at the moment:
from datetime import datetime
errorLogPath = datetime.strftime(datetime.now(), '%Y%m%d_%H:%M') + ".log"
with open("Report.log") as logFile:
with open(errorLogPath, 'w') as errorLog:
for line in logFile:
errorLog.write(line)
Currently the new file is created, but it is completely blank and has the wrong filename. The filename should be YYYYMMDD_HH:MM.log instead I am getting a filename which does not show the minutes and the file is empty.
EDIT: Removed an unnecessary if statement, but the code is still not functioning :\
The simplest way to copy the file in python without using shutil module is:
with open("Report.log") as logFile, open(errorLogPath, 'w') as errorLog:
errorlog.writelines(logFile)
To use the shutil module:
import shutil
shutil.copy("Report.log", errorLogPath)
The problem is in your path name, : is a reserved characters in windows, here is the whole list:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
(asterisk)
the colon is referred as:
A disk designator with a backslash, for example "C:\" or "d:\".
Therefore, the correct solution will be to change your errorLogPath to remove the : character.
Then, the best way to copy a file is yo use copy
from datetime import datetime
from shutil import copy
error_log_path = datetime.strftime(datetime.now(), '%Y%m%d_%H_%M') + ".log"
log_file_path = "Report.log"
copy(log_file_path, error_log_path)
Note:
You can open several files with a single with statement.
It is better not to use lower_case rather than camelCase for variable's name in python.
try this, this has worked for me:
from datetime import datetime
import csv
errorLogPath = datetime.strftime(datetime.now(), '%Y%m%d_%H:%M') + ".log"
ff = open(errorLogPath, 'w')
csvwriter = csv.writer(ff)
with open("Report.log","r") as logFile:
reader = csv.reader(logFile)
for line in reader:
if "ROW" in line:
csvwriter.writerow(line)
else:
continue
ff.close()

DBF - encoding cp1250

I have dbf database encoded in cp1250 and I am reading this database using folowing code:
import csv
from dbfpy import dbf
import os
import sys
filename = sys.argv[1]
if filename.endswith('.dbf'):
print "Converting %s to csv" % filename
csv_fn = filename[:-4]+ ".csv"
with open(csv_fn,'wb') as csvfile:
in_db = dbf.Dbf(filename)
out_csv = csv.writer(csvfile)
names = []
for field in in_db.header.fields:
names.append(field.name)
#out_csv.writerow(names)
for rec in in_db:
out_csv.writerow(rec.fieldData)
in_db.close()
print "Done..."
else:
print "Filename does not end with .dbf"
Problem is, that final csv file is wrong. Encoding of the file is ANSI and some characters are corrupted. I would like to ask you, if you can help me how to read dbf file correctly.
EDIT 1
I tried different code from https://pypi.python.org/pypi/simpledbf/0.2.4, there is some error.
Source 2:
from simpledbf import Dbf5
import os
import sys
dbf = Dbf5('test.dbf', codec='cp1250');
dbf.to_csv('junk.csv');
Output:
python program2.py
Traceback (most recent call last):
File "program2.py", line 5, in <module>
dbf = Dbf5('test.dbf', codec='cp1250');
File "D:\ProgramFiles\Anaconda\lib\site-packages\simpledbf\simpledbf.py", line 557, in __init__
assert terminator == b'\r'
AssertionError
I really don't know how to solve this problem.
Try using my dbf library:
import dbf
with dbf.Table('test.dbf') as table:
dbf.export(table, 'junk.csv')
I wrote simpledbf. The line that is causing you problems was from some testing I was doing when developing the module. First of all, you might want to update your installation, as 0.2.6 is the most recent. Then you can try removing that particular line (#557) from the file "D:\ProgramFiles\Anaconda\lib\site-packages\simpledbf\simpledbf.py". If that doesn't work, you can ping me at the GitHub repo for simpledbf, or you could try Ethan's suggestion for the dbf module.
You can decode and encode as necessary. dbfpy assumes strings are utf8 encoded, so you can decode as it isn't that encoding and then encode again with the right encoding.
import csv
from dbfpy import dbf
import os
import sys
filename = sys.argv[1]
if filename.endswith('.dbf'):
print "Converting %s to csv" % filename
csv_fn = filename[:-4]+ ".csv"
with open(csv_fn,'wb') as csvfile:
in_db = dbf.Dbf(filename)
out_csv = csv.writer(csvfile)
names = []
for field in in_db.header.fields:
names.append(field.name)
#out_csv.writerow(names)
for rec in in_db:
row = [i.decode('utf8').encode('cp1250') if isinstance(i, str) else i for i in rec.fieldData]
out_csv.writerow(rec.fieldData)
in_db.close()
print "Done..."
else:
print "Filename does not end with .dbf"

Categories