Python - Rename multiple files based on list - python

I am trying to rename a set of 15,000 txt-files. In seperate txt-file I do have a list of the old names of these files and the new names I want them have. But not all of the 15,000 txt-files are listed in the name-list. That is where the problem starts. How do I change my code that it ignores/skips files that are not listed in the name list or those that are duplicates? The codes works fine until it reaches one of these file. Any suggestions? Thanks! This is the code I have so far:
import os
with open("rename3.txt") as fd:
for line in fd:
line = line.strip()
if len(line) == 0: continue
old, new = line.strip().split(",", 1)
os.rename(old.strip() + ".txt", new.strip() + ".txt")

import os
with open("rename3.txt") as fd:
for line in fd if not line.strip():
old, new = line.strip().split(",", 1)
if os.path.exists(old):
os.rename(old.strip() + ".txt", new.strip() + ".txt")

Related

Create hash table from a 2 elements of a file in Python

I'm trying to combine every 2 elements of a txt file, and hash it to create a hash table, using Python. My code is as below:
import hashlib
def SHA1_hash(string):
hash_obj = hashlib.sha1(string.encode())
return(hash_obj.hexdigest())
with open("/Users/admin/Downloads/Project_files/dictionary.txt") as f:
text_file = open("/Users/admin/Downloads/Project_files/text_combined.txt", "w",encoding = 'utf-8')
for i in f.readlines():
for j in f.readlines():
text_c = i.strip() + j.strip()
n = text_file.write(SHA1_hash(text_c) + "\n")
text_file.close()
The file is 64KB (more than 5700 lines). I tried to run the code but it is not working nor showing any errors. The destination file (text_combined.txt) did not have anything either. Can I ask if I am doing it right or wrong?
I am new to Python as well as programming so please excuse me if I ask any bad questions. Thank you so much.
The second f.readlines() has nothing to read, because you've already read the entire file.
Read the file into a list variable, then iterate through the list.
with open("/Users/admin/Downloads/Project_files/dictionary.txt") as f, open("/Users/admin/Downloads/Project_files/text_combined.txt", "w",encoding = 'utf-8') as textfile:
lines = f.readlines():
for i in lines:
for j in lines:
text_c = i.strip() + j.strip()
n = text_file.write(SHA1_hash(text_c) + "\n")

Removing New Line from CSV Files using Python

I obtain multiple CSV files from API, in which I need to remove New Lines present in the CSV and join the record, consider the data provided below;
My Code to remove the New Line:
## Loading necessary libraries
import glob
import os
import shutil
import csv
## Assigning necessary path
source_path = "/home/Desktop/Space/"
dest_path = "/home/Desktop/Output/"
# Assigning file_read path to modify the copied CSV files
file_read_path = "/home/Desktop/Output/*.csv"
## Code to copy .csv files from one folder to another
for csv_file in glob.iglob(os.path.join(source_path, "*.csv"), recursive = True):
shutil.copy(csv_file, dest_path)
## Code to delete the second row in all .CSV files
for filename in glob.glob(file_read_path):
with open(filename, "r", encoding = 'ISO-8859-1') as file:
reader = list(csv.reader(file , delimiter = ","))
for i in range(0,len(reader)):
reader[i] = [row_space.replace("\n", "") for row_space in reader[i]]
with open(filename, "w") as output:
writer = csv.writer(output, delimiter = ",", dialect = 'unix')
for row in reader:
writer.writerow(row)
I actually copy the CSV files into a new folder and then use the above code to remove any new line present in the file.
You are fixing the csv File, because they have wrong \n the problem here is how
to know if the line is a part of the previous line or not. if all lines starts
with specifics words like in your example SV_a5d15EwfI8Zk1Zr or just SV_ You can do something like this:
import glob
# this is the FIX PART
# I have file ./data.csv(contains your example) Fixed version is in data.csv.FIXED
file_read_path = "./*.csv"
for filename in glob.glob(file_read_path):
with open(filename, "r", encoding='ISO-8859-1') as file, open(filename + '.FIXED', "w", encoding='ISO-8859-1') as target:
previous_line = ''
for line in file:
# check if it's a new line or a part of the previous line
if line.startswith('SV_'):
if previous_line:
target.write( previous_line + '\n')
previous_line = line[:-1] # remove \n
else:
# concatenate the broken part with previous_line
previous_line += line[:-1] # remove \n
# add last line
target.write(previous_line + '\n')
Ouput:
SV_a5d15EwfI8Zk1Zr;QID4;"<span style=""font-size:16px;""><strong>HOUR</strong> Interview completed at:</span>";HOUR;TE;SL;;;true;ValidNumber;0;23.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID6;"<span style=""font-size:16px;""><strong>MINUTE</strong> Interview completed:</span>";MIN;TE;SL;;;true;ValidNumber;0;59.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID8;Number of Refusals - no language<br />For <strong>Zero Refusals - no language</strong> use 0;REFUSAL1;TE;SL;;;true;ValidNumber;0;99.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID10;<strong>DAY OF WEEK:</strong>;WEEKDAY;MC;SACOL;TX;;true;;0;;;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID45;"<span style=""font-size:16px;"">Using points from 0 to 10, how likely would you be recommend Gatwick Airport to a friend or colleague?</span><div> </div>";NPSCORE;MC;NPS;;;true;;0;;;882;-873;
EDITS:
Can Be Simpler using split too, this will fix the file it self:
import glob
# this is the FIX PART
# I have file //data.csv the fixed version in the same file
file_read_path = "./*.csv"
# assuming that all lines starts with SV_
STARTING_KEYWORD = 'SV_'
for filename in glob.glob(file_read_path):
with open(filename, "r", encoding='ISO-8859-1') as file:
lines = file.read().split(STARTING_KEYWORD)
with open(filename, 'w', encoding='ISO-8859-1') as file:
file.write('\n'.join(STARTING_KEYWORD + l.replace('\n', '') for l in lines if l))
Well I'm not sure on the restrictions you have. But if you can use the pandas library , this is simple.
import pandas as pd
data_set = pd.read_csv(data_file,skip_blank_lines=True)
data_set.to_csv(target_file,index=False)
This will create a CSV File will all new lines removed. You can save a lot of time with available libraries.

Python regex from txt file

I have a text file, that has data.
PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER=
I want to extract data from that file, where nothing is after the equal sign.
So in my new text file, I want to get
CMD_REINIT
CMD_OLIVIER
How do I do this?
My code is like that righr now.
import os, os.path
DIR_DAT = "dat"
DIR_OUTPUT = "output"
print("Psst go check in the ouptut folder ;)")
for roots, dir, files in os.walk(DIR_DAT):
for filename in files:
filename_output = "/" + os.path.splitext(filename)[0]
with open(DIR_DAT + "/" + filename) as infile, open(DIR_OUTPUT + "/bonjour.txt", "w") as outfile:
for line in infile:
if not line.strip().split("=")[-1]:
outfile.write(line)
I want to collect all data in a single file. It doesn't work. Can anyone help me ?
The third step, it do crawl that new file, and only keep single values. As four files are appended into a single one. Some data might be there four, three, two times.
And I need to keep in a new file, that I will call output.txt. Only the lines that are in common in all the files.
You can use regex:
import re
data = """PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER="""
found = re.findall(r"^\s+(.*)=\s*$",data,re.M)
print( found )
Output:
['CMD_REINIT', 'CMD_OLIVIER']
The expression looks for
^\s+ line start + whitespaces
(.*)= anything before a = which is caputred as group
\s*$ followed by optional whitespaces and line end
using the re.M (multiline) flag.
Read your files text like so:
with open("yourfile.txt","r") as f:
data = f.read()
Write your new file like so:
with open("newfile.txt","w") as f:
f.write(''.join("\n",found))
You can use http://www.regex101.com to evaluate test-text vs regex-patterns, make sure to swith to its python mode.
I suggest you the following short solution using comprehension:
with open('file.txt', 'r') as f, open('newfile.txt', 'w') as newf:
for x in (line.strip()[:-1] for line in f if line.strip().endswith("=")):
newf.write(f'{x}\n')
Try this pattern: \w+(?==$).
Demo
Using a simple iteration.
Ex:
with open(filename) as infile, open(filename2, "w") as outfile:
for line in infile: #Iterate Each line
if not line.strip().split("=")[-1]: #Check for second Val
print(line.strip().strip("="))
outfile.write(line) #Write to new file
Output:
CMD_REINIT
CMD_OLIVIER

Writing comparison to .txt file when comparing sql files

So I'm trying to compare multiple tables using a Python script. The actual comparison is working, tested with print statements, but the write to a .txt file is not. I believe I might have an error in my syntax, though being relatively new to Python, I can't find it.
for num in range(0, 4): #runs through the database array and compares the files in each folder
comp_var = directory + server_number[size] + databases[num]
for file in os.listdir(comp_var):
for num1 in os.listdir(master + databases[num]):
var = master + databases[num] + "\\" + os.listdir(master + databases[num])[size]
for line in open(var, 'r'):
for line2 in open(comp_var + "\\" + file, 'r'):
same = set(line).intersection(line2)
print(same)
same.discard('\n')
with open('results.txt', 'w') as file_out:
for line1 in same:
file_out.write(line1)
size = size + 1
comp_var = directory + server_number[size] + databases[num]
size = 0
Your problem is that you create a new file every time you call open. You should use 'a' to append to a file, which is probably what you want.
You are overwriting the results.txt.
with open('results.txt', 'w') as file_out:
change it to:
with open('results.txt', 'a') as file_out:
from Python documentation:
'w' for only writing (an existing file with the same name will be erased), and 'a' opens the file for appending; any data written to the file is automatically added to the end.

Why loop overwriting my file instead of writing after text?

i = 1 # keep track of file number
directory = '/some/directory/'
for i in range(1, 5170): #number of files in directory
filename = directory + 'D' + str(i) + '.txt'
input = open(filename)
output = open('output.txt', 'w')
input.readline() #ignore first line
for g in range(0, 7): #write next seven lines to output.txt
output.write(input.readline())
output.write('\n') #add newline to avoid mess
output.close()
input.close()
i = i + 1
I have this code, and i am trying to get one file and rewrite it to output.txt, but when i want to attach next file, my code overwrite older file that has been attached. In result when code is complete i have something like this:
dataA[5169]=26
dataB[5169]=0
dataC[5169]=y
dataD[5169]='something'
dataE[5169]=x
data_date[5169]=2012.06.02
Instead of datas ranging from files 0 to 5169. Any tips how to fix it?
You probably want to open output.txt before your for loop (and close it after). As it is written, you overwrite the file output.txt everytime you open it. (an alternative would be to open for appending: output = open('output.txt','a'), but that's definitely not the best way to do it here ...
Of course, these days it's better to use a context manager (with statement):
i = 1 # keep track of file number <-- This line is useless in the code you posted
directory = '/some/directory/' #<-- os.path.join is better for this stuff.
with open('output.txt','w') as output:
for i in range(1, 5170): #number of files in directory
filename = directory + 'D' + str(i) + '.txt'
with open(filename) as input:
input.readline() #ignore first line
for g in range(0, 7): #write next seven lines to output.txt
output.write(input.readline())
output.write('\n') #add newline to avoid mess
i = i + 1 #<---also useless line in the code you posted
Your issue is that you open in write mode. To append to file you want to use append. See here.

Categories