Files comparison optimization

Files comparison optimization - python

I have to do the files comparison for the huge(10-20 millions) set of records.
Requirement explanation:
For the files comparison, there will be two files to do the comparison
and find the different records.
The files type are : .txt , .csv , .xlsx , .mdb or .accdb
The File 1 can be any type as mentioned in the first point.
The File 2 can be any type as mentioned in the first point.
The delimiter for File 1 or File 2 are unknown, it may be any from ~^;|.
Each file is having more than 70 columns in each.
File 1 is older than File 2 in terms of records. File 1 may have 10 million and File 2 may have 10.2 millions of records.
Need to create File 3, which consists of different records(for example 0.2 million of records from point 6) from File 1 to File 2 with the column header.
My Try: I have used SET for collecting data from both the files(File1 and File2) and done the comparison
using for and if condition.
import pyodbc
import os.path
import string
import re
import sys
import time
from datetime import datetime
# Function for Do you want to continue
def fun_continue():
# If you want to continue
yesno = raw_input('\nDo you want to continue(Y/N)?')
if yesno == 'Y':
fun_comparison()
else:
sys.exit()
def fun_comparison():
# Getting Input Value's
file1 = raw_input('Enter the file1 name with path:')
file_extension_old = os.path.splitext(file1)[1]
#Condition check for the File extension, if it's ACCESS DB then ask for the table name
if (file_extension_old == ".accdb") or (file_extension_old == ".mdb"):
table_name_old = raw_input('Enter table name:')
file2 = raw_input('Enter the latest file name:')
file_extension_latest = os.path.splitext(file2)[1]
#Condition check for the File extension, if it's ACCESS DB then ask for the table name
if (file_extension_latest == ".accdb") or (file_extension_latest == ".mdb"):
table_name_latest = raw_input('Enter table name:')
file3 = raw_input('Give the file name to store the comparison result:')
print('Files comparison is running! Please wait...')
# Duration Calculation START TIME
start_time = datetime.now()
# Code for file Comparison
try:
#Condition check for the ACCESS FILE -- FILE 1
if (file_extension_old == ".accdb") or (file_extension_old == ".mdb"):
conn_string_old = r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+file1+';'
con_old = pyodbc.connect(conn_string_old)
cur_old = con_old.cursor()
#Getting Column List
res_old = cur_old.execute('SELECT * FROM '+table_name_old+' WHERE 1=0')
column_list = [tuple(map(str, record_new))[0] for record_new in res_old.description]
column_list = ';'.join(column_list)
#For Getting Data
SQLQuery_old = 'SELECT * FROM '+table_name_old+';'
rows_old = cur_old.execute(SQLQuery_old).fetchall()
records_old = [tuple(map(str,record_old)) for record_old in rows_old]
records_old = [";".join(t) + "\n" for t in records_old]
records_old = set(records_old)
records_old = map(str.strip, records_old)
#print records_old
else:
with open(file1) as a:
column_list = a.readline()
column_list = re.sub(r"[;,|^~]", ";", column_list)
a = set(a)
sete = map(str.strip, a)
setf = [re.sub(r"[;,|^~]", ";", s) for s in sete]
records_old = [";".join(map(str.strip, i.split(";"))) for i in setf]
#Condition check for the ACCESS FILE -- FILE 2
if (file_extension_latest == ".accdb") or (file_extension_latest == ".mdb"):
conn_string_new = r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+file2+';'
con_new = pyodbc.connect(conn_string_new)
cur_new = con_new.cursor()
#Getting Column List
res_new = cur_new.execute('SELECT * FROM '+table_name_latest+' WHERE 1=0')
column_list = [tuple(map(str, record_new))[0] for record_new in res_new.description]
column_list = ';'.join(column_list)
SQLQuery_new = 'SELECT * FROM '+table_name_latest+';'
rows_new = cur_new.execute(SQLQuery_new).fetchall()
records_new = [tuple(map(str,record_new)) for record_new in rows_new]
records_new = [";".join(t) + "\n" for t in records_new]
records_new = set(records_new)
records_new = map(str.strip, records_new)
#print records_new
else:
with open(file2) as b:
column_list = b.readline()
column_list = re.sub(r"[;,|^~]", ";", column_list)
b = set(b)
sete = map(str.strip, b)
setf = [re.sub(r"[;,|^~]", ";", s) for s in sete]
records_new = [";".join(map(str.strip, i.split(";"))) for i in setf]
column_list = column_list.strip()
column_list = column_list.replace('; ', ';').strip(' ')
with open(file3, 'w') as result:
result.write(column_list + '\n')
for line in records_new:
if line not in records_old:
result.write(line + '\n')
except Exception as e:
print('\n\nError! Files Comparison completed unsuccessfully.')
print('\nError Details:')
print(e)
# Duration calculation END TIME
end_time = datetime.now()
print('Duration: {}'.format(end_time - start_time))
# Calling Continue function
fun_continue()
# Calling Comparison function
fun_comparison()
input()
Problem:
The code is working fine for small records which i did for testing but its not optimal for the huge records.
System is getting hang.
Consuming more memory as shown below in the screenshot:

Related

I Want to Compare two XML Files Using Python and Print Common attribute Values and Uncommon attribute Values in both the files

As Iam New to Python I need some help to compare two XML files.
These are the Following Conditions:
To print Common fullPath Name and Name (fullPath and Name are the attributes present in the XML file) between the two XML files.
To print the values which is present in only first file and not in second file.
To print the values which is present in only second file and not in first file.
Later, Have to print this output in excel file having different sheets.
for example (1st condition in sheet 1, 2nd condition in sheet2, 3rd condition in sheer3 of the same excel file.)
Can please anyone help me with the code that satisfies the above condition which I have mentioned.
This is the code which I have tried.
from lxml import etree
Base = etree.parse('Base.xml')
Target = etree.parse('Target.xml')
Base_fullPath = Base.xpath("//Member/#fullPath")
Target_fullPath = Target.xpath("//Member/#fullPath")
Base_name = Base.xpath("//Member/#name")
Target_name = Target.xpath("//Member/#name")
def match(Base_fullPath, Target_fullPath, Base_name,Target_name):
Base_fullPath_set = set(Base_fullPath)
Target_fullPath_set = set(Target_fullPath)
Base_name_set = set(Base_name)
Target_name_set = set(Target_name)
if (Base_fullPath_set & Target_fullPath_set, Base_name_set & Target_name_set):
x = open('C:\\Users\\pvl\\Desktop\\New folder\\Common_FullPath.csv', 'w')
y=(Base_fullPath_set & Target_fullPath_set)
z=(Base_name_set & Target_name_set)
print("common details Full Path: \n", *y, sep='\n', file = x)
print("\n")
x = open('C:\\Users\\pvl\\Desktop\\New folder\\Common_name.csv', 'w')
print("\n common details Name: \n", *z, sep='\n', file=x)
else:
print("No Matches Found")
match(Base_fullPath, Target_fullPath, Base_name,Target_name)
def non_match_elements(list_base, list_target):
non_match_base = []
non_match_target = []
for i in list_base:
if i not in list_target:
non_match_base.append(i)
for i in list_target:
if i not in list_base:
non_match_target.append(i)
return non_match_base
return non_match_target
list_base = Base.xpath("//Member/#*")
list_target = Target.xpath("//Member/#*")
non_match_base = non_match_elements(list_base, list_target)
x = open('C:\\Users\\pvl\\Desktop\\New folder\\Present_in_base.csv', 'w')
print("\n Base Details: \n", *non_match_base, sep='\n', file = x)
non_match_target = non_match_elements(list_target, list_base)
x = open('C:\\Users\\pvl\\Desktop\\New folder\\Present_in_target.csv', 'w')
print("\n Target Details: \n", *non_match_target, sep='\n', file = x)
import pandas as pd
df = pd.read_csv('C:\\Users\\pvl\\Desktop\\New folder\\Common_FullPath.csv')
df1 = pd.read_csv('C:\\Users\\pvl\\Desktop\\New folder\\Common_name.csv')
df2 = pd.read_csv('C:\\Users\\pvl\\Desktop\\New folder\\Present_in_base.csv', delimiter=';;', on_bad_lines = 'skip', engine = 'python' )
df3 = pd.read_csv('C:\\Users\\pvl\\Desktop\\New folder\\Present_in_target.csv', delimiter=';', on_bad_lines = 'skip', engine = 'python')
with pd.ExcelWriter("C:\\Users\\pvl\\Desktop\\New folder\\combined.xlsx") as writer:
df1.to_excel(writer, sheet_name="Common_name", index=False)
df2.to_excel(writer, sheet_name="base_Details", index=False)
df3.to_excel(writer, sheet_name = "target_Details", index=Fal

using python to extract data from excel

Hi I have a python program that will read data value from an excel file and extract specific data value and create a new excel with the extracted data value.
the excel file that python reads contains 3 columns 1- time, 2-Elapsed Time, and 3-Duct temp.
the duct temperature is stored in this file as
['duct1 temperature', [25.000882991454244, 25.002648974362724, 25.00387452337855, 25.004724896765367, 25.00531481876751, 25.005723932326624, . . . . .]]
This is all in 1 cell (column 3)
each number is a node, so node 1 is extracted as 25.000882991454244
and node 2 is extracted as 25.002648974362724
so the program works fine but I have some issues.
1- is that if I say I want 512 values it gives me an error
Traceback (most recent call last):
File "ColumnExtractor.py", line 126, in <module>
main(0, None)
File "ColumnExtractor.py", line 114, in main
timeData, data = readFile(directory, fileName)
TypeError: cannot unpack non-iterable NoneType object```
2- is that I can only extract each value at a time. so if I wanted to extract the 512 values I will have to do it manually and that will take a long time.
so my question is is there any way to adjust this to extract every data value in the original file and store them in their each own column in the new excel file?
What is the file directory? /Users/jack/Downloads/1copy/untitledfolder
What would you like the new file to be called? results
How many values would you like to take? 4
Duct_temp_values.csv
Which node number to read? 3
Which column is the data in? 3
What would you like the first column to be called? temp
What would you like the final column to be called? time
Which node number to read? 1
Which column is the data in? 3
What would you like the first column to be called? temp2
What would you like the final column to be called? time
Which node number to read? 4
Which column is the data in? 3
What would you like the first column to be called? temp3
What would you like the final column to be called? time
Which node number to read? 5
Which column is the data in? 3
What would you like the first column to be called? temp4
What would you like the final column to be called? time ```
orignal file
results
the code it self is
import os
import csv
import xlwt
def getInputs():
directoryFound = False
csvFileNames = []
convertedDir = ""
while not directoryFound:
convertedDir = ""
folderDir = input("What is the file directory? ")
for char in folderDir:
if char == "\\":
convertedDir += "/"
else:
convertedDir += char
try:
openFolder = os.listdir(convertedDir)
for file in openFolder:
if file[-3:] == "csv":
directoryFound = True
csvFileNames.append(file)
except:
print("That directory doesn't exist.\n")
long = False
while not long:
output = input("What would you like the new file to be called? ")
if len(output) > 0:
long = True
return output, csvFileNames, convertedDir
def readFile(directory, fileName):
found = False
while not found:
try:
file = open(f"{directory}/{fileName}", "r")
found = True
reader = csv.reader(file)
data = []
timeData = []
lineNumber = 0
for line in reader:
if lineNumber % 2 == 1:
data.append(line)
lineNumber += 1
file.close()
column = int(input("Which node number to read? "))
columnWithData = int(input("Which column is the data in? "))
exportArray = []
for i, dataRow in enumerate(data):
timeData.append(dataRow[0])
mainData = dataRow[-1].split("[")[-1]
dataArray = mainData.split(" ")
for i, point in enumerate(dataArray):
newPoint = ""
for char in point:
if char != "," and char != "'" and char != "[" and char != "]":
newPoint += char
dataArray[i] = newPoint
exportArray.append(dataArray[column - 1])
firstName = input("What would you like the first column to be called? ")
secondName = input("What would you like the final column to be called? ")
return [secondName] + timeData, [firstName] + exportArray
except:
print("Couldn't find a file with that name.\n")
def main(incr, openedBook):
book = openedBook
if not openedBook:
book = xlwt.Workbook()
output, csvFile, directory = getInputs()
for fileName in csvFile:
if "junction" not in fileName.lower():
number = int(input("\nHow many values would you like to take? "))
print("\n" + fileName, "\n")
sheet = book.add_sheet(fileName)
for j in range(number):
timeData, data = readFile(directory, fileName)
for i, value in enumerate(data):
if j == 0:
sheet.write(i, number, timeData[i])
sheet.write(i, j, value)
book.save(f"{output}.xls")
if __name__ == "__main__":
main(0, None)

Matching Regex in Python from Excelfile

Im using Regex to match the following excel file and Im struggling with how I can
seperate each row by
Timestamp [0:00:48],
ID 20052A
and the content content (more content)
This is the excel row (one of many, so the ID can vary from row to row and the timestamp as well as the content too)
[0:00:48] 20052A: content (more content)
I get an Error code
AttributeError: 'NoneType' object has no attribute 'group
for matching my ID where I have
(r"^(.+:)(.+)|(r(\w+)?\s*\[(.*)\]\s*(\w+))", c)
Keep in mind that from time to time the ID looks something like this
[0:00:33] 30091aA: (content) 
My whole skript is (cancel out the connection to database)
import os
import re
import pymysql
pymysql.install_as_MySQLdb()
import pandas as pd
import sqlalchemy
def insert_or_update(engine, pd_table, table_name):
inserts = 0
updates = 0
for i in range(len(pd_table)):
vals_with_quotes = ["'" + str(x) + "'" for x in pd_table.loc[i, :].values]
# print(vals_with_quotes)
update_pairs = [str(c) + " = '" + str(v) + "'" for c, v in zip(pd_table.columns, pd_table.loc[i, :])]
query = f"INSERT INTO {table_name} ({', '.join(list(pd_table.columns.values))}) " \
f"VALUES ({', '.join(vals_with_quotes)}) " \
f"ON DUPLICATE KEY UPDATE {', '.join(update_pairs)}"
print(query)
result = engine.execute(query)
if result.lastrowid == 0:
updates += 1
else:
inserts += 1
print(f"Inserted {inserts} rows and updated {updates} rows.")
schema = '---'
alchemy_connect = "---"
engine = sqlalchemy.create_engine(alchemy_connect) # connect to server
engine.execute(f"USE {schema}") # select new db
# engine.execute("SET NAMES UTF8MB4;")
query = "SELECT * FROM .... where ...=..."
pm = pd.read_sql(query, engine)
rootpath = "path/"
for root, dirs, files in os.walk(rootpath):
for file in files:
print(root, dirs, files, file)
d = pd.read_excel(root + file, header=None)
d.drop(columns=[0], inplace=True)
d.rename(columns={1: "content"}, inplace=True)
participants = []
for ix, row in d.iterrows():
c = row["content"]
match = re.search(r"^(.+:)(.+)|(r(\w+)?\s*\[(.*)\]\s*(\w+))", c)
prefix = match.group(1)
only_content = match.group(2)
try:
timestamp = re.search(r"\[(\d{1,2}:\d{1,2}:\d{1,2})\]", prefix).group(1)
except:
timestamp = "-99"
# print(timestamp)
if re.search(r"\s(Versuchsleiter|ersuchsleiter|Versuchsleit|Versuch):", prefix):
id_code = "Versuchsleiter"
else:
starting_digits = re.search(r"^(\d+)", prefix)
id_code = re.search(r"(\d{2,4}.{1,3}):", prefix).group(1)
if hasattr(starting_digits, 'group'):
id_code = starting_digits.group(1) + id_code #
# get pid
participant = pm.loc[pm["id_code"] == id_code, "pid"]
try:
pid = participant.values[0]
except:
pid = "Versuchsleiter"
# print(ix, pid, id_code, only_content, timestamp)
if pid and pid not in participants and pid != "Versuchsleiter":
participants.append(pid)
d.loc[ix, "pid"] = pid
d.loc[ix, "timestamp"] = timestamp
d.loc[ix, "content"] = only_content.strip()
d.loc[ix, "is_participant"] = 0 if pid == "Versuchsleiter" else 1
d = d[["pid", "is_participant", "content", "timestamp"]]
d.loc[(d['pid'] == "Versuchsleiter"), "pid"] = participants[0]
d.loc[(d['pid'] == None), "pid"] = participants[0]
insert_or_update(engine, d, "table of sql")```
I need "Versuchsleiter" since some of the ID's are "Versuchsleiter"
Thank you!

You should take advantage from using capturing groups.
All the initial regex matching (after c = row["content"] and before # get pid) can be done with
match = re.search(r"^\[(\d{1,2}:\d{1,2}:\d{1,2})]\s+(\w+):\s*(.*)", c)
if match:
timestamp = match.group(1)
id_code = match.group(2)
only_content = match.group(3)
if re.search(r"(?:Versuch(?:sleit(?:er)?)?|ersuchsleiter)", id_code):
id_code = "Versuchsleiter"
Your timestamp will be 0:00:33, only_content will hold (content) and id_code will contain 30091aA.
See the regex demo

Thank you for your help but this gives me the following error
Traceback (most recent call last):
File "C:/Users/.../PycharmProjects/.../.../....py", line 80, in <module>
insert_or_update(engine, d, "sql table")
TypeError: not enough arguments for format string

Returning a row that matches specified condition, and edit particular columns in row. Then write to csv file with changed row

I'm writing a python script that works with two csv files. Lets call them csv1.csv (original file to read) and csv2.csv (exact copy of csv1). The goal is to find the row and column in the csv file that corresponds to the the modified user-defined input.
csv format:(continues for about 2-3 thousand lines)
record LNLIM, ID_CO,OD_DV,ID_LN, ST_LN, ZST_LN, ID_LNLIM,LIMIT1_LNLIM, LIMIT2_LNLIM, LIMIT3_LNLIM
LNLIM, 'FPL', 'SOUT', '137TH_LEVEE_B', 'B', '137TH_AV', 'LEVEE', 'A', 1000, 1100, 1200
LNLIM, 'FPL', 'SOUT', '137TH_DAVIS_B', 'A', '137TH_AV', 'NEWTON', 'A', 1000, 1100, 1200
...
Let's say that the user is looking for 137TH_AV and NEWTON. I want to be able to go row by row and compare the two columns/row indices ST_LN and ZST_LN. If both columns match what the user inputted then I want to capture which row in the csv file that happened on, and use that information to edit the remaining columns LIMIT1_LNLIM LIMIT2_LNLIM LIMIT3_LNLIM on that row with new analog values.
I want to get the 3 new values provided by the user and edit a specific row, and a specific row element. Once I've found the place to replace the number values I want to overwrite csv2.csv with this edit.
Determining where the line segment is located in the array
import sys
import csv
import os
import shutil
LineSectionNames = []
ScadaNames = []
with open('Vulcan_Imp_Summary.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
LineSectionName = row[1]
ScadaName = row[29]
LineSectionNames.append(LineSectionName)
ScadaNames.append(ScadaName)
#Reformatting arrays for accurate references
LineSectionNames = [character.replace('\xa0', ' ') for character in LineSectionNames]
LineSectionNames = [character.replace('?', '-') for character in LineSectionNames]
ScadaNames = [character.replace('\xa0', ' ') for character in ScadaNames]
#Setting Line Section name as key and Scada name as value
ScadaDict = {}
for i in range(len(LineSectionNames)):
ScadaDict[LineSectionNames[i]] = ScadaNames[i]
#Prompt user for grammatical name of Line Section
print ('Enter the Line Section Name: (Example = Goulds-Princeton) \n')
user_input = input()
#Reference user input to dictionary value to convert input into SCADA format
def reformat():
print ('Searching for Line Section...' + user_input)
if user_input in ScadaDict:
value = ScadaDict[user_input]
print ('\n\t Match!\n')
else:
print ('The Line Section name you have entered was incorrect. Try again. \n Example = Goulds-Princeton')
reformat()
# Copying the exported file from Genesys
path = 'I://PSCO//DBGROUP//PatrickL//'
shutil.copyfile(path + 'lnlim_import.csv', path + 'lnlim_import_c.csv')
#Using the SCADA format to search through csv file
print ('Searching csv file for...' + user_input)
# Reading the copied file
record_lnlims = []
id_cos = []
id_dvs = []
id_lines = []
id_lns = []
st_lns = []
zst_lns = []
id_lnlims = []
limit1_lnlims = []
limit2_lnlims = []
limit3_lnlims = []
with open('lnlim_import_c.csv', 'r') as copy:
reader = csv.reader(copy)
for row in reader:
record_lnlim = row[0]
id_co = row[1]
id_dv = row[2]
id_line = row[3]
id_ln = row[4]
st_ln = row[5]
zst_ln = row[6]
id_lnlim = row[7]
limit1_lnlim = row[8]
limit2_lnlim = row[9]
limit3_lnlim = row[10]
record_lnlims.append(record_lnlim)
id_cos.append(id_co)
id_dvs.append(id_dv)
id_lines.append(id_line)
id_lns.append(id_ln)
st_lns.append(st_ln)
zst_lns.append(zst_ln)
id_lnlims.append(id_lnlim)
limit1_lnlims.append(limit1_lnlim)
limit2_lnlims.append(limit2_lnlim)
limit3_lnlims.append(limit3_lnlim)
#Reformatting the user input from GOULDS-PRINCETON to 'GOULDS' and 'PRINCETON'
input_split = user_input.split('-', 1)
st_ln1 = input_split[0]
zst_ln1 = input_split[1]
st_ln2 = st_ln1.upper()
zst_ln2 = zst_ln1.upper()
st_ln3 = "'" + str(st_ln2) + "'"
zst_ln3 = "'" + str(zst_ln2) + "'"
#Receiving analog values from user
print ('\n\t Found! \n')
print ('Enter the Specified Emergency Rating (A) for 110% for 7 minutes: ')
limit1_input = input()
print ('Enter the Specified Emergency Rating (A) for 120% for 7 minutes: ')
limit2_input = input()
print ('Enter the Specified Emergency Rating (A) for 130% for 5 minutes: ')
limit3_input = input()
Whenever I print the row_index it prints the initialized value of 0.
i = 0
row_index = 0
for i in range(len(st_lns)):
if st_ln3 == st_lns[i] and zst_ln3 == zst_lns[i]:
row_index = i
print(row_index)
limit1_input = limit1_lnlims[row_index]
limit2_input = limit2_lnlims[row_index]
limit3_input = limit3_lnlims[row_index]
csv_list = []
csv_list.append(record_lnlims)
csv_list.append(id_cos)
csv_list.append(id_dvs)
csv_list.append(id_lines)
csv_list.append(st_lns)
csv_list.append(zst_lns)
csv_list.append(id_lnlims)
csv_list.append(limit1_lnlims)
csv_list.append(limit2_lnlims)
csv_list.append(limit3_lnlims)
#Editing the csv file copy to implement new analog values
with open('lnlim_import_c.csv', 'w') as edit:
for x in zip(csv_list):
edit.write("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}\t{7}\t{8}\t{9}\t{10}\n".format(x))

Python CSV module, add column to the side, not the bottom

I am new in python, and I need some help. I made a python script that takes two columns from a file and copies them into a "new file". However, every now and then I need to add columns to the "new file". I need to add the columns on the side, not the bottom. My script adds them to the bottom. Someone suggested using CSV, and I read about it, but I can't make it in a way that it adds the new column to the side of the previous columns. Any help is highly appreciated.
Here is the code that I wrote:
import sys
import re
filetoread = sys.argv[1]
filetowrite = sys.argv[2]
newfile = str(filetowrite) + ".txt"
openold = open(filetoread,"r")
opennew = open(newfile,"a")
rline = openold.readlines()
number = int(len(rline))
start = 0
for i in range (len(rline)) :
if "2theta" in rline[i] :
start = i
for line in rline[start + 1 : number] :
words = line.split()
word1 = words[1]
word2 = words[2]
opennew.write (word1 + " " + word2 + "\n")
openold.close()
opennew.close()
Here is the second code I wrote, using CSV:
import sys
import re
import csv
filetoread = sys.argv[1]
filetowrite = sys.argv[2]
newfile = str(filetowrite) + ".txt"
openold = open(filetoread,"r")
rline = openold.readlines()
number = int(len(rline))
start = 0
for i in range (len(rline)) :
if "2theta" in rline[i] :
start = i
words1 = []
words2 = []
for line in rline[start + 1 : number] :
words = line.split()
word1 = words[1]
word2 = words[2]
words1.append([word1])
words2.append([word2])
with open(newfile, 'wb') as file:
writer = csv.writer(file, delimiter= "\n")
writer.writerow(words1)
writer.writerow(words2)
These are some samples of input files:
https://dl.dropbox.com/u/63216126/file5.txt
https://dl.dropbox.com/u/63216126/file6.txt
My first script works "almost" great, except that it writes the new columns at the bottom and I need them at side of the previous columns.

The proper way to use writerow is to give it a single list that contains the data for all the columns.
words.append(word1)
words.append(word2)
writer.writerow(words)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Files comparison optimization - python

Related

I Want to Compare two XML Files Using Python and Print Common attribute Values and Uncommon attribute Values in both the files

using python to extract data from excel

Matching Regex in Python from Excelfile

Returning a row that matches specified condition, and edit particular columns in row. Then write to csv file with changed row

Python CSV module, add column to the side, not the bottom

Categories

Resources