Multi-threading list iterating for loop

Multi-threading list iterating for loop - python

this function reads from a text file and re-formats the contents, and then writes the contents to a csv. I'm trying to use threading to multi-thread the for i in lines loop, this is the longest part of a larger script and takes up most of the run time because the list lines contains thousands of elements. Can someone help me straighten this out? Doing this synchronously instead of in parallel is taking up tons of time. I have seen many other answers to similar questions but I've yet to understand the answers and implement them correctly so far.
def sheets(i):
# time format for spreadsheet
dt_time = datetime.now().strftime('%m/%d|%H:%M')
# for league name (NFL,NBA,NHL ETC.) in list containing league names
for league_name in leagues2:
league_name = league_name.split('|')[0]
with open(final_stats_path, 'r+') as lines:
lines = lines.readlines()
# i = one long string containg details about the event in the loop, eg. sport, game day, game id, home team name
for i in lines:
i = i.split(',')
minprice = i[6]
totaltix = i[5]
event_date = i[2]
try:
dayofweek = datetime.strptime(event_date, '%Y-%m-%d').strftime('%A')
except:
continue
event_date = i[2][2:]
event_date = str(event_date).split('-')
event_date = event_date[1]+'/'+event_date[2]
sport = i[4]
event = i[1].replace('Basketball','').replace('\n','')
away = i[8].replace('Basketball', '').replace('\n','')
eventid = i[0]
event_home = i[9].replace('Basketball', '').replace('\n','')
event = event.split(' at ')[0]
tixdata = str(totaltix)
eventid = 'https://pro.stubhub.com/simweb/sim/services/priceanalysis?eventId='+str(eventid)+'&sectionId=0'
directory = root+'\data'+'\\'+sport+'\\'
report = directory+'report.xlsx'
fname = directory+'teams.txt'
eventleague = sport
f = open(directory+'acronym.txt', 'r+')
lines_2 = f.readlines()
for qt in lines_2:
qt = qt.split('-')
compare = qt[1]
if event_home in compare:
event_home = qt[0]
else:
pass
troop = []
d = {
'ID' : eventid,
'Date' : event_date,
'Day' : dayofweek,
'Away' : away,
}
s = {
'time' : tixdata
}
numbers = event_home+'.txt'
numbers_new = 'bk\\bk_'+numbers
with open(directory+numbers_new, 'a+') as y:
pass
with open(directory+numbers, 'a+') as o:
pass
with open(directory+numbers, 'r+') as g:
for row in g:
if str(eventid) in row:
#print('the event is in the list')
row_update = row.replace('}', ", '"+dt_time+"': '"+tixdata+"'}")
with open(directory+numbers_new, 'a+') as y:
y.write(row_update)
break
else:
with open(directory+numbers, 'a+') as p:
#print('the event is not in the list')
p.write(str(d)+'\n')
with open(directory+numbers_new, 'a+') as n:
n.write(str(d)+'\n')
sizefile = os.path.getsize(directory+numbers_new)
if sizefile > 0:
shutil.copy(directory+numbers_new, directory+numbers)
open(directory+numbers_new, 'w').close()
else:
pass
df = []
with open(directory+numbers, 'r+') as t:
for row in t:
b = eval(row)
dfs = df.append(b)
df = pd.DataFrame(df)
yark = list(df.columns)[:-5]
zed = ['ID', 'Date', 'Day', 'Away']
columns = zed+yark
try:
df = df[columns]
except:
pass
df.index = range(1, 2*len(df)+1, 2)
df = df.reindex(index=range(2*len(df)))
writer = pd.ExcelWriter(directory+event_home+'.xlsx', engine='xlsxwriter')
try:
df.to_excel(writer, sheet_name=event_home)
except:
continue
workbook = writer.book
worksheet = writer.sheets[event_home]
format1 = workbook.add_format({'num_format': '#,##0.00'})
worksheet.set_column('A:ZZ', 18, format1)
writer.save()
if __name__ == "__main__":
pool = ThreadPool(8) # Make the Pool of workers
results = pool.map(sheets) #Open the urls in their own threads
pool.close() #close the pool and wait for the work to finish
pool.join()
##get_numbers()
##stats_to_csv()
##stats_to_html()
#sheets()

Try changing the following line:
results = pool.map(sheets)
to:
results = pool.map(sheets,range(8))

Related

How to speed up this search script?

IHello,
I have created a python script which aims to complete an excel file (wb) thanks to the first column of this file composed of many references (about 4000). To complete this excel, my script must search each reference (so use a for loop of list references from reading wb file) in two other excel files transformed into dataframe (df_mbom and df_ebom) and fill the specific cells of wb according to the presence or not of the references in df_mbom and df_ebom. If the reference is found, it is necessary to compare the level of the reference and the following line and fill wb accordingly. The created script works very well and it does the job very well.
But the only problem I have is that it takes more than 6 hours to search and fill wb for 1000 references so to process the 4000 references, it would take almost 24 hours! Do you have any suggestions to speed up this program?
Here is the code used:
from multiprocessing.dummy import Pool
def finding_complete(elt):
elt = str(elt)
pos = mylist_ref.index(elt)
print(pos)
item = r'^' + elt + '$'
df_findings = df_mbom[df_mbom['Article'].str.contains(item, case=True, regex=True)]
if df_findings.shape[0] == 0 :
active_sheet.cell(row = 4+pos, column = 19).value = "NOK"
active_sheet.cell(row = 4+pos, column = 18).value = "NOK"
else :
active_sheet.cell(row = 4+pos, column = 19).value = "OK"
boolean_f = df_findings.drop_duplicates(subset = ['Article'],keep = 'first')
ind = boolean_f.index.to_list()
idx = ind[0]
item1 = df_mbom['Niveau'][idx]
item2 = df_mbom['Niveau'][idx + 1]
if item2 > item1 :
active_sheet.cell(row = 4+pos, column = 18).value = "OK"
else :
active_sheet.cell(row = 4+pos, column = 18).value = "NOK"
df_findings2 = df_ebom[df_ebom['Article'].str.contains(item, case=True, regex=True)]
pos = mylist_ref.index(elt)
if df_findings2.shape[0] == 0 :
active_sheet.cell(row = 4+pos, column = 17).value = "NOK"
else :
boolean_f = df_findings2.drop_duplicates(subset = ['Article'],keep = 'first')
ind = boolean_f.index.to_list()
idx = ind[0]
item1 = df_ebom['Niveau'][idx]
item2 = df_ebom['Niveau'][idx + 1]
if item2 > item1 :
active_sheet.cell(row = 4+pos, column = 17).value = "OK"
else :
active_sheet.cell(row = 4+pos, column = 17).value = "NOK"
if __name__ == '__main__':
start = time.time()
path = '100446099_mbom.xlsx'
df_mbom = pd.read_excel(path, sheet_name=0, header=0)
path = '100446099_ebom.xlsx'
df_ebom = pd.read_excel(path, sheet_name=0, header=0)
location = 'DOC#6TERNORrev0.xlsx'
wb = openpyxl.load_workbook(filename=location) #, data_only=True"
active_sheet = wb["DOC#6 toutes regions"]
#Get cell value and put it in a list
mylist_ref = []
for row in active_sheet.iter_rows(min_row=4, max_row=active_sheet.max_row, min_col=2, max_col=2):
for cell in row:
if cell.value == None :
pass
else:
mylist_ref.append(cell.value)
print("Number of references :")
print(len(mylist_ref))
print(" ")
with Pool() as pool: #os.cpu_count())
pool.map(finding_complete,mylist_ref) # correspond à for elt in mylist_ref: do finding_complete
wb.save(location)
wb.close()
final = time.time()
timer = final - start
print(round(timer, 1))
Thanks in advance for your time.

convert the Excel file to json, procces the json, then write it to Excel.

looping through a list of dataframes, writing each element of that list to a new .csv file on disk

I have a list of dataframes and am attempting to export each using the pandas.df.to_csv method to a folder on disk. However, only the last item in the list of dataframes is being written to disk as a .csv
Please see code below:
import pandas as pd
import os
import datetime
from pathlib import Path
CSV_Folder = Path('C:\PA_Boundaries\Tests')
Output = r'C:/PA_Boundaries/test_output'
today = datetime.date.today()
date = today.strftime('%Y%m%d')
try:
dfs = []
for file in os.listdir(CSV_Folder):
df = pd.read_csv(CSV_Folder / file)
dfs.append(df)
new_dfs = []
for df in dfs:
new_df = pd.DataFrame()
new_df['Original Addr string'] = df['StreetConc']
new_df['Addr #'] = df['AddNum']
new_df['Prefix'] = df['StPreDir']
new_df['Street Name'] = df['StName']
new_df['StreetType'] = df['StType']
new_df['Suffix'] = df['StDir']
new_df['Multi-Unit'] = ''
new_df['City'] = df['City']
new_df['Zip Code'] = df['PostCode']
new_df['4'] = df['PostalExt']
new_df['County'] = df['CountyID']
new_df['Addr Type'] = ''
new_df['Precint Part Name'] = ''
new_df['Lat'] = df['X']
new_df['Long'] = df['Y']
replaced_address_names = []
for index, row in new_df.iterrows():
new_row = row['Original Addr string'].replace(',', ' ')
replaced_address_names.append(new_row)
new_df['Original Addr string'] = replaced_address_names
county_id = df.iloc[0, 37]
new_dfs.append(new_df)
for i in range(len(new_dfs)):
new_dfs[i].to_csv(f'{Output}\ADDR_{county_id}_{date}.csv', index=False)
except FileNotFoundError:
print(f'{file} not found in {CSV_Folder}')
except PermissionError:
print('Check syntax of paths')
else:
print('Process Complete')
new_dfs contains the correct number of dataframes. However, when looping through the new list of dataframes and calling .to_csv on each item in the list, only the last item in the list is written to the disk.

The problem lies in the way in which you name your exported file.
After running through the loop, county_id will be equal to the last county_id, or the county_id of the last iterated df.
Since the name of your exported dataframe is {Output}\ADDR_{county_id}_{date}.csv, all the exported files are being named by the same count_id and date, or in other words, they are being rewritten.
To avoid this, you can create a new list called county_ids and then use the last loop to change the name of the saved file. This would be your resulting code:
import pandas as pd
import os
import datetime
from pathlib import Path
CSV_Folder = Path('C:\PA_Boundaries\Tests')
Output = r'C:/PA_Boundaries/test_output'
today = datetime.date.today()
date = today.strftime('%Y%m%d')
try:
dfs = []
for file in os.listdir(CSV_Folder):
df = pd.read_csv(CSV_Folder / file)
dfs.append(df)
new_dfs, county_ids = [], []
for df in dfs:
new_df = pd.DataFrame()
new_df['Original Addr string'] = df['StreetConc']
new_df['Addr #'] = df['AddNum']
new_df['Prefix'] = df['StPreDir']
new_df['Street Name'] = df['StName']
new_df['StreetType'] = df['StType']
new_df['Suffix'] = df['StDir']
new_df['Multi-Unit'] = ''
new_df['City'] = df['City']
new_df['Zip Code'] = df['PostCode']
new_df['4'] = df['PostalExt']
new_df['County'] = df['CountyID']
new_df['Addr Type'] = ''
new_df['Precint Part Name'] = ''
new_df['Lat'] = df['X']
new_df['Long'] = df['Y']
replaced_address_names = []
for index, row in new_df.iterrows():
new_row = row['Original Addr string'].replace(',', ' ')
replaced_address_names.append(new_row)
new_df['Original Addr string'] = replaced_address_names
county_ids.append(df.iloc[0, 37])
new_dfs.append(new_df)
for i in range(len(new_dfs)):
new_dfs[i].to_csv(f'{Output}\ADDR_{county_id[i]}_{date}.csv', index=False)
except FileNotFoundError:
print(f'{file} not found in {CSV_Folder}')
except PermissionError:
print('Check syntax of paths')
else:
print('Process Complete')

Obviously I cannot test this - if you do run it there maybe lines that need tweaking. However, I'd do the code something like the below. Basically I'd call a function to replace as I'm opening and write out immediately.
If you can get it working it will probably be faster and reads slightly better as there are less lines.
Example:
import pandas as pd
import os
import datetime
from pathlib import Path
CSV_Folder = Path(r'C:/PA_Boundaries/Tests')
Output = r'C:/PA_Boundaries/test_output/'
today = datetime.date.today()
date = today.strftime('%Y%m%d')
def updateFrame(f):
new_df = pd.DataFrame()
new_df['Original Addr string'] = f['StreetConc']
new_df['Addr #'] = f['AddNum']
new_df['Prefix'] = f['StPreDir']
new_df['Street Name'] = f['StName']
new_df['StreetType'] = f['StType']
new_df['Suffix'] = f['StDir']
new_df['Multi-Unit'] = ''
new_df['City'] = f['City']
new_df['Zip Code'] = f['PostCode']
new_df['4'] = f['PostalExt']
new_df['County'] = f['CountyID']
new_df['Addr Type'] = ''
new_df['Precint Part Name'] = ''
new_df['Lat'] = f['X']
new_df['Long'] = f['Y']
# better way to replace without looping the rows...
new_df['Original Addr string'] = new_df['Original Addr string'].str.replace(',', ' ')
return new_df
for file in os.listdir(CSV_Folder):
working_file = str(CSV_Folder) + '/' + file
if working_file.endswith('.csv'):
try:
df = pd.read_csv(working_file)
county_id = str(df.iloc[0, 37])
# the function returns a frame so you can treat it as such...
updateFrame(df).to_csv(f'{Output}ADDR_{county_id}_{date}.csv', index=False)
except FileNotFoundError:
print(f'{file} not found in {CSV_Folder}')
except PermissionError:
print('Check syntax of paths')
else:
print('Process Complete')

how to import state of pandas dataframe to second .py file

so, toward the end of my first file; we'll call /file.py.
def get_excel_data(self):
"""Places excel data into pandas dataframe"""
# excel_data = pandas.read_excel(self.find_file())
for extracted_archive in self.find_file():
excel_data = pandas.read_excel(extracted_archive)
# print(excel_data)
columns = pandas.DataFrame(columns=excel_data.columns.tolist())
excel_data = pandas.concat([excel_data, columns])
excel_data.columns = excel_data.columns.str.strip()
excel_data.columns = excel_data.columns.str.replace("/", "_")
excel_data.columns = excel_data.columns.str.replace(" ", "_")
total_records = 0
num_valid_records = 0
num_invalid_records = 0
for row in excel_data.itertuples():
mrn = row.MRN
total_records += 1
if mrn in ("", " ", "N/A", "NaT", "NaN", None) or math.isnan(mrn):
# print(f"Invalid record: {row}")
num_invalid_records += 1
# total_invalid = num_invalid_records + dup_count
excel_data = excel_data.drop(excel_data.index[row.Index])
# continue
else:
# print(mrn) # outputs all MRN ids
for row in excel_data.itertuples():
num_valid_records += 1
continue
with open("./logs/metrics.csv", "a", newline="\n") as f:
csv_writer = DictWriter(f, ['date', 'total_records', 'processed', 'skipped', 'success_rate'])
# csv_writer.writeheader()
currentDT = datetime.datetime.now()
success_rate = num_valid_records / total_records * 100
csv_writer.writerow(dict(date=currentDT,
total_records=total_records,
processed=num_valid_records,
skipped=num_invalid_records,
success_rate=num_valid_records / total_records * 100))
return self.clean_data_frame(excel_data)
def clean_data_frame(self, data_frame):
"""Cleans up dataframes"""
for col in data_frame.columns:
if "date" in col.lower():
data_frame[col] = pandas.to_datetime(data_frame[col],
errors='coerce', infer_datetime_format=True)
data_frame[col] = data_frame[col].dt.date
data_frame['MRN'] = data_frame['MRN'].astype(int).astype(str)
return data_frame
def get_mapping_data(self):
map_data = pandas.read_excel(config.MAPPING_DOC, sheet_name='main')
columns = pandas.DataFrame(columns=map_data.columns.tolist())
return pandas.concat([map_data, columns])
in my second file I would like to keep that end state; and do another iteration for instance.... second_file.py
def process_records(self, records, map_data, completed=None, errors=None):
"""Code to execute after webdriver initialization."""
series_not_null = False
try:
num_attempt = 0
for record in data_frame.itertuples(): # not working
print(record)
series_not_null = True
mrn = record.MRN
self.navigate_to_search(num_attempt)
self.navigate_to_member(mrn)
self.navigate_to_assessment()
self.add_assessment(record, map_data)
self.driver.switch_to.parent_frame() # not working
sleep(.5)
error_flag = self.close_member_tab(self.driver, mrn, error_flag)
except Exception as exc:
if series_not_null:
errors = self.process_series_error(exc)
return completed, error
both have import pandas

you can save your dataframe in a pickle file like this. it is also worth noting that you can store most anything in a pickle file. here is a link to some info here: pickle info
import pandas as pd
import pickle
x = pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]})
#this will create a file called pickledata.p that will store the data frame
with open('pickledata.p', 'wb') as fh: #notice that you need the 'wb' for the dump
pickle.dump(x, fh)
#to load the file do this
with open('pickledata.p', 'rb') as fh: #you need to use 'rb' to read
df = pickle.load(fh)
#you can now use df like a normal dataframe
print(df)
you dont actually need the '.p' extension for a pickle file, i just like it.
so you save your dataframe at the end of script one, and then load it in at the start of script 2.

Use Dataframe.to_pickle and pandas.read_pickle:
To persist
df.to_pickle('./dataframe.pkl')
To load
df = pd.read_pickle('./dataframe.pkl')

Files comparison optimization

I have to do the files comparison for the huge(10-20 millions) set of records.
Requirement explanation:
For the files comparison, there will be two files to do the comparison
and find the different records.
The files type are : .txt , .csv , .xlsx , .mdb or .accdb
The File 1 can be any type as mentioned in the first point.
The File 2 can be any type as mentioned in the first point.
The delimiter for File 1 or File 2 are unknown, it may be any from ~^;|.
Each file is having more than 70 columns in each.
File 1 is older than File 2 in terms of records. File 1 may have 10 million and File 2 may have 10.2 millions of records.
Need to create File 3, which consists of different records(for example 0.2 million of records from point 6) from File 1 to File 2 with the column header.
My Try: I have used SET for collecting data from both the files(File1 and File2) and done the comparison
using for and if condition.
import pyodbc
import os.path
import string
import re
import sys
import time
from datetime import datetime
# Function for Do you want to continue
def fun_continue():
# If you want to continue
yesno = raw_input('\nDo you want to continue(Y/N)?')
if yesno == 'Y':
fun_comparison()
else:
sys.exit()
def fun_comparison():
# Getting Input Value's
file1 = raw_input('Enter the file1 name with path:')
file_extension_old = os.path.splitext(file1)[1]
#Condition check for the File extension, if it's ACCESS DB then ask for the table name
if (file_extension_old == ".accdb") or (file_extension_old == ".mdb"):
table_name_old = raw_input('Enter table name:')
file2 = raw_input('Enter the latest file name:')
file_extension_latest = os.path.splitext(file2)[1]
#Condition check for the File extension, if it's ACCESS DB then ask for the table name
if (file_extension_latest == ".accdb") or (file_extension_latest == ".mdb"):
table_name_latest = raw_input('Enter table name:')
file3 = raw_input('Give the file name to store the comparison result:')
print('Files comparison is running! Please wait...')
# Duration Calculation START TIME
start_time = datetime.now()
# Code for file Comparison
try:
#Condition check for the ACCESS FILE -- FILE 1
if (file_extension_old == ".accdb") or (file_extension_old == ".mdb"):
conn_string_old = r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+file1+';'
con_old = pyodbc.connect(conn_string_old)
cur_old = con_old.cursor()
#Getting Column List
res_old = cur_old.execute('SELECT * FROM '+table_name_old+' WHERE 1=0')
column_list = [tuple(map(str, record_new))[0] for record_new in res_old.description]
column_list = ';'.join(column_list)
#For Getting Data
SQLQuery_old = 'SELECT * FROM '+table_name_old+';'
rows_old = cur_old.execute(SQLQuery_old).fetchall()
records_old = [tuple(map(str,record_old)) for record_old in rows_old]
records_old = [";".join(t) + "\n" for t in records_old]
records_old = set(records_old)
records_old = map(str.strip, records_old)
#print records_old
else:
with open(file1) as a:
column_list = a.readline()
column_list = re.sub(r"[;,|^~]", ";", column_list)
a = set(a)
sete = map(str.strip, a)
setf = [re.sub(r"[;,|^~]", ";", s) for s in sete]
records_old = [";".join(map(str.strip, i.split(";"))) for i in setf]
#Condition check for the ACCESS FILE -- FILE 2
if (file_extension_latest == ".accdb") or (file_extension_latest == ".mdb"):
conn_string_new = r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+file2+';'
con_new = pyodbc.connect(conn_string_new)
cur_new = con_new.cursor()
#Getting Column List
res_new = cur_new.execute('SELECT * FROM '+table_name_latest+' WHERE 1=0')
column_list = [tuple(map(str, record_new))[0] for record_new in res_new.description]
column_list = ';'.join(column_list)
SQLQuery_new = 'SELECT * FROM '+table_name_latest+';'
rows_new = cur_new.execute(SQLQuery_new).fetchall()
records_new = [tuple(map(str,record_new)) for record_new in rows_new]
records_new = [";".join(t) + "\n" for t in records_new]
records_new = set(records_new)
records_new = map(str.strip, records_new)
#print records_new
else:
with open(file2) as b:
column_list = b.readline()
column_list = re.sub(r"[;,|^~]", ";", column_list)
b = set(b)
sete = map(str.strip, b)
setf = [re.sub(r"[;,|^~]", ";", s) for s in sete]
records_new = [";".join(map(str.strip, i.split(";"))) for i in setf]
column_list = column_list.strip()
column_list = column_list.replace('; ', ';').strip(' ')
with open(file3, 'w') as result:
result.write(column_list + '\n')
for line in records_new:
if line not in records_old:
result.write(line + '\n')
except Exception as e:
print('\n\nError! Files Comparison completed unsuccessfully.')
print('\nError Details:')
print(e)
# Duration calculation END TIME
end_time = datetime.now()
print('Duration: {}'.format(end_time - start_time))
# Calling Continue function
fun_continue()
# Calling Comparison function
fun_comparison()
input()
Problem:
The code is working fine for small records which i did for testing but its not optimal for the huge records.
System is getting hang.
Consuming more memory as shown below in the screenshot:

List index out of range error in breaking whiloe loop in python

Hi I am new to python and struggling my way out. Currently ia m doing some appending excel files kind of task and here's my sample code. Getting list out of index error as according to me while loop is not breaking at rhe end of each excel file. Any help would be appreciated. Thanks:
import xlrd
import glob
import os
import openpyxl
import csv
from xlrd import open_workbook
from os import listdir
row = {}
basedir = '../files/'
files = listdir('../files')
sheets = [filename for filename in files if filename.endswith("xlsx")]
header_is_written = False
for filename in sheets:
print('Parsing {0}{1}\r'.format(basedir,filename))
worksheet = open_workbook(basedir+filename).sheet_by_index(0)
print (worksheet.cell_value(5,6))
counter = 0
while True:
row['plan name'] = worksheet.cell_value(1+counter,1).strip()
row_values = worksheet.row_slice(counter+1,start_colx=0, end_colx=30)
row['Dealer'] = int(row_values[0].value)
row['Name'] = str(row_values[1].value)
row['City'] = str(row_values[2].value)
row['State'] = str(row_values[3].value)
row['Zip Code'] = int(row_values[4].value)
row['Region'] = str(row_values[5].value)
row['AOM'] = str(row_values[6].value)
row['FTS Short Name'] = str(row_values[7].value)
row['Overall Score'] = float(row_values[8].value)
row['Overall Rank'] = int(row_values[9].value)
row['Count of Ros'] = int(row_values[10].value)
row['Count of PTSS Cases'] = int(row_values[11].value)
row['% of PTSS cases'] = float(row_values[12].value)
row['Rank of Cases'] = int(row_values[13].value)
row['% of Not Prepared'] = float(row_values[14].value)
row['Rank of Not Prepared'] = int(row_values[15].value)
row['FFVt Pre Qrt'] = float(row_values[16].value)
row['Rank of FFVt'] = int(row_values[17].value)
row['CSI Pre Qrt'] = int(row_values[18].value)
row['Rank of CSI'] = int(row_values[19].value)
row['FFVC Pre Qrt'] = float(row_values[20].value)
row['Rank of FFVc'] = int(row_values[21].value)
row['OnSite'] = str(row_values[22].value)
row['% of Onsite'] = str(row_values[23].value)
row['Not Prepared'] = int(row_values[24].value)
row['Open'] = str(row_values[25].value)
row['Cost per Vin Pre Qrt'] = float(row_values[26].value)
row['Damages per Visit Pre Qrt'] = float(row_values[27].value)
row['Claim Sub time pre Qrt'] = str(row_values[28].value)
row['Warranty Index Pre Qrt'] = str(row_values[29].value)
counter += 1
if row['plan name'] is None:
break
with open('table.csv', 'a',newline='') as f:
w=csv.DictWriter(f, row.keys())
if header_is_written is False:
w.writeheader()
header_is_written = True
w.writerow(row)

In place of while True use for.
row['plan name'] = worksheet.cell_value(1 + counter, 1).strip()
row_values = worksheet.row_slice(counter + 1, start_colx=0, end_colx=30)
for values in row_values:
row['Dealer'] = int(values.value)
row['Name'] = str(values.value)
....
because while True means to run this loop infinite time.(or until it means break keyword) inside while loop
Read more about while loop

while True loop basically means: execute the following code block to infinity, unless a break or sys.exit statement get you out.
So in your case, you need to terminate after the lines to append the excel are over (exhausted). You have two options: check if there are more lines to append, and if not break.
A more suitable approach when writing a file is for loops. This kind of a loop terminates when it is exausted.
Also, you should consider gathering the content of the excel in one operation, and save it to a variable. Then, once you have it, create iteration and append it to csv.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multi-threading list iterating for loop - python

Try changing the following line: results = pool.map(sheets) to: results = pool.map(sheets,range(8))

Related

How to speed up this search script?

looping through a list of dataframes, writing each element of that list to a new .csv file on disk

how to import state of pandas dataframe to second .py file

Files comparison optimization

List index out of range error in breaking whiloe loop in python

Categories

Resources