Why is this program losing track of the count? - python

I have two list variables: descriptions and amounts. These lists are created by appending certain parts of each index of a nested list (transaction info). Every Item in Transaction info is a list that follows this template (Date, Description, more info, more info, amount, balance). Here is my code:
dates = []
descriptions = []
amounts = []
#transaction info is a nested list of each row of my excel sheet
transaction_info = select_all(bank_statements)
i=0
#skips the first row because of headers
for row in transaction_info:
if i<1:
i+=1
else:
#breaks down row cell by cell
for cell in row:
if cell == row[0]:
dates.append(cell)
if cell == row[1]:
full_desc = str(cell)+str(row[2])+str(row[3])
if 'None' in full_desc:
none_strip = full_desc.strip('None')
descriptions.append(none_strip)
i+=1
elif full_desc in descriptions:
descriptions.append(f'{full_desc}{i}')
i+=1
else:
descriptions.append(full_desc)
i+=1
if cell == row[4]:
strcell = str(cell)
amounts.append(strcell)
i+=1
For some reason when I run:
for desc in enumerate(descriptions):
print(desc)
for amnt in enumerate(amounts):
print(amnt)
they are different lengths. What am I doing wrong? How is it losing count?
For reference, it works fine up until row 61.
I am using OpenPyxl.
I am expecting the Amounts and Descriptions lists to line up so I can put them back together later.

I figured it out. For whatever reason, if there was a blank cell it would just count the descriptions multiple times and add it to the list unnecessarily. I filled in the empty cells on the excel sheet I was pulling data from and it fixed this bug. I am going to add some conditionals to account for this.

Related

How can I write my table of data row after row to Excel with xlsxwriter

I am having issue iterating rows in xlsxwriter once again. I tried many combination of it but failed. Your help will be much appreciated!
My code;
file = open("namelist.txt","r")
data = []
for d in file:
url = "file:///C:/Users/k/Desktop/HTML/"
t = (url) + d
data.append(t.strip())
file.close()
# data variable becomes --> ['file:///C:/Users/k/Desktop/HTML/new01.html', 'file:///C:/Users/k/Desktop/HTML/new02.html',
# 'file:///C:/Users/k/Desktop/HTML/new03.html', 'file:///C:/Users/k/Desktop/HTML/new04.html']
workbook = xlsxwriter.Workbook("namelist.xlsx")
worksheet = workbook.add_worksheet("Name list")
# this peace of code writes header
for headers in data:
b = data[0]
browser.get(b)
header = browser.find_elements_by_xpath("/html/body/table/tbody/tr[1]/th")
head = []
for h in header:
head.append(h.text)
worksheet.write_row("A1", head)
print ("Headers written to namelist.xlsx succesfully!")
print ("Given Name list being written to namelist.xlsx...")
# this part of the core overrites the tables to excel file
for i in data:
browser.get(i)
trs = browser.find_elements_by_xpath("/html/body/table/tbody/tr")
for n, tr in enumerate(trs):
row=[td.text for td in tr.find_elements_by_tag_name("td")]
print (row)
worksheet.write_row("A{}".format(1+n), row)
print ("namelist.xlsx is ready!")
workbook.close()
Each link in data[ ] contain table of different data with the same format. Like this below
For loop in my code overwrites the rows so I can only see one of the table above in Excel
however I need them to be written by iterating to all link and write row after row like this;
Many thanks in advance!
Apperently you have 1,2,3... values in n in last loop, make some variable with int and increment it in loop like:
line = 1
for tr in trs:
row=[td.text for td in tr.find_elements_by_tag_name("td")]
print (row)
worksheet.write_row("A{}".format(line + 1), row)
line += 1
Thanks very much for the help, It worked out actually however adds space for each loop so there are empty rows in Excel now. I have tried to change; worksheet.write_row("A{}".format(line), row)
but didn't work.
Any idea on how to avoid empty rows for the each loop?

I wrote a python script to dedupe a csv and I think it's 90% working. Could really use some help troubleshooting one issue

The code is supposed to find duplicates by comparing FirstName, LastName, and Email. All Duplicates should be written to the Dupes.csv file, and all Uniques should be written to Deduplicated.csv, but this is currently not happening..
Example:
If row A shows up in Orginal.csv 10 times, the code writes A1 to deduplicated.csv, and it writes A2 - A10 to dupes.csv.
This is incorrect. A1-A10 should ALL be written to the dupes.csv file, leaving only unique rows in deduplicated.csv.
Another strange behavior is that A2-A10 are all getting written to dupes.csv TWICE!
I would really appreciate any and all feedback as this is my first professional python script and I'm feeling pretty disheartened.
Here is my code:
import csv
def read_csv(filename):
the_file = open(filename, 'r', encoding='latin1')
the_reader = csv.reader(the_file, dialect='excel')
table = []
#As long as the table row has values we will add it to the table
for row in the_reader:
if len(row) > 0:
table.append(tuple(row))
the_file.close()
return table
def create_file(table, filename):
join_file = open(filename, 'w+', encoding='latin1')
for row in table:
line = ""
#build up the new row - don't comma on last item so add last item separate
for i in range(len(row)-1):
line += row[i] + ","
line += row[-1]
#adds the string to the new file
join_file.write(line+'\n')
join_file.close()
def main():
original = read_csv('Contact.csv')
print('finished read')
#hold duplicate values
dupes = []
#holds all of the values without duplicates
dedup = set()
#pairs to know if we have seen a match before
pairs = set()
for row in original:
#if row in dupes:
#dupes.append(row)
if (row[4],row[5],row[19]) in pairs:
dupes.append(row)
else:
pairs.add((row[4],row[5],row[19]))
dedup.add(row)
print('finished first parse')
#go through and add in one more of each duplicate
seen = set()
for row in dupes:
if row in seen:
continue
else:
dupes.append(row)
seen.add(row)
print ('writing files')
create_file(dupes, 'duplicate_leads.csv')
create_file(dedup, 'deduplicated_leads.csv')
if __name__ == '__main__':
main()
You should look into the pandas module for this, it will be extremely fast, and much easier than rolling your own.
import pandas as pd
x = pd.read_csv('Contact.csv')
duplicates = x.duplicated(['row4', 'row5', 'row19'], keep = False)
#use the names of the columns you want to check
x[duplicates].to_csv('duplicates.csv') #write duplicates
x[~duplicates].to_csv('uniques.csv') #write uniques

Creating Individual Rows with based on a cell value in a column

I am looking to take a CSV file and sort the file using python 2.7 to get an individual value based on two columns for a block and lot. My data looks like now in the link below:
Beginning
I want to be able on the lot value to create extra lines using Python to automate this into a new CSV where the values will look like this when drawn out on the new CSV
End Result
So I know that I need read the row and the column and based on the cell value for the lot column if there is a "," then the row will be copied to the next row in the other csv and all the values before the first column will be copied only and then the second, third etc.
After the Commas are separated out, then the ranges will be managed in a similar way in a third CSV. If there is a single value, the whole row will be copied as is.
Thank you for the help in advanced.
This should work.
On Windows open files in binary mode or else you get double new lines.
I assumed rows are separated by ; because cells contains ,
First split by ,, then check for ranges
print line is for debugging
Error checking is left as an exercise for the reader.
Code:
import csv
file_in = csv.reader(open('input.csv', 'rb'), delimiter=';')
file_out = csv.writer(open('output.csv', 'wb'), delimiter=';')
for i, line in enumerate(file_in):
if i == 0:
# write header
file_out.writerow(line)
print line
continue
for j in line[1].split(','):
if len(j.split('-')) > 1:
# lines with -
start = int(j.split('-')[0])
end = int(j.split('-')[1])
for k in xrange(start, end + 1):
line[1] = k
file_out.writerow(line)
print line
else:
# lines with ,
line[1] = j
file_out.writerow(line)
print line

Finding Row with No Empty Strings

I am trying to determine the type of data contained in each column of a .csv file so that I can make CREATE TABLE statements for MySQL. The program makes a list of all the column headers and then grabs the first row of data and determines each data type and appends it to the column header for proper syntax. For example:
ID Number Decimal Word
0 17 4.8 Joe
That would produce something like CREATE TABLE table_name (ID int, Number int, Decimal float, Word varchar());.
The problem is that in some of the .csv files the first row contains a NULL value that is read as an empty string and messes up this process. My goal is to then search each row until one is found that contains no NULL values and use that one when forming the statement. This is what I have done so far, except it sometimes still returns rows that contains empty strings:
def notNull(p): # where p is a .csv file that has been read in another function
tempCol = next(p)
tempRow = next(p)
col = tempCol[:-1]
row = tempRow[:-1]
if any('' in row for row in p):
tempRow = next(p)
row = tempRow[:-1]
else:
rowNN = row
return rowNN
Note: The .csv file reading is done in a different function, whereas this function simply uses the already read .csv file as input p. Also each row is ended with a , that is treated as an extra empty string so I slice the last value off of each row before checking it for empty strings.
Question: What is wrong with the function that I created that causes it to not always return a row without empty strings? I feel that it is because the loop is not repeating itself as necessary but I am not quite sure how to fix this issue.
I cannot really decipher your code. This is what I would do to only get rows without the empty string.
import csv
def g(name):
with open('file.csv', 'r') as f:
r = csv.reader(f)
# Skip headers
row = next(r)
for row in r:
if '' not in row:
yield row
for row in g('file.csv'):
print('row without empty values: {}'.format(row))

Write last three entries per name in a file

I have the following data in a file:
Sarah,10
John,5
Sarah,7
Sarah,8
John,4
Sarah,2
I would like to keep the last three rows for each person. The output would be:
John,5
Sarah,7
Sarah,8
John,4
Sarah,2
In the example, the first row for Sarah was removed since there where three later rows. The rows in the output also maintain the same order as the rows in the input. How can I do this?
Additional Information
You are all amazing - Thank you so much. Final code which seems to have been deleted from this post is -
import collections
with open("Class2.txt", mode="r",encoding="utf-8") as fp:
count = collections.defaultdict(int)
rev = reversed(fp.readlines())
rev_out = []
for line in rev:
name, value = line.split(',')
if count[name] >= 3:
continue
count[name] += 1
rev_out.append((name, value))
out = list(reversed(rev_out))
print (out)
Since this looks like csv data, use the csv module to read and write it. As you read each line, store the rows grouped by the first column. Store the line number along with the row so that they can be written out maintaining the same order as the input. Use a bound deque to keep only the last three rows for each name. Finally, sort the rows and write them out.
import csv
by_name = defaultdict(lambda x: deque(x, maxlen=3))
with open('my_data.csv') as f_in
for i, row in enumerate(csv.reader(f_in)):
by_name[row[0]].append((i, row))
# sort the rows for each name by line number, discarding the number
rows = sorted(row[1] for value in by_name.values() for row in value, key=lambda row: row[0])
with open('out_data.csv', 'w') as f_out:
csv.writer(f_out).writerows(rows)

Categories