python xlrd string match - python

I couldnt find anything in the API. Is there a way to return the row number or coordinate of a cell based on a string match? For instance: You give the script a string and it scans through the .xls file and when it finds a cell with the matching string, it returns the coordinate or row number.

for i in range(sheet.nrows):
row = sheet.row_values(i)
for j in range(len(row)):
if row[j] == search_value:
return i,j
return None
something like that... just a basic search

You could try the following function, thank you Joran
def look4_xlrd (search_value, sheet) :
lines = []
columns = []
for i in range (sheet.nrows) :
row = sheet.row_values(i)
for j in range(len(row)) :
if row[j] == search_value :
lines.append(i)
columns.append(j)
del row
return lines, columns

Related

Comparison of two elements in different array

My problem:
I am trying to compare two elements from two different arrays but the operator is not working.
Code Snippet in question:
for i in range(row_length):
print(f"ss_record: {ss_record[i]}")
print(f"row: {row[i + 1]}")
#THIS IF STATEMENT IS NOT WORKING
if ss_record[i] == row[i + 1]:
count += 1
#print()
#print(f"row length: {row_length}")
#print(f"count: {count}")
if count == row_length:
print(row[0])
exit(0)
What I have done: I tried to print the value of ss_record and row before it runs through the if statement but when it matches, count doesn't increase. I tried storing the value of row in a new array but it bugs out and only store the array length and first 2 value of row and repeats those values every next instance.
What I think the issue: I think the issue with my code is that row is being read from a CSV file and is not being converted into an integer as a result, it appears they are the same but one is an integer while the other is a string.
Entire Code:
import csv
import sys
import re
from cs50 import get_string
from sys import argv
def main():
line_count = 0
if len(argv) != 3:
print("missing command-line argument")
exit(1)
with open(sys.argv[1], 'r') as database:
sequence = open(sys.argv[2], 'r')
string = sequence.read()
reader = csv.reader(database, delimiter = ',')
for row in reader:
if line_count == 0:
row_length = len(row) - 1
ss_record = [row_length]
for i in range(row_length):
ss_record.append(ss_count(string, row[i + 1], len(row[i + 1])))
ss_record.pop(0)
line_count = 1
else:
count = 0
for i in range(row_length):
print(f"ss_record: {ss_record[i]}")
print(f"row: {row[i + 1]}")
#THIS IF STATEMENT IS NOT WORKING
if ss_record[i] == row[i + 1]:
count += 1
if count == row_length:
print(row[0])
exit(0)
#ss_count mean the # of times the substring appear in the string
def ss_count(string, substring, length):
count = 1
record = 0
pos_array = []
for m in re.finditer(substring, string):
pos_array.append(m.start())
for i in range(len(pos_array) - 1):
if pos_array[i + 1] - pos_array[i] == length:
count += 1
else:
if count > record:
record = count
count = 1
if count > record:
record = count
return record
main()
Values to use to reproduce issue:
sequence (this is a text file) = AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG
substring (this is a csv file) =
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
Gist of the CSV file:
The numbers beside Alice means how many times a substring(STR/Short Tandem Repeat) appears in a row in the string(DNA sequence). In this string, AGATC appears 4 times in a row, AATG appears 1 time in a row, and TATC appears 5 times in a row. For this DNA sequence, it matches Bob and he outputted as the answer.
You were right, when you compare ss_record[i] == row[i + 1]: there is a type problem, the numbers of ss_record are integers while the numbers of the row are strings. You may acknowledge the issue by printing both ss_record and row:
print("ss_record: {}".format(ss_record)) -> ss_record: [4, 1, 5]
print("row: {}".format(row)) -> row: ['Alice', '2', '8', '3']
In order for the snippet to work you just need to change the comparison to
ss_record[i] == int(row[i + 1])
That said, I feel the code is quite complex for the task. The string class implements a count method that returns the number of non-overlapping occurrences of a given substring. Also, since the code it's working in an item basis and relies heavily in index manipulations the iteration logic is hard to follow (IMO). Here's my approach to the problem:
import csv
def match_user(dna_file, user_csv):
with open(dna_file, 'r') as r:
dna_seq = r.readline()
with open(user_csv, 'r') as r:
reader = csv.reader(r)
rows = list(reader)
target_substrings = rows[0][1:]
users = rows[1:]
num_matches = [dna_seq.count(target) for target in target_substrings]
for user in users:
user_matches = [int(x) for x in user[1:]]
if user_matches == num_matches:
return user[0]
return "Not found"
Happy Coding!

I don't understand this code.I want to split him up

I don't quite understand how this paragraph is written.
The source code is as follows.
line = [cell.value for cell in col if cell.value != None]
I want to understand how to write this code.
I tried to use loops, but the results were different.
for cell in col:
if cell.value != None:
line = cell.value
You are quite close. FYI, the one-line syntax is called a list comprehension. Here is the equivalent.
line = list()
for cell in col:
if cell.value != None:
line.append(cell.value)
You're keep overriding the line variable while it should be a list:
line = []
for cell in col:
if cell.value != None:
line.append(cell.value)
As you see, the one-liner has two square brackets around it, so it becomes a list.
You are going in right direction but here line will be an array and each value is appended in the array
so code will look like following
line = []
for cell in col:
if cell.value != None:
line.append(cell.value)
line = [cell.value for cell in col if cell.value != None]
print(line)
line = []
for cell in col:
if cell.value != None:
line.append(cell.value)
print(line)
line = list()
for cell in col:
if cell.value != None:
line.append(cell.value)
print(line)
Translate to an empty list and write the contents as you did and add them to your list by append. I put here print line from me, you can ignore it.

How to find the sum of a certain column in a .txt file in Python?

I have a .txt file with 3 rows and 3 columns of data shown below:
1.5 3.1425 blank
10 12 14
8.2 blank 9.5
I am looking to create a function that allows a user to input a number of either 1,2,or 3 and get the sum of that specified column
The error I receive is as follows:
Traceback (most recent call last):
File "<pyshell#41>", line 1, in <module>
summarizer(2)
File "/Users/"practice.py", line
403, in summarizer
print(sum(float(col2)))
ValueError: could not convert string to float: '.'
I'm just practicing my indexing and am running into trouble when trying to pick a specific column or row to analyze. I have the following code, but get errors pertaining to my index being out of range, or a float object not being iterable
def summarizer(searchNum):
infile = open('nums.txt','r')
fileContents = infile.readlines()
infile.close
newList = []
for numbers in fileContents:
numVals = numbers.split('\t')
for i in range(len(numVals)):
for j in range(0, len(numVals[i])):
newList+=numVals[i][j]
col1 = numVals[i][0]
col2 = numVals[i][1]
col3 = numVals[i][2]
if searchNum == 1:
print (sum(float(col1)))
elif searchNum == 2:
print(sum(float(col2)))
else:
print(sum(float(col3)))
If a user inputs summarizer(3), I would like the output to be 23.5 since 14+9.5+0= 23.5
I put comments on the script. You can create three column lists to collect each value in the corresponding columns. Then sum it at the end.
def summarizer(searchNum):
infile = open('nums.txt','r')
fileContents = infile.readlines()
infile.close
col1, col2, col3 = [], [], [] #initialize the columns
for numbers in fileContents:
numVals = numbers.replace('\n','').split('\t') #also remove newline at the end (\n)
col1.append(float(numVals[0]) if numVals[0] else 0) #convert to float if not blank else 0 then add to col1
col2.append(float(numVals[1]) if numVals[1] else 0)
col3.append(float(numVals[2]) if numVals[2] else 0)
if searchNum == 1:
print(sum(col1))
elif searchNum == 2:
print(sum(col2))
else:
print(sum(col3)) #print the sum of col3
return
Result:
summarizer(3)
23.5
You need to make sure that text file is perfectly formatted with tabs. Then you need to append each row to a list, and split each value by tabs.
Then you need to get rid of 'blanks' and '\n' or whatever other non-numbers.
Then sum them.
This is how I would do it
infile = open('nums.txt','r')
fileContents = infile.readlines()
infile.close
newList = [] # List of lists. Each list is a column
for line in fileContents:
newList.append(line.split('\t'))
# - Blank must be 0. Let's get rid of \n as well
for i in range(len(newList)):
for j in range(len(newList[i])):
if '\n' in newList[i][j]:
newList[i][j] = newList[i][j].replace('\n', '')
try:
newList[i][j] = float(newList[i][j]) # get rid of string entries
except ValueError:
newList[i][j] = 0
sum = 0
if searchNum == 1:
for i in range(len(newList)):
sum += newList[i][0]
if searchNum == 2:
for i in range(len(newList)):
sum += newList[i][1]
if searchNum == 3:
for i in range(len(newList)):
sum += newList[i][2]
print(sum)
Explanation of the "could not convert string to float: '.' " error:
col2 variable has a string "blank" (which is not a integer) .
When you apply float on a string which is not a integer ( in our case float(col2)) it throws the error which u mentioned.
What your code actually does:
1.It creates a n*n 2d array and puts all the elements from textfile to the 2d array.
2.You assign the last element in each column to variable col1,col2,col3
3.You apply sum operation on the last element in each column
What you were trying to do :
1.Create a n*n 2d array and puts all the elements from textfile to the 2d array.
2.Apply sum operation on each column element and display the result:
So ur code is not actually doing what you wanted to do.
I have written the below code which does wat u actually intended to do
Solution Code
def summarizer(searchNum):
infile = open('nums.txt','r')
fileContents = infile.readlines()
infile.close
newList = []
for numbers in fileContents:
# - replace the "blank" string and with 0 and makes every instance
#- float type
numbers =numbers.replace("blank","0").replace('\n','').split('\t')
# - creates the 2d array of the items from you text file
for i in range(1,len(numbers)+1):
newList[i].extend(float(numbers[i-1]))
# - prints the sum based on column index u wanted
print(sum(newList(searchNum)))
You can do this easier by using the csv library
https://docs.python.org/2/library/csv.html

Efficiently update columns based on one of the columns split value

So here is my code updating many column values based on a condition of split values of the column 'location'. The code works fine, but as its iterating by row it's not efficient enough. Can anyone help me to make this code work faster please?
for index, row in df.iterrows():
print index
location_split =row['location'].split(':')
after_county=False
after_province=False
for l in location_split:
if l.strip().endswith('ED'):
df[index, 'electoral_district'] = l
elif l.strip().startswith('County'):
df[index, 'county'] = l
after_county = True
elif after_province ==True:
if l.strip()!='Ireland':
df[index, 'dublin_postal_district'] = l
elif after_county==True:
df[index, 'province'] = l.strip()
after_province = True
'map' was what I needed :)
def fill_county(column):
res = ''
location_split = column.split(':')
for l in location_split:
if l.strip().startswith('County'):
res= l.strip()
break
return res
df['county'] = map(fill_county, df['location'])

How to find row number of a particular value with known column number in csv file through python

I am working on a problem to create a function find_row with three input parameters - file name, col_number and value. I want output like in given example:
For example, if we have a file a.csv:
1, 1.1, 1.2
2, 2.1, 2.2
3
4, 4.1, 4.2
then
print(find_row('a.csv', 0, 4)) would print 3,
print(find_row('a.csv', 2, 2.2)) would print 1, and
print(find_row('a.csv', 0, 100)) would print None.
The code I tried is :
import csv
def find_row(filename,col_number,value):
var = str(value)
coln = str(col_number)
o = open(filename, 'r')
myData = csv.reader(o)
index = 0
for row in myData:
if row[col_number] == var:
return index
else :
index+=1
print find_row('a.csv',2,2.2)
It is throwing error :
File "C:/Users/ROHIT SHARMA/Desktop/1.py", line 17, in find_row
if row[col_number] == var:
IndexError: list index out of range
I understand the error now, but not able to improve the code. Any help here guys??!
Thanks.
In your CSV file, the 3rd row has only one column, so 2 is not a valid index.
As an aside, it's cleaner to do
for index, row in enumerate(myData):
if row[col_number] == var:
return index
Edit: Also, that CSV is going to give you problems. It can't find '2.2' because it actually returns ' 2.2'. Strip the spaces when you read or make sure the CSV is saved the "correct" way (no spaces between comma and content).
Edit2: If you MUST have a CSV with unequal rows, this will do the trick:
for index, row in enumerate(myData):
try:
if row[col_number] == var:
return index
except IndexError:
pass

Categories