xlrd cell value returns error - python

I have a spreadsheet with the below structure (Data starts from Column B. Col A is empty)
A B C D
Name city salary
Jennifer Boston 100
Andrew Pittsburgh 1000
Sarah LA 100
Grand Total 1200
I need to filter out the row with the grand total before loading it into the database.
For this, I'm reading the Grand Total as:
import xlrd
import pymssql
#open workbook
book = xlrd.open_workbook("C:\_Workspace\Test\MM.xls")
print( "The number of worksheets is", book.nsheets)
#for each row in xls file loop
#skip last row
last_row = curr_sheet.nrows
print(last_row)
print(curr_sheet.ncols)
skip_val = curr_sheet.cell(last_row,1).value
print( skip_val)
if skip_val == "Grand Total":
last_row = last_row - 1
else:
last_row = last_row
for rx in range(last_row):
print( curr_sheet.row(rx))
However, I'm getting the below error:
Traceback (most recent call last):
File "C:\_Workspace\Test\xldb.py", line 26, in <module>
skip_val = curr_sheet.cell(last_row,1).value
File "c:\Python34\lib\site-packages\xlrd-0.9.3- >py3.4.egg\xlrd\sheet.py", line 399, in cell
self._cell_types[rowx][colx],
IndexError: list index out of range
I'm not able to figure out what is wrong with the syntax above. Hoping someone here can spot why its throwing the error.
Thanks much in advance,
Bee

I think your problem is not accounting for the zero-based index. last_row = curr_sheet.nrows returns the number of rows in the worksheet, so accessing the last row requires:
skip_val = curr_sheet.cell_value(last_row-1, 1)
The first element in Python is indexed by 0, so the first element of a list mylist would be mylist[0]. The last element is not mylist[len(mylist)], instead it's mylist[len(mylist)-1], which should be written as mylist[-1]. You can therefore write the following:
skip_val = curr_sheet.cell_value(-1, 1)

Related

How to iterate through number of rows in csv file and append the values according to that

I need a small help....
Csv file sample :
ID, name, age, city1,city2,city3
1, Andy, 25, "Ann Arbor,NA","CA,NA","MS,NA"
2, Bella, 40, "Los Angeles,NA"
3, Cathy, 13, "Eureka,NA","NV,NA"
My current code :
name=[]
age=[]
with open ('File_name','r') as f:
reader=DictReader(f)
for row in reader:
name.append(row['name'])#append all names
age.append(row['age'])
Now i need to print the cities.There is no certainity that will lived in only 3 cities.While updating data source,there might be more cities....So what i think is Creating a variables by loop .
Method i tried :
Our Column is 3...there is no changes in it..
ID=2 #user requirement
name=[]
age=[]
cities=[]
with open ('File_name','r') as f:
reader=DictReader(f)
for row in reader:
if ID == row['ID']:
name.append(row['name'])#append all names
age.append(row['age'])
Def_Fil=len(row)-3
for i in range(Def_Fil):
city=city.append(row['city+str(i)']) #I dont aware how to declare the row name,i need to iterate manually...i can print name and age...but here depend number of cities i need to declare.
print(name,age,city)
But am facing "SyntaxError: cannot assign to operator"
My expected output:
when i print city of ID 3 : ["Eureka,NA","NV,NA"]
ID 2 : ["Los Angeles,NA"]
ID 1 : ["Ann Arbor,NA","CA,NA","MS,NA"]
If you are not forced to use DictReader you can use pandas instead:
import pandas as pd
csv = pd.read_csv('data.csv', delimiter=',', skipinitialspace=True)
def getCities(ID):
# get the row with the given ID
row = csv[csv['ID'] == ID]
# get the cities columns (all columns but the first 3 'ID, name, age')
cities = row.iloc[:, 3:].values.tolist()[0]
# convert to a list of strings remove nan values
re = [str(x) for x in cities if str(x) != 'nan']
return re
print(getCities(3))
print(getCities(2))
print(getCities(1))
This gives you:
['Eureka,NA', 'NV,NA']
['Los Angeles,NA']
['Ann Arbor,NA', 'CA,NA', 'MS,NA']
Your dataframe looks like this:
print(csv)
ID name age city1 city2 city3
0 1 Andy 25 Ann Arbor,NA CA,NA MS,NA
1 2 Bella 40 Los Angeles,NA NaN NaN
2 3 Cathy 13 Eureka,NA NV,NA NaN
If you want to access all ages or names:
print(csv['age'].values.tolist())
print(csv['name'].values.tolist())
[25, 40, 13]
['Andy', 'Bella', 'Cathy']
If you want to get the age of a person with a specific ID or Name
print(csv[csv['ID'] == 1]['age'].values.tolist()[0])
print(csv[csv['name'] == 'Bella']['age'].values.tolist()[0])
25
40

ValueError error in Python code when reading from CSV file

Hello am supposed to the steps below. I have finished but getting this error
File "C:/Users/User/Desktop/question2.py", line 37, in
jobtype_salary[li['job']] = int(li['salary'])
ValueError: invalid literal for int() with base 10: 'SECRETARY
a. Read the file into a list of lists (14 rows, 5 columns)
b. Transform each row of the list into a dictionary. The keys are : ename, job, salary, comm, dno. Call the resulting list of dictionaries dict_of_emp
c. Display the table dict_of_emp, one row per line
d. Perform the following computations on dict_of_emp:
D1. Compute and print the incomes of Richard and Mary (add salary and comm)
D2 Compute and display the sum of salaries paid to each type of job (i.e. salary paid to analysts is 3500 + 3500= 7000)
D3. Add 5000 to the salaries of employees in department 30. Display the new table
import csv
#Open the file in read mode
f = open("employeeData.csv",'r')
reader = csv.reader(f)
#To read the file into list of lists we use list() method
emps = list(reader)
#print(emps)
#Transform each row into a dictionary.
dict_of_emp = [] #list of dictionaries
for row in emps:
d={}
d['ename'] = row[0]
d['job'] = row[1]
d['salary']=row[2]
d['comm']=row[3]
d['dno']=row[4]
dict_of_emp.append(d)
print("*************************************************")
#display the table dict_of_emp, one row per line.
for li in dict_of_emp:
print(li)
print("*************************************************")
#Incomes of Richard and Mary, to add salary and commision, first we need to cast them to integers.
d1 = ['RICHARD','MARY']
for li in dict_of_emp:
if li['ename'] in d1:
print('income of ', li['ename']," is ",int(li['salary']+li['comm']))
print("*************************************************")
#Sum of salaries based on type of job, dictionary is used so the job type is key
#and sum of salary is value
jobtype_salary = {}
for li in dict_of_emp:
if li['job'] in jobtype_salary.keys():
jobtype_salary[li['job']] += int(li['salary'])
else:
jobtype_salary[li['job']] = int(li['salary'])
print(jobtype_salary)
print("*************************************************")
#Add 5000 to salaries of employees in department 30.
for li in dict_of_emp:
if li['dno']=='30':
li['salary']=int(li['salary'])+5000
for li in dict_of_emp:
print(li)
Here is the csv as an image:
I think the indexing of your columns is slightly off. You do d['salary'] = row[2], which, according to the CSV corresponds with the third row i.e. with the position of the person (SECRETARY, SALESPERSON). If you then try to convert this string to an integer, you get the error.
Does it run with this instead?
for row in emps:
d={}
d['ename'] = row[1]
d['job'] = row[2]
d['salary']=row[3]
d['comm']=row[4]
d['dno']=row[5]
dict_of_emp.append(d)

How to identify string repetition throughout rows of a column in a Pandas DataFrame?

I'm trying to think of a way to best handle this. If I have a data frame like this:
Module---|-Line Item---|---Formula-----------------------------------------|-repetition?|--What repeated--------------------------------|---Where repeated
Module 1-|Line Item 1--|---hello[SUM: hello2]------------------------------|----yes-----|--hello[SUM: hello2]---------------------------|---Module 1 Line item 2
Module 1-|Line Item 2--|---goodbye[LOOKUP: blue123] + hello[SUM: hello2]---|----yes-----|--hello[SUM: hello2], goodbye[LOOKUP: blue123]-|---Module 1 Line item 1, Module 2 Line Item 1
Module 2-|Line Item 1--|---goodbye[LOOKUP: blue123] + some other line item-|----yes-----|--goodbye[LOOKUP: blue123]---------------------|---Module 1 Line item 2
How would I go about setting up a search and find to locate and identify repetition in the middle or on edges or complete strings?
Sorry the formatting looks bad
Basically I have the module, line item, and formula columns filled in, but I need to figure out some sort of search function that I can apply to each of the last 3 columns. I'm not sure where to start with this.
I want to match any repetition that occurs between 3 or more words, including if for example a formula was 1 + 2 + 3 + 4 and that occurred 4 times in the Formula column, I'd want to give a yes to the boolean column "repetition" return 1 + 2 + 3 + 4 on the "Where repeated" column and a list of every module/line item combination where it occurred on the last column. I'm sure I can tweak it more to fit my needs once I get started.
This one was a bit messy, is surely some more straight forward way to do some of the steps, but it worked for your data.
Step 1: I just reset_index() (assuming index uses row numbers) to get row numbers into a column.
df.reset_index(inplace=True)
I then wrote a for loop which aim was to check for each given value, if that value is at any place in the given column (using the .str.contains() function, and if so, where. And then store that information in a dictionary. Note that here I used + to split the various values you search by as that looked to be a valid separator in your dataset, but you can adjust this accordingly
#the dictionary will have a key containing row number and the value we searched for
#the value will contain the module and line item values
result = {}
#create a rownumber variable so we know where in the dataset we are
rownumber = -1
#now we just iterate over every row of the Formula series
for row in df['Formula']:
rownumber +=1
#and also every relevant value within that cell
for value in row.split('+'):
#we clean the value from trailing/preceding whitespace
value = value.strip()
#and then we return our key and value and update our dictionary
key = 'row:|:'+str(rownumber)+':|:'+value
value = (df.loc[((df.Formula.str.contains(value,regex=False))) & (df.index!=rownumber),['Module','Line Item']])
result.update({key:value})
We can now unpack the dictionary into list, where we had a match:
where_raw = []
what_raw = []
rows_raw = []
for key,value in zip(result.keys(),result.values()):
if 'Empty' in str(value):
continue
else:
where_raw.append(list(value['Module']+' '+value['Line Item']))
what_raw.append(key.split(':|:')[2])
rows_raw.append(int(key.split(':|:')[1]))
tempdf = pd.DataFrame({'row':rows_raw,'where':where_raw,'what':what_raw})
tempdf now contains one row per match, however, we want to have one row per original row in the df, so we combine all matches for each main row into one
where = []
what = []
rows = []
for row in tempdf.row.unique():
where.append(list(tempdf.loc[tempdf.row==row,'where']))
what.append(list(tempdf.loc[tempdf.row==row,'what']))
rows.append(row)
result = df.merge(pd.DataFrame({'index':rows,'where':where,'what':what}))
lastly we can now get the result by merging the result with our original dataframe
result = df.merge(pd.DataFrame({'index':rows,'where':where,'what':what}),how='left',on='index').drop('index',axis=1)
and lastly we can add the repeated column like this:
result['repeated'] = (result['what']!='')
print(result)
Module Line Item Formula what where
Module 1 Line Item 1 hello[SUM: hello2] ['hello[SUM: hello2]'] [['Module 1 Line Item 2']]
Module 1 Line Item 2 goodbye[LOOKUP: blue123] + hello[SUM: hello2] ['goodbye[LOOKUP: blue123]', 'hello[SUM: hello2]'] [['Module 2 Line Item 1'], ['Module 1 Line Item 1']]
Module 2 Line Item 1 goodbye[LOOKUP: blue123] + some other line item ['goodbye[LOOKUP: blue123]'] [['Module 1 Line Item 2']]

Searching next item in list if object isn't in the list

I'm attempting to learn how to search csv files. In this example, I've worked out how to search a specific column (date of birth) and how to search indexes within that column to get the year of birth.
I can search for greater than a specific year - e.g. typing in 45 will give me everyone born in or after 1945, but the bit I'm stuck on is if I type in a year not specifically in the csv/list I will get an error saying the year isn't in the list (which it isn't).
What I'd like to do is iterate through the years in the column until the next year that is in the list is found and print anything greater than that.
I've tried a few bits with iteration, but my brain has finally ground to a halt. Here is my code so far...
data=[]
with open("users.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
print(data)
lookup = input("Please enter a year of birth to start at (eg 67): ")
#lookupint = int(lookup)
#searching column 3 eg [3]
#but also searching index 6-8 in column 3
#eg [6:8] being the year of birth within the DOB field
col3 = [x[3][6:8] for x in data]
#just to check if col3 is showing the right data
print(col3)
print ("test3")
#looks in column 3 for 'lookup' which is a string
#in the table
if lookup in col3: #can get rid of this
output = col3.index(lookup)
print (col3.index(lookup))
print("test2")
for k in range (0, len(col3)):
#looks for data that is equal or greater than YOB
if col3[k] >= lookup:
print(data[k])
Thanks in advance!

Python 2.7 - xlrd - Matching A String To a Cell Value

Using Python 2.7 on Mac OSX Lion with xlrd
My problem is relatively simple and straightforward. I'm trying to match a string to an excel cell value, in order to insure that other data, within the row that value will be matched to, is the correct value.
So, say for instance that player = 'Andrea Bargnani' and I want to match a row that looks like this:
Draft Player Team
1 Andrea Bargnani - Toronto Raptors
I do:
num_rows = draftSheet.nrows - 1
cur_row = -1
while cur_row < num_rows:
cur_row += 1
row = draftSheet.row(cur_row)
if row[1] == player:
ranking == row[0]
The problem is that the value of row[1] is text:u'Andrea Bargnani, as opposed to just Andrea Bargnani.
I know that Excel, after Excel 97, is all unicode. But even if I do player = u'Andrea Bargnani' there is still the preceding text:. So I tried player = 'text:'u'Andrea Bargnani', but when the variable is called it ends up looking like u'text: Andrea Bargnani and still does not produce a match.
I would like to then just strip the test: u' off of the returned row[1] value in order to get an appropriate match.
You need to get a value from the cell.
I've created a sample excel file with a text "Andrea Bargnani" in the A1 cell. And here the code explaining the difference between printing the cell and it's value:
import xlrd
book = xlrd.open_workbook("input.xls")
sheet = book.sheet_by_index(0)
print sheet.cell(0, 0) # prints text:u'Andrea Bargnani'
print sheet.cell(0, 0).value # prints Andrea Bargnani
Hope that helps.

Categories