Check OrderedDict based on index in csv? - python

I'm trying to do different checks based on the index number of a dictionary. So, if index == 0, do something, otherwise if index>0, do something else.
I was trying to use OrderedDict and index it based on items(). But if I say, od.items()[0], it just gives me the name of the first element. Not the ability to write an if conditional based on whether the first element has already been checked.
Also I would prefer not to check my conditional based on the actual value in the example.csv file, since it will change daily.
Here is my code and example data in the csv file.
Example.csv
Key_abc, Value894
Key_xyz, Value256
Key_hju, Value_567
Code:
with open('example.csv','rb') as f:
r = csv.reader(f)
od = collections.OrderedDict(r)
for row in od:
if od.items() == 0:
print 'do some checks and run code'
print row, od[row]
elif od.items() > 0:
print 'go through code without checks'
print row, od[row]

Maybe you could do something like this. (The example below is written in python3 syntax).
#ExampleCode
with open('example.csv') as f:
r = csv.reader(f)
od = collections.OrderedDict(r)
for index, row in zip(collections.count(), od):
if index == 0:
print('do some checks and run code')
print(row, od[row])
elif index > 0:
print('go through code without checks')
print(row, od[row])

Related

Dataframe Is No Longer Accessible

I am trying to make my code look better and create functions that do all the work from running just one line but it is not working as intended. I am currently pulling data from a pdf that is in a table into a pandas dataframe. From there I have 4 functions, all calling each other and finally returning the updated dataframe. I can see that it is full updated when I print it in the last method. However I am unable to access and use that updated dataframe, even after I return it.
My code is as follows
def data_cleaner(dataFrame):
#removing random rows
removed = dataFrame.drop(columns=['Unnamed: 1','Unnamed: 2','Unnamed: 4','Unnamed: 5','Unnamed: 7','Unnamed: 9','Unnamed: 11','Unnamed: 13','Unnamed: 15','Unnamed: 17','Unnamed: 19'])
#call next method
col_combiner(removed)
def col_combiner(dataFrame):
#Grabbing first and second row of table to combine
first_row = dataFrame.iloc[0]
second_row = dataFrame.iloc[1]
#List to combine columns
newColNames = []
#Run through each row and combine them into one name
for i,j in zip(first_row,second_row):
#Check to see if they are not strings, if they are not convert it
if not isinstance(i,str):
i = str(i)
if not isinstance(j,str):
j = str(j)
newString = ''
#Check for double NAN case and change it to Expenses
if i == 'nan' and j == 'nan':
i = 'Expenses'
newString = newString + i
#Check for leading NAN and remove it
elif i == 'nan':
newString = newString + j
else:
newString = newString + i + ' ' + j
newColNames.append(newString)
#Now update the dataframes column names
dataFrame.columns = newColNames
#Remove the name rows since they are now the column names
dataFrame = dataFrame.iloc[2:,:]
#Going to clean the values in the DF
clean_numbers(dataFrame)
def clean_numbers(dataFrame):
#Fill NAN values with 0
noNan = dataFrame.fillna(0)
#Pull each column, clean the values, then put it back
for i in range(noNan.shape[1]):
colList = noNan.iloc[:,i].tolist()
#calling to clean the column so that it is all ints
col_checker(colList)
noNan.iloc[:,i] = colList
return noNan
def col_checker(col):
#Going through, checking and cleaning
for i in range(len(col)):
#print(type(colList[i]))
if isinstance(col[i],str):
col[i] = col[i].replace(',','')
if col[i].isdigit():
#print('not here')
col[i] = int(col[i])
#If it is not a number then make it 0
else:
col[i] = 0
Then when I run this:
doesThisWork = data_cleaner(cleaner)
type(doesThisWork)
I get NoneType. I might be doing this the long way as I am new to this, so any advice is much appreciated!
The reason you are getting NoneType is because your function does not have a return statement, meaning that when finishing executing it will automatically returns None. And it is the return value of a function that is assigned to a variable var in a statement like this:
var = fun(x)
Now, a different thing entirely is whether or not your dataframe cleaner will be changed by the function data_cleaner, which can happen because dataframes are mutable objects in Python.
In other words, your function can read your dataframe and change it, so after the function call cleaner is different than before. At the same time, your function can return a value (which it doesn't) and this value will be assigned to doesThisWork.
Usually, you should prefer that your function does only one thing, so expect that the function changes its argument and return a value is usually bad practice.

Can't get python function to return correct values

This is my first post here, but I will try to be short and clear on what I'm trying to solve. This is part of a homework assignment but not the actual assignment.
I'm having problems getting the function below to return the correct answers. I keep getting 0
The csv file is located in the same directory as the python file.
import csv
def count_matches(rows, field, value):
count = 0
for row in rows:
if row[field] == value:
count += 1
return count
with open('hospitals.csv') as f:
reader = csv.DictReader(f)
hospitals_table = list(reader)
print(count_matches(hospitals_table, 'State', 'NY'))
I try hard coding it to see if I could get it to work outside the function (see below), and it works, returning the correct answer of 194 (hospitals in NY from the csv file). What am I doing wrong in the function? Thank you
import csv
with open('hospitals.csv') as f:
reader = csv.DictReader(f)
hospitals_table = list(reader)
count = 0
field = input('Enter field: ')
value_entered = input('Enter state: ')
for row in hospitals_table:
if row[field] == value_entered:
count += 1
print(count)
Your return statement is in the for loop therefore will return count on the first iteration. The loop must be outside or else it will just instantly return.

Detecting a change in a CSV row

I am trying to find a way to detect when string elements in csv file change values. When the value changes, I want the operation of the program to change. I want to read the value in the for loop one step ahead and compare it to the current value. Unfortunately my research has only turn up results that step the for loop ahead by one rather than simply reading the value.
Any help would be appropriated.
import csv
with open("bleh.csv", "r") as bleh:
blehFileReader = csv.reader(bleh, delimiter=',')
next(blehFileReader, None)
for row in blehFileReader:
name = row
nextname = next(blehFileReader)
print(name)
if name != nextname:
print ("name has changed")
Instead of looking at the next name, look at the previous one:
previous_name = None
for row in blehFileReader:
if row != previous_name:
print ("name has changed")
....
previous_name = row

My script run only through 1 "if" on 2

Here is a description of what I want to do :
I've 2 csv files.
I want to search the same thing (lets call it "hugo") in my 2 files at the same time.
The thing is that it only print one and not the other.
Here is my code :
try:
while True:
if pm10pl[j][2] == '"Victor Hugo"':
victor1 = pm10pl[j]
print victor1
if pm25pl[t][2] == '"Victor Hugo"':
victor2= pm25pl[t]
print victor2
j=j+1
t=t+1
except IndexError:
pass
I've tried different things such as elif instead of if, replace t by j, passing by 2 functions. Each if works perfectly when the other is not here, and when I invert the 2 of them, that's the same that print aka pm25pl.
Can't do anything.
(here is only the part of my code that has interest, opening of file etc works fine, the '""' is normal hugo appeared in my file as "hugo" (with the double quote))
Plus, I can't call victor1 and victor2 outside of the if.
Do you have any idea what's going on ?
You can iterate through 2 lists simultaneously using itertool's zip function.
import itertools
l = []
for victor1, victor2 in itertools.izip_longest(pm10pl, pm25pl):
if victor1 and victor1[2] == '"Victor Hugo"':
#print victor1
if victor2 and victor2[2] == '"Victor Hugo"':
#print victor2
l.append((victor1, victor2)) # add the pair to list.
for i in l: # prints all pairs.
print i
Do one list comprehension for each csv file:
[pm10pl[i] for i in range(0,len(pm10pl)) if 'Victor Hugo' in pm10pl[i][2]]
[pm25pl[i] for i in range(0,len(pm25pl)) if 'Victor Hugo' in pm25pl[i][2]]

Parsing a column using openpyxl

I have the following algorithm to parse a column for integer values:
def getddr(ws):
address = []
col_name = 'C'
start_row = 4
end_row = ws.get_highest_row()+1
range_expr = "{col}{start_row}:{col}{end_row}".format(col=col_name, start_row=start_row, end_row=end_row)
for row in ws.iter_rows(range_string=range_expr):
print row
raw_input("enter to continue")
cell = row[0]
if str(cell.value).isdigit:
address.append(cell.value)
else:
continue
return address
This crashes at cell = row[0] saying "IndexError: tuple index out of range", and i dont know what this means. I tried printing out row to see what it contained, but all it gives me is an empty set of parentheses. Anyone know what I'm missing?
That is not so easy to say what is the problem you have, because there are no input data that you are trying to process.
But I can explain what is the reason of the error you've get, and in which direction you must go. The list row contains 0 elements (row = []), because of that you can not say row[0] — there are no row[0]. The first thing you must change is check, how long is your list, and when if it is long enough make other things:
for row in ws.iter_rows(range_string=range_expr):
print row
raw_input("enter to continue")
if len(row) > 0:
cell = row[0]
if str(cell.value).isdigit:
address.append(cell.value)
else:
continue
That is the first step that you must do anyway.

Categories