Trying to write python CSV extractor - python

I am complete newbie for programming and this is my first real program I am trying to write.
So I have this huge CSV file (hundreds of cols and thousands of rows) where I am trying to extract only few columns based on value in the field. It works fine and I get nice output, but the problem arises when I am try to encapsulate the same logic in a function.
it returns only first extracted row however print works fine.
I have been playing for this for hours and read other examples here and now my mind is mush.
import csv
import sys
newlogfile = csv.reader(open(sys.argv[1], 'rb'))
outLog = csv.writer(open('extracted.csv', 'w'))
def rowExtractor(logfile):
for row in logfile:
if row[32] == 'No':
a = []
a.append(row[44])
a.append(row[58])
a.append(row[83])
a.append(row[32])
return a
outLog.writerow(rowExtractor(newlogfile))

You are exiting prematurely. When you put return a inside the for loop, return gets called on the first iteration. Which means that only the firs iteration runs.
A simple way to do this would be to do:
def rowExtractor(logfile):
#output holds all of the rows
ouput = []
for row in logfile:
if row[32] == 'No':
a = []
a.append(row[44])
a.append(row[58])
a.append(row[83])
a.append(row[32])
output.append(a)
#notice that the return statement is outside of the for-loop
return output
outLog.writerows(rowExtractor(newlogfile))
You could also consider using yield

You've got a return statement in your function...when it hits that line, it will return (thus terminating your loop). You'd need yield instead.
See What does the "yield" keyword do in Python?

Related

Python continue for loop from file

I have a code that generates characters from 000000000000 to ffffffffffff which are written to a file.
I'm trying to implement a check to see if the program was closed so that I can read from the file, let's say at 00000000781B, and continue for-loop from the file.
The Variable "attempt" in (for attempt in to_attempt:) has tuple type and always starting from zero.
Is it possible to continue the for-loop from the specified value?
import itertools
f = open("G:/empty/last.txt", "r")
lines = f.readlines()
rand_string = str(lines[0])
f.close()
letters = '0123456789ABCDEF'
print(rand_string)
for length in range(1, 20):
to_attempt = itertools.product(letters, repeat=length)
for attempt in to_attempt:
gen_string = rand_string[length:] + ''.join(attempt)
print(gen_string)
You have to store the value on a file to keep track of what value was last being read from. I'm assuming the main for loop running from 000000000000 to ffffffffffff is the to_attempt one. All you need store the value of the for loop in a file. You can use a new variable to keep track of it.
try:
with open('save.txt','r') as reader:
save = int(reader.read())
except FileNotFoundError:
save = 0
#rest of the code
for i in range(save,len(to_attempt)):
with open('save.txt','r') as writer:
writer.write(i)
#rest of the code

how we can use variables, which are described inside the loop, out side the loop in python

Below is my code. Here, I am trying to read the variable gobs(x) from an input file and then I want to use it for other calculations, eg., computing error(x). But, I found, I can read it from input file properly within the loop, but when trying to use it outside the loop, only the first data is getting transferred. For all 100 data, which I read as gobs(x) inside the loop, it is showing the value of last data only, when I am using it outside the loop.
code started below
constant = 99
x0=50
z0=5
def gsyn (x):
return (constant*z0)/(z0**2+(x-x0)**2)
with open ('Grav_H_Cyln_v3_output.txt') as finp:
lines=finp.readlines()
for line in lines:
g=float(line)
x=line
def gobs (x):
return g
print (gobs(x)) # here, gobs(x) is printing properly
def error(x):
return (gsyn(x)-gobs(x))
for i in range (1, 100, 1):
x=i
print (error(x)) # here, only the first value of gobs(x) is coming
print ('stop')
This seems like a very odd solution to what is fundamentally a very simple problem. Make gobs a dictionary so you can set or retrieve gobs[x] at will.
gobs = dict()
with open ('Grav_H_Cyln_v3_output.txt') as finp:
lines=finp.readlines()
for line in lines:
g=float(line)
gobs[line] = g
print (gobs[line])
You could try creating a vector gobs[] outside the loop, and filling it up within the loop over lines.
That should do.
Instead of reassigning the value of x on each iteration of your loop, append i to a list that is declared outside of the if-block scope.
x = []
for i in range (1, 100, 1):
x.append(i)
print(x)

Python - program for searching for relevant cells in excel does not work correctly

I've written a code to search for relevant cells in an excel file. However, it does not work as well as I had hoped.
In pseudocode, this is it what it should do:
Ask for input excel file
Ask for input textfile containing keywords to search for
Convert input textfile to list containing keywords
For each keyword in list, scan the excelfile
If the keyword is found within a cell, write it into a new excelfile
Repeat with next word
The code works, but some keywords are not found while they are present within the input excelfile. I think it might have something to do with the way I iterate over the list, since when I provide a single keyword to search for, it works correctly. This is my whole code: https://pastebin.com/euZzN3T3
This is the part I suspect is not working correctly. Splitting the textfile into a list works fine (I think).
#IF TEXTFILE
elif btext == True:
#Split each line of textfile into a list
file = open(txtfile, 'r')
#Keywords in list
for line in file:
keywordlist = file.read().splitlines()
nkeywords = len(keywordlist)
print(keywordlist)
print(nkeywords)
#Iterate over each string in list, look for match in .xlsx file
for i in range(1, nkeywords):
nfound = 0
ws_matches.cell(row = 1, column = i).value = str.lower(keywordlist[i-1])
for j in range(1, worksheet.max_row + 1):
cursor = worksheet.cell(row = j, column = c)
cellcontent = str.lower(cursor.value)
if match(keywordlist[i-1], cellcontent) == True:
ws_matches.cell(row = 2 + nfound, column = i).value = cellcontent
nfound = nfound + 1
and my match() function:
def match(keyword, content):
"""Check if the keyword is present within the cell content, return True if found, else False"""
if content.find(keyword) == -1:
return False
else:
return True
I'm new to Python so my apologies if the way I code looks like a warzone. Can someone help me see what I'm doing wrong (or could be doing better?)? Thank you for taking the time!
Splitting the textfile into a list works fine (I think).
This is something you should actually test (hint: it does but is inelegant). The best way to make easily testable code is to isolate functional units into separate functions, i.e. you could make a function that takes the name of a text file and returns a list of keywords. Then you can easily check if that bit of code works on its own. A more pythonic way to read lines from a file (which is what you do, assuming one word per line) is as follows:
with open(filename) as f:
keywords = f.readlines()
The rest of your code may actually work better than you expect. I'm not able to test it right now (and don't have your spreadsheet to try it on anyway), but if you're relying on nfound to give you an accurate count for all keywords, you've made a small but significant mistake: it's set to zero inside the loop, and thus you only get a count for the last keyword. Move nfound = 0 outside the loop.
In Python, the way to iterate over lists - or just about anything - is not to increment an integer and then use that integer to index the value in the list. Rather loop over the list (or other iterable) itself:
for keyword in keywordlist:
...
As a hint, you shouldn't need nkeywords at all.
I hope this gets you on the right track. When asking questions in future, it'd be a great help to provide more information about what goes wrong, and preferably enough to be able to reproduce the error.

My function doesn't return a value, can't figure out why

The function is supposed to open a csv file that has data in this format
"polling company,date range,how many polled,margin of error,cruz,kasich,rubio,trump"
When I run this function, read_data_file, there is no output, which I don't understand since I am returning the poll_data. I don't believe there is an issue with the rest of the code as if I replaced 'return poll data' with 'print(poll_data)' there is the desired output.
I am a noob at this and don't have a full grasp of return.
def read_data_file(filename):
file = open(filename, 'r')
poll_data = []
for data in file:
data = data.strip('\n')
data = data.split(',')
poll_data.append(data)
return poll_data
read_data_file('florida-gop.csv')
you changed that last line in the function from print to return. So, when you call your function as such:
read_data_file('florida-gop.csv')
it does return that data. It's sitting right there! But then your script ends, doing nothing with that data. so, instead, do something like this:
data = read_data_file('florida-gop.csv')
print(data)
a short addendum - political data is an excellent way to learn data manipulation with Python and, if so inclined, Python itself. I'd recommend the O'Reilly books on data & Python - but that's outside the scope of this question.
You have two options here:
Replacing return poll_data with print poll_data.
Instead of read_data_file('florida-gop.csv'), you can do print read_data_file('florida-gop.csv').
Why do you need to do this?
Print vs Return
print actually shows you the result, while return only gives the result to the computer, if that makes sense. The computer knows it, but it doesn't print it, which is why the second solution works - the computer has the data you want, and it is able to print it if you command it too. However, in your case, the first solution is probably easier.
Hope this helps!
Continuing from the above(or not so above anymore xD) answer...
the full code would now be,
def read_data_file(filename):
file = open(filename, 'r')
poll_data = []
for data in file:
data = data.strip('\n')
data = data.split(',')
poll_data.append(data)
return poll_data
print(read_data_file('florida-gop.csv')) # Before you forgot to print it.
or exactly like the above answer,
def read_data_file(filename):
file = open(filename, 'r')
poll_data = []
for data in file:
data = data.strip('\n')
data = data.split(',')
poll_data.append(data)
return poll_data
data = read_data_file('florida-gop.csv')
print(data)

Array visibility in python

Simple question: i've got this code:
i want to fetch a row with Dictreader from the csv package, every entry i wanto to cast it float and put it in the data array. At the end of the scanning i want to print the first 10 elements of the array. It gives me error of visibility on the array data.
with open(train, "r") as traincsv:
trainreader = csv.DictReader(traincsv)
for row in trainreader:
data = [float(row['Sales'])]
print(data[:10])
If i put the print inside the for like this
with open(train, "r") as traincsv:
trainreader = csv.DictReader(traincsv)
for row in trainreader:
data = [float(row['Sales'])]
print(data[:10])
It prints all the entries not just 10.
You are overwriting data every time in the for loop. This is the source of your problem.
Please upload an example input for me to try and I will, but I believe what is below will fix your problem, by appending to data instead of overwriting it.
Also, it is good practice to leave the with block as soon as possible.
# Open with block and leave immediately
with open(train, "r") as traincsv:
trainreader = csv.DictReader(traincsv)
# Declare data as a blank list before iterations
data =[]
# Iterate through all of trainreader
for row in trainreader:
data.append([float(row['Sales'])])
# Now it should print first 10 results
print(data[:10])
Ways of appending a list:
data = data + [float(row['Sales'])]
data += [float(row['Sales'])]
data.append([float(row['Sales'])]

Categories