reading input file and summarizing data

reading input file and summarizing data - python

I am working on my assignment for my intro level computing class and i've come across an error that i am unable to understand as to why it occurs.
My goal (at the moment) is to be able to extract information from the input file and store it in such a way where i get 3 values-- animal id, time visited and station.
Here is the input file:
#Comments
a01:01-24-2011:s1
a03:01-24-2011:s2
a03:09-24-2011:s1
a03:10-23-2011:s1
a04:11-01-2011:s1
a04:11-02-2011:s2
a04:11-03-2011:s1
a04:01-01-2011:s1
a02:01-24-2011:s2
a03:02-02-2011:s2
a03:03-02-2011:s1
a02:04-19-2011:s2
a04:01-23-2011:s1
a04:02-17-2011:s1
#comments
a01:05-14-2011:s2
a02:06-11-2011:s2
a03:07-12-2011:s1
a01:08-19-2011:s1
a03:09-19-2011:s1
a03:10-19-2011:s2
a03:11-19-2011:s1
a03:12-19-2011:s2
a04:12-20-2011:s2
a04:12-21-2011:s2
a05:12-22-2011:s1
a04:12-23-2011:s2
a04:12-24-2011:s2
And here is my code so far:
import os.path
def main():
station1={}
station2={}
record=()
items=[]
animal=[]
endofprogram =False
try:
filename1=input("Enter name of input file >")
infile=open(filename1,"r")
filename2=input('Enter name of output file > ')
while(os.path.isfile(filename2)):
filename2=input("File Exists!Enter name again>")
outfile=open(filename2.strip(),"w")
except IOError:
print("File does not exist")
endofprogram=True
if endofprogram==False:
print ('Continuing program')
records=reading(infile)
print('records are > ',records)
def reading(usrinput):
for line in usrinput:
if (len(line) !=0) or (line[0]!='#'):
AnimalID,Timestamp,StationID =line.split()
record= (AnimalID, Timestamp, StationID)
data=data.append(record)
return data
main()
What i am trying to do us to open the file and import the 3 data sets seperated by a' : '
The error i keep recieving is as such:
Continuing programTraceback (most recent call last):
File "C:\Program Files (x86)\Wing IDE 101 5.0\src\debug\tserver\_sandbox.py", line 39, in <module>
File "C:\Program Files (x86)\Wing IDE 101 5.0\src\debug\tserver\_sandbox.py", line 25, in main
File "C:\Program Files (x86)\Wing IDE 101 5.0\src\debug\tserver\_sandbox.py", line 34, in reading
builtins.ValueError: need more than 1 value to unpack
I have tried to switch the term in my reading function to :
AnimalID,Timestamp,StationID =line.split(':') ]
But still nothing.

The issue is len(line) !=0 that is always True. To select non-blank lines that do not start with #, you could:
line = line.strip() # remove leading/trailing whitespace
if line and line[0] != '#':
fields = line.split(':') #NOTE: use ':' delimiter
if len(fields) == 3:
data.append(fields)

I assume Comments is a line in your file. So the first line that your reading function will try to parse is the line that is simply Comments. This will not work because Comments will not create a sequence that is three elements long when you split the line on white space:
AnimalID, Timestamp, StationID = line.split() # won't work with headings
Thanks to the recent formatting of your input file, you could use the above approach if you filter which lines you try to split (that is to say, you ensure that the line you are splitting always has two colons, which would give you three elements). The following approach might illustrate an alternative method that you might use to stimulate thought:
for line in lines: # from the open file
if ':' in line.strip(): # for example; need to distinguish from station visits from headings somehow
print(line.split(':')) # you don't really want to print here, but you should figure out how to store this data
As I say in the comment, you won't really want to print the last line; you want to store it in some data structure. Further, you might find some better way to distinguish between lines with station visits and lines without. I'll leave these items for you to figure out, since I don't want to ruin the rest of the assignment for you.

Related

Outputting the contents of a file in Python

I'm trying to do something simple but having issues.
I want to read in a file and export each word to different columns in an excel spreadsheet. I have the spreadsheet portion, just having a hard time on what should be the simple part.
What I have happening is each character is placed on a new line.
I have a file called server_list. That file has contents as shown below.
Linux RHEL64 35
Linux RHEL78 24
Linux RHEL76 40
I want to read each line in the file and assign each word a variable so I can output it to the spreadsheet.
File = open("server_list", "r")
FileContent = File.readline()
for Ser, Ver, Up value in FileContent:
worksheet.write(row, col, Ser)
worksheet.write(row, col +1, Ver)
worksheet.write(row, col +1, Up)
row += 1
I'm getting the following error for this example
Traceback (most recent call last):
File "excel.py", line 47, in <module>
for Files, Ver, Uptime in FileContent:
ValueError: not enough values to unpack (expected 3, got 1)

FileContent is a string object that contains a single line of your file:
Out[4]: 'Linux RHEL64 35\n'
What you want to do with this string is strip the ending tag \n then split into single words. Only at this point you can do the item assignment that currently leads to your ValueError in your for-statement.
In python this means:
ser, ver, up = line.strip().split() # line is what you called FileContent, I'm allergic to caps in variable names
Now note that this is just one single line we are talking about. Probably you want to do this for all lines in the file, right?
So best is to iterate over the lines:
myfile = "server_list"
with open(myfile, 'r') as fobj:
for row, line in enumerate(fobj):
ser, ver, up = line.strip().split()
# do stuff with row, ser, ver, up
Note that you do not need to track the row yourself you can use the enumerate iterator to do so.
Also note, and this is crucial: the with statement I used here makes sure that you do not leave the file open. Using the with-clause whenever you are working with files is a good habit!

Comparing two text documents and skipping certain lines based off of one text document - Python

I'm working on a Python project. I have a semicolon-plus-newline delimited text file that is being read containing all 50 states (including DC). Thus, each state has its own line terminating in a semicolon (;). An example is below. I also have another file being read in with a LOT of information. The text document can be found here.
I want to skip any line that starts with the state name by testing it against a text file with all fifty states, along with the line below any such line. I do not need this information. Is there a way to test, line by line, if it starts with the state name and, if it matches with one of the fifty states in the other text file, skip that line plus the line below it?
For example, in the hyperlinked text file, line 43 starts with Alaska. I want to skip that line and the line below it. I want to store the rest of the information in an array. When I hit line 244, the information for the next state (Alabama) starts. I want to skip line 244 and the line below that, and do the same thing - store all the information in the array, compiling one large array at the end.
Here are the first four lines of the fifty states file:
Alabama;
Alaska;
Arizona;
Arkansas;
For clarification, the only information I am only interested in is the ICAO data, which is the 3rd column in the hyperlinked text file.
Also, would it be an issue if there is no ICAO information for a specific line? For example, line 63 in the hyperlinked text document does not have a value.
This is the code I have so far:
import numpy as np
#This program reads in the ICAO data file found at: http://weather.rap.ucar.edu/surface/stations.txt
with open('ICAOlist.txt','r') as dataICAO:
icaoData = np.loadtxt(dataICAO, dtype = str, delimiter = ' ', skiprows = 41)
with open('listOfAllStates.txt', 'r') as dataStates:
statesData = np.loadtxt(dataStates, dtype = str, delimiter = ';')

I'm pretty sure this is just a matter of breaking down your concerns. First, you want to load your 'states name file' only once:
# Get all the states as an array
def load_states(statesFile):
with open(statesFile, 'r') as states:
return np.loadtxt(states, dtype = str, delimiter = ';')
Now, we need to go through every line in the ICAO data:
def load_icao_data(state_filename, icao_filename):
states = load_states(state_filename)
with open(icao_filename, 'r') as input:
previous_line = None
for line in input:
if valid_line(line, states) and valid_line(previous_line, states):
process_line(line)
previous_line = line
The two functions you would have to write are valid_line (which should return a bool) and process_line (which should do whatever you need done with the data).
valid_line should take a list of states along with the current line. It would look something like this:
def valid_line(line, states):
if not line or len(line) == 0:
return True # if the line is empty or None
for state in states:
if line.startswith(state):
return False
return True
process_line is left for you to determine. Make sense?
Addendum:
I note in your actual data that state isn't really the thing that determines if a line is 'bad'. You could rewrite valid_line to:
def valid_line(line):
return len(line) > 3 # Eliminates short/empty lines
&& line[0] != '!' # Eliminates 'comment' lines
&& line[2] == ' ' # Eliminates 'state title' lines
&& line[3] != ' ' # Eliminates 'header column' line
Then your load_icao_data becomes:
def load_icao_data(icao_filename):
with open(icao_filename, 'r') as input:
for line in input:
if valid_line(line):
process_line(line)

Splitting a line from a file into different lists

My program is supposed to take input from the user and read a file with the name input. Read file gets saved into a dictionary called portfolio and from there all I have to do is sort each line in the portfolio into keys and values.
Here's my code.
portfolio = {}
portfolio = file_read() #Reads the file through a function
if file_empty(portfolio) == True or None: #nevermind this, it works
print "The file was not found."
else:
print "The file has successfully been loaded"
for line in portfolio:
elements = line.strip().split(",") #separate lists by comma
print elements[0] #using this to check
print elements[1] #if it works at all
All this does is print the first letter in the first line, which is S. And apparently elements[1] is supposed to be the second letter but index is out of range, please enlighten me what might be wrong.
Thank you.

It looks like file_read() is reading the file into a string.
Then for line in portfolio: is iterating through each character in that string.
Then elements = line.strip().split(",") will give you a list containing one character, so trying to get elements[1] is past the bounds of the list.
If you want to read the whole contents of the file into a string called portfolio, you can iterate through each line in the string using
for line in porfolio.split('\n'):
...
But the more usual way of iterating through lines in a file would be
with open(filename,'r') as inputfile:
for line in inputfile:
....

Got it to work with this code:
for line in minfil :
line = line.strip()
elements = line.split(",")
portfolio[str(elements[0])] = [(int(elements[1]),float(elements[2]), str(elements[3]))]

how to Use python to find a line number in a document and insert data underneath it

Hi I already have the search function sorted out:
def searchconfig():
config1 = open("config.php", "r")
b='//cats'
for num, line in enumerate(config1,0):
if b in line:
connum = num + 1
return connum
config1.close()
This will return the line number of //cats, I then need to take the data underneath it put it in a tempoary document, append new data under the //cats and then append the data in the tempoary document to the original? how would i do this? i know that i would have to use 'a' instead of 'r' when opening the document but i do not know how to utilise the line number.

I think, the easiest way would be to read the whole file into a list of strings, work on that list and write it back afterwards.
# Read all lines of the file into a list of strings
with open("config.php", "r") as file:
lines = list(file)
file.close()
# This gets the line number for the first line containing '//cats'
# Note that it will throw an StopIteration exception, if no such line exists...
linenum = (num for (num, line) in enumerate(lines) if '//cats' in line).next()
# insert a line after the line containing '//cats'
lines.insert(linenum+1, 'This is a new line...')
# You could also replace the line following '//cats' like
lines[linenum+1] = 'New line content...'
# Write back the file (in fact this creates a new file with new content)
# Note that you need to append the line delimiter '\n' to every line explicitely
with open("config.php", "w") as file:
file.writelines(line + '\n' for line in lines)
file.close()
Using "a" as mode for open would only let you append ath the end of the file.
You could use "r+" for a combined read/write mode, but then you could only overwrite some parts of the file, there is no simple way to insert new lines in the middle of the file using this mode.

You could do it like this. I am creating a new file in this example as it is usually safer.
with open('my_file.php') as my_php_file:
add_new_content = ['%sNEWCONTENT' %line if '//cat' in line
else line.strip('\n')
for line in my_php_file.readlines()]
with open('my_new_file.php', 'w+') as my_new_php_file:
for line in add_new_content:
print>>my_new_php_file, line

object not iterable with inputfile

I am basically trying to update a line in saved files with new updated number but it leaves only one line in the file. It feels like its overwriting over entire file rather than updating it. I looked at other questions here, and although they gave me right module to use I can't seem to figure out the problem I am having.
unique = 1
for line in fileinput.input('tweet log.txt', inplace=1):
if tweet_id in line: #checks if ID is unique, if it is not, needs to update it
tweet_fields = line.split(';')
old_count = tweet_fields[-2]
new_count = 'retweet=%d' % (int(tweet_retweet))
line = line.replace(old_count, new_count)
print line
unique = 0
if unique == 1: #if previous if didn't find uniqueness, appends the file
save_file = open('tweet log.txt', 'a')
save_file.write('id='+tweet_id +';'+
'timestamp='+tweet_timestamp+';'+
'source='+tweet_source+';'+
'retweet='+tweet_retweet+';'+'\n')
save_file.close()
I feel like this has a very easy solution but I am clearly missing it.
Thanks in advance!

I think the issue you're having is due to your conditional in the loop over the input. When you use fileinput.input with the inplace=1 argument, it renames the original file adding a "backup" extension (by default ".bak") and redirects standard output to a new file with the original name.
Your loop is only printing the line that you're editing. Because of this, all the non-matching lines are getting filtered out of the file. You can fix this by printing each line you iterate over, even if it doesn't match. Here's an altered version of your loop:
for line in fileinput.input('tweet log.txt', inplace=1):
if tweet_id in line:
tweet_fields = line.split(';')
old_count = tweet_fields[-2]
new_count = 'retweet=%d' % (int(tweet_retweet))
line = line.replace(old_count, new_count)
unique = 0
print line
The only change is moving the print line statement out of the if block.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

reading input file and summarizing data - python

The issue is len(line) !=0 that is always True. To select non-blank lines that do not start with #, you could: line = line.strip() # remove leading/trailing whitespace if line and line[0] != '#': fields = line.split(':') #NOTE: use ':' delimiter if len(fields) == 3: data.append(fields)

Related

Outputting the contents of a file in Python

Comparing two text documents and skipping certain lines based off of one text document - Python

Splitting a line from a file into different lists

how to Use python to find a line number in a document and insert data underneath it

object not iterable with inputfile

Categories

Resources