I'm trying to make a function that, given input from the User, can map input to a list of strings in a text file, and return some integer corresponding to the string in the file. Essentially, I check if what the user input is in the file and return the index of the matching string in the file. I have a working function, but it seems slow and error-prone.
def parseInput(input):
Gates = []
try:
textfile = open("words.txt")
while nextLine:
nextLine = textfile.readline()
Gates[n] = nextLine #increment n somewhere
finally:
textfile.close()
while n <= len(Gates):
nextString = Gates[n]
if input in nextString:
#Exit loop
with open("wordsToInts.txt") as textfile:
#Same procedure as the try loop(why isn't that one a with loop?)
if(correct):
return number
This seems rather... bad. I just can't seem to think of a better way to do this though. I have full control over words.txt and wordsToInts.txt(should I combine these?), so I can format them as I please. I'm looking for suggestions re: the function itself, but if a change to the text files would help, I would like to know. My goal is to reduce cause for error, but I will add error checking later. Please, suggest a better way to write this function. If writing in code, please use Python. Pseudocode is fine, however.
I would say to combine the files. You can have your words, and their corresponding values as follows:
words.txt
string1|something here
string2|something here
Then you can store each line as an entry to a dictionary and recall the value based on your input:
def parse_input(input):
word_dict = {}
with open('words.txt') as f:
for line in f.readlines():
line_key, line_value = line.split('|', 1)
word_dict[line_key] = line_value.rstrip('\n')
try:
return word_dict[input]
except KeyError:
return None
I'm trying to make a function that, given input from the User, can map input to a list of strings in a text file, and return some integer corresponding to the string in the file. Essentially, I check if what the user input is in the file and return the index of the matching string in the file
def get_line_number(input):
"""Finds the number of the first line in which `input` occurs.
If input isn't found, returns -1.
"""
with open('words.txt') as f:
for i, line in enumerate(f):
if input in line:
return i
return -1
This function will meet the specification from your description with the additional assumption that the string you care about are on separate lines. Notable things:
File objects in Python act as iterators over the lines of their contents. You don't have to read the lines into a list if all you need to do is check each individual line.
The enumerate function takes an iterator and returns a generator which yields a tuple like (index, element), where element is an element in your iterator and index is its position inside the iterator.
The term iterator means any object that's a sequence of things you can access in a for loop.
The term generator means an object which generates elements to iterate through "on-the-fly". What this means in this case is that you can access each line of a file one by one without having to load the entire file into your machine's memory.
This function is written in the standard Pythonic style, with a docstring, appropriate casing on variable names, and a descriptive name.
Related
I have found a fix for the following problem, however I'd like to understand why my below code creates a list of strings within a list, i.e. has this list of strings as the only element in an outer list.
I have a .txt file which I'm reading in which consists of about 25 sentences. It is just one long paragraph and so I wanted to split it into sentences, delimited by a full stop.
I initially used this code to perform this step:
file = open("love_life.txt", "r")
list_of_sentences = []
for line in file:
new = line.split('.')
list_of_sentences.append(new)
file.close()
print(list_of_sentences)
I expected that this would create a list of strings, with each string representing a sentence delimited by a full stop. But instead, although it indeed created a list of strings/sentences it did so enclosed within another list. So when I tried to iterate over the list, I was just iterating one time over the nested list. Like this output:
[["lifeguards save lives", "time is of the essence", "the wind blows where it wants"]]
Can anyone tell me why this is happening with this code?
It's because line.split('.') returns a list, and list_of_sentences.append(new) adds that list to list_of_sentences. Maybe you meant to use list_of_sentences.extend(new) instead? That would add each element of new to list_of_sentences.
You should use extend(), if you dont want to end up with a list of list.
file = open("love_life.txt", "r")
list_of_sentences = []
for line in file:
new = line.split('.')
list_of_sentences.append(new)
file.close()
print(list_of_sentences)
I am trying to extract certain string of data from a text file.
The code I use is the following. I want to read the particular string(all actions) from that text file and then store it in an array or list if it is found. and then display in the same order.
import string
solution_path = "/homer/my_dir/solution_detail.txt"
solution = open(solution_path).read()
all_actions = ['company_name','email_address','full_name']
n = 0
sequence_array = []
for line in solution:
for action in all_actions:
if action in line:
sequence_array[n] = action
n = n+1
for x in range(len(sequence_array)):
print (sequence_array[x])
But this code does not do anything but runs without any error.
There are multiple problems with the code.
.read() on a file produces a single string. As a result, for line in solution: iterates over each character of the file's text, not over each line. (The name line is not special, in case you thought it was. The iteration depends only on what is being iterated over.) The natural way to get lines from the file is to loop over the file itself, while it is open. To keep the file open and make sure it closes properly, we use a with block.
You may not simply assign to sequence_array[n] unless the list is already at least n+1 elements long. (The reason you don't get an error from this is because if action in line: is never true, because of the first point.) Fortunately, we can simply .append to the end of the list instead.
If the line contains multiple of the all_actions, it would be stored multiple times. This is probably not what you want to happen. The built-in any function makes it easier to deal with this problem; we can supply it with a generator expression for an elegant solution. But if your exact needs are different then of course there are different approaches.
While the last loop is okay in theory, it is better to loop directly, the same way you attempt to loop over solution. But instead of building up a list, we could instead just print the results as they are found.
So, for example:
with open(solution_path) as solution:
for line in solution:
if any(action in line for action in all_actions):
print(line)
What is happening is that solution contains all the text inside the file. Therefore when you are iterating for line in solution you are actually iterating over each and every character separately, which is why you never get any hits.
try the following code (I can't test it since I don't have you're file)
solution_path = "/homer/my_dir/solution_detail.txt"
all_actions = ['company_name','email_address','full_name']
sequence_array = []
with open(solution_path, 'r') as f:
for line in f.readlines():
for action in all_actions:
if action in line:
sequence_array.append(action)
This will collect all the actions in the documents. if you want to print all of them
for action in sequence_array:
print(action)
I have searched and cannot find the answer to this even though I am sure it is already out there. I am very new to python but I have done this kind of stuff before in other languages, I am reading in line form a data file and I want to store each line of data in it's own tuple to be accessed outside the for loop.
tup(i) = inLine
where inLine is the line from the file and tup(i) is the tuple it's stored in. i increases as the loop goes round. I can then print any line using something similar to
print tup(100)
Creating a tuple as you describe in a loop isn't a great choice, as they are immutable.
This means that every time you added a new line to your tuple, you're really creating a new tuple with the same values as the old one, plus your new line. This is inefficient, and in general should be avoided.
If all you need is refer to the lines by index, you could use a list:
lines = []
for line in inFile:
lines.append(line)
print lines[3]
If you REALLY need a tuple, you can cast it after you're done:
lines = tuple(lines)
Python File Object supports a method called file.readlines([sizehint]), which reads the entire file content and stores it as a list.
Alternatively, you can pass the file iterator object through tuple to create a tuple of lines and index it in the manner you want
#This will create a tuple of file lines
with open("yourfile") as fin:
tup = tuple(fin)
#This is a straight forward way to create a list of file lines
with open("yourfile") as fin:
tup = fin.readlines()
I have a file which has about 25000 lines, and it's a s19 format file.
each line is like: S214 780010 00802000000010000000000A508CC78C 7A
There are no spaces in the actual file, the first part 780010 is the address of this line, and I want it to be a dict's key value, and I want the data part 00802000000010000000000A508CC78C be the value of this key. I wrote my code like this:
def __init__(self,filename):
infile = file(filename,'r')
self.all_lines = infile.readlines()
self.dict_by_address = {}
for i in range(0, self.get_line_number()):
self.dict_by_address[self.get_address_of_line(i)] = self.get_data_of_line(i)
infile.close()
get_address_of_line() and get_data_of_line() are all simply string slicing functions. get_line_number() iterates over self.all_lines and returns an int
problem is, the init process takes me over 1 min, is the way I construct the dict wrong or python just need so long to do this?
And by the way, I'm new to python:) maybe the code looks more C/C++ like, any advice of how to program like python is appreciated:)
How about something like this? (I made a test file with just a line S21478001000802000000010000000000A508CC78C7A so you might have to adjust the slicing.)
>>> with open('test.test') as f:
... dict_by_address = {line[4:10]:line[10:-3] for line in f}
...
>>> dict_by_address
{'780010': '00802000000010000000000A508CC78C'}
This code should be tremendously faster than what you have now. EDIT: As #sth pointed out, this doesn't work because there are no spaces in the actual file. I'll add a corrected version at the end.
def __init__(self,filename):
self.dict_by_address = {}
with open(filename, 'r') as infile:
for line in infile:
_, key, value, _ = line.split()
self.dict_by_address[key] = value
Some comments:
Best practice in Python is to use a with statement, unless you are using an old Python that doesn't have it.
Best practice is to use open() rather than file(); I don't think Python 3.x even has file().
You can use the open file object as an iterator, and when you iterate it you get one line from the input. This is better than calling the .readlines() method, which slurps all the data into a list; then you use the data one time and delete the list. Since the input file is large, that means you are probably causing swapping to virtual memory, which is always slow. This version avoids building and deleting the giant list.
Then, having created a giant list of input lines, you use range() to make a big list of integers. Again it wastes time and memory to build a list, use it once, then delete the list. You can avoid this overhead by using xrange() but even better is just to build the dictionary as you go, as part of the same loop that is reading lines from the file.
It might be better to use your special slicing functions to pull out the "address" and "data" fields, but if the input is regular (always follows the pattern of your example) you can just do what I showed here. line.split() splits the line on white space, giving a list of four strings. Then we unpack it into four variables using "destructuring assignment". Since we only want to save two of the values, I used the variable name _ (a single underscore) for the other two. That's not really a language feature, but it is an idiom in the Python community: when you have data you don't care about you can assign it to _. This line will raise an exception if there are ever any number of values other than 4, so if it is possible to have blank lines or comment lines or whatever, you should add checks and handle the error (at least wrap that line in a try:/except).
EDIT: corrected version:
def __init__(self,filename):
self.dict_by_address = {}
with open(filename, 'r') as infile:
for line in infile:
key = extract_address(line)
value = extract_data(line)
self.dict_by_address[key] = value
In Python, I'm currently working on a project. I am storing lines from a file as a list. In these lines, I want to delimit the strings by spaces and store the individual words from the line in an object.
Each line contains three "words" spaced apart. I want to store each word as an element in a class object. Since I do not know how many lines the user may have in the input file, these objects will be created in an indeterminate amount.
When I run through the list of lines and "split" them, I get a list within a list. I do not know what to do with the data in that form (without using a for-loop within a for-loop), I'm stuck here.
I already have the object class created with three fields and methods to access those fields. However, I do not know how to access the "list within a list" (in linear time) and delimit the words and easily and quickly create a new object with the words as the parameters.
If anyone could give me advice on where to go from here, I would appreciate it. Thank you.
class MyClass(object):
def __init__(self, word1, word2, word3):
# your initialisation code here
for line in list_of_lines:
words = line.split()
assert len(words) == 3
an_object = MyClass(*words)
do_something_with(an_object)
Update in response to comment:
To get a list of MyClass objects, one for each line in your input file:
with open("my_input_file.txt") as f:
the_list = [MyClass(*line.split()) for line in f]
Why split them before use? Why not split the line when you need it?
Some pseudo codeish
for each line in list
items = line split
object(items)