I have two text files with numbers that I want to do some very easy calculations on (for now). I though I would go with Python. I have two file readers for the two text files:
with open('one.txt', 'r') as one:
one_txt = one.readline()
print(one_txt)
with open('two.txt', 'r') as two:
two_txt = two.readline()
print(two_txt)
Now to the fun (and for me hard) part. I would like to loop trough all the numbers in the second text file and then subtract it with the second number in the first text file.
I have done this (extended the coded above):
with open('two.txt') as two_txt:
for line in two_txt:
print line;
I don't know how to proceed now, because I think that the second text file would need to be converted to string in order do make some parsing so I get the numbers I want. The text file (two.txt) looks like this:
Start,End
2432009028,2432009184,
2432065385,2432066027,
2432115011,2432115211,
2432165329,2432165433,
2432216134,2432216289,
2432266528,2432266667,
I want to loop trough this, ignore the Start,End (first line) and then once it loops only pick the first values before each comma, the result would be:
2432009028
2432065385
2432115011
2432165329
2432216134
2432266528
Which I would then subtract with the second value in one.txt (contains numbers only and no Strings what so ever) and print the result.
There are many ways to do string operations and I feel lost, for instance I don't know if the methods to read everything to memory are good or not.
Any examples on how to solve this problem would be very appreciated (I am open to different solutions)!
Edit: Forgot to point out, one.txt has values without any comma, like this:
102582
205335
350365
133565
Something like this
with open('one.txt', 'r') as one, open('two.txt', 'r') as two:
next(two) # skip first line in two.txt
for line_one, line_two in zip(one, two):
one_a = int(split(line_one, ",")[0])
two_b = int(split(line_two, " ")[1])
print(one_a - two_b)
Try this:
onearray = []
file = open("one.txt", "r")
for line in file:
onearray.append(int(line.replace("\n", "")))
file.close()
twoarray = []
file = open("two.txt", "r")
for line in file:
if line != "Start,End\n":
twoarray.append(int(line.split(",")[0]))
file.close()
for i in range(0, len(onearray)):
print(twoarray[i] - onearray[i])
It should do the job!
Related
I have a function gen_rand_index that generates a random group of numbers in list format, such as [3,1] or [3,2,1]
I also have a textfile that that reads something like this:
red $1
green $5
blue $6
How do I write a function so that once python generates this list of numbers, it automatically reads that # line in the text file? So if it generated [2,1], instead of printing [2,1] I would get "green $5, red $1" aka the second line in the text file and the first line in the text file?
I know that you can do print(line[2]) and commands like that, but this won't work in my case because each time I am getting a different random number of a line that I want to read, it is not a set line I want to read each time.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
I have this so far, but I am getting this
error: invalid literal for int() with base 10: '[4, 1]'
I also have gotten
TypeError: string indices must be integers
butI have tried replacing str with int and many things like that but I'm thinking the way I'm just approaching this is wrong. Can anyone help me? (I have only been coding for a couple days now so I apologize in advance if this question is really basic)
Okay, let us first get some stuff out of the way
Whenever you access something from a list the thing you put inside the box brackets [] should be an integer, eg: [5]. This tells Python that you want the 5th element. It cannot ["5"] because 5 in this case would be treated as a string
Therefore the line row = str(result[gen_rand_index]) should actually just be row = ... without the call to str. This is why you got the TypeError about list indices
Secondly, as per your description gen_rand_index would return a list of numbers.
So going by that, why don;t you try this
indices_to_pull = gen_rand_index()
file_handle = open("Foodinventory.txt", 'r')
file_contents = file_handle.readlines() # If the file is small and simle this would work fine
answer = []
for index in indices_to_pull:
answer.append(file_contents[index-1])
Explanation
We get the indices of the file lines from gen_rand_index
we read the entire file into memory using readlines()
Then we get the lines we want, Rememebr to subtract 1 as the list is indexed from 0
The error you are getting is because you're trying to index a string variable (line) with a string index (row). Presumably row will contain something like '[2,3,1]'.
However, even if row was a numerical index, you're not indexing what you think you're indexing. The variable line is a string, and it contains (on any given iteration) one line of the file. Indexing this variable will give you a single character. For example, if line contains green $5, then line[2] will yield 'e'.
It looks like your intent is to index into a list of strings, which represent all the lines of the file.
If your file is not overly large, you can read the entire file into a list of lines, and then just index that array:
with open('file.txt') as fp:
lines = fp.readlines()
print(lines[2]).
In this case, lines[2] will yield the string 'blue $6\n'.
To discard the trailing newline, use lines[2].strip() instead.
I'll go line by line and raise some issues.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
Are you sure it is gen_rand_index and not gen_rand_index()? If gen_rand_index is a function, you should call the function. In the code you have, you are not calling the function, instead you are using the function directly as an index.
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
The correct python idiom for opening a file and reading line by line is
with open("Foodinventory.txt.", "r") as f:
for line in f:
...
This way you do not have to close the file; the with clause does this for you automatically.
Now, what you want to do is to print the lines of the file that correspond to the elements in your variable row. So what you need is an if statement that checks if the line number you just read from the file corresponds to the line number in your array row.
with open("Foodinventory.txt", "r") as f:
for i, line in enumerate(f):
if i == row[i]:
print(line)
But this is wrong: it would work only if your list's elements are ordered. That is not the case in your question. So let's think a little bit. You could iterate over your file multiple times, and each time you iterate over it, print out one line. But this will be inefficient: it will take time O(nm) where n==len(row) and m == number of lines in your file.
A better solution is to read all the lines of the file and save them to an array, then print the corresponding indices from this array:
arr = []
with open("Foodinventory.txt", "r") as f:
arr = list(f)
for i in row:
print(arr[i - 1]) # arrays are zero-indiced
I'm a new coder and am currently trying to write a piece of code that, from an opened txt document, will print out the line number that each piece of information is on.
I've opened the file and striped it of all it's commas. I found online that you can use a function called enumerate() to get the line number. However when I run the code instead of getting numbers like 1, 2, 3 I get information like: 0x113a2cff0. Any idea of how to fix this problem/what the actual problem is? The code for how I used enumerate is below.
my_document = open("data.txt")
readDocument = my_document.readlines()
invalidData = []
for data in readDocument:
stripDocument = data.strip()
if stripDocument.isnumeric() == False:
data = (enumerate(stripDocument))
invalidData.append(data)
First of all, start by opening the document and already reading its content, and it's a good practice to use with, as it closes the document after the use. The readlines function gathers all the lines (this assumes the data.txt file is in the same folder as your .py one:
with open("data.txt") as f:
lines = f.readlines()
After, use enumerate to add index to the lines, so you can read them, use them, or even save the indexes:
for index, line in enumerate(lines):
print(index, line)
As last point, if you have breaklines on your data.txt, the lines will contain a \n, and you can remove them with the line.strip(), if you need.
The full code would be:
with open("data.txt") as f:
lines = f.readlines()
for index, line in enumerate(lines):
print(index, line.strip())
Taking your problem statement:
trying to write a piece of code that, from an opened txt document, will print out the line number that each piece of information is on
You're using enumerate incorrectly as #roganjosh was trying to explain:
with open("data.txt") as my_document:
for i, data in enumerate(my_document):
print(i, data)
The way you're doing it now, you're not removing the commas. The strip() method without arguments only deletes whitespaces leading and trailing the line. If you only want the data, this would work:
invalidData = []
for row_number, data in enumerate(readDocument):
stripped_line = ''.join(data.split(','))
if not stripped_line.isnumeric():
invalidData.append((row_number, data))
You can use the enumerate() function to enumerate a list. This will return a list of tuples containing the index first, then the line string. Like this:
(0, 'first line')
Your readDocument is a list of the lines, so it might be a good idea to name it accordingly.
lines = my_document.readlines()
for i, line in enumerate(lines):
print i, line
I have a large file with the following syntax:
Object 1:
[Useless Data]
com_pos = number number number
[Useless Data]
Object 2:
[Useless Data]
com_pos = number, number, number
[Useless Data]
...
and so on (there's a very large number of objects.).
What I want to do is pick the numbers and put them in another txt file with a specific format (basically a row for each object and a column for each number).
The problem is I have the same com_pos = for every object.
How should I do that? Should I use Regular Expressions?
You have to write some kind of parser for this. You don't need to use regular expressions if you don't understand them. For example, given your two examples, this would work just as well:
with open(path) as f:
for line in f:
columns = line.split()
if columns[0] == 'com_pos' and columns[1] == '=':
numbers = [float(column.rstrip(',')) for column in columns[2:]]
# do something with numbers
Using regular expressions can make things more compact, more efficient, or more robust. For example, consider this:
r = re.compile(r'com_pos\s*=\s*(\d+),?\s*(\d+),?\s*(\d+)')
with open(path) as f:
for line in f:
m = r.search(line)
if m:
numbers = [float(group) for group in m.groups]
# do something with numbers
That will probably run faster, and it's more robust in the face of variable input (a data format that sometimes has commas and sometimes doesn't looks a lot like a human-written fileā¦), and it's simpler if you understand the regexp. But if you don't, it'll be harder to maintain.
com_pos\s*=\s*(\d+),?\s*(\d+),?\s*(\d+)
Debuggex Demo
You can use the following :
with open ('first_file' ,'r') as f1 and open('second_file' ,'w') as f2 :
for line in f1.readlines() :
if 'com_pos' in line :
f2.write(line.split('=')[1])
first you need to find the line that com_pos is in it , then you can split that line with = and write the second splited element that is the numbers in second file .
I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).
There's a text file that I'm reading line by line. It looks something like this:
3
3
67
46
67
3
46
Each time the program encounters a new number, it writes it to a text file. The way I'm thinking of doing this is writing the first number to the file, then looking at the second number and checking if it's already in the output file. If it isn't, it writes THAT number to the file. If it is, it skips that line to avoid repetitions and goes on to the next line. How do I do this?
Rather than searching your output file, keep a set of the numbers you've written, and only write numbers that are not in the set.
Instead of checking output file for the number if it was already written it is better to keep this information in a variable (a set or list). It will save you on disk reads.
To search a file for numbers you need to loop through each line of that file, you can do that with for line in open('input'): loop, where input is the name of your file. On each iteration line would contain one line of input file ended with end of line character '\n'.
In each iteration you should try to convert the value on that line to a number, int() function may be used. You may want to protect yourself against empty lines or non-number values with try statement.
In each iteration having the number you should check if the value you found wasn't already written to the output file by checking a set of already written numbers. If value is not in the set yet, add it and write to the output file.
#!/usr/bin/env python
numbers = set() # create a set for storing numbers that were already written
out = open('output', 'w') # open 'output' file for writing
for line in open('input'): # loop through each line of 'input' file
try:
i = int(line) # try to convert line to integer
except ValueError: # if conversion to integer fails display a warning
print "Warning: cannot convert to number string '%s'" % line.strip()
continue # skip to next line on error
if i not in numbers: # check if the number wasn't already added to the set
out.write('%d\n' % i) # write the number to the 'output' file followed by EOL
numbers.add(i) # add number to the set to mark it as already added
This example assumes that your input file contains single numbers on each line. In case of empty on incorrect line a warning will be displayed to stdout.
You could also use list in the above example, but it may be less efficient.
Instead of numbers = set() use numbers = [] and instead of numbers.add(i): numbers.append(i). The if condition stays the same.
Don't do that. Use a set() to keep track of all the numbers you have seen. It will only have one of each.
numbers = set()
for line in open("numberfile"):
numbers.add(int(line.strip()))
open("outputfile", "w").write("\n".join(str(n) for n in numbers))
Note this reads them all, then writes them all out at once. This will put them in a different order than in the original file (assuming they're integers, they will come out in ascending numeric order). If you don't want that, you can also write them as you read them, but only if they are not already in the set:
numbers = set()
with open("outfile", "w") as outfile:
for line in open("numberfile"):
number = int(line.strip())
if number not in numbers:
outfile.write(str(number) + "\n")
numbers.add(number)
Are you working with exceptionally large files? You probably don't want to try to "search" the file you're writing to for a value you just wrote. You (probably) want something more like this:
encountered = set([])
with open('file1') as fhi, open('file2', 'w') as fho:
for line in fhi:
if line not in encountered:
encountered.add(line)
fho.write(line)
If you want to scan through a file to see if it contains a number on any line, you could do something like this:
def file_contains(f, n):
with f:
for line in f:
if int(line.strip()) == n:
return True
return False
However as Ned points out in his answer, this isn't a very efficient solution; if you have to search through the file again for each line, the running time of your program will increase proportional to the square of the number of numbers.
It the number of values is not incredibly large, it would be more efficient to use a set (documentation). Sets are designed to very efficiently keep track of unordered values. For example:
with open("input_file.txt", "rt") as in_file:
with open("output_file.txt", "wt") as out_file:
encountered_numbers = set()
for line in in_file:
n = int(line.strip())
if n not in encountered_numbers:
encountered_numbers.add(n)
out_file.write(line)