Calculating with data from a txt file in python

Calculating with data from a txt file in python - python

Detailed:
I have a set of 200 or so values in a txt file and I want to select the first value b[0] and then go through the list from [1] to [199] and add them together.
So, [0]+[1]
if that's not equal to a certain number, then it would go to the next term i.e. [0]+[2] etc etc until it's gone through every term. Once it's done that it will increase b[0] to b[1] and then goes through all the values again
Step by step:
Select first number in list.
Add that number to the next number
Check if that equals a number
If it doesn't, go to next term and add to first term
Iterate through these until you've gone through all terms/ found
a value which adds to target value
If gone through all values, then go to the next term for the
starting add value and continue
I couldn't get it to work, if anyone can maybe provide a solution or give some advice? Much appreciated. I've tried looking at videos and other stack overflow problems but I still didn't get anywhere. Maybe I missed something, let me know! Thank you! :)
I've attempted it but gotten stuck. This is my code so far:
b = open("data.txt", "r")
data_file = open("data.txt", "r")
for i, line in enumerate(data_file):
if (i+b)>2020 or (i+b)<2020:
b=b+1
else:
print(i+b)
print(i*b)
Error:
Traceback (most recent call last):
File "c:\Users\███\Desktop\ch1.py", line 11, in <module>
if (i+b)>2020 or (i+b)<2020:
TypeError: unsupported operand type(s) for +: 'int' and '_io.TextIOWrapper'
PS C:\Users\███\Desktop>

I would read the file into an array and then convert it into ints before
actually dealing with the problem. files are messy and the less we have to deal with them the better
with open("data.txt", "r") as data_file:
lines = data_file.readlines() # reads the file into an array
data_file.close
j = 0 # you could use a better much more terse solution but this is easy to understand
for i in lines:
lines[j] = int(i.strip().replace("\n", ""))
j += 1
i, j = 0
for i in lines: # for every value of i we go through every value of j
# so it would do x = [0] + [0] , [0] + [1] ... [1] + [0] .....
for j in lines:
x = j + i
if x == 2020:
print(i * j)

Here are some things that you can fix.
You can't add the file object b to the integer i. You have to convert the lines to int by using something like:
integer_in_line = int(line.strip())
Also you have opened the same file twice in read mode with:
b = open("data.txt", "r")
data_file = open("data.txt", "r")
Opening it once is enough.
Make sure that you close the file after you used it:
data_file.close()
To compare each number in the list with each other number in the list you'll need to use a double for loop. Maybe this works for you:
certain_number = 2020
data_file = open("data.txt", "r")
ints = [int(line.strip()) for line in data_file] # make a list of all integers in the file
for i, number_at_i in enumerate(ints): # loop over every integer in the list
for j, number_at_j in enumerate(ints): # loop over every integer in the list
if number_at_i + number_at_j == certain_number: # compare the integers to your certain number
print(f"{number_at_i} + {number_at_j} = {certain_number}")
data_file.close()

Your problem is the following: The variables b and data_file are not actually the text that you are hoping they are. You should read something about reading text files in python, there are many tutorials on that.
When you call open("path.txt", "r"), the open function returns a file object, not your text. If you want the text from the file, you should either call read or readlines. Also it is important to close your file after reading the content.
data_file = open("data.txt", "r") # b is a file object
text = data_file.read() # text is the actual text in the file in a single string
data_file.close()
Alternatively, you could also read the text into a list of strings, where each string represents one line in the file: lines = data_file.readlines().
I assume that your "data.txt" file contains one number per line, is that correct? In that case, your lines variable will be a list of the numbers, but they will be strings, not integers or floats. Therefore, you can't simply use them to perform calculation. You would need to call int() on them.
Here is an example how to do it. I assumed that your textfile looks like this (with arbitary numbers):
1
2
3
4
...
file = open("data.txt", "r")
lines = file.readlines()
file.close()
# This creates a new list where the numbers are actual numbers and not strings
numbers = []
for line in lines:
numbers.append(int(line))
target_number = 2020
starting_index = 0
found = False
for i in range(starting_index, len(numbers)):
temp = numbers[i]
for j in range(i + 1, len(numbers)):
temp += numbers[j]
if temp == target_number:
print(f'Target number reached by adding nubmers from {i} to {j}')
found = True
break #This stops the inner loop.
if found:
break #This stops the outer loop

Related

Concatenate number into a list of text in Python

I'm trying to concatenate some numbers into a text list using Python. This is my code
import hashlib
def SHA1_hash(string):
hash_obj = hashlib.sha1(string.encode())
return(hash_obj.hexdigest())
with open("/Users/admin/Downloads/Project_files/dictionary.txt") as f:
n = 5
numtext_list = []
for i in range(0,n+1):
for j in f:
numtext = j.strip() + str(i)
numtext_list.append(numtext)
print(numtext_list)
However, it only concatenates the first number (which is 0) to the file elements, and the output list is like this:
'yellow0', 'four0', 'woods0', 'hanging0', 'marching0', 'looking0', 'rouse0', 'lord0', 'sagde0', 'meadows0', 'sinking0', 'foul0', 'bringing0', 'disturb0', 'uttering0', 'scholar0', 'wooden0'
While I want it to also have
'yellow1', 'yellow2', 'yellow3', 'yellow4', 'yellow5','four0',...
as well as other combinations of text and numbers to the list.
Please help me with this, I'm totally new to Python so please excuse me if this is not a good question or I am wrong in writing keywords, thank you so much.

The first time you do for j in f: you read the entire file. So when you get to the next iteration of the for i loop, there's nothing left to read, so the inner loop ends immediately and nothing is appended.
Swap the order of your loops so you only have to read the file once.
for j in f:
word = j.strip()
for i in range(0, n+1):
numtext = f'{j}{i}'
numlist.append(numtext)
Other possible solutions:
Read the file into a list with f.readlines() and then loop over that.
Put f.seek(0) before each for j in f: loop.

How to find whether a integer is between first two columns of a file without using any for loop

I've a file which have integers in first two columns.
File Name : file.txt
col_a,col_b
1001021,1010045
2001021,2010045
3001021,3010045
4001021,4010045 and so on
Now using python, i get a variable var_a = 2002000.
Now how to find the range within which this var_a lies in "file.txt".
Expected Output : 2001021,2010045
I have tried with below,
With open("file.txt","r") as a:
a_line = a.readlines()
for line in a_line:
line_sp = line.split(',')
if var_a < line_sp[0] and var_a > line_sp[1]:
print ('%r, %r', %(line_sp[0], line_sp[1])
Since the file have more than million of record this make it time consuming. Is there any better way to do the same without a for loop.

Since the file have more than million of record this make it time
consuming. Is there any better way to do the same without a for loop.
Unfortunately you have to iterate over all records in file and the only way you can archive that is some kind of for loop. So complexity of this task will always be at least O(n).

It is better to read your file linewise (not all into memory) and store its content inside ranges to look them up for multiple numbers. Ranges store quite efficiently and you only have to read in your file once to check more then 1 number.
Since python 3.7 dictionarys are insert ordered, if your file is sorted you will only iterate your dictionary until the first time a number is in the range, for numbers not all all in range you iterate the whole dictionary.
Create file:
fn = "n.txt"
with open(fn, "w") as f:
f.write("""1001021,1010045
2001021,2010045
3001021,3010045
garbage
4001021,4010045""")
Process file:
fn = "n.txt"
# read in
data = {}
with open(fn) as f:
for nr,line in enumerate(f):
line = line.strip()
if line:
try:
start,stop = map(int, line.split(","))
data[nr] = range(start,stop+1)
except ValueError as e:
pass # print(f"Bad data ({e}) in line {nr}")
look_for_nums = [800, 1001021, 3001039, 4010043, 9999999]
for look_for in look_for_nums:
items_checked = 0
for nr,rng in data.items():
items_checked += 1
if look_for in rng:
print(f"Found {look_for} it in line {nr} in range: {rng.start},{rng.stop-1}", end=" ")
break
else:
print(f"{look_for} not found")
print(f"after {items_checked } checks")
Output:
800 not found after 4 checks
Found 1001021 it in line 0 in range: 1001021,1010045 after 1 checks
Found 3001039 it in line 2 in range: 3001021,3010045 after 3 checks
Found 4010043 it in line 5 in range: 4001021,4010045 after 4 checks
9999999 not found after 4 checks
There are better ways to store such a ranges-file, f.e. in a tree like datastructure - research into k-d-trees to get even faster results if you need them. They partition the ranges in a smarter way, so you do not need to use a linear search to find the right bucket.
This answer to Data Structure to store Integer Range , Query the ranges and modify the ranges provides more things to research.

Assuming each line in the file has the correct format, you can do something like following.
var_a = 2002000
with open("file.txt") as file:
for l in file:
a,b = map(int, l.split(',', 1)) # each line must have only two comma separated numbers
if a < var_a < b:
print(l) # use the line as you want
break # if you need only the first occurrence, break the loop now
Note that you'll have to do additional verifications/workarounds if the file format is not guaranteed.
Obviously you have to iterate through all the lines (in the worse case). But we don't load all the lines into memory at once. So as soon as the answer is found, the rest of the file is ignored without reading (assuming you are looking only for the first match).

Reading a numbers off a list from a txt file, but only upto a comma

This is data from a lab experiment (around 717 lines of data). Rather than trying to excell it, I want to import and graph it on either python or matlab. I'm new here btw... and am a student!
""
"Test Methdo","exp-l Tensile with Extensometer.msm"
"Sample I.D.","Sample108.mss"
"Speciment Number","1"
"Load (lbf)","Time (s)","Crosshead (in)","Extensometer (in)"
62.638,0.900,0.000,0.00008
122.998,1.700,0.001,0.00012
more numbers : see Screenshot of more data from my file
I just can't figure out how to read the line up until a comma. Specifically, I need the Load numbers for one of my arrays/list, so for example on the first line I only need 62.638 (which would be the first number on my first index on my list/array).
How can I get an array/list of this, something that iterates/reads the list and ignores strings?
Thanks!
NOTE: I use Anaconda + Jupyter Notebooks for Python & Matlab (school provided software).
EDIT: Okay, so I came home today and worked on it again. I hadn't dealt with CSV files before, but after some searching I was able to learn how to read my file, somewhat.
import csv
from itertools import islice
with open('Blue_bar_GroupD.txt','r') as BB:
BB_csv = csv.reader(BB)
x = 0
BB_lb = []
while x < 7: #to skip the string data
next(BB_csv)
x+=1
for row in islice(BB_csv,0,758):
print(row[0]) #testing if I can read row data
Okay, here is where I am stuck. I want to make an arraw/list that has the 0th index value of each row. Sorry if I'm a freaking noob!
Thanks again!

You can skip all lines till the first data row and then parse the data into a list for later use - 700+ lines can be easily processd in memory.
Therefor you need to:
read the file line by line
remember the last non-empty line before number/comma/dot ( == header )
see if the line is only number/comma/dot, else increase a skip-counter (== data )
seek to 0
skip enough lines to get to header or data
read the rest into a data structure
Create test file:
text = """
""
"Test Methdo","exp-l Tensile with Extensometer.msm"
"Sample I.D.","Sample108.mss"
"Speciment Number","1"
"Load (lbf)","Time (s)","Crosshead (in)","Extensometer (in)"
62.638,0.900,0.000,0.00008
122.998,1.700,0.001,0.00012
"""
with open ("t.txt","w") as w:
w.write(text)
Some helpers and the skipping/reading logic:
import re
import csv
def convert_row(row):
"""Convert one row of data into a list of mixed ints and others.
Int is the preferred data type, else string is used - no other tried."""
d = []
for v in row:
try:
# convert to int && add
d.append(float(v))
except:
# not an int, append as is
d.append(v)
return d
def count_to_first_data(fh):
"""Count lines in fh not consisting of numbers, dots and commas.
Sideeffect: will reset position in fh to 0."""
skiplines = 0
header_line = 0
fh.seek(0)
for line in fh:
if re.match(r"^[\d.,]+$",line):
fh.seek(0)
return skiplines, header_line
else:
if line.strip():
header_line = skiplines
skiplines += 1
raise ValueError("File does not contain pure number rows!")
Usage of helpers / data conversion:
data = []
skiplines = 0
with open("t.txt","r") as csvfile:
skip_to_data, skip_to_header = count_to_first_data(csvfile)
for _ in range(skip_to_header): # skip_to_data if you do not want the headers
next(csvfile)
reader = csv.reader(csvfile, delimiter=',',quotechar='"')
for row in reader:
row_data = convert_row(row)
if row_data:
data.append(row_data)
print(data)
Output (reformatted):
[['Load (lbf)', 'Time (s)', 'Crosshead (in)', 'Extensometer (in)'],
[62.638, 0.9, 0.0, 8e-05],
[122.998, 1.7, 0.001, 0.00012]]
Doku:
re.match
csv.reader
Method of file objekts (i.e.: seek())
With this you now have "clean" data that you can use for further processing - including your headers.
For visualization you can have a look at matplotlib

I would recommend reading your file with python
data = []
with open('my_txt.txt', 'r') as fd:
# Suppress header lines
for i in range(6):
fd.readline()
# Read data lines up to the first column
for line in fd:
index = line.find(',')
if index >= 0:
data.append(float(line[0:index]))
leads to a list containing your data of the first column
>>> data
[62.638, 122.998]
The MATLAB solution is less nice, since you have to know the number of data lines in your file (which you do not need to know in the python solution)
n_header = 6
n_lines = 2 % Insert here 717 (as you mentioned)
M = csvread('my_txt.txt', n_header, 0, [n_header 0 n_header+n_lines-1 0])
leads to:
>> M
M =
62.6380
122.9980
For the sake of clarity: You can also use MATLABs textscan function to achieve what you want without knowing the number of lines, but still, the python code would be the better choice in my opinion.

Based on your format, you will need to do 3 steps. One, read all lines, two, determine which line to use, last, get the floats and assign them to a list.
Assuming you file name is name.txt, try:
f = open("name.txt", "r")
all_lines = f.readlines()
grid = []
for line in all_lines:
if ('"' not in line) and (line != '\n'):
grid.append(list(map(float, line.strip('\n').split(','))))
f.close()
The grid will then contain a series of lists containing your group of floats.
Explanation for fun:
In the "for" loop, i searched for the double quote to eliminate any string as all strings are concocted between quotes. The other one is for skipping empty lines.
Based on your needs, you can use the list grid as you please. For example, to fetch the first line's first number, do
grid[0][0]
as python's list counts from 0 to n-1 for n elements.

This is super simple in Matlab, just 2 lines:
data = dlmread('data.csv', ',', 6,0);
column1 = data(:,1);
Where 6 and 0 should be replaced by the row and column offset you want. So in this case, the data starts at row 7 and you want all the columns, then just copy over the data in column 1 into another vector.
As another note, try typing doc dlmread in matlab - it brings up the help page for dlmread. This is really useful when you're looking for matlab functions, as it has other suggestions for similar functions down the bottom.

IndexError: List index out of range, only when performing operations 2 list items - Python

I have the following code:
infile = open("sitin.txt", "r")
r = int(infile.readline().split()[0])
s = int(infile.readline().split()[1])
totalseats = r * s
print(totalseats)
my input file, "sitin.txt" is a text file with nothing but 10 10. Printing either r or s by itself returns the correct value, and printing either r or s multiplied by 10 returns the correct value, but this code, attempting to multiply r and s returns "IndexError: list index out of range". What is happening here?

you read your input file too many times; you will get r for the first line; s for the second line (and so on)... fix:
infile = open("sitin.txt", "r")
split = infile.readline().split() # read once only!
r = int(split[0])
s = int(split[1])
totalseats = r * s
print(totalseats)
you may need to check that split has the correct form for your input.

Every time your readline() the "cursor" moves to the next line. Thus you should probably only read that line once or shift back offset. Try doing this instead:
infile = open("sitin.txt", "r")
t = infile.readline().split()
r = int(t[0])
s = int(t[1])
totalseats = r * s
print(totalseats)

You appear to be doing readline() twice, which is going to read two lines. Do it once, grab the results, split that. Also, would help to put your file access inside a with, so that the file is automatically closed for you after reading, not left open as a dangling resource. E.g.:
with open("sitin.txt", "r") as infile:
line = infile.readline()
r, s = map(int, line.split())
totalseats = r * s
print(totalseats)

Two simple questions about python

I have 2 simple questions about python:
1.How to get number of lines of a file in python?
2.How to locate the position in a file object to the
last line easily?

lines are just data delimited by the newline char '\n'.
1) Since lines are variable length, you have to read the entire file to know where the newline chars are, so you can count how many lines:
count = 0
for line in open('myfile'):
count += 1
print count, line # it will be the last line
2) reading a chunk from the end of the file is the fastest method to find the last newline char.
def seek_newline_backwards(file_obj, eol_char='\n', buffer_size=200):
if not file_obj.tell(): return # already in beginning of file
# All lines end with \n, including the last one, so assuming we are just
# after one end of line char
file_obj.seek(-1, os.SEEK_CUR)
while file_obj.tell():
ammount = min(buffer_size, file_obj.tell())
file_obj.seek(-ammount, os.SEEK_CUR)
data = file_obj.read(ammount)
eol_pos = data.rfind(eol_char)
if eol_pos != -1:
file_obj.seek(eol_pos - len(data) + 1, os.SEEK_CUR)
break
file_obj.seek(-len(data), os.SEEK_CUR)
You can use that like this:
f = open('some_file.txt')
f.seek(0, os.SEEK_END)
seek_newline_backwards(f)
print f.tell(), repr(f.readline())

Let's not forget
f = open("myfile.txt")
lines = f.readlines()
numlines = len(lines)
lastline = lines[-1]
NOTE: this reads the whole file in memory as a list. Keep that in mind in the case that the file is very large.

The easiest way is simply to read the file into memory. eg:
f = open('filename.txt')
lines = f.readlines()
num_lines = len(lines)
last_line = lines[-1]
However for big files, this may use up a lot of memory, as the whole file is loaded into RAM. An alternative is to iterate through the file line by line. eg:
f = open('filename.txt')
num_lines = sum(1 for line in f)
This is more efficient, since it won't load the entire file into memory, but only look at a line at a time. If you want the last line as well, you can keep track of the lines as you iterate and get both answers by:
f = open('filename.txt')
count=0
last_line = None
for line in f:
num_lines += 1
last_line = line
print "There were %d lines. The last was: %s" % (num_lines, last_line)
One final possible improvement if you need only the last line, is to start at the end of the file, and seek backwards until you find a newline character. Here's a question which has some code doing this. If you need both the linecount as well though, theres no alternative except to iterate through all lines in the file however.

For small files that fit memory,
how about using str.count() for getting the number of lines of a file:
line_count = open("myfile.txt").read().count('\n')

I'd like too add to the other solutions that some of them (those who look for \n) will not work with files with OS 9-style line endings (\r only), and that they may contain an extra blank line at the end because lots of text editors append it for some curious reasons, so you might or might not want to add a check for it.

The only way to count lines [that I know of] is to read all lines, like this:
count = 0
for line in open("file.txt"): count = count + 1
After the loop, count will have the number of lines read.

For the first question there're already a few good ones, I'll suggest #Brian's one as the best (most pythonic, line ending character proof and memory efficient):
f = open('filename.txt')
num_lines = sum(1 for line in f)
For the second one, I like #nosklo's one, but modified to be more general should be:
import os
f = open('myfile')
to = f.seek(0, os.SEEK_END)
found = -1
while found == -1 and to > 0:
fro = max(0, to-1024)
f.seek(fro)
chunk = f.read(to-fro)
found = chunk.rfind("\n")
to -= 1024
if found != -1:
found += fro
It seachs in chunks of 1Kb from the end of the file, until it finds a newline character or the file ends. At the end of the code, found is the index of the last newline character.

Answer to the first question (beware of poor performance on large files when using this method):
f = open("myfile.txt").readlines()
print len(f) - 1
Answer to the second question:
f = open("myfile.txt").read()
print f.rfind("\n")
P.S. Yes I do understand that this only suits for small files and simple programs. I think I will not delete this answer however useless for real use-cases it may seem.

Answer1:
x = open("file.txt")
opens the file or we have x associated with file.txt
y = x.readlines()
returns all lines in list
length = len(y)
returns length of list to Length
Or in one line
length = len(open("file.txt").readlines())
Answer2 :
last = y[-1]
returns the last element of list

Approach:
Open the file in read-mode and assign a file object named “file”.
Assign 0 to the counter variable.
Read the content of the file using the read function and assign it to a
variable named “Content”.
Create a list of the content where the elements are split wherever they encounter an “\n”.
Traverse the list using a for loop and iterate the counter variable respectively.
Further the value now present in the variable Counter is displayed
which is the required action in this program.
Python program to count the number of lines in a text file
# Opening a file
file = open("filename","file mode")#file mode like r,w,a...
Counter = 0
# Reading from file
Content = file.read()
CoList = Content.split("\n")
for i in CoList:
if i:
Counter += 1
print("This is the number of lines in the file")
print(Counter)
The above code will print the number of lines present in a file. Replace filename with the file with extension and file mode with read - 'r'.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calculating with data from a txt file in python - python

Related

Concatenate number into a list of text in Python

How to find whether a integer is between first two columns of a file without using any for loop

Reading a numbers off a list from a txt file, but only upto a comma

IndexError: List index out of range, only when performing operations 2 list items - Python

Two simple questions about python

Categories

Resources