How to read data like this from a text file? - python

The text file is like
101 # an integer
abcd # a string
2 # a number that indicates how many 3-line structures will there be below
1.4 # some float number
2 # a number indicating how many numbers will there be in the next line
1 5 # 2 numbers
2.7 # another float number
3 # another number
4 2 7 # three numbers
and the output should be like
[101,'abcd',[1.4,[1,5]],[2.7,[4,2,7]]]
I can do it line by line, with readlines(), strip(), int(), and for loop, but I'm not sure how to do it like a pro.
P.S. there can be spaces and tabs and maybe empty lines randomly inserted in the text file. The input was originally intended for C program where it doesn't matter :(
My code:
with open('data','r') as f:
lines = [line.strip('\n') for line in f.readlines()]
i=0
while(i<len(lines)):
course_id = int(lines[i])
i+=1
course_name = lines[i]
i+=1
class_no = int(lines[i])
i+=1
for j in range(class_no):
fav = float(lines[i])
i+=2
class_sched = lines[i].split(" ")
the variables read from the file will be handled afterwards
All those i+='s look absolutely hideous! And it seems to be a long Python program for this sort of task

Related

Proper way of inputting values from text file

I am having hard time to understand how to properly inert values into my variables from txt file. First line is number of test cases, then goes number of houses and then house binary string. Here are my input values:
2 (Number of tests [INT])
3 (Number of houses [INT])
111 (Binary string [String])
6 (Number of houses [INT])
100100 (Binary string [String])
I know we can do like this:
test_cases = int(input())
for i in range(test_cases):
house_number = int(input())
house_string = input()
some_function(int value1, string value2)
But I want to create txt file so I will not type these values every time. I know how to open and read txt file. However can not imagine how can I pass variables.
with open('test.txt') as file:
lines = file.readlines()
for line in lines:
...
As long as your text file is consistent with formatting you can loop over every two elements and turn them into a list of tuples. Note that this code excludes the first element assuming there are complete pairs:
with open('test.txt') as file:
output_lst = []
lines = file.readlines()
for i,k in zip(lines[1::2], lines[2::2]):
output_lst.append((int(i), str(k)))

MemoryError Python, in file 99999999 string

Windows 10 pro 64bit, python installed 64bit version
The file weighs 1,80 gb
How to fix thiss error, and print all string
def count():
reg = open('link_genrator.txt', 'r')
s = reg.readline().split()
print(s)
reg.read().split('\n') will give a list of all lines.
Why don't you just do s = reg.read(65536).splitlines()? This will give you a hint on the structure of the content and you can then play with the size you read in a chunk.
Once you know a bit more, you can try to loop that line an sum up the number of lines
After looking at the answers and trying to understand what the initial question could be I come to more complete answer than my previous one.
Looking at the question and the code in the sample function I assume now following:
is seems he want to separate the contents of a file into words and print them
from the function name I suppose he would like to count all these words
the whole file is quite big and thus Python stops with a memory error
Handling such large files obviously asks for a different treatment than the usual ones. For example, I do not see any use in printing all the separated words of such a file on the console. Of course it might make sense to count these words or search for patterns in it.
To show as an example how one might treat such big files I wrote following example. It is meant as a starting point for further refinements and changes according your own requirements.
MAXSTR = 65536
MAXLP = 999999999
WORDSEP = ';'
lineCnt = 0
wordCnt = 0
lpCnt = 0
fn = 'link_genrator.txt'
fin = open(fn, 'r')
try:
while lpCnt < MAXLP:
pos = fin.tell()
s = fin.read(MAXSTR)
lines = s.splitlines(True)
if len(lines) == 0:
break
# count words of line
k= 0
for l in lines:
lineWords = l.split(WORDSEP)# semi-colon separates each word
k += len(lineWords) # sum up words of each line
wordCnt += k - 1 # last word most probably not complete: subtract one
# count lines
lineCnt += len(lines)-1
# correction when line ends with \n
if lines[len(lines)-1][-1] == '\n':
lineCnt += 1
wordCnt += 1
lpCnt += 1
print('{0} {4} - {5} act Pos: {1}, act lines: {2}, act words: {3}'.format(lpCnt, pos, lineCnt, wordCnt, lines[0][0:10], lines[len(lines)-1][-10:]))
finally:
fin.close()
lineCnt += 1
print('Total line count: {}'.format(lineCnt))
That code works for files up to 2GB (tested with 2.1GB). The two constants at the beginning let you play with the size of the read in chunks and limit the amount of text processed. During testing you can then just process a subset of the whole data which goes much faster.

What qualifies collection of strings to become a line?

Following code is taking every character and running the loop as many times. But when I save the same line in a text file and perform same operation, the loop is only run once for 1 line. It is bit confusing. Possible reason I can think off is that first method is running the loop by considering "a" as a list. Kindly correct me if I am wrong. Also let me know how to create a line in code itself rather first saving it in a file and then using it.
>>> a="In this world\n"
>>> i=0
>>> for lines in a:
... i=i+1
... print i
...
1
2
3
4
5
6
7
8
9
10
11
12
13
You're trying to loop over a, which is a string. Regardless of how many newlines you have in a string, when you loop over it, you're going to go character by character.
If you want to loop through a bunch of lines, you have to use a list:
lines = ["this is line 1", "this is another line", "etc"]
for line in lines:
print line
If you have a string containing a bunch of newlines and want to convert it to a list of lines, use the split method:
text = "This is line 1\nThis is another line\netc"
lines = text.split("\n")
for line in lines:
print line
The reason why you go line by line when reading from a file is because the people who implemented Python decided that it would be more useful if iterating over a file yielded a collection of lines instead of a collection of characters.
However, a file and a string are different things, and you should not necessarily expect that they work in the same way.
Just change the name of the variable when looping on the line:
i = 0
worldLine ="In this world\n"
for character in worldLine:
i=i+1
print i
count = 0
readFile = open('myFile','r')
for line in readFile:
count += 1
now it should be clear what's going on.
Keeping meaningful names will save you a lot of debugging time.
Considering doing the following:
i = 0
worldLine =["In this world\n"]
for character in worldLine:
i=i+1
print i
if you want to loop on a list of lines consisting of worldLine only.

making lists from data in a file in python

I'm really new at python and needed help in making a list from data in a file. The list contains numbers on separate lines (by use of "\n" and this is something I don't want to change to CSV). The amount of numbers saved can be changed at any time because the way the data is saved to the file is as follows:
Program 1:
# creates a new file for writing
numbersFile = open('numbers.txt', 'w')
# determines how many times the loop will iterate
totalNumbers = input("How many numbers would you like to save in the file? ")
# loop to get numbers
count = 0
while count < totalNumbers:
number = input("Enter a number: ")
# writes number to file
numbersFile.write(str(number) + "\n")
count = count + 1
This is the second program that uses that data. This is the part that is messy and that I'm unsure of:
Program 2:
maxNumbers = input("How many numbers are in the file? ")
numFile = open('numbers.txt', 'r')
total = 0
count = 0
while count < maxNumbers:
total = total + numbers[count]
count = count + 1
I want to use the data gathered from program 1 to get a total in program 2. I wanted to put it in a list because the amount of numbers can vary. This is for an introduction to computer programming class, so I need a SIMPLE fix. Thank you to all who help.
Your first program is fine, although you should use raw_input() instead of input() (which also makes it unnecessary to call str() on the result).
Your second program has a small problem: You're not actually reading anything from the file. Fortunately, that's easy in Python. You can iterate over the lines in a file using
for line in numFile:
# line now contains the current line, including a trailing \n, if present
so you don't need to ask for the total of numbers in your file at all.
If you want to add the numbers, don't forget to convert the string line to an int first:
total += int(line) # shorthand for total = total + int(line)
There remains one problem (thanks #tobias_k!): The last line of the file will be empty, and int("") raises an error, so you could check that first:
for line in numFile:
if line:
total += int(line)

Reading one integer at a time using python

How can I read int from a file? I have a large(512MB) txt file, which contains integer data as:
0 0 0 10 5 0 0 140
0 20 6 0 9 5 0 0
Now if I use c = file.read(1), I get only one character at a time, but I need one integer at a time. Like:
c = 0
c = 10
c = 5
c = 140 and so on...
Any great heart please help. Thanks in advance.
Here's one way:
with open('in.txt', 'r') as f:
for line in f:
for s in line.split(' '):
num = int(s)
print num
By doing for line in f you are reading bit by bit (using neither read() all nor readlines). Important because your file is large.
Then you split each line on spaces, and read each number as you go.
You can do more error checking than that simple example, which will barf if the file contains corrupted data.
As the comments say, this should be enough for you - otherwise if it is possible your file can have extremely long lines you can do something trickier like reading blocks at a time.
512 MB is really not that large. If you're going to create a list of the data anyway, I don't see a problem with doing the reading step in one go:
my_int_list = [int(v) for v in open('myfile.txt').read().split()]
if you can structure your code so you don't need the entire list in memory, it would be better to use a generator:
def my_ints(fname):
for line in open(fname):
for val in line.split():
yield int(val)
and then use it:
for c in my_ints('myfile.txt'):
# do something with c (which is the next int)
I would do it this way:
buffer = file.read(8192)
contents += buffer
split the output string by space
remove last element from the array (might not be full number)
replace contents with last element string
repeat until buffer is None`

Categories