Print random line from txt file? - python

I'm using random.randint to generate a random number, and then assigning that number to a variable. Then I want to print the line with the number I assigned to the variable, but I keep getting the error:
list index out of range
Here's what I tried:
f = open(filename. txt)
lines = f.readlines()
rand_line = random. randint(1,10)
print lines[rand_line]

You want to use random.choice
import random
with open(filename) as f:
lines = f.readlines()
print(random.choice(lines))

To get a random line without loading the whole file in memory you can use Reservoir sampling (with sample size of 1):
from random import randrange
def get_random_line(afile, default=None):
"""Return a random line from the file (or default)."""
line = default
for i, aline in enumerate(afile, start=1):
if randrange(i) == 0: # random int [0..i)
line = aline
return line
with open('filename.txt') as f:
print(get_random_line(f))
This algorithm runs in O(n) time using O(1) additional space.

This code is correct, assuming that you meant to pass a string to open function, and that you have no space after the dot...
However, be careful to the indexing in Python, namely it starts at 0 and not 1, and then ends at len(your_list)-1.
Using random.choice is better, but if you want to follow your idea it would rather be:
import random
with open('name.txt') as f:
lines = f.readlines()
random_int = random.randint(0,len(lines)-1)
print lines[random_int]
Since randint includes both boundary, you must look until len(lines)-1.

f = open(filename. txt)
lines = f.readlines()
rand_line = random.randint(0, (len(lines) - 1)) # https://docs.python.org/2/library/random.html#random.randint
print lines[rand_line]

You can edit your code to achieve this without an error.
f = open(filename. txt)
lines = f.readlines()
rand_line = random. randint(0,len(lines)-1) # this should make it work
print lines[rand_line]
This way the index is not out of range.

Related

sorting lines of file python

I want to Bubblesort a file by numbers and I have propably 2 mistakes in my code.
The lines of the file contain: string-space-number
The response is a wrong sorting or sometimes I got also an IndexError because x.append(row[l]) is out of range
Hope someone can help me
Code:
#!/usr/bin/python
filename = "Numberfile.txt"
fo = open(filename, "r")
x, y, z, b = [], [], [], []
for line in fo: # read
row = line.split(" ") # split items by space
x.append(row[1]) # number
liste = fo.readlines()
lines = len(liste)
fo.close()
for passesLeft in range(lines-1, 0, -1):
for i in range(passesLeft):
if x[i] > x[i+1]:
temp = liste[i]
liste[i] = liste[i+1]
liste[i+1] = temp
fo = open(filename, "w")
for i in liste:
fo.writelines("%s" % i)
fo.close()
Seems that you have empty lines in the file.
Change:
for line in fo: # read
row = line.split(" ") # split items by space
x.append(row[1]) # number
with:
for line in fo: # read
if line.strip():
row = line.split(" ") # split items by space
x.append(row[1]) # number
By the way, you're better off using re.split with the regex \s+:
re.split(r'\s+', line)
which will make your code more resilient - it will be able to handle multiple spaces as well.
For the second issue Anand proceeded me: you're comparing strings, if you want to compare numbers you'll have to wrap it with a call to int()
First issue, if you are sorting based on the numbers and the numbers can be multiple digits, then your logic would not work because x is a list of strings , not integers, and when comparing strings, it compares lexicographically, that is '12' is less than 2 , etc. You should convert the number to int before appending to x list.
Also if you are getting ListIndex error, you may have empty lines or lines without 2 elements, you should correctly check you input, also you can add a condition to ignore the empty lines.
Code -
for line in fo:
if line.strip():
row = line.split(" ")
x.append(int(row[1]))

Writing all outputs to a file (Python)

I have this code that should generate all possible combinations of digits and store them in a text file called Passwords4.txt. The issue here is that when I go to the text file it just shows 9999 instead of showing the numbers from 0000 to 9999.
import itertools
lst = itertools.product('0123456789', repeat=4) #Last part is equal to the password lenght
for i in lst:
print ''.join(i)
f = open('Passwords4.txt', 'w')
f.write(str(''.join(i)) +'\n')
f.close()
Can someone explain what should I do?
Your f.write is not inside the loop, so it only happens once.
You probably want the open() before the loop, and your f.write in the loop (indented, same as print).
This is the more Pythonic way of doing :
import itertools
lst = itertools.product('0123456789', repeat=4) #Last part is equal to the password lenght
with open('Passwords4.txt', 'w') as f:
for i in lst:
print ''.join(i)
f.write(str(''.join(i)) +'\n')
Python takes care of everything here ...
Re:
for i in lst:
print ''.join(i)
f = open('Passwords4.txt', 'w')
f.write(str(''.join(i)) +'\n')
By the time you open the file and write to it after the loop is finished), i has already been set to just the last result of the loop and that's why you're only getting 9999.
A fix is to do the writes within the loop, with something like:
import itertools
lst = itertools.product('0123456789', repeat=4)
f = open('Passwords4.txt', 'w')
for i in lst:
f.write(''. join(i) + '\n')
f.close()

How to find sum of numbers from a file using a while loop?

I am trying to write a function that takes a file name as an argument, and returns the sum of the numbers in the file. Here is what I have done so far:
def sum_digits (filename):
"""
>>> sum_digits("digits.txt")
434
"""
myfile = open(filename, "r")
newfile = myfile.read()
sum = 0
while newfile.isdigit():
sum += newfile%10
newfile = newfile/10
return sum
if __name__=="__main__":
import doctest
doctest.testmod(verbose=True)
But this code is not working. I dont know how to do this. Any ideas?
You need to split your text to get a list of numbers, then iterate over that adding them up:
nums = newfile.split()
for num in nums:
sum += int(num)

finding sum of values in a nested dictionary in python

I have around around 20000 text files, numbered 5.txt,10.txt and so on..
I am storing the filepaths of these files in a list "list2" that i have created.
I also have a text file "temp.txt" with a list of 500 words
vs
mln
money
and so on..
I am storing these words in another list "list" that i have created.
Now i create a nested dictionary d2[file][word]=frequency count of "word" in "file"
Now,
I need to iterate through these words for each text file as,
i am trying to get the following output :
filename.txt- sum(d[filename][word]*log(prob))
Here, filename.txt is of the form 5.txt,10.txt and so on...
"prob",which is a value that i have already obtained
I basically need to find the sum of the inner keys'(words) values, (which is the frequency of the word) for every outer key(file).
Say:
d['5.txt']['the']=6
here "the" is my word and "5.txt" is the file.Now 6 is the number of times "the" occurs in "5.txt".
Similarly:
d['5.txt']['as']=2.
I need to find the sum of the dictionary values.
So,here for 5.txt: i need my answer to be :
6*log(prob('the'))+2*log(prob('as'))+...`(for all the words in list)
I need this to be done for all the files.
My problem lies in the part where I am supposed to iterate through the nested dictionary
import collections, sys, os, re
sys.stdout=open('4.txt','w')
from collections import Counter
from glob import glob
folderpath='d:/individual-articles'
folderpaths='d:/individual-articles/'
counter=Counter()
filepaths = glob(os.path.join(folderpath,'*.txt'))
#test contains: d:/individual-articles/5.txt,d:/individual,articles/10.txt,d:/individual-articles/15.txt and so on...
with open('test.txt', 'r') as fi:
list2= [line.strip() for line in fi]
#temp contains the list of words
with open('temp.txt', 'r') as fi:
list= [line.strip() for line in fi]
#the dictionary that contains d2[file][word]
d2 =defaultdict(dict)
for fil in list2:
with open(fil) as f:
path, name = os.path.split(fil)
words_c = Counter([word for line in f for word in line.split()])
for word in list:
d2[name][word] = words_c[word]
#this portion is also for the generation of dictionary "prob",that is generated from file 2.txt can be overlooked!
with open('2.txt', 'r+') as istream:
for line in istream.readlines():
try:
k,r = line.strip().split(':')
answer_ca[k.strip()].append(r.strip())
except ValueError:
print('Ignoring: malformed line: "{}"'.format(line))
#my problem lies here
items = d2.items()
small_d2 = dict(next(items) for _ in range(10))
for fil in list2:
total=0
for k,v in small_d2[fil].items():
total=total+(v*answer_ca[k])
print("Total of {} is {}".format(fil,total))
for fil in list2: #list2 contains the filenames
total = 0
for k,v in d[fil].iteritems():
total += v*log(prob[k]) #where prob is a dict
print "Total of {} is {}".format(fil,total)
with open(f) as fil assigns fil to whatever the contents of f are. When you later access the entries in your dictionary as
total=sum(math.log(prob)*d2[fil][word].values())
I believe you mean
total = sum(math.log(prob)*d2[f][word])
though, this doesn't seem to quite match up with the order you were expecting, so I would instead suggest something more like this:
word_list = [#list of words]
file_list = [#list of files]
dictionary = {#your dictionary}
summation = lambda file_name,prob: sum([(math.log(prob)*dictionary[word][file_name]) for word in word_list])
return_value = []
for file_name in file_list:
prob = #something
return_value.append(summation(file_name))
The summation line there is defining an anonymous function within python. These are called lambda functions. Essentially, what that line in particular means is:
summation = lambda file_name,prob:
is almost the same as:
def summation(file_name, prob):
and then
sum([(math.log(prob)*dictionary[word][file_name]) for word in word_list])
is almost the same as:
result = []
for word in word_list:
result.append(math.log(prob)*dictionary[word][file_name]
return sum(result)
so in total you have:
summation = lambda file_name,prob: sum([(math.log(prob)*dictionary[word][file_name]) for word in word_list])
instead of:
def summation(file_name, prob):
result = []
for word in word_list:
result.append(math.log(prob)*dictionary[word][file_name])
return sum(result)
though the lambda function with the list comprehension is much faster than the for loop implementation. There are very few cases in python where one should use a for loop instead of a list comprehension, but they certainly exist.

Go to a specific line and read the next few in Python

I have this huge (61GB) FASTQ file of which I want to create a random subset, but which I cannot load into memory. The problem with FASTQs is that every four lines belong together, otherwise I would just create a list of random integers and only write the lines at these integers to my subset file.
So far, I have this:
import random
num = []
while len(num) < 50000000:
ran = random.randint(0,27000000)
if (ran%4 == 0) and (ran not in num):
num.append(ran)
num = sorted(num)
fastq = open("all.fastq", "r", 4)
subset = open("sub.fastq", "w")
for i,line in enumerate(fastq):
for ran in num:
if ran == i:
subset.append(line)
I have no idea how to reach the next three lines in the file before going to the next random integer. Can someone help me?
Iterate over the file in chunks of four lines.
Take a random sample from that iterator.
The idea is that you can sample from a generator without random access, by iterating through it and choosing (or not) each element in turn.
You could try this:
import random
num = sorted([random.randint(0,27000000/4)*4 for i in range(50000000/4)])
lines_to_write = 0
with open("all.fastq", "r") as fastq:
with open("sub.fastq", "w") as subset:
for i,line in enumerate(fastq):
if len(num)==0:
break
if i == num[0]:
num.pop(0)
lines_to_write = 4
if lines_to_write>0:
lines_to_write -= 1
subset.write(line)

Categories