Read n test cases from file in Python - python

I've been trying to use Python for a few sample programming competition questions, but I've been stumped on file reading.
I'm reading from stdin, the first line is the number of test cases that follow, each subsequent line contains two integers that I need to process. E.g.
4 -10
0 5
6 20
0 -1
20 10
I've found a C++ solution that looks like this:
int main()
int runs,a,b ;
cin >> runs ;
cin >> a >> b ;
long long ret = solve(a,b) ;
cout << ret << endl ;
return 0 ;
The closest I've come up with in Python is:
t = int(raw_input())
answer = 0
while t :
n, m = map(int, raw_input().split())
answer = solve(n,m)
print answer
I've seen similar questions on Stack Overflow but I'm still having a tricky time wrapping my head around the Python way to do this.

4 -10
0 5
6 20
0 -1
20 10
You would do it like this.
num_of_testcases = int(raw_input()) # this corresponds to 3 and 2
for each in range(number_of_testcases):
x, y = map(int, raw_input().split()) # this would give the pair of numbers
In the contests, usually, you will have the total number of test cases. You have not mentioned it here. It is taken upfront
total_test_cases = int(raw_input())
and then you iterate the above input gathering routine over the total_test_cases If the total test cases is not present, then you can iterate over while True and then cancel at EOF.
for tc in range(total_test_cases):
num_of_testcases = int(raw_input()) # this corresponds to 3 and 2
for each in range(number_of_testcases):
x, y = map(int, raw_input().split()) # this would give the pair of numbers

Try this:
import sys
for l in sys.stdin.readlines()[1:]:
a,b = map(int,l.split())
#now process your test cases
Also according to your input file description there should only be one set of test cases.Like so:
4 -10
0 5
4 20

If you don't want to use raw_input you can use fileinput instead:
import fileinput
input = fileinput.input()
for line in input:
for j in range(int(line)):
solve(*[int(i) for i in])
or with sys.stdin
import sys
for line in sys.stdin:
for j in range(int(line)):
solve(*[int(i) for i in])


Is there a way of processing list of number integers entered via terminal without saving them into list?

I am trying to write a similar code in Python, but I am new to it.
int counts[] = { 0, 0, 0, 0, 0 };
for (int i = 0; i < groups; i++) {
int groups_size;
scanf(" %d", &groups_size);
counts[groups_size] += 1;
Please note that it does not all the save the numbers into memory.
I tried to do this in Python as:
for group in range(groups):
num = int(input().strip())
counts[num] += 1
This does not work. When I enter 1 2 3 4 5 into terminal, I get ValueError: invalid literal for int() with base 10: '1 2 3 4 5'.
Is there a way of doing this in Python the same as I did in C?
In python, it will not automatically take one number and then loop for the other. You input() command will read the whole line at once. So, what you can do is read the whole line in a string and then split it into a list as follows -
str = input()
num = list(map(int,str.split()))
Now you have all the input given by user stored in the num variable. You can just iterate over it and complete your process as follows -
counts = [0]*5 #assuming you want it to be of size 5 as in your question
for inp in num :
counts[inp] = counts[inp] + 1
Hope this helps!

How to print mean , median and mode in python 3

I am new to python programming and I'm getting runtime error with my code. Any help is appreciated.
import statistics
tc = int(input())
while tc > 0:
n = int(input())
arr = input()
l = list(map(int, arr.split(' ')))
print("{} {} {}".format(statistics.mean(l), statistics.median(l), statistics.mode(l)))
tc = tc - 1
StatisticsError: no unique mode; found 2 equally common values
Input Format
First line consists of a single integer T denoting the number of test cases.
First line of each test case consists of a single integer N denoting the size of the array.
Following line consists of N space-separated integers Ai denoting the elements in the array.
Output Format
For each test case, output a single line containing three-separated integers denoting the Mean, Median and Mode of the array
Sample Input
1 1 2 3 3
Sample Output
2 2 1
You could add a variable mode surrounded by a try...except and if statistics has an error get the mode a different way.
print("{} {} {}".format(statistics.mean(l), statistics.median(l), mode))
Hello Kartik Madaan,
Try this code,
Using Python 3.X
import statistics
from statistics import mean,median,mode
tc = int(input())
while tc > 0:
n = int(input())
arr = input()
l = list(map(int,arr.split()))
mod = max(set(l), key=l.count)
tc = tc - 1
I hope my answer is helpful.
If any query so comment please.

Memory efficient way to read an array of integers from single line of input in python2.7

I want to read a single line of input containing integers separated by spaces.
Currently I use the following.
A = map(int, raw_input().split())
But now the N is around 10^5 and I don't need the whole array of integers, I just need to read them 1 at a time, in the same sequence as the input.
Can you suggest an efficient way to do this in Python2.7
Use generators:
numbers = '1 2 5 18 10 12 16 17 22 50'
gen = (int(x) for x in numbers.split())
for g in gen:
print g
the generator object would use one item at a time, and won't construct a whole list.
You could parse the data a character at a time, this would reduce memory usage:
data = "1 50 30 1000 20 4 1 2"
number = []
numbers = []
for c in data:
if c == ' ':
if number:
number = []
if number:
print numbers
Giving you:
[1, 50, 30, 1000, 20, 4, 1, 2]
Probably quite a bit slower though.
Alternatively, you could use itertools.groupby() to read groups of digits as follows:
from itertools import groupby
data = "1 50 30 1000 20 4 1 2"
numbers = []
for k, g in groupby(data, lambda c: c.isdigit()):
if k:
print numbers
If you're able to destroy the original string, split accepts a parameter for the maximum number of breaks.
See docs for more details and examples.

How to randomly delete a number of lines from a big file?

I have a big text file of 13 GB with 158,609,739 lines and I want to randomly select 155,000,000 lines.
I have tried to scramble the file and then cut the 155000000 first lines, but it's seem that my ram memory (16GB) isn't enough big to do this. The pipelines i have tried are:
shuf file | head -n 155000000
sort -R file | head -n 155000000
Now instead of selecting lines, I think is more memory efficient delete 3,609,739 random lines from the file to get a final file of 155000000 lines.
As you copy each line of the file to the output, assess its probability that it should be deleted. The first line should have a 3,609,739/158,609,739 chance of being deleted. If you generate a random number between 0 and 1 and that number is less than that ratio, don't copy it to the output. Now the odds for the second line are 3,609,738/158,609,738; if that line is not deleted, the odds for the third line are 3,609,738/158,609,737. Repeat until done.
Because the odds change with each line processed, this algorithm guarantees the exact line count. Once you've deleted 3,609,739 the odds go to zero; if at any time you would need to delete every remaining line in the file, the odds go to one.
You could always pre-generate which line numbers (a list of 3,609,739 random numbers selected without replacement) you plan on deleting, then just iterate through the file and copy to another, skipping lines as necessary. As long as you have space for a new file this would work.
You could select the random numbers with random.sample
random.sample(xrange(158609739), 3609739)
Proof of Mark Ransom's Answer
Let's use numbers easier to think about (at least for me!):
10 items
delete 3 of them
First time through the loop we will assume that the first three items get deleted -- here's what the probabilities look like:
first item: 3 / 10 = 30%
second item: 2 / 9 = 22%
third item: 1 / 8 = 12%
fourth item: 0 / 7 = 0 %
fifth item: 0 / 6 = 0 %
sixth item: 0 / 5 = 0 %
seventh item: 0 / 4 = 0 %
eighth item: 0 / 3 = 0 %
ninth item: 0 / 2 = 0 %
tenth item: 0 / 1 = 0 %
As you can see, once it hits zero, it stays at zero. But what if nothing is getting deleted?
first item: 3 / 10 = 30%
second item: 3 / 9 = 33%
third item: 3 / 8 = 38%
fourth item: 3 / 7 = 43%
fifth item: 3 / 6 = 50%
sixth item: 3 / 5 = 60%
seventh item: 3 / 4 = 75%
eighth item: 3 / 3 = 100%
ninth item: 2 / 2 = 100%
tenth item: 1 / 1 = 100%
So even though the probability varies per line, overall you get the results you are looking for. I went a step further and coded a test in Python for one million iterations as a final proof to myself -- remove seven items from a list of 100:
# python 3.2
from __future__ import division
from stats import mean #
import random
counts = dict()
for i in range(100):
counts[i] = 0
removed_failed = 0
for _ in range(1000000):
to_remove = 7
from_list = list(range(100))
removed = 0
while from_list:
current = from_list.pop()
probability = to_remove / (len(from_list) + 1)
if random.random() < probability:
removed += 1
to_remove -= 1
counts[current] += 1
if removed != 7:
removed_failed += 1
print(counts[0], counts[1], counts[2], '...',
counts[49], counts[50], counts[51], '...',
counts[97], counts[98], counts[99])
print("remove failed: ", removed_failed)
print("min: ", min(counts.values()))
print("max: ", max(counts.values()))
print("mean: ", mean(counts.values()))
and here's the results from one of the several times I ran it (they were all similar):
70125 69667 70081 ... 70038 70085 70121 ... 70047 70040 70170
remove failed: 0
min: 69332
max: 70599
mean: 70000.0
A final note: Python's random.random() is [0.0, 1.0) (doesn't include 1.0 as a possibility).
I believe you're looking for "Algorithm S" from section 3.4.2 of Knuth (D. E. Knuth, The Art of Computer Programming. Volume 2: Seminumerical Algorithms, second edition. Addison-Wesley, 1981).
You can see several implementations at
The Perlmonks list has some Perl implementations of Algorithm S and Algorithm R that might also prove useful.
These algorithms rely on there being a meaningful interpretation of floating point numbers like 3609739/158609739, 3609738/158609738, etc. which might not have sufficient resolution with a standard Float datatype, unless the Float datatype is implemented using numbers of double precision or larger.
Here's a possible solution using Python:
import random
skipping = random.sample(range(158609739), 3609739)
input = open(input)
output = open(output, 'w')
for i, line in enumerate(input):
if i in skipping:
Here's another using Mark's method:
import random
lines_in_file = 158609739
lines_left_in_file = lines_in_file
lines_to_delete = lines_in_file - 155000000
input = open(input)
output = open(output, 'w')
for line in input:
current_probability = lines_to_delete / lines_left_in_file
lines_left_in_file -= 1
if random.random < current_probability:
lines_to_delete -= 1
except ZeroDivisionError:
print("More than %d lines in the file" % lines_in_file)
I wrote this code before seeing that Darren Yin has expressed its principle.
I've modified my code to take the use of name skipping (I didn't dare to choose kangaroo ...) and of keyword continue from Ethan Furman whose code's principle is the same too.
I defined default arguments for the parameters of the function in order that the function can be used several times without having to make re-assignement at each call.
import random
import os.path
def spurt(ff,skipping):
for i,line in enumerate(ff):
if i in skipping:
print 'line %d excluded : %r' % (i,line)
yield line
def randomly_reduce_file(filepath,nk = None,
d = {0:'st',1:'nd',2:'rd',3:'th'},spurt = spurt,
sample = random.sample,splitext = os.path.splitext):
# count of the lines of the original file
with open(filepath) as f: nl = sum(1 for _ in f)
# asking for the number of lines to keep, if not given as argument
if nk is None:
nk = int(raw_input(' The file has %d lines.'
' How many of them do you '
'want to randomly keep ? : ' % nl))
# transfer of the lines to keep,
# from one file to another file with different name
if nk<nl:
with open(filepath,'rb') as f,\
open('COPY'.join(splitext(filepath)),'wb') as g:
g.writelines( spurt(f,sample(xrange(0,nl),nl-nk) ) )
# sample(xrange(0,nl),nl-nk) is the list
# of the counting numbers of the lines to be excluded
print ' %d is %s than the number of lines (%d) in the file\n'\
' no operation has been performed'\
% (nk,'the same' if nk==nl else 'greater',nl)
With the $RANDOM variable you can get a random number between 0 and 32,767.
With this, you could read in each line, and see if $RANDOM is less than 155,000,000 / 158,609,739 * 32,767 (which is 32,021), and if so, let the line through.
Of course, this wouldn't give you exactly 150,000,000 lines, but pretty close to it depending on the normality of the random number generator.
EDIT: Here is some code to get you started:
while read line; do
if (( $RANDOM < 32021 ))
echo $line
Call it like so: <inFile.txt >outFile.txt

Printing in a loop

I have the following file I'm trying to manipulate.
1 2 -3 5 10 8.2
5 8 5 4 0 6
4 3 2 3 -2 15
-3 4 0 2 4 2.33
2 1 1 1 2.5 0
0 2 6 0 8 5
The file just contains numbers.
I'm trying to write a program to subtract the rows from each other and print the results to a file. My program is below and, dtest.txt is the name of the input file. The name of the program is
from math import *
posnfile = open("dtest.txt","r")
posn = posnfile.readlines()
for i in range (len(posn)-1):
for j in range (0,1):
if (j == 0):
Xp = float(posn[i].split()[0])
Yp = float(posn[i].split()[1])
Zp = float(posn[i].split()[2])
Xc = float(posn[i+1].split()[0])
Yc = float(posn[i+1].split()[1])
Zc = float(posn[i+1].split()[2])
Xp = float(posn[i].split()[3*j+1])
Yp = float(posn[i].split()[3*j+2])
Zp = float(posn[i].split()[3*j+3])
Xc = float(posn[i+1].split()[3*j+1])
Yc = float(posn[i+1].split()[3*j+2])
Zc = float(posn[i+1].split()[3*j+3])
Px = fabs(Xc-Xp)
Py = fabs(Yc-Yp)
Pz = fabs(Zc-Zp)
print Px,Py,Pz
The program is calculating the values correctly but, when I try to call the program to write the output file,
mpipython > distance.dat
The output file (distance.dat) only contains 3 columns when it should contain 6. How do I tell the program to shift what columns to print to for each step j=0,1,....
For j = 0, the program should output to the first 3 columns, for j = 1 the program should output to the second 3 columns (3,4,5) and so on and so forth.
Finally the len function gives the number of rows in the input file but, what function gives the number of columns in the file?
Append a , to the end of your print statement and it will not print a newline, and then when you exit the for loop add an additional print to move to the next row:
for j in range (0,1):
print Px,Py,Pz,
Assuming all rows have the same number of columns, you can get the number of columns by using len(row.split()).
Also, you can definitely shorten your code quite a bit, I'm not sure what the purpose of j is, but the following should be equivalent to what you're doing now:
for j in range (0,1):
Xp, Yp, Zp = map(float, posn[i].split()[3*j:3*j+3])
Xc, Yc, Zc = map(float, posn[i+1].split()[3*j:3*j+3])
You don't need to:
use numpy
read the whole file in at once
know how many columns
use awkward comma at end of print statement
use list subscripting
use math.fabs()
explicitly close your file
Try this (untested):
with open("dtest.txt", "r") as posnfile:
previous = None
for line in posnfile:
current = [float(x) for x in line.split()]
if previous:
delta = [abs(c - p) for c, p in zip(current, previous)]
print ' '.join(str(d) for d in delta)
previous = current
just in case your dtest.txt grows larger and you don't want to redirect your output but rather write to distance.dat, especially, if you want to use numpy. Thank #John for pointing out my mistake in the old code ;-)
import numpy as np
pos = np.genfromtxt("dtest.txt")
dis = np.array([np.abs(pos[j+1] - pos[j]) for j in xrange(len(pos)-1)])
