I've started doing programming contests/challenges, and often the questions involve reading in from standard input. I've been doing
import fileinput
inputLines = []
for line in fileinput.input():
inputLines.append(line)
I then can do whatever calculations I need to do with inputLines. Is there a more Pythonic (i.e., better) way of doing this?
If you just want to read from stdin, not from any files named in the command line, then you should not use fileinput.
If you want a list containing the lines from stdin, then:
import sys
inputLines = list(sys.stdin)
import sys
for line in sys.stdin:
print "The line was", line
I think fileinput is the flexible way to do it in Python, given they made a module just to do that.
If you know more about what type of input you will be reading, there might be libraries that are better suited to your needs. For example, I do a lot of numerical work, so pandas works great for me because it has a read_csv. Take a look at the docs (and try and tell us more about specific reading needs, if you can narrow it down).
Related
The documentation for the python library Murmur is a bit sparse.
I have been trying to adapt the code from this answer:
import hashlib
from functools import partial
def md5sum(filename):
with open(filename, mode='rb') as f:
d = hashlib.md5()
for buf in iter(partial(f.read, 128), b''):
d.update(buf)
return d.hexdigest()
print(md5sum('utils.py'))
From what I read in the answer, the md5 can't operate on the whole file at once so it needs this looping. Not sure exactly what would happen on the line d.update(buf) however.
The public methods in hashlib.md5() are:
'block_size',
'copy',
'digest',
'digest_size',
'hexdigest',
'name',
'update'
whereas mmh3 has
'hash',
'hash64',
'hash_bytes'
No update or hexdigest methods..
Does anyone know how to achieve a similar result?
The motivation is testing for uniqueness as fast as possible, the results here suggests murmur is a good candidate.
Update -
Following the comment from #Bakuriu I had a look at mmh3 which seems to be better documented.
The public methods inside it are:
import mmh3
print([x for x in dir(mmh3) if x[0]!='_'])
>>> ['hash', 'hash128', 'hash64', 'hash_bytes', 'hash_from_buffer']
..so no "update" method. I had a look at the source code for mmh3.hash_from_buffer but it does not look like it contains a loop and it is also not in Python, can't really follow it. Here is a link to the line
So for now will just use CRC-32 which is supposed to be almost as good for the purpose, and it is well documented how to do it. If anyone posts a solution will test it out.
To hash a file using murmur, one has to load it completely into memory and hash it in one go.
import mmh3
with open('main.py') as file:
data = file.read()
hash = mmh3.hash_bytes(data, 0xBEFFE)
print(hash.hex())
If your file is too large to fit into memory, you could use incremental/progressive hashing: add your data in multiple chunks and hash them on the fly (like your example above).
Is there a Python library for progressive hashing with murmur?
I tried to find one, but it seems there is none.
Is progressive hashing even possible with murmur?
There is a working implementation in C:
https://github.com/rurban/smhasher/blob/master/PMurHash.h
https://github.com/rurban/smhasher/blob/master/PMurHash.c
I am trying to run a rolling horizon optimisation where I have multiple optimisation scripts, each generating their own results. Instead of printing results to screen at every interval, I want to write each of the results using model.write("results.sol") - and then read them back into a results processing script (separate python script).
I have tried using read("results.sol") using Python, but the file format is not recognised. Is there any way that you can read/process the .sol file format that Gurobi outputs? It would seem bizarre if you cannot read the .sol file at some later point and generate plots etc.
Maybe I have missed something blindingly obvious.
Hard to answer without seeing your code as we have to guess what you are doing.
But well...
When you use
model.write("out.sol")
Gurobi will use it's own format to write it (and what is written is inferred from the file-suffix).
This can easily be read by:
model.read("out.sol")
If you used
x = read("out.sol")
you are using python's basic IO-tools and of course python won't interpret that file in respect to the format. Furthermore reading like that is text-mode (and maybe binary is required; not sure).
General rule: if you wrote the solution using a class-method of class model, then read using a class-method of class model too.
The usage above is normally used to reinstate some state of your model (e.g. MIP-start). If you want to plot it, you will have to do further work. In this case, using python's IO tools might be a good idea and you should respect the format described here. This could be read as csv or manually (and opposed to my remark earlier: it is text-mode; not binary).
So assuming the example from the link is in file gur.sol:
import csv
with open('gur.sol', newline='\n') as csvfile:
reader = csv.reader((line.replace(' ', ' ') for line in csvfile), delimiter=' ')
next(reader) # skip header
sol = {}
for var, value in reader:
sol[var] = float(value)
print(sol)
Output:
{'z': 0.2, 'x': 1.0, 'y': 0.5}
Remarks:
Code is ugly because python's csv module has some limitations
Delimiter is two-spaces in this format and we need to hack the code to read it (as only one character is allowed in this function)
Code might be tailored to python 3 (what i'm using; probably the next() method will be different in py2)
pandas would be much much better for this purpose (huge tool with a very good csv_reader)
This question already has answers here:
How do I read from stdin?
(25 answers)
Closed 6 years ago.
I am doing small project in which I have to read the file from STDIN.
I am not sure what it means, what I asked the professor he told me,
there is not need to open the file and close like we generally do.
sFile = open ( "file.txt",'r')
I dont have to pass the file as a argument.
I am kind of confused what he wants.
The stdin takes input from different sources - depending on what input it gets.
Given a very simple bit of code for illustration (let's call it: script.py):
import sys
text = sys.stdin.read()
print text
You can either pipe your script with the input-file like so:
$ more file.txt | script.py
In this case, the output of the first part of the pipeline - which is the content of the file - is assigned to our variable(in this case text, which gets printed out eventually).
When left empty (i.e. without any additional input) like so:
$ python script.py
It let's you write the input similar to the input function and assigns the written input to the defined variable(Note that this input-"window" is open until you explicitly close it, which is usually done with Ctrl+D).
import sys, then sys.stdin will be the 'file' you want which you can use like any other file (e.g. sys.stdin.read()), and you don't have to close it. stdin means "standard input".
Might be helpful if you read through this post, which seems to be similar to yours.
'stdin' in this case would be the argument on the command line coming after the python script, so python script.py input_file. This input_file would be the file containing whatever data you are working on.
So, you're probably wondering how to read stdin. There are a couple of options. The one suggested in the thread linked above goes as follows:
import fileinput
for line in fileinput.input():
#read data from file
There are other ways, of course, but I think I'll leave you to it. Check the linked post for more information.
Depending on the context of your assignment, stdin may be automatically sent into the script, or you may have to do it manually as detailed above.
I want to pass an input fasta file stored in a variable say inp_a from python to bowtie and write the output into another out_a. I want to use
os.system ('bowtie [options] inp_a out_a')
Can you help me out
Your question asks for two things, as far as I can tell: writing data to disk, and calling an external program from within Python. Without more detailed requirements, here's what I would write:
import subprocess
data_for_bowtie = "some genome data, lol"
with open("input.fasta", "wb") as input_file:
input_file.write(data_for_bowtie)
subprocess.call(["bowtie", "input.fasta", "output.something"])
There are some fine details here which I have assumed. I'm assuming that you mean bowtie, the read aligner. I'm assuming that your file is a binary, non-human-readable one (which is why there's that b in the second argument to open) and I'm making baseless assumptions about how to call bowtie on the command line because I'm not motivated enough to spend the time learning it.
Hopefully, that provides a starting point. Good luck!
I'm trying to read a huge amount of lines from standard input with python.
more hugefile.txt | python readstdin.py
The problem is that the program freezes as soon as i've read just a single line.
print sys.stdin.read(8)
exit(1)
This prints the first 8 bytes but then i expect it to terminate but it never does. I think it's not really just reading the first bytes but trying to read the whole file into memory.
Same problem with sys.stdin.readline()
What i really want to do is of course to read all the lines but with a buffer so i don't run out of memory.
I'm using python 2.6
This should work efficiently in a modern Python:
import sys
for line in sys.stdin:
# do something...
print line,
You can then run the script like this:
python readstdin.py < hugefile.txt
Back in the day, you had to use xreadlines to get efficient huge line-at-a-time IO -- and the docs now ask that you use for line in file.
Of course, this is of assistance only if you're actually working on the lines one at a time. If you're just reading big binary blobs to pass onto something else, then your other mechanism might be as efficient.