Python doesn't deal with char string as expected

Python doesn't deal with char string as expected - python

I am dealing with a long char string (elements with values 0..255), read directly from a file. I need to divide the string in chunks of 8 bytes. I'd expect this to work:
rawindex = file.read()
for chunk in rawindex[::8]:
print sys.stderr, len(chunk)
...but the len() always returns 1. What am I doing wrong?
More info:
* this is not homework
* I could play with range(,,8), but I would really like to know why the above example doesn't work

The 'step' parameter in an array's index just iterates each 8th element, not 8 element at once. Your code should look like:
step = 8
rawindex = file.read()
for index in range(0, len(rawindex), step):
print sys.stderr, len(rawindex[index:index+step])

You could just read chunks of the correct size yourself.
read_chunk = lambda: my_file.read(8)
for chunk in iter(read_chunk, ''):
print len(chunk)

rawindex = file.read() # this line will return the file read output as a string. For example rawindex = abc efgh ijk
rawindex[::8] # will returns all the eight charactors from the above string. so the result will be 'ah'
so effectively the for loop will be for chunk in 'ah':
In the first loop chunk will be a. len('a') will be 1. So always the len(chunk) will return 1.
I think you should use range in stead of rowindex[::8].

Related

how to find the largets numbers in each line in a file

Write a program that opens the following file (numbers1.txt ) that contains 4 lines of numbers as follows:
numbers1.txt
100 900
-3.2 25.9
30 11
( Empty line)
200 500
The program should read the two numbers in every line and print the maximum using max() function
The program should Print a message when there isn't a number
my code doesn't work??
f=open('numbers1.txt','r')
list1=f.readlines()
for i in list1:
print(max(i))

What is the error?
try this:
for i in list1:
print(max(i.strip())
No empty lines.

Param of max() should be a sequence. Use split() to change string to sequence.
for i in list1:
if len(i) != 1: # not empty line
print(max(i.split()))
Don't forget close a file, or use: with open('numbers1.txt') as f, which closes a file on itself.

Your code has some issues:
The max() function is called with a single string argument (the string '100 900' for example) but is expecting two values. Use line.split() for that purpose.
I guess string values should be converted to float before evaluating max().
You should handle empty lines.
You should close the file after reading.
This code should fix the issues:
with open('numbers.txt', 'r') as f:
lines = f.readlines()
for line in lines:
if line.strip() == '':
continue
n1, n2 = line.split()
max_value = max(float(n1), float(n2))
print(max_value)
The snippet can be abbreviated, but I've just added extra lines for pedagogic purposes only.

Another solution could be:
n = len(list1)
for i in range(0,n):
x = (list1[i].split())
if not x: # skip if line is empty
continue
print(max(x))

What does .readline() return in Python?

EDIT of entire post, in order to be more clear of the problem:
s = "GATATATGCATATACTT"
t = "ATAT"
for i in range(len(s)):
if t == s[i:i+len(t)]:
print i+1,
So the purpose of the program above is to scan through the long line of DNA (s) with the short line of DNA (t), in order to find at which positions on s, that t matches. The output of the above code is:
2 4 10 #This are basically the index numbers of string s that string t matches. but as can be seen in the code above, it's i+1 to give a 1-based numbering output.
The problem I'm having is that when i try to change the code, in order to make it receive the values for s and t through a file, the readline() function is not working for me. The motif.txt file contains two strings of DNA, one on each line.
with open('txt/motif.txt', 'r') as f:
s = f.readline()
t = f.readline()
for i in range(len(s)):
if t == s[i:i+len(t)]:
print i+1,
So this code, on the other hand will output nothing at all. But when I change t to:
t = f.readline().strip()
Then the program outputs the same result as the first example did.
So i hope this has made things more clear. My question is thus, if readline() returns a string, why isn't my program in example 2 working in the same way as in the very first example?

your problem statement is wrong, there's no way s or t has more content (and len(s) > 0 or len(t) > 0) in the first example than in the second.
basically with:
s = f.readline()
then s will contain a string like "foobar \n", and thus len(s) will be 9.
Then with:
s = f.readline().strip()
with the same string, len(s) will be 6 because the stripped string is "foobar".
so if you line is full of spaces like s = " \n", s.strip() will be the empty string "", with len(s) == 0.
Then in that case your loop won't start and will never print anything.
in almost all the other cases I can think of, you should get an execption raised, not silent exit.
But to be honest, your code is bad because nobody can understand what you want to do from reading it (including you in six months).

How to loop over every 2 characters in a file in python

I'm trying to loop over every 2 character in a file, do some tasks on them and write the result characters into another file.
So I tried to open the file and read the first two characters.Then I set the pointer on the 3rd character in the file but it gives me the following error:
'bytes' object has no attribute 'seek'
This is my code:
the_file = open('E:\\test.txt',"rb").read()
result = open('E:\\result.txt',"w+")
n = 0
s = 2
m = len(the_file)
while n < m :
chars = the_file.seek(n)
chars.read(s)
#do something with chars
result.write(chars)
n =+ 1
m =+ 2
I have to mention that inside test.txt is only integers (numbers).
The content of test.txt is a series of binary data (0's and 1's) like this:
01001010101000001000100010001100010110100110001001011100011010000001010001001
Although it's not the point here, but just want to replace every 2 character with something else and write it into result.txt .

Use the file with the seek and not its contents
Use an if statement to break out of the loop as you do not have the length
use n+= not n=+
finally we seek +2 and read 2
Hopefully this will get you close to what you want.
Note: I changed the file names for the example
the_file = open('test.txt',"rb")
result = open('result.txt',"w+")
n = 0
s = 2
while True:
the_file.seek(n)
chars = the_file.read(2)
if not chars:
break
#do something with chars
print chars
result.write(chars)
n +=2
the_file.close()
Note that because, in this case, you are reading the file sequentially, in chunks i.e. read(2) rather than read() the seek is superfluous.
The seek() would only be required if you wished to alter the position pointer within the file, say for example you wanted to start reading at the 100th byte (seek(99))
The above could be written as:
the_file = open('test.txt',"rb")
result = open('result.txt',"w+")
while True:
chars = the_file.read(2)
if not chars:
break
#do something with chars
print chars
result.write(chars)
the_file.close()

You were trying to use .seek() method on a string, because you thought it was a File object, but the .read() method of files turns it into a string.
Here's a general approach I might take to what you were going for:
# open the file and load its contents as a string file_contents
with open('E:\\test.txt', "r") as f:
file_contents = f.read()
# do the stuff you were doing
n = 0
s = 2
m = len(file_contents)
# initialize a result string
result = ""
# iterate over the file_contents, incrementing by 2, adding to results
for i in xrange(0, m, 2):
result += file_contents[i]
# write to results.txt
with open ('E:\\result.txt', 'wb') as f:
f.write(result)
Edit: It seems like there was a change to the question. If you want to change every second character, you'll need to make some adjustments.

Fetching address and values of opened file

I need to read a particular byte from a big binary file using Python. Using f.seek() takes a long time. Is there any method to fetch the address of the first byte of file and then add the address to reach to a particular byte in Python?
For example, given a text file containing
asddfrgd
get address of a, add 5, and then fetch the resulting value (which is 'r', assuming 1 byte for each letter).

Your description is not very clear. I assume that you want to fetch all values that are 5 bytes after an "a" in your example, such that "aardvark" gets "a" and "r" and the last "a" is skipped, because adding 5 goes beyond the end of the string.
Here's a solution that returns a list of such values by scanning the file linearly without jumping, byte by byte:
def find_past(fn, which, step):
""" Read file 'fn' and return all elements 'step' bytes after
each occurrence of 'which'.
"""
f = open(fn, "rb")
n = 0 # current byte address
res = [] # list of result bytes
next = [] # list of next byte addresses to consider
while True:
c = f.read(1)
if c == "":
break
if next and next[0] == n:
res.append(c)
next.pop(0)
if c == which:
next.append(n + step)
n += 1
f.close()
return res
Keeping track of the lists and byte offsets should be cheaper than f.seek(), but I haven't tried that on large data.

Reading one integer at a time using python

How can I read int from a file? I have a large(512MB) txt file, which contains integer data as:
0 0 0 10 5 0 0 140
0 20 6 0 9 5 0 0
Now if I use c = file.read(1), I get only one character at a time, but I need one integer at a time. Like:
c = 0
c = 10
c = 5
c = 140 and so on...
Any great heart please help. Thanks in advance.

Here's one way:
with open('in.txt', 'r') as f:
for line in f:
for s in line.split(' '):
num = int(s)
print num
By doing for line in f you are reading bit by bit (using neither read() all nor readlines). Important because your file is large.
Then you split each line on spaces, and read each number as you go.
You can do more error checking than that simple example, which will barf if the file contains corrupted data.
As the comments say, this should be enough for you - otherwise if it is possible your file can have extremely long lines you can do something trickier like reading blocks at a time.

512 MB is really not that large. If you're going to create a list of the data anyway, I don't see a problem with doing the reading step in one go:
my_int_list = [int(v) for v in open('myfile.txt').read().split()]
if you can structure your code so you don't need the entire list in memory, it would be better to use a generator:
def my_ints(fname):
for line in open(fname):
for val in line.split():
yield int(val)
and then use it:
for c in my_ints('myfile.txt'):
# do something with c (which is the next int)

I would do it this way:
buffer = file.read(8192)
contents += buffer
split the output string by space
remove last element from the array (might not be full number)
replace contents with last element string
repeat until buffer is None`

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python doesn't deal with char string as expected - python

The 'step' parameter in an array's index just iterates each 8th element, not 8 element at once. Your code should look like: step = 8 rawindex = file.read() for index in range(0, len(rawindex), step): print sys.stderr, len(rawindex[index:index+step])

You could just read chunks of the correct size yourself. read_chunk = lambda: my_file.read(8) for chunk in iter(read_chunk, ''): print len(chunk)

Related

how to find the largets numbers in each line in a file

What does .readline() return in Python?

How to loop over every 2 characters in a file in python

Fetching address and values of opened file

Reading one integer at a time using python

Categories

Resources