Recoding of binary file - python

I have a file that has this contents
1 5 9 14 15
00000
10000
00010
11010
00010
I want to parse the file so that the following is output
UUUUUUUUUUUUUU
YUUUUUUUUUUUUU
UUUUUUUUUUUUYY
YUUUYUUUUUUUYU
UUUUUUUUUUUUYU
This means the first row provides a position. If there is a 0, it becomes U. If it is a 1 it becomes Y. Between the first two columns there are 4 unmapped cols which means that for these four cols all rows are U - and 0
I tried the following in python
#!/usr/bin/env python2
import sys
with open(sys.argv[1]) as f:
f.readline()
for line in f:
new = ''
for char in line.rstrip():
if char == '0':
new += 'UU'
elif char == '1':
new +='YU'
print new.rstrip()[:-1]
The problem is that this script only works if the positions are 2 apart but they can also be larger - how can I extend the script?
there is some poroblem when i run the code from, Delimity - get an error
dropbox.com/s/cf8rbv20bgyvssq/conv_inp?dl=0
these are the real da
Traceback (most recent call last):
File "./con.py", line 8, in <module>
for v in xrange(max(positions) + 1):
OverflowError: long int too large to convert to int

Just a guess.
Implement the converter:
def convert(s):
return "UUU".join({"0": "U", "1": "Y"}[c] for c in s[:-1]) + "U"
And test it:
assert convert("00000") == "UUUUUUUUUUUUUU"
assert convert("10000") == "YUUUUUUUUUUUUU"
assert convert("00010") == "UUUUUUUUUUUUYU"
assert convert("11010") == "YUUUYUUUUUUUYU"
assert convert("00010") == "UUUUUUUUUUUUYU"

Check this code:
#!/usr/bin/env python2
import sys
def myxrange(to):
x = 0
while x < to:
yield x
x += 1
with open(sys.argv[1]) as f:
positions = map(lambda x: long(x) - 1, f.readline().split())
max_pos = max(positions)
for line in f:
new = ''
for i in myxrange(max_pos + 1):
if i in positions and line[positions.index(i)] == '1':
new += 'Y'
else:
new += 'U'
print new.rstrip()

Related

How can I change this function so that it returns a list of the number of even digits in a file?

def evens(number_file: TextIO) -> List[int]:
lst = []
line = number_file.readline().strip()
while line != '':
evens = 0
line = number_file.readline().strip()
while line.isdigit():
evens = evens + int(line)
line = number_file.readline().strip()
lst.append(evens)
return last
in this example the file 'numbers.txt' looks like this:
START
1
2
END
START
3
END
START
4
5
6
END
Each line is either an int or 'START' or 'END'
I want to make a function that returns the number of evens in each section so when the code tuns on this file, it should return the list [1, 0, 2]. Could someone please help me?
import numpy as np
def main():
line = [1,4,5,6,8,8,9,10]
evens = np.array()
for l in line:
if (l % 2) == 0:
np.append(evens, l)
return evens
if __name__ == '__main__':
main()
Without your txt file and the code to read it in, this is the best I can do.
When writing new code, it's a good idea to start small and then add capabilities bit by bit. e.g., write code that works without worrying about sections first, then update to work with sections.
But anyway, here's some revised code that should work:
def evens(number_file: TextIO) -> List[int]:
lst = []
line = number_file.readline().strip()
while line != '':
evens = 0
# ignore all text lines
while !line.isdigit():
line = number_file.readline().strip()
# process number lines
while line.isdigit():
if int(line) % 2 == 0:
evens = evens + 1
line = number_file.readline().strip()
lst.append(evens)
return lst
And here's a version that may be simpler (for loops in Python can often make your life easier, e.g. when processing every row of a file):
def evens(number_file: TextIO) -> List[int]:
lst = []
for row in number_file:
line = row.strip()
if line == 'START':
# new section, start new count
lst.append(0)
elif line.isdigit() and int(line) % 2 == 0:
# update current count (last item in lst)
lst[-1] += 1
return lst

String index out of range, solution is working fine

I'm trying to change all the characters to the symbol "#" except the last 4. The error I get:
> Traceback (most recent call last):
File "tests.py", line 16, in
> <module>
> r = maskify(cc) File "/workspace/default/solution.py", line 19, in maskify
> n[i] = c[i] IndexError: string index out of range
The code
c = "3656013700"
cc = list(c)
a = (len(cc)-1) - 4
b = []
def maskify(cc):
n = list(len(cc) * "#")
while len(cc) <= 4:
return str(cc)
break
else:
for i in range(len(cc)):
if i <= a:
n[i] = n[i]
else:
n[i] = c[i]
b = "".join([str(i) for i in n])
return b
maskify(cc)
I don't know why you're complicating it. As #blorgon has already pointed it out, you can directly return it in one line. Further simplifying it, you don't need even need to convert the string into a list. Just directly pass the string as argument.
c = "3656013700"
def maskify(c):
return (len(c) - 4) * "#" + c[-4:]
print(maskify(c))
If this is not what you're trying to achieve, your question is unclear.
You are going through a complicated route. Here is another approach:
c = "3656013700"
def maskify(cc):
return cc.replace(cc[:-4], '#'*(len(cc)-4))
print (maskify(c))
Output:
######3700
If you want to go with the iterative approach you can simplify the code substantially:
def maskify(c):
# Check if lenght of string is 4 or less
if len(c) <= 4:
return c
else:
# string into a list
cc = list(c)
# Iterate over the list - 4 last items
for i in range(len(c)-4):
cc[i] = "#"
return "".join(cc)
s = "3656013700"
s = maskify(s)
print(s)

Python Index error using generator with yield

I'm running this code and I get the values I want from it, but there is also an IndexError: tuple index out of range for lines 12 and 18
import statistics as st
def squares(*args):
i = 0
val = []
fin = []
val = args
while True:
avg = (st.mean(val))
fin = (avg - val[i]) ** 2 # line 12
yield fin
i += 1
mylist = squares(3, 4, 5)
for x in mylist: # line 18
print(x)
result:
1
0
1
Traceback (most recent call last):
File line 18, in <module>
for x in mylist:
File line 12, in squares
fin = (avg - val[i]) ** 2
IndexError: tuple index out of range
Base on your code there are some variables & methods that you did that I think you can also change. Like example on this one. I commented out your old code so you can see the changes & difference.
import statistics as st
def squares(*args):
#i = 0
#val = []
fin = []
val = args
for n in val:
avg = (st.mean(val))
fin = (avg - n) ** 2 # line 12, #val[i]
#i += 1
yield fin
mylist = squares(3, 4, 5)
for x in mylist: # line 18
print(x)
I can see here that you are trying to access every value of val with fin = (avg - val[i]) ** But you can also use a for loop with it & don't need for a i variable as index. Also what #schwobaseggl is correct, you get the error IndexError: tuple index out of range because you kept incrementing or adding up your i to the point where you are trying to access a value from your val variable that is beyond its length.
You can simplify the generator function:
import statistics as st
def squares(*args):
avg = st.mean(args)
for arg in args:
yield (avg - arg) ** 2
Note that in your original, you have an infinite loop (while True) that you never break and that keeps incrementing index i while the i-accessed sequence val does not grow. That was always an IndexError waiting to happen.

How to sum line by line in python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I am new at this and couldn't search my exact problem. Though it may be simple to many of you, it is certainly a bowl full of aggravation to me. So, here it is...
I have a text file with columns and rows. The columns are string and the rows are numeric.
EXAMPLE: text file.
Line 1: a.1 2.g 2.2 b.3
Line 2: --------------------
Line 3: 1 2 4 1
Line 4: 3 3 1 1
Line 5: 2 1 5 8
I need to read the text file and do some simple math.
1. sum rows 3,4,5.
The final result should like the following:
FINAL
Line 1:
Line 2:
Line 3: 8
Line 4: 8
Line 5: 16
Here is what i have so far...
files = open("exam-grades.txt", "r")
line_number = 1
for line in files:
if (line_number == 1 or line_number == 2):
continue
else:
sum = 0
numbers = line.split("\t")
for n in numbers:
sum = sum + float(n)
print "Line #: %d / Sum is %d ."%(line_number,sum)
line_number = line_number + 1
line_number is 1 initially, so it hits the first if statement and continues without incrementing line_number, and thus does the same thing for every line, and doesn't process any of them.
You could correct for this by replacing the top of your loop like so:
for line in files:
if line_number>=3:
sum = 0
...
with open("exam-grades.txt", "r") as f:
for i, line in enumerate(f, start=1):
if i >= 3:
l = line.split()
print l[0], l[1], sum((int(i) for i in l[2:] if i.isdigit()))
Logic here is pretty simple, you're enumerating lines in a files starting from 1.
Checking if a line number is 3 or greater, splitting a line into a list and verifying if a list element might be converted to int, if it might we summing it up and print out.
I would try to make it bit more flexible - i.e. we won't sum up numbers for lines where at least one value can't be converted to float/number:
def to_number(s):
try:
return float(s)
except ValueError:
return None
def sum_in_line(s):
sm = 0
cols = s.split()
try:
return sum(to_number(x) for x in cols)
except TypeError:
return ''
with open("exam-grades.txt", "r") as f:
i = 1
for line in f:
print('%4d:\t%s' % (i, sum_in_line(line)))
i += 1
Input data:
a.1 2.g 2.2 b.3
--------------------
1 2 4 1
3 3 1 1
2 1 5 8
1 a Z 12
Output:
1:
2:
3: 8.0
4: 8.0
5: 16.0
6:
1) Use pass instead of continue, if you use continue line_number will not be updated
2) to get numbers use first_part, numbers = line.split(':');numbers = number.split().
because if you try to get numbers like you did it will fail:
>>> sum = 0
>>> line="Line 3: 1 2 4 1"
>>> numbers = line.split("\t")
>>> numbers
['Line 3:', '1', '2', '4', '1']
>>> for n in numbers:
... sum = sum + float(n)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: could not convert string to float: Line 3:
You shoud skip the first element 'Line 3:'.
Try this:
files = open("exam-grades.txt", "r")
line_number = 1
for line in files:
if (line_number == 1 or line_number == 2 or line_number == 3):
print "Line %d:" % line_number
pass
else:
sum = 0
first_part, numbers = line.split(":")
for n in numbers.split():
sum = sum + float(n)
print "Line %d: Sum is %d" % (line_number,sum)
line_number = line_number + 1
Seems a bit like a natural for regular expressions to me, depending on the complexity of your numbers... in this example, I have "data" as a generic iterator for your input data:
import re
valid = re.compile(r'^(?:\s*(\d+\s*))+\s*$')
numbers = re.compile(r'(\d+)')
for line,row in enumerate(data):
if valid.match(row):
print "Line #: %d / Sum is %d ."%(line+1,sum(map(int, numbers.findall(row))))

get nth line of string in python

How can you get the nth line of a string in Python 3?
For example
getline("line1\nline2\nline3",3)
Is there any way to do this using stdlib/builtin functions?
I prefer a solution in Python 3, but Python 2 is also fine.
Try the following:
s = "line1\nline2\nline3"
print s.splitlines()[2]
a functional approach
>>> import StringIO
>>> from itertools import islice
>>> s = "line1\nline2\nline3"
>>> gen = StringIO.StringIO(s)
>>> print next(islice(gen, 2, 3))
line3
`my_string.strip().split("\n")[-1]`
Use a string buffer:
import io
def getLine(data, line_no):
buffer = io.StringIO(data)
for i in range(line_no - 1):
try:
next(buffer)
except StopIteration:
return '' #Reached EOF
try:
return next(buffer)
except StopIteration:
return '' #Reached EOF
A more efficient solution than splitting the string would be to iterate over its characters, finding the positions of the Nth and the (N - 1)th occurence of '\n' (taking into account the edge case at the start of the string). The Nth line is the substring between those positions.
Here's a messy piece of code to demonstrate it (line number is 1 indexed):
def getLine(data, line_no):
n = 0
lastPos = -1
for i in range(0, len(data) - 1):
if data[i] == "\n":
n = n + 1
if n == line_no:
return data[lastPos + 1:i]
else:
lastPos = i;
if(n == line_no - 1):
return data[lastPos + 1:]
return "" # end of string
This is also more efficient than the solution which builds up the string one character at a time.
From the comments it seems as if this string is very large.
If there is too much data to comfortably fit into memory one approach is to process the data from the file line-by-line with this:
N = ...
with open('data.txt') as inf:
for count, line in enumerate(inf, 1):
if count == N: #search for the N'th line
print line
Using enumerate() gives you the index and the value of object you are iterating over and you can specify a starting value, so I used 1 (instead of the default value of 0)
The advantage of using with is that it automatically closes the file for you when you are done or if you encounter an exception.
Since you brought up the point of memory efficiency, is this any better:
s = "line1\nline2\nline3"
# number of the line you want
line_number = 2
i = 0
line = ''
for c in s:
if i > line_number:
break
else:
if i == line_number-1 and c != '\n':
line += c
elif c == '\n':
i += 1
Wrote into two functions for readability
string = "foo\nbar\nbaz\nfubar\nsnafu\n"
def iterlines(string):
word = ""
for letter in string:
if letter == '\n':
yield word
word = ""
continue
word += letter
def getline(string, line_number):
for index, word in enumerate(iterlines(string),1):
if index == line_number:
#print(word)
return word
print(getline(string, 4))
My solution (effecient and compact):
def getLine(data, line_no):
index = -1
for _ in range(line_no):index = data.index('\n',index+1)
return data[index+1:data.index('\n',index+1)]

Categories