get nth line of string in python

get nth line of string in python - python

How can you get the nth line of a string in Python 3?
For example
getline("line1\nline2\nline3",3)
Is there any way to do this using stdlib/builtin functions?
I prefer a solution in Python 3, but Python 2 is also fine.

Try the following:
s = "line1\nline2\nline3"
print s.splitlines()[2]

a functional approach
>>> import StringIO
>>> from itertools import islice
>>> s = "line1\nline2\nline3"
>>> gen = StringIO.StringIO(s)
>>> print next(islice(gen, 2, 3))
line3

`my_string.strip().split("\n")[-1]`

Use a string buffer:
import io
def getLine(data, line_no):
buffer = io.StringIO(data)
for i in range(line_no - 1):
try:
next(buffer)
except StopIteration:
return '' #Reached EOF
try:
return next(buffer)
except StopIteration:
return '' #Reached EOF

A more efficient solution than splitting the string would be to iterate over its characters, finding the positions of the Nth and the (N - 1)th occurence of '\n' (taking into account the edge case at the start of the string). The Nth line is the substring between those positions.
Here's a messy piece of code to demonstrate it (line number is 1 indexed):
def getLine(data, line_no):
n = 0
lastPos = -1
for i in range(0, len(data) - 1):
if data[i] == "\n":
n = n + 1
if n == line_no:
return data[lastPos + 1:i]
else:
lastPos = i;
if(n == line_no - 1):
return data[lastPos + 1:]
return "" # end of string
This is also more efficient than the solution which builds up the string one character at a time.

From the comments it seems as if this string is very large.
If there is too much data to comfortably fit into memory one approach is to process the data from the file line-by-line with this:
N = ...
with open('data.txt') as inf:
for count, line in enumerate(inf, 1):
if count == N: #search for the N'th line
print line
Using enumerate() gives you the index and the value of object you are iterating over and you can specify a starting value, so I used 1 (instead of the default value of 0)
The advantage of using with is that it automatically closes the file for you when you are done or if you encounter an exception.

Since you brought up the point of memory efficiency, is this any better:
s = "line1\nline2\nline3"
# number of the line you want
line_number = 2
i = 0
line = ''
for c in s:
if i > line_number:
break
else:
if i == line_number-1 and c != '\n':
line += c
elif c == '\n':
i += 1

Wrote into two functions for readability
string = "foo\nbar\nbaz\nfubar\nsnafu\n"
def iterlines(string):
word = ""
for letter in string:
if letter == '\n':
yield word
word = ""
continue
word += letter
def getline(string, line_number):
for index, word in enumerate(iterlines(string),1):
if index == line_number:
#print(word)
return word
print(getline(string, 4))

My solution (effecient and compact):
def getLine(data, line_no):
index = -1
for _ in range(line_no):index = data.index('\n',index+1)
return data[index+1:data.index('\n',index+1)]

Related

How can I change this function so that it returns a list of the number of even digits in a file?

def evens(number_file: TextIO) -> List[int]:
lst = []
line = number_file.readline().strip()
while line != '':
evens = 0
line = number_file.readline().strip()
while line.isdigit():
evens = evens + int(line)
line = number_file.readline().strip()
lst.append(evens)
return last
in this example the file 'numbers.txt' looks like this:
START
1
2
END
START
3
END
START
4
5
6
END
Each line is either an int or 'START' or 'END'
I want to make a function that returns the number of evens in each section so when the code tuns on this file, it should return the list [1, 0, 2]. Could someone please help me?

import numpy as np
def main():
line = [1,4,5,6,8,8,9,10]
evens = np.array()
for l in line:
if (l % 2) == 0:
np.append(evens, l)
return evens
if __name__ == '__main__':
main()
Without your txt file and the code to read it in, this is the best I can do.

When writing new code, it's a good idea to start small and then add capabilities bit by bit. e.g., write code that works without worrying about sections first, then update to work with sections.
But anyway, here's some revised code that should work:
def evens(number_file: TextIO) -> List[int]:
lst = []
line = number_file.readline().strip()
while line != '':
evens = 0
# ignore all text lines
while !line.isdigit():
line = number_file.readline().strip()
# process number lines
while line.isdigit():
if int(line) % 2 == 0:
evens = evens + 1
line = number_file.readline().strip()
lst.append(evens)
return lst
And here's a version that may be simpler (for loops in Python can often make your life easier, e.g. when processing every row of a file):
def evens(number_file: TextIO) -> List[int]:
lst = []
for row in number_file:
line = row.strip()
if line == 'START':
# new section, start new count
lst.append(0)
elif line.isdigit() and int(line) % 2 == 0:
# update current count (last item in lst)
lst[-1] += 1
return lst

Recursive Decompression of Strings

I'm trying to decompress strings using recursion. For example, the input:
3[b3[a]]
should output:
baaabaaabaaa
but I get:
baaaabaaaabaaaabbaaaabaaaabaaaaa
I have the following code but it is clearly off. The first find_end function works as intended. I am absolutely new to using recursion and any help understanding / tracking where the extra letters come from or any general tips to help me understand this really cool methodology would be greatly appreciated.
def find_end(original, start, level):
if original[start] != "[":
message = "ERROR in find_error, must start with [:", original[start:]
raise ValueError(message)
indent = level * " "
index = start + 1
count = 1
while count != 0 and index < len(original):
if original[index] == "[":
count += 1
elif original[index] == "]":
count -= 1
index += 1
if count != 0:
message = "ERROR in find_error, mismatched brackets:", original[start:]
raise ValueError(message)
return index - 1
def decompress(original, level):
# set the result to an empty string
result = ""
# for any character in the string we have not looked at yet
for i in range(len(original)):
# if the character at the current index is a digit
if original[i].isnumeric():
# the character of the current index is the number of repetitions needed
repititions = int(original[i])
# start = the next index containing the '[' character
x = 0
while x < (len(original)):
if original[x].isnumeric():
start = x + 1
x = len(original)
else:
x += 1
# last = the index of the matching ']'
last = find_end(original, start, level)
# calculate a substring using `original[start + 1:last]
sub_original = original[start + 1 : last]
# RECURSIVELY call decompress with the substring
# sub = decompress(original, level + 1)
# concatenate the result of the recursive call times the number of repetitions needed to the result
result += decompress(sub_original, level + 1) * repititions
# set the current index to the index of the matching ']'
i = last
# else
else:
# concatenate the letter at the current index to the result
if original[i] != "[" and original[i] != "]":
result += original[i]
# return the result
return result
def main():
passed = True
ORIGINAL = 0
EXPECTED = 1
# The test cases
provided = [
("3[b]", "bbb"),
("3[b3[a]]", "baaabaaabaaa"),
("3[b2[ca]]", "bcacabcacabcaca"),
("5[a3[b]1[ab]]", "abbbababbbababbbababbbababbbab"),
]
# Run the provided tests cases
for t in provided:
actual = decompress(t[ORIGINAL], 0)
if actual != t[EXPECTED]:
print("Error decompressing:", t[ORIGINAL])
print(" Expected:", t[EXPECTED])
print(" Actual: ", actual)
print()
passed = False
# print that all the tests passed
if passed:
print("All tests passed")
if __name__ == '__main__':
main()

From what I gathered from your code, it probably gives the wrong result because of the approach you've taken to find the last matching closing brace at a given level (I'm not 100% sure, the code was a lot). However, I can suggest a cleaner approach using stacks (almost similar to DFS, without the complications):
def decomp(s):
stack = []
for i in s:
if i.isalnum():
stack.append(i)
elif i == "]":
temp = stack.pop()
count = stack.pop()
if count.isnumeric():
stack.append(int(count)*temp)
else:
stack.append(count+temp)
for i in range(len(stack)-2, -1, -1):
if stack[i].isnumeric():
stack[i] = int(stack[i])*stack[i+1]
else:
stack[i] += stack[i+1]
return stack[0]
print(decomp("3[b]")) # bbb
print(decomp("3[b3[a]]")) # baaabaaabaaa
print(decomp("3[b2[ca]]")) # bcacabcacabcaca
print(decomp("5[a3[b]1[ab]]")) # abbbababbbababbbababbbababbbab
This works on a simple observation: rather tha evaluating a substring after on reading a [, evaluate the substring after encountering a ]. That would allow you to build the result AFTER the pieces have been evaluated individually as well. (This is similar to the prefix/postfix evaluation using programming).
(You can add error checking to this as well, if you wish. It would be easier to check if the string is semantically correct in one pass and evaluate it in another pass, rather than doing both in one go)

Here is the solution with the similar idea from above:
we go through string putting everything on stack until we find ']', then we go back until '[' taking everything off, find the number, multiply and put it back on stack
It's much less consuming as we don't add strings, but work with lists
Note: multiply number can't be more than 9 as we parse it as one element string
def decompress(string):
stack = []
letters = []
for i in string:
if i != ']':
stack.append(i)
elif i == ']':
letter = stack.pop()
while letter != '[':
letters.append(letter)
letter = stack.pop()
word = ''.join(letters[::-1])
letters = []
stack.append(''.join([word for j in range(int(stack.pop()))]))
return ''.join(stack)

How to rearrange a string's characters such that none of it's adjacent characters are the same, using Python

In my attempt to solve the above question, I've written the following code:
Logic: Create a frequency dict for each character in the string (key= character, value= frequency of the character). If any character's frequency is greater than ceil(n/2), there is no solution. Else, print the most frequent character followed by reducing its frequency in the dict/
import math, operator
def rearrangeString(s):
# Fill this in.
n = len(s)
freqDict = {}
for i in s:
if i not in freqDict.keys():
freqDict[i] = 1
else:
freqDict[i] += 1
for j in list(freqDict.values()):
if j > math.ceil(n / 2):
return None
return maxArrange(freqDict)[:-4]
temp = ""
def maxArrange(inp):
global temp
n = len(inp)
if list(inp.values()) != [0] * n:
resCh = max(inp.items(), key=operator.itemgetter(1))[0]
if resCh is not None and resCh != temp:
inp[resCh] -= 1
# Terminates with None
temp = resCh
return resCh + str(maxArrange(inp))
# Driver
print(rearrangeString("abbccc"))
# cbcabc
print(rearrangeString("abbcccc"))
In the first try, with input abbccc, it gives the right answer, i.e. cbcabc, but fails for the input abbcccc, returning ccbcabc, without handling it using the temp variable, else returning cbcabc and skipping c altogether when handled using temp
How should I modify the logic, or is there a better approach?

Apply Regular expression on output file

I have written a python script that dumps all versions in a text file. All versions are separated by '|' symbol.
I need to replace all versions starting with 3 with follwing condition
e.g 1) 3.7.0E should be replaced as 03.07.00E
2) 3.17.1E should be replaced as 03.17.01E
All single digit numbers should be replaced with 0
My output file looks like
3.7.0E|3.7.1E|3.7.2E|3.7.3E|3.7.4E|3.7.5E|16.2.1|16.2.2|3.8.0E|16.3.1|16.3.2|16.3.3|16.3.1a|16.4.1|16.4.2|3.17.1E|3.7.11E

This isn't pretty, but it will do what you want:
import re
s = '3.7.0E|3.7.1E|3.7.2E|3.7.3E|3.7.4E|3.7.5E|16.2.1|16.2.2|3.8.0E|16.3.1|16.3.2|16.3.3|16.3.1a|16.4.1|16.4.2|3.17.1E|3.7.11E'
l = []
# split up based on pipe
for chunk in s.split('|'):
if chunk.startswith('3'):
new_chunk = ''
# split up based on period
for piece in chunk.split('.'):
try:
# if there's a letter, exception will be thrown
x = int(piece)
new_chunk += '0{}.'.format(x) if x < 10 else '{}.'.format(x)
except:
n = int(re.search('\d+', piece).group(0))
letter = re.search('\w', piece).group(0)
new_chunk += '0{}{}'.format(n, letter) if n < 10 else piece
l.append(''.join(new_chunk))
else:
l.append(chunk)
new_s = '|'.join([p for p in l])
print(new_s)
The value of new_s will be: '03.07.00E|03.07.01E|03.07.02E|03.07.03E|03.07.04E|03.07.05E|16.2.1|16.2.2|03.08.00E|16.3.1|16.3.2|16.3.3|16.3.1a|16.4.1|16.4.2|03.17.01E|03.07.11E'.

How to process character by character in a line

I have a file that has sequence on line 2 and variable called tokenizer, which give me an old position value. I am trying to find the new position.. For example tokenizer for this line give me position 12, which is E by counting letters only until 12. So i need to figure out the new position by counting dashes...
---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------
This is what i have so far it still doesn't work.
with open(filename) as f:
countletter = 0
countdash = 0
for line, line2 in itertools.izip_longest(f, f, fillvalue=''):
tokenizer=line.split()[4]
print tokenizer
for i,character in enumerate(line2):
for countletter <= tokenizer:
if character != '-':
countletter += 1
if character == '-':
countdash +=1
my new position should be 32 for this example

First answer, edited by Chad D to make it 1-indexed (but incorrect):
def get_new_index(string, char_index):
chars = 0
for i, char in enumerate(string):
if char != '-':
chars += 1
if char_index == chars:
return i+1
Rewritten version:
import re
def get(st, char_index):
chars = -1
for i, char in enumerate(st):
if char != '-':
chars += 1
if char_index == chars:
return i
def test():
st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------'
initial = re.sub('-', '', st)
for i, char in enumerate(initial):
print i, char, st[get_1_indexed(st, i)]
def get_1_indexed(st, char_index):
return 1 + get(st, char_index - 1)
def test_1_indexed():
st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------'
initial = re.sub('-', '', st)
for i, char in enumerate(initial):
print i+1, char, st[get_1_indexed(st, i + 1) - 1]

my original text looks like this and the position i was interested in was 12 which is 'E'
Actually, it's K, assuming you're using zero indexed strings. Python uses zero indexing so unless you're jumping through hoops to 1-index things (and you're not) it will give you K. If you were running into issues, try addressing this.
Here's some code for you that does what you need it to (albeit with 0-indexing, not 1-indexing). This can be found online here:
def get_new_index(oldindex, str):
newindex = 0
for c in str:
if c != '-':
if oldindex == 0:
return newindex
oldindex -= 1
newindex += 1
return 1 / 0 # throw a shitfit if we don't find the index

This is a silly way to get the second line, it would be clearer to use an islice, or next(f)
for line, line2 in itertools.izip_longest(f, f, fillvalue=''):
Here count_letter seems to be an int while tokenizer is a str. Probably not what you expect.
for countletter <= tokenizer:
It's also a syntax error, so I think this isn't the code you are running
Perhaps you should have
tokenizer = int(line.split()[4])
to make tokenizer into an int
print tokenizer can be misleading because int and str look identical, so you see what you expect to see. Try print repr(tokenizer) instead when you are debugging.
once you make sure tokenizer is an int, you can change this line
for i,character in enumerate(line2[:tokenizer]):

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

get nth line of string in python - python

How can you get the nth line of a string in Python 3? For example getline("line1\nline2\nline3",3) Is there any way to do this using stdlib/builtin functions? I prefer a solution in Python 3, but Python 2 is also fine.

Try the following: s = "line1\nline2\nline3" print s.splitlines()[2]

a functional approach >>> import StringIO >>> from itertools import islice >>> s = "line1\nline2\nline3" >>> gen = StringIO.StringIO(s) >>> print next(islice(gen, 2, 3)) line3

`my_string.strip().split("\n")[-1]`

Use a string buffer: import io def getLine(data, line_no): buffer = io.StringIO(data) for i in range(line_no - 1): try: next(buffer) except StopIteration: return '' #Reached EOF try: return next(buffer) except StopIteration: return '' #Reached EOF

Since you brought up the point of memory efficiency, is this any better: s = "line1\nline2\nline3" # number of the line you want line_number = 2 i = 0 line = '' for c in s: if i > line_number: break else: if i == line_number-1 and c != '\n': line += c elif c == '\n': i += 1

My solution (effecient and compact): def getLine(data, line_no): index = -1 for _ in range(line_no):index = data.index('\n',index+1) return data[index+1:data.index('\n',index+1)]

Related

How can I change this function so that it returns a list of the number of even digits in a file?

Recursive Decompression of Strings

How to rearrange a string's characters such that none of it's adjacent characters are the same, using Python

Apply Regular expression on output file

How to process character by character in a line

Categories

Resources