Getting length of each line in a list - python

I have a block of text and I'd like to add a new line character at the end of any line that is fewer than 50 characters.
This is where I'm at
text = open('file.txt','r+')
line_length = []
lines = list(enumerate(text))
for i in lines:
line_length.append(len(i))
print lines
print line_length
I just end up with a large list of the value 2 over and over. I know that the length of each line is not 2.
Edit: Here's the solution I went with
text = open('text.txt','r+')
new = open('new.txt','r+')
new.truncate(0)
l=[]
for i in text.readlines():
if len(i) < 50:
l.append(i+'\n')
else:
l.append(i)
new.write(' '.join(l))
text.close()
new.close()

Well like:
text = open('file.txt','r+')
l=[]
for i in text.readlines():
if len(i)<50:
l.append(i)
else:
l.append(i.rstrip())
No need for enumerate.
Or one-liner ( i recommend this ):
l=[i if len(i)<50 else i.rstrip() for i in text.readlines()]
So your code doesn't work because really of enumerate.
Both cases:
print(l)
Is desired output.

lines is a list of pairs (each with a length of two). You need to check the length of the sublist, not the pair that it's in:
for i, seq in lines:
line_length.append(len(seq))
Although, as you can see, you don't use i, so there's no point in using enumerate.

Assuming you are trying to write to a new file, you will want something like this:
with open("file.txt", "r+") as input_file, open("output.txt", "w") as output_file:
for line in input_file:
if len(line) < 50:
line += '\n'
output_file.write(line)
The lines in your existing file will often have a newline character at the end of them already, so the result will be two newline characters for lines of length under 50. Use rstrip if you need to avoid this.

Related

Count number of lines that are not comment or are not blank in a file

I am trying to count the number of SLOC using the following code but it is not working it is just printing 0 can someone please help
f = open("/content/drive/MyDrive/Rental.java", "r")
#print(f.read())
for l in f:
count=0
if (l.strip() and l.startswith('/')):
count += 1
print(count)
You reset count in every iteration, so you'll only ever get an answer of 0 or 1. Instead, set it before the loop.
l.strip() doesn't modify l. Instead, it returns a new string! You should assign that to l.
Additionally, you want to count how many lines aren't a comment, so you need to check not l.startswith('/'). It might even make more sense to check .startswith('//') because a single forward-slash doesn't make a comment in java. In fact, something like this would be wrongly identified as having a commented line if you just do .startswith('/'):
double a = 1.0
/ 5.0;
Here's your fixed code:
count = 0
f = open("/content/drive/MyDrive/Rental.java", "r")
for l in f:
l = l.strip()
if l and not l.startswith('//'):
count += 1
print(count)
I am ignoring the case of multiline comments using /* ... */ since you haven't addressed it in your original question.
You could use the sum() function to get the count directly:
with open("/content/drive/MyDrive/Rental.java", "r") as f:
SLOC = sum(not line.startswith('//') for line in map(str.strip,f) if line)
print(SLOC)
Using map(str.strip,... on the line iterator allows you to easily exclude blank lines and detect indented comments.
Try to define count before the loop.
f = open("/content/drive/MyDrive/Rental.java", "r")
count = 0
#print(f.read())
for l in f:
if (l.strip() and l.startswith('/')):
count += 1
print(count)

Anagram from large file

I have a file having 10,000 word on it. I wrote a program to find anagram word from that file but its taking too much time to get to output. For small file program works well. Try to optimize the code.
count=0
i=0
j=0
with open('file.txt') as file:
lines = [i.strip() for i in file]
for i in range(len(lines)):
for j in range(i):
if sorted(lines[i]) == sorted(lines[j]):
#print(lines[i])
count=count+1
j=j+1
i=i+1
print('There are ',count,'anagram words')
I don't fully understand your code (for example, why do you increment i and j inside the loop?). But the main problem is that you have a nested loop, which makes the runtime of the algorithm O(n^2), i.e. if the file becomes 10 times as large, the execution time will become (approximately) 100 times as long.
So you need a way to avoid that. One possible way is to store the lines in a smarter way, so that you don't have to walk through all lines every time. Then the runtime becomes O(n). In this case you can use the fact that anagrams consist of the same characters (only in a different order). So you can use the "sorted" variant as a key in a dictionary to store all lines that can be made from the same letters in a list under the same dictionary key. There are other possibilities of course, but in this case I think it works out quite nice :-)
So, fully working example code:
#!/usr/bin/env python3
from collections import defaultdict
d = defaultdict(list)
with open('file.txt') as file:
lines = [line.strip() for line in file]
for line in lines:
sorted_line = ''.join(sorted(line))
d[sorted_line].append(line)
anagrams = [d[k] for k in d if len(d[k]) > 1]
# anagrams is a list of lists of lines that are anagrams
# I would say the number of anagrams is:
count = sum(map(len, anagrams))
# ... but in your example your not counting the first words, only the "duplicates", so:
count -= len(anagrams)
print('There are', count, 'anagram words')
UPDATE
Without duplicates, and without using collections (although I strongly recommend to use it):
#!/usr/bin/env python3
d = {}
with open('file.txt') as file:
lines = [line.strip() for line in file]
lines = set(lines) # remove duplicates
for line in lines:
sorted_line = ''.join(sorted(line))
if sorted_line in d:
d[sorted_line].append(line)
else:
d[sorted_line] = [line]
anagrams = [d[k] for k in d if len(d[k]) > 1]
# anagrams is a list of lists of lines that are anagrams
# I would say the number of anagrams is:
count = sum(map(len, anagrams))
# ... but in your example your not counting the first words, only the "duplicates", so:
count -= len(anagrams)
print('There are', count, 'anagram words')
Well it is unclear whether you account for duplicates or not, however if you don't you can remove duplicates from your list of words and that will spare you a huge amount of runtime in my opinion. You can check for anagrams and then use sum() to get the their total number. This should do it:
def get_unique_words(lines):
unique = []
for word in " ".join(lines).split(" "):
if word not in unique:
unique.append(word)
return unique
def check_for_anagrams(test_word, words):
return sum([1 for word in words if (sorted(test_word) == sorted(word) and word != test_word)])
with open('file.txt') as file:
lines = [line.strip() for line in file]
unique = get_unique_words(lines)
count = sum([check_for_anagrams(word, unique) for word in unique])
print('There are ', count,'unique anagram words aka', int(count/2), 'unique anagram couples')

list out of range: when a line is appended to a list by searching a word in a file

file1 = open('manu.txt', 'r')
charlist = []
lines=file1.readlines()
for i in range(0,len(str(lines))-1):
prevline=lines[i]
nextline=lines[i+1]
if 'a' in nextline:
charlist.append(nextline)
print charlist
I am trying to find a word and trying to keep that in a list by reading each line a file. But it is giving list out range error.
I'd guess your mistake is here:
for i in range(0,len(str(lines))-1)
Variable i iterates over length of str(lines) (which is string representation of the list), not lines itself. Try:
for i in range(0, len(lines) - 1)
instead?

sorting lines of file python

I want to Bubblesort a file by numbers and I have propably 2 mistakes in my code.
The lines of the file contain: string-space-number
The response is a wrong sorting or sometimes I got also an IndexError because x.append(row[l]) is out of range
Hope someone can help me
Code:
#!/usr/bin/python
filename = "Numberfile.txt"
fo = open(filename, "r")
x, y, z, b = [], [], [], []
for line in fo: # read
row = line.split(" ") # split items by space
x.append(row[1]) # number
liste = fo.readlines()
lines = len(liste)
fo.close()
for passesLeft in range(lines-1, 0, -1):
for i in range(passesLeft):
if x[i] > x[i+1]:
temp = liste[i]
liste[i] = liste[i+1]
liste[i+1] = temp
fo = open(filename, "w")
for i in liste:
fo.writelines("%s" % i)
fo.close()
Seems that you have empty lines in the file.
Change:
for line in fo: # read
row = line.split(" ") # split items by space
x.append(row[1]) # number
with:
for line in fo: # read
if line.strip():
row = line.split(" ") # split items by space
x.append(row[1]) # number
By the way, you're better off using re.split with the regex \s+:
re.split(r'\s+', line)
which will make your code more resilient - it will be able to handle multiple spaces as well.
For the second issue Anand proceeded me: you're comparing strings, if you want to compare numbers you'll have to wrap it with a call to int()
First issue, if you are sorting based on the numbers and the numbers can be multiple digits, then your logic would not work because x is a list of strings , not integers, and when comparing strings, it compares lexicographically, that is '12' is less than 2 , etc. You should convert the number to int before appending to x list.
Also if you are getting ListIndex error, you may have empty lines or lines without 2 elements, you should correctly check you input, also you can add a condition to ignore the empty lines.
Code -
for line in fo:
if line.strip():
row = line.split(" ")
x.append(int(row[1]))

Replacing enumerate to output a string

I have written a code that reads a text file containing several paragraphs.I have used enumerate but want to replace enumerate() with a simple loop
file=open("file1.txt","r")
text="target"
for i, line in enumerate(file, 1):
if text in line:
print (i, line)
No idea why you would want to do this, however this is an equivalent:
file=open("file1.txt","r")
text="target"
count=0
for line in file:
count += 1
if text in line:
print (count, line)
enumerate can can be replaced easily with a simple generator function:
def enumerate(iterable, start=0):
for item in iterable:
yield start, item
start += 1

Categories