sorting lines of file python - python

I want to Bubblesort a file by numbers and I have propably 2 mistakes in my code.
The lines of the file contain: string-space-number
The response is a wrong sorting or sometimes I got also an IndexError because x.append(row[l]) is out of range
Hope someone can help me
Code:
#!/usr/bin/python
filename = "Numberfile.txt"
fo = open(filename, "r")
x, y, z, b = [], [], [], []
for line in fo: # read
row = line.split(" ") # split items by space
x.append(row[1]) # number
liste = fo.readlines()
lines = len(liste)
fo.close()
for passesLeft in range(lines-1, 0, -1):
for i in range(passesLeft):
if x[i] > x[i+1]:
temp = liste[i]
liste[i] = liste[i+1]
liste[i+1] = temp
fo = open(filename, "w")
for i in liste:
fo.writelines("%s" % i)
fo.close()

Seems that you have empty lines in the file.
Change:
for line in fo: # read
row = line.split(" ") # split items by space
x.append(row[1]) # number
with:
for line in fo: # read
if line.strip():
row = line.split(" ") # split items by space
x.append(row[1]) # number
By the way, you're better off using re.split with the regex \s+:
re.split(r'\s+', line)
which will make your code more resilient - it will be able to handle multiple spaces as well.
For the second issue Anand proceeded me: you're comparing strings, if you want to compare numbers you'll have to wrap it with a call to int()

First issue, if you are sorting based on the numbers and the numbers can be multiple digits, then your logic would not work because x is a list of strings , not integers, and when comparing strings, it compares lexicographically, that is '12' is less than 2 , etc. You should convert the number to int before appending to x list.
Also if you are getting ListIndex error, you may have empty lines or lines without 2 elements, you should correctly check you input, also you can add a condition to ignore the empty lines.
Code -
for line in fo:
if line.strip():
row = line.split(" ")
x.append(int(row[1]))

Related

How to read and create a new list without duplicate words in Python?

I am new in Python and I have the following problem to solve:
"Open the file sample.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order."
I have done the following code, with some good result, but I can't understand the reason my result appears to multiple list. I just need to have the words in one list.
thanks in advance!
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
lst=fh.read().split()
final_list=list()
for line in lst:
if line in lst not in final_list:
final_list.append(line)
final_list.sort()
print(final_list)
Your code is largely correct; the major problem is the conditional on your if statement:
if line in lst not in final_list:
The expression line in lst produces a boolean result, so this will end up looking something like:
if false not in final_list:
That will always evaluate to false (because you're adding strings to your list, not boolean values). What you want is simply:
if line not in final_list:
Right now, you're sorting and printing your list inside the loop, but it would be better to do that once at the end, making your code look like this:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
lst=fh.read().split()
final_list=list()
for line in lst:
if line not in final_list:
final_list.append(line)
final_list.sort()
print(final_list)
I have a few additional comments on your code:
You don't need to explicitly initialize a variable (as in lst = list())) if you're going to immediately assign something to it. You can just write:
fh = open(fname)
lst=fh.read().split()
On the other hand, you do need to initialize final_list because
you're going to try to call the .append method on it, although it
would be more common to write:
final_list = []
In practice, it would be more common to use a set to
collect the words, since a set will de-duplicate things
automatically:
final_list = set()
for line in lst:
final_list.add(line)
print(sorted(final_list))
Lastly, if I were to write this code, it might look like this:
fname = input("Enter file name: ")
with open(fname) as fh:
lst = fh.read().split()
final_list = set(word.lower() for word in lst)
print(sorted(final_list))
Your code has following problems as is:
if line in lst not in final_list - Not sure what you are trying to do here. I think you expect this to go over all words in the line and check in the final_list
Your code also have some indentation issues
Missing the call to close() method
You need to read all the lines to a list and iterate over the list of lines and perform the splitting and adding elements to the list as:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
lst = fh.read().split()
final_list=list()
for word in lst:
if word not in final_list:
final_list.append(word)
final_list.sort()
print(final_list)
fh.close()

Getting length of each line in a list

I have a block of text and I'd like to add a new line character at the end of any line that is fewer than 50 characters.
This is where I'm at
text = open('file.txt','r+')
line_length = []
lines = list(enumerate(text))
for i in lines:
line_length.append(len(i))
print lines
print line_length
I just end up with a large list of the value 2 over and over. I know that the length of each line is not 2.
Edit: Here's the solution I went with
text = open('text.txt','r+')
new = open('new.txt','r+')
new.truncate(0)
l=[]
for i in text.readlines():
if len(i) < 50:
l.append(i+'\n')
else:
l.append(i)
new.write(' '.join(l))
text.close()
new.close()
Well like:
text = open('file.txt','r+')
l=[]
for i in text.readlines():
if len(i)<50:
l.append(i)
else:
l.append(i.rstrip())
No need for enumerate.
Or one-liner ( i recommend this ):
l=[i if len(i)<50 else i.rstrip() for i in text.readlines()]
So your code doesn't work because really of enumerate.
Both cases:
print(l)
Is desired output.
lines is a list of pairs (each with a length of two). You need to check the length of the sublist, not the pair that it's in:
for i, seq in lines:
line_length.append(len(seq))
Although, as you can see, you don't use i, so there's no point in using enumerate.
Assuming you are trying to write to a new file, you will want something like this:
with open("file.txt", "r+") as input_file, open("output.txt", "w") as output_file:
for line in input_file:
if len(line) < 50:
line += '\n'
output_file.write(line)
The lines in your existing file will often have a newline character at the end of them already, so the result will be two newline characters for lines of length under 50. Use rstrip if you need to avoid this.

list out of range: when a line is appended to a list by searching a word in a file

file1 = open('manu.txt', 'r')
charlist = []
lines=file1.readlines()
for i in range(0,len(str(lines))-1):
prevline=lines[i]
nextline=lines[i+1]
if 'a' in nextline:
charlist.append(nextline)
print charlist
I am trying to find a word and trying to keep that in a list by reading each line a file. But it is giving list out range error.
I'd guess your mistake is here:
for i in range(0,len(str(lines))-1)
Variable i iterates over length of str(lines) (which is string representation of the list), not lines itself. Try:
for i in range(0, len(lines) - 1)
instead?

Print random line from txt file?

I'm using random.randint to generate a random number, and then assigning that number to a variable. Then I want to print the line with the number I assigned to the variable, but I keep getting the error:
list index out of range
Here's what I tried:
f = open(filename. txt)
lines = f.readlines()
rand_line = random. randint(1,10)
print lines[rand_line]
You want to use random.choice
import random
with open(filename) as f:
lines = f.readlines()
print(random.choice(lines))
To get a random line without loading the whole file in memory you can use Reservoir sampling (with sample size of 1):
from random import randrange
def get_random_line(afile, default=None):
"""Return a random line from the file (or default)."""
line = default
for i, aline in enumerate(afile, start=1):
if randrange(i) == 0: # random int [0..i)
line = aline
return line
with open('filename.txt') as f:
print(get_random_line(f))
This algorithm runs in O(n) time using O(1) additional space.
This code is correct, assuming that you meant to pass a string to open function, and that you have no space after the dot...
However, be careful to the indexing in Python, namely it starts at 0 and not 1, and then ends at len(your_list)-1.
Using random.choice is better, but if you want to follow your idea it would rather be:
import random
with open('name.txt') as f:
lines = f.readlines()
random_int = random.randint(0,len(lines)-1)
print lines[random_int]
Since randint includes both boundary, you must look until len(lines)-1.
f = open(filename. txt)
lines = f.readlines()
rand_line = random.randint(0, (len(lines) - 1)) # https://docs.python.org/2/library/random.html#random.randint
print lines[rand_line]
You can edit your code to achieve this without an error.
f = open(filename. txt)
lines = f.readlines()
rand_line = random. randint(0,len(lines)-1) # this should make it work
print lines[rand_line]
This way the index is not out of range.

reading second float in a line in python [duplicate]

This question already has answers here:
split string by arbitrary number of white spaces
(6 answers)
Closed 6 years ago.
I wrote a simple program to read some floats from a file:
line2 = f1.readline()
if "Totals" in line2:
cline = line2.strip()
csline= cline.split(" ")
zforcet = float(csline[7])
torquet = float(csline[8])
line2 in question is :
Totals 7.911647E+03 -1.191758E+03 7.532665E+03 4.137034E+00
My code works, but my question is this there a more obvious way to code this ?
I mean it is not at all obvious to me that the second real number on the line is csline[7] and i could do this only after trail and error and printing out the contents of csline. I am looking for a way to "read the second float" on a line directly.
Just use split() It will split on every whitespace and you ll get a list like this:
["Totals", "7.911647E+03", "-1.191758E+03", "7.532665E+03", "4.137034E+00"]
So the first element of the list will be "7.911647E+03"
Also note, that it will be a string by default, you ll have to make it a float, using the float function. (eg float("7.911647E+03"))
EDIT: As highlighted in the comment if you are really looking for a way to "read the second float" on a line directly Than i would iterate over the splitted line, and check the types of the elements, and grab the second float type.
splitted_line = ["Totals", "7.911647E+03", "-1.191758E+03", "7.532665E+03", "4.137034E+00"]
counter = 1
for i in splitted_line:
try:
float(i)
counter += 1
if counter== 2:
print(i)
except ValueError:
pass
This will print out 7.911647E+03
I would go for automated trial and error in case we can not be sure that the second number in a line (which you want) is always the third word (space-separated characters/digits):
line2 = f1.readline()
if "Totals" in line2:
numbers = []
for word in line2.split():
try:
number.append(float(word))
except ValueError:
pass
zforcet = numbers[2] if len(numbers) > 2 else 0
torquet = numbers[3] if len(numbers) > 3 else 0
We split the line into words and try to convert each word to a float number. If it succeeds, we append the number to our list, if not we don't care.
Then after having parsed the line, we can simply pick the n-th number from out numbers list - or a default value (0) if we could not parse enough numbers.
You want to:
Make a list with all the float values out of line2
Do something with the second element of that list
For that you'll have to:
Make a helper function to check if something is a float
Split line2 and pass each element to your helper function
Keep only the actual floats
Once you have a list of actual floats in line2, it's just a matter of knowing which item you want to pick (in this case, the second one, so floats[1].
--
def is_float(e):
try:
float(e)
except ValueError:
return False
return True
def get_floats(line):
return [float(item) for item in line.rstrip().split() if is_float(item)]
After which your code becomes:
line2 = f1.readline()
if "Totals" in line2:
floats = get_floats(line2)
zforcet = floats[1] # -1191.758
torquet = floats[2] # 7532.665
Which is not so much shorter but somewhat clearer and easier to debug.
If you plan to reuse the above code, you could also abstract the indexes of items you want to pick:
ZFORCET = 1
TORQUET = 2
And then:
line2 = f1.readline()
if "Totals" in line2:
floats = get_floats(line2)
zforcet = floats[ZFORCET]
torquet = floats[TORQUET]

Categories