Split string within list into words in Python - python

I'm a newbie in Python, and I need to write a code in Python that will read a text file, then split each words in it, sort it and print it out.
Here is the code I wrote:
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
words = list()
for line in fh:
line = line.strip()
line.split()
lst.append(line)
lst.sort()
print lst
That's my output -
['Arise fair sun and kill the envious moon', 'But soft what light through yonder window breaks', 'It is the east and Juliet is the sun', 'Who is already sick and pale with grienter code herew',
'with', 'yonder']
However, when I try to split lst.split() it saying
List object has no attribute split
Please help!

You should extend the new list with the splitted line, rather than attempt to split the strings after appending:
for line in fh:
line = line.strip()
lst.extend(line.split())

The issue is split() does not magically mutate the string that is split into a list. You have to do sth with the return value.
for line in fh:
# line.split() # expression has has no effect
line = line.split() # statement does
# lst += line # shortcut for loop underneath
for token in line:
lst = lst + [token]
lst += [token]
The above is a solution that uses a nested loop and avoids append and extend. The whole line by line splitting and sorting can be done very concisely, however, with a nested generator expression:
print sorted(word for line in fh for word in line.strip().split())

You can do:
fname = raw_input("Enter file name: ")
fh = open(fname, "r")
lines = list()
words = list()
for line in fh:
# get an array of words for this line
words = line.split()
for w in words:
lines.append(w)
lines.sort()
print lines
To avoid dups:
no_dups_list = list()
for w in lines:
if w not in no_dups_list:
no_dups_list.append(w)

Related

Returning a line of txt.-file that has a word with more than 6 characters and starts with "A" in Python

I have a task to accomplish in Python with only one sentence:
I need to return lines of my txt-file that include words which have more than 6 characters and start with the letter "A".
My code is the following:
[line for line in open('test.txt') if line.split().count('A') > 6]
I am not sure how to implement another command in order to say that my word starts with "A" and has to have more than 6 characters. That is the furthest I could do. I thank you for your time.
Greetings
I would split up your for loop so that it's not a list comprehension, to make it easier to understand what's going on. Once you do that, it should be clearer what you're missing so you can assemble it back into a list comprehension.
lines = []
with open('test.txt', 'r') as f:
for line in f: # this line reads each line in the file
add_line = False
for word in line.split():
if (word.startswith('A') and len(word) > 6):
add_line = True
break
if (add_line):
lines.append(line)
This roughly translates to
[line for line in open('test.txt', 'r') if any(len(word) > 6 and word.startswith('A') for word in line.split())]
You should break each line and compare each word separately
[line for line in open('test.txt') if len([word for word in line.split(' ') if word[0].lower() == 'a' and len(word)> 6]) > 0]

Counting word frequency by python list

Today i was trying to write a code to return the number of times a word is repeated in a text (the text that a txt file contains). at first , before i use a dictionary i wanted to test if the list is working and the words are appended into it so i wrote this code :
def word_frequency(file) :
"""Returns the frequency of all the words in a txt file"""
with open(file) as f :
arg = f.readlines()
l = []
for line in arg :
l = line.split(' ')
return l
After i gave it the file address and i pressed f5 this happened :
In[18]: word_frequency("C:/Users/ASUS/Desktop/Workspace/New folder/tesst.txt")
Out[18]: ['Hello', 'Hello', 'Hello\n']
At first you may think that there is no problem with this output but the text in the txt file is :
As you can see , it only appends the words of the first line to the list but i want all the words that are in the txt file to be appended to the list.
Does anyone know what i have to do? what is the problem here ?
You should save the words in the main list before returning the list.
def word_frequency(file):
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
line_words = line.split()
words += line_words
return words
In your code, you are saving and returning only the first line, return terminates the execution of the function and returns a value. Which in your case is just the first line of the file.
One answer is from https://www.pythonforbeginners.com/lists/count-the-frequency-of-elements-in-a-list#:~:text=Count%20frequency%20of%20elements%20in%20a%20list%20using,the%20frequency%20of%20the%20element%20in%20the%20list.
import collections
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
word = line.split(' ')
words.append(word)
frequencyDict = collections.Counter(words)
print("Input list is:", words)
print("Frequency of elements is:")
print(frequencyDict)

how do I output a list of strings that fall alphabetically between two input values

I'm given a text file called input1.txt1 this file contains the following
aspiration
classified
federation
graduation
millennium
philosophy
quadratics
transcript
wilderness
zoologists
Write a program that first reads in the name of an input file, followed by two strings representing the lower and upper bounds of a search range. The file should be read using the file.readlines() method. The input file contains a list of alphabetical, ten-letter strings, each on a separate line. Your program should output all strings from the list that are within that range (inclusive of the bounds).
EX:
Enter the path and name of the input file: input1.txt
Enter the first word: ammunition
Enter the second word (it must come alphabetically after the first word): millennium
The words between ammunition and millennium are:
aspiration
classified
federation
graduation
millennium
file_to_open = input()
bound1 = input()
bound2 = input()
with open(file_to_open) as file_handle:
list1 = [line.strip() for line in file_handle]
out = [x for x in list1 if x >= bound1 and x <= bound2]
out.sort()
print('\n'.join(map(str, out)))
Use a list comprehension with inequalities to check the string range:
out = [x for x in your_list if x >= 'ammunition' and x <= 'millennium']
This assumes that your range is inclusive on both ends, that is, you want to include ammunition and millennium on both ends of the range.
To further sort the out list and then write to a file, use:
out.sort()
f = open('output.txt', 'w')
text = '\n'.join(out)
f.write(text)
f.close()
if you should use readline() try this :
filepath = 'Iliad.txt'
start = 'sometxtstart'
end = 'sometxtend'
apending = False
out = ""
with open(filepath) as fp:
line = fp.readline()
while line:
txt = line.strip()
if(txt == end):
apending = False
if(apending):
out+=txt + '\n'
if(txt == start):
apending = True
line = fp.readline()
print(out)
This worked for me:
file = input()
first = input()
second = input()
with open(file) as f:
lines = f.readlines()
for line in lines:
l = line.strip('\n')
if (l >= first) and (l <= second):
print(line.strip())
else:
pass

How to loop through two list and append key,val pair?

I'm trying two loop through a text file and create a dict which holds dict[line_index]=word_index_position which means the key is the line number and the value is all the words in that line. The goal is to create a "matrix" so that a the user later on should be able to specify x,y coordinates (line, word_index_position) and retrieve a word in those coordinates, if there is any (Not sure how it is going to work with a dict, since it's not ordered). Below is the loop to create the dict.
try:
f = open("file.txt", "r")
except Exception as e:
print("Skriv in ett korrekt filnamn")
uppslag = dict()
num_lines = 0
for line in f.readlines():
num_lines += 1
print(line)
for word in line.split():
print(num_lines)
print(word)
uppslag[num_lines] = word
f.close()
uppslag
Loop works as it's supposed to, but uppslag[num_lines] = word seems to only store the last word in each line. Any guidance would be highly appreciated.
Many thanks,
Instead of overwriting the word:
for word in line.split():
print(num_lines)
print(word)
uppslag[num_lines] = word
you may be better off saving the whole line:
uppslag[num_lines] = line.split()
This way you'll be able to find the 3rd word in 4th line as:
uppslag[4][3]
uppslag[num_lines] = word is overwriting the dictionary entry for key num_lines every time it is called. You can use a list to hold all the words:
for line in f:
num_lines += 1
print(line)
uppslag[num_lines] = [] # initialize dictionary entry with empty list
for word in line.split():
print(num_lines, word)
uppslag[num_lines].append(word) # add new word to list
You can write the same code in a more compact form, since line.split() already returns a list:
for line_number, line in enumerate(f):
uppslag[line_number] = line.split()
If there is a word on every line (i.e. the line index will be continuous) you can use a list instead of a dictionary, and reduce your code to a one-line list comprehension:
uppslag = [line.split() for line in f]
There is no need for a dictionary, or .readlines().
with open("file.txt") as words_file:
words = [line.split() for line in words_file]

How to search words from txt file to python

How can I show words which length are 20 in a text file?
To show how to list all the word, I know I can use the following code:
#Program for searching words is in 20 words length in words.txt file
def main():
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
print (line)
return
main()
But I not sure how to focus and show all the words with 20 letters.
Big thanks
If your lines have lines of text and not just a single word per line, you would first have to split them, which returns a list of the words:
words = line.split(' ')
Then you can iterate over each word in this list and check whether its length is 20.
for word in words:
if len(word) == 20:
# Do what you want to do here
If each line has a single word, you can just operate on line directly and skip the for loop. You may need to strip the trailing end-of-line character though, word = line.strip('\n'). If you just want to collect them all, you can do this:
words_longer_than_20 = []
for word in words:
if len(word) > 20:
words_longer_than_20.append(word)
If your file has one word only per line, and you want only the words with 20 letters you can simply use:
with open("words.txt", "r") as f:
words = f.read().splitlines()
found = [x for x in words if len(x) == 20]
you can then print the list or print each word seperately
You can try this:
f = open('file.txt')
new_file = f.read().splitlines()
words = [i for i in f if len(i) == 20]
f.close()

Categories