Increase index of split in Python - python

Well, I have a problem in a Python script, I need to do is that the index of the split function, increases automatically with every iteration of the loop. I do this:
tag = "\'"
while loop<=302:
for line in f1.readlines():
if tag in line:
word = line.split(tag)[num] #num is the index I need to increase
text = "Word: "+word+"."
f.write(text)
num = num + 1
loop = loop + 1
But...the "num" variable on index doesn't change...it simply stays the same. The num index indicates the word I need to take. So this is why "num = num + 1" would have to increase...
What is the problem in the loop?
Thanks!

Your question is confusing. But I think you want to move num = num + 1 into the for loop and if statement.
tag = "\'"
while loop<=302:
for line in f1.readlines():
if tag in line:
word = line.split(tag)[num] #num is the index I need to increase
num = num + 1
text = "Word: "+word+"."
f.write(text)
loop = loop + 1

Based on Benyi's comment in the question - do you just want this for the individual sentences? You might not need to index.
>>> mystring = 'hello i am a string'
>>> for word in mystring.split():
print 'Word: ',word
Word: hello
Word: i
Word: am
Word: a
Word: string

There seems to be a lot of things wrong with this.
First
while loop <= 302:
for line in f1.readlines():
f1.readlines() is going be [] for every iteration past the first
Second
for line in f1.readline():
word = line.split(tag)[num]
...
text = "Word: "+word+"."
Even if you made the for loop work, text will always be using the last iteration of the word. Maybe this is desired behavior, but it seems strange.
Third
while loop<=302:
...
loop = loop += 1
Seems like it would be better written as
for _ in xrange(302):
Since loop isn't used at all inside that scope. This is assuming loop starts at 0, if it doesn't then you just adjust 302 to however many iterations you wanted.
Lastly
num = num + 1
This is outside your inner loop, so num will always be the same for the first iteration, then won't matter latter because of the empty f1.readlines() as stated before.

I have a different approach to your problem as mentioned by you in the comment. Consider input.txt has the following entry:
this is a an input file.
then the Following code will give you the desired output
lines = []
with open (r'C:\temp\input.txt' , 'r') as fh:
lines = fh.read()
with open (r'C:\temp\outputfile.txt' , 'w') as fh1:
for words in lines.split():
fh1.write("Words:"+ words+ "\n" )

Related

Working with printing a certain amount of line in a file

I am trying to achieve:
User input word, and it outputs how many lines contain that word also sees it up to the first ten such lines. If no lines has the words, then your program must output Not found.
My code so far:
sentences = []
with open("txt.txt") as file:
for line in file:
words = line.split()
words_count += len(words)
if len(words) > len(maxlines.split()):
maxlines = line
sentences.append(line)
word = input("Enter word: ")
count = 0
for line in sentences:
if word in line:
print(line)
count += 1
print(count, "lines contain", word)
if count == 0:
print("Not found.")
How would I only print first 10 line regardless the amount of lines
Thank you!
If you want to iterate 10 times (old style, not pythonic at all)
index = 0
for line in file:
if index >= 10:
break
# do stuff 10 times
index += 1
Without using break, just put the stuff inside the condition. Notice that the loop will continue iterating, so this is not really a smart solution.
index = 0
for line in file:
if index < 10:
# do stuff 10 times
index += 1
However this is not pythonic at all. the way you should do it in python is using range.
for _ in range(10):
# do stuff 10 times
_ means you don't care about the index and just want to repeat 10 times.
If you want to iterate over file and keeping the index (line number) you can use enumerate
for lineNumber, line in enumerate(file):
if lineNumber >= 10:
break
# do stuff 10 times
Finally as#jrd1 suggested, you can actually read all the file and then only slice the part that you want, in your case
sentences[:10] # will give you the 10 first sentences (lines)
just change your code like this, it should help:
for line in sentences:
if word in line:
if count < 10: print(line) # <--------
count += 1

Program not writing to dictionary?

I'm trying to create a simple program that opens a file, splits it into single word lines (for ease of use) and creates a dictionary with the words, the key being the word and the value being the number of times the word is repeated. This is what I have so far:
infile = open('paragraph.txt', 'r')
word_dictionary = {}
string_split = infile.read().split()
for word in string_split:
if word not in word_dictionary:
word_dictionary[word] = 1
else:
word_dictionary[word] =+1
infile.close()
word_dictionary
The line word_dictionary prints nothing, meaning that the lines are not being put into a dictionary. Any help?
The paragraph.txt file contains this:
This is a sample text file to be used for a program. It should have nothing important in here or be used for anything else because it is useless. Use at your own will, or don't because there's no point in using it.
I want the dictionary to do something like this, but I don't care too much about the formatting.
Two things. First of all the shorter version of
num = num + 1
is
num += 1
not
num =+ 1
code
infile = open('paragraph.txt', 'r')
word_dictionary = {}
string_split = infile.read().split()
for word in string_split:
if word not in word_dictionary:
word_dictionary[word] = 1
else:
word_dictionary[word] +=1
infile.close()
print(word_dictionary)
Secondly you need to print word_dictionary

Python - Dictionary function only one entry for dictionary

Sorry if this is a silly question but I am new to python. I have a piece of code that was opening a text reading it, creating a list of words, then from that list create a dictionary of each word with a count of how many times it appears in the list of words. This code was working fine and was printing out the dictionary fine however when i put it in a function and called the function it returns the dictionary but only with one entry. Any ideas why, any help is much appreciated.
def createDict():
wordlist = []
with open('superman.txt','r', encoding="utf8") as superman:
for line in superman:
for word in line.split():
wordlist.append(word)
#print(word)
table = str.maketrans("!#$%&()*+, ./:;<=>?#[\]^_`{|}~0123456789'“”-''—", 47*' ' )
lenght = len(wordlist)
i = 0
while i < lenght:
wordlist[i] = wordlist[i].translate(table)
wordlist[i] = wordlist[i].lower()
wordlist[i] = wordlist[i].strip()
i += 1
wordlist = list(filter(str.strip, wordlist))
word_dict = {}
for item in wordlist:
if item in word_dict.keys():
word_dict[item] += 1
else:
word_dict[item] = 1
return(word_dict)
try initializing the dictionary outside of the function and then using global inside the function. Is that one item in the dictionary the last iteration?
Fix your indenting in your iterating over the wordlist. Should read:
for item in wordlist:
if item in word_dict.keys():
word_dict[item] += 1
else:
word_dict[item] = 1
this seems to be an indentation and whitespace issue. Make sure the if and else statements near the end of your function are at the same level.
Below is code I got working with the indentation at the correct level. In addition comments to explain the thought process
def createDict():
wordlist = []
with open('superman.txt','r', encoding="utf8") as superman:
for line in superman:
for word in line.split():
wordlist.append(word)
#print(word)
table = str.maketrans("!#$%&()*+, ./:;<=>?#[\]^_`{|}~0123456789'“”-''—", 47*' ' )
lenght = len(wordlist)
i = 0
while i < lenght:
wordlist[i] = wordlist[i].translate(table)
wordlist[i] = wordlist[i].lower()
wordlist[i] = wordlist[i].strip()
i += 1
wordlist = list(filter(str.strip, wordlist))
# print(len(wordlist)) # check to see if wordlist is fine. Indeed it is
word_dict = {}
for item in wordlist:
# for dictionaries don't worry about using dict.keys()
# method. You can use a shorter if [value] in [dict] conditional
# The issue in your code was the whitespace and indentation
# of the else statement.
# Please make sure that if and else are at the same indentation levels
# Python reads by indentation and whitespace because it
# doeesn't use curly brackets like other languages like javascript
if item in word_dict:
word_dict[item] += 1
else:
word_dict[item] = 1
return word_dict # print here too
Please let me know if you have any questions. Cheers!

Counting prefixes from a csv file using python

Is there a way to make this python system count the prefixes in the array? I keep getting a prefixcount result of 0
Any help would be appreciated :)
The code I have is below
file = input("What is the csv file's name?")+".csv"
openfile = open(file)
text = sorted(openfile)
print("Sorted list")
print(text)
dictfile = open("DICT.txt")
prefix = ['de', 'dys', 'fore', 'wh']
prefixcount = 0
for word in text:
for i in range(0, len(prefix)):
if word>prefix[i]:
break
if word[0:len(prefix[i])] == prefix[i]:
prefixcount+=1
break
print(prefixcount)
Firstly, when you meet a condition where you want to skip an iteration, use continue - break ends the loop entirely.
Secondly, word>prefix[i] does not do what you want; it compares the strings lexicographically (see e.g. the Python docs), and you really want to know len(word) < len(prefix[i]).
I think what you want is:
prefixcount = 0
for word in text:
for pref in prefix:
if word.startswith(pref):
prefixcount += 1

Why is this not correct? (codeeval challenge)PYTHON

This is what I have to do https://www.codeeval.com/open_challenges/140/
I've been on this challenge for three days, please help. It it is 85-90 partially solved. But not 100% solved... why?
This is my code:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
saver=[]
text=""
textList=[]
positionList=[]
num=0
exists=int()
counter=0
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
print text
test_cases.close()
This code works for me:
import sys
def main(name_file):
_file = open(name_file, 'r')
text = ""
while True:
try:
line = _file.next()
disordered_line, numbers_string = line.split(';')
numbers_list = map(int, numbers_string.strip().split(' '))
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
if missing_number == 0:
missing_number = len(disordered_line)
numbers_list.append(missing_number)
disordered_list = disordered_line.split(' ')
string_position = zip(disordered_list, numbers_list)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
except StopIteration:
break
_file.close()
print text.strip()
if __name__ == '__main__':
main(sys.argv[1])
I'll try to explain my code step by step so maybe you can see the difference between your code and mine one:
while True
A loop that breaks when there are no more lines.
try:
I put the code inside a try and catch the StopIteracion exception, because this is raised when there are no more items in a generator.
line = _file.next()
Use a generator, so that way you do not put all the lines in memory from once.
disordered_line, numbers_string = line.split(';')
Get the unordered phrase and the numbers of every string's position.
numbers_list = map(int, numbers_string.strip().split(' '))
Convert every number from string to int
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
Get the missing number from the serial of numbers, so that missing number is the position of the last string in the phrase.
if missing_number == 0:
missing_number = len(unorder_line)
Check if the missing number is equal to 0 if so then the really missing number is equal to the number of the strings that make the phrase.
numbers_list.append(missing_number)
Append the missing number to the list of numbers.
disordered_list = disordered_line.split(' ')
Conver the disordered phrase into a list.
string_position = zip(disordered_list, numbers_list)
Combine every string with its respective position.
ordered = sorted(string_position, key = lambda x: x[1])
Order the combined list by the position of the string.
text += " ".join([x[0] for x in ordered])
Concatenate the ordered phrase, and the reamining code it's easy to understand.
UPDATE
By looking at your code here is my opinion tha might solve your problem.
split already returns a list so you do not have to loop over the splitted content to add that content to another list.
So these six lines:
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
can be converted into three:
splitted_test = test.strip().split(';')
textList = splitted_test[0].split(" ")
positionList = map(int, splitted_test[1].split(" "))
In this line positionList = map(int, splitted_test[0].split(" ")) You already convert numbers into int, so you save these two lines:
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
The next lines:
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
can be converted into the next four:
missing_number = sum(xrange(sorted(positionList)[0],sorted(positionList)[-1]+1)) - sum(positionList)
if missing_number == 0:
missing_number = len(textList)
positionList.append(missing_number)
Basically what these lines do is calculate the missing number in the serie of numbers so the len of the serie is the same as textList.
The next lines:
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
Can be replaced by these ones:
string_position = zip(textList, positionList)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
From this way you can save, lines and memory, also use xrange instead of range.
Maybe the factors that make your code pass partially could be:
Number of lines of the script
Number of time your script takes.
Number of memory your script uses.
What you could do is:
Use Generators. #You save memory
Reduce for's, this way you save lines of code and time.
If you think something could be made it easier, do it.
Do not redo the wheel, if something has been already made it, use it.

Categories