I have the below program wherein I am trying to convert text files to a character unigram (feature vector) and writing the output to a text file.
I am printing the output on the console and writing it to a text file at the same time, however, printing to the console will print all the records while printing to the file prints only the last iteration of the filename in articles.
Should I be using an array for rawcu?
My Code:
for fileName in allarticles:
rawcu = [0.0]*95
out=open("CASIS-25fvs_rawcu.txt","w")
fileOpen = open(fileName)
charFrequency = {}
for line in fileOpen:
for letter in line:
if((ord(letter) > 31) and ord(letter) < 127):
rawcu[ord(letter)-32] += 1.0
print rawcu
print >> out, rawcu
You opened the file for over-writing, not for appending. Must be:
open("CASIS-25fvs_rawcu.txt", "a")
Related
I want print my output to text file. But the results different if I print in terminal. My code :
...
words = keywords.split("makan","Rina")
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
print('"' + sentences[itemIndex] + '."')
break
The ouput like this :
"Semalam saya makan nasi padang."
" Saya makan bersama Rina."
" Rina pesan ayam goreng."
If I add print to text file :
words = ["makan","Rina"]
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
with open("corpus.txt",'w+') as f:
f.write(sentences[itemIndex])
f.close()
The output just :
Rina pesan ayam goreng
Why? How to print outputs to text file same like I print outputs in terminal?
You are reopening the file for each iteration of the loop so when you write to it you overwrite what is already there. You need to open the file outside of all the loops and open it in append mode, denoted by a.
When you finish you will end up with only the last line in the file. Remember to close the file using f.close() when you are done with it.
You have to reorder the lines of your code, by moving opening/closing the file outside of the loop:
with open("corpus.txt",'w+') as f:
words = ["makan","Rina"]
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
f.write(sentences[itemIndex])
Also, print usually added a newline character after the output, if you want your sentences to be written on the different lines in the file, you may want to add f.write('\n') after every sentence.
Because you are listing with open inside of the loop, and you're using 'w+' mode, your program is going to overwrite the file each time, so you will only end up with the last line written to the file. Try it with 'a' instead, or move with open outside of the loop.
You don't need to call close on a file handle that you have opened using the with syntax. The closing of the file is handled for you.
I would open the file just once before for loops (the for loops should be within the with statement) instead of opening it multiple times. You are overwriting the file each time you are opening it to write a new line.
Your code should be:
words = ["makan","Rina"]
sentences = text.split(".")
with open("corpus.txt",'w+') as f:
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
f.write(sentences[itemIndex] + '\n')
I am currently trying to extract the raw data from a .txt file of 10 urls, and put the raw data from each line(URL) in the .txt file. And then repeat the process with the processed data(the raw data from the same original .txt file stripped of the html) by using Python.
import commands
import os
import json
# RAW DATA
input = open('uri.txt', 'r')
t_1 = open('command', 'w')
counter_1 = 0
for line in input:
counter_1 += 1
if counter_1 < 11:
filename = str(counter_1)
print str(line)
filename= str(count)
command ='curl ' + '"' + str(line).rstrip('\n') + '"'+ '> ./rawData/' + filename
output_1 = commands.getoutput(command)
input.close()
# PROCESSED DATA
counter_2 = 0
input = open('uri.txt','r')
t_2 = open('command','w')
for line in input:
counter_2 += 1
if counter_2 <11:
filename = str(counter_2) + '-processed'
command = 'lynx -dump -force_html ' + '"'+ str(line).rstrip('\n') + '"'+'> ./processedData/' + filename
print command
output_2 = commands.getoutput(command)
input.close()
I am attempting to do all of this with one script. Can anyone help me refine my code so I can run it? it should loop through the code completely once for each kind line in the .txt file. For example, I should have 1 raw & 1 processed .txt file for every url line in my .txt file.
Break your code up into functions. Currently the code is hard to read and debug. Make a function called get_raw() and a function called get_processed(). Then for your main loop, you can do
for line in file:
get_raw(line)
get_processed(line)
Or something similar. Also you should avoid using 'magic numbers' like counter<11. Why is it 11? Is it the number of the lines in the file? If it is you can get the number of lines with len().
I'm a total noob to Python and need some help with my code.
The code is meant to take Input.txt [http://pastebin.com/bMdjrqFE], split it into seperate Pokemon (in a list), and then split that into seperate values which I use to reformat the data and write it to Output.txt.
However, when I run the program, only the last Pokemon gets outputted, 386 times. [http://pastebin.com/wkHzvvgE]
Here's my code:
f = open("Input.txt", "r")#opens the file (input.txt)
nf = open("Output.txt", "w")#opens the file (output.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
num = 0
tab = """ """
newl = """NEWL
"""
slash = "/"
while num != 386:
current = pokeData
current.append(line)
print current[num]
for tab in current:
words = tab.split()
print words
for newl in words:
nf.write('%s:{num:%s,species:"%s",types:["%s","%s"],baseStats:{hp:%s,atk:%s,def:%s,spa:%s,spd:%s,spe:%s},abilities:{0:"%s"},{1:"%s"},heightm:%s,weightkg:%s,color:"Who cares",eggGroups:["%s"],["%s"]},\n' % (str(words[2]).lower(),str(words[1]),str(words[2]),str(words[3]),str(words[4]),str(words[5]),str(words[6]),str(words[7]),str(words[8]),str(words[9]),str(words[10]),str(words[12]).replace("_"," "),str(words[12]),str(words[14]),str(words[15]),str(words[16]),str(words[16])))
num = num + 1
nf.close()
f.close()
There are quite a few problems with your program starting with the file reading.
To read the lines of a file to an array you can use file.readlines().
So instead of
f = open("Input.txt", "r")#opens the file (input.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
You can just do this
pokeData = open("Input.txt", "r").readlines() # This will return each line within an array.
Next you are misunderstanding the uses of for and while.
A for loop in python is designed to iterate through an array or list as shown below. I don't know what you were trying to do by for newl in words, a for loop will create a new variable and then iterate through an array setting the value of this new variable. Refer below.
array = ["one", "two", "three"]
for i in array: # i is created
print (i)
The output will be:
one
two
three
So to fix alot of this code you can replace the whole while loop with something like this.
(The code below is assuming your input file has been formatted such that all the words are split by tabs)
for line in pokeData:
words = line.split (tab) # Split the line by tabs
nf.write ('your very long and complicated string')
Other helpers
The formatted string that you write to the output file looks very similar to the JSON format. There is a builtin python module called json that can convert a native python dict type to a json string. This will probably make things alot easier for you but either way works.
Hope this helps
I'm working through an introductory Python programming course on MIT OCW. On this problem set I've been given some code to work on and a text file. The code and the text file are in the same folder. The code looks like this:
import random
import string
def load_words( ):
print "Loading word list from file..."
inFile = open (WORDLIST_FILENAME, 'r', 0)
line = inFile.readline( )
wordlist = string.split (line)
print " ", len(wordlist), "words loaded."
return wordlist
def choose_word (wordlist):
return random.choice (wordlist)
wordlist = load_words ( )
When I run the code as it is, the problem set instructions say I should get this:
Loading word list from file...
55900 words loaded.
For some reason though, when I run the code I get:
Loading word list from file...
1 words loaded
I've tried omitting the 2nd and 3rd parameters from the input to the open function but to no avail. What could the problem be?
Moreover, when I try to print the value of wordlist I get
['AA']
When I print the value of line within the context of the relevant function I get:
AA
The text file does begin with 'AA', but what about all of the letters that follow?
line = inFile.readline( ) should be readlines(), plural.
readline would read only a single line. The reason why only one word is read.
Using readlines() would give you a list delimited by new line characters in your input file.
raw file like this:
cat wordlist.txt
aa
bb
cc
dd
ee
python file like this:
import random
def load_words(WORDLIST_FILENAME):
print "Loading word list from file..."
wordlist = list()
# 'with' can automate finish 'open' and 'close' file
with open(WORDLIST_FILENAME) as f:
# fetch one line each time, include '\n'
for line in f:
# strip '\n', then append it to wordlist
wordlist.append(line.rstrip('\n'))
print " ", len(wordlist), "words loaded."
print '\n'.join(wordlist)
return wordlist
def choose_word (wordlist):
return random.choice (wordlist)
wordlist = load_words('wordlist.txt')
then result:
python load_words.py
Loading word list from file...
5 words loaded.
aa
bb
cc
dd
ee
the function u have written can read words in a single line. It assumes all words are written in single line in text file and hence reads that line and creates a list by splitting it. However, it appears your text file contains some newlines also. Hence u can replace the following with:
line = inFile.readline( )
wordlist = string.split (line)
with:
wordlist =[]
for line in inFile:
line = line.split()
wordlist.extend(line)
print " ", len(wordlist), "words loaded."
So far I have this. I opened the data file, I was able to make a list from the data and print the data I needed from the list in 2 columns correctly. It shows up in python just fine. But when I try to write it to a txt file, it all shows up on 1 line. Not sure what to do so it's into 2 columns in the new text file.
# open file
data = open("BigCoCompanyData.dat", "r")
data.readline()
# skip header and print number of employees
n = eval(data.readline())
print(n)
# read in employee information
longest = 0
# save phone list in text file
phoneFile = open("PhoneList.txt", "w")
for i in range(n):
lineI = data.readline().split(",")
nameLength = len(lineI[1])+len(lineI[2])
if nameLength > longest:
longest = nameLength
longest = longest + 5
print((lineI[2].title()+", "+lineI[1].title()).ljust(longest) + ("("+lineI[-2][0:3]+")"+lineI[-2][3:6]+"-"+lineI[-2][6:10]).rjust(14))
phoneFile.write((lineI[2].title()+", "+lineI[1].title()).ljust(longest) + ("("+lineI[-2][0:3]+")"+lineI[-2][3:6]+"-"+lineI[-2][6:10]).rjust(14))
data.close()
# close the file
phoneFile.close()
phoneFile.write(...)simply writes the line you give it. Every time you give a line it appends it to the previous lines, unless you end your lines with \n.
phoneFile.write((lineI[2].title()+", "+lineI[1].title()).ljust(longest) +
("("+lineI[-2][0:3]+")"+lineI[-2][3:6]+"-"+lineI[-2][6:10]).rjust(14)+'\n')