Saving my output code to a separate file? - python

identify the individual words in a sentence and store them in a list
create a list of positions for words in that list
save these lists as a single file or as separate files.
How would i save my output to a separate file?
This is my code

file=open("filename.txt","w")
file.write(*map(position.__getitem__, words)) #Your last line
file.close()
By the way, don't share your code through an image and use markdown code formatting like above.

I recognise the question you are trying to answer from GSCE Computing (UK).
Aside from your request as to how to store the words in a file, you use a confusing way to save the position of the words.
Using position[word] = len(position) + 1 is saving the position of the word in the list of words, not the position in the original sentence. In fact, the number stored against the word in position is not used, since you map the word against the words list to get the position.
Writing to a file is not difficult, do a search of Stackoverflow.

Related

How to find if a word is in a text file and add the name of the file to a dictionary?

I've been stuck on this problem:
I am given a list of text files (e.g. 'ZchEuxJ9VJ.txt', 'QAih70niIq.txt') which each contain a randomly generated paragraph. I am to write a parser that will extract the individual words from each of the files listed in the folder. My end goal is to create a dictionary with keys that represent the individual words found in my files. The values are supposed to be a list of files that contain that word. For example, if I print ('happy:', search['happy']), the files that contain the word happy should be added as values to my dictionary. If the word is a "new" word I would have to set it up for the first time. If the word is not a "new" word I would have to update the list associated with that word. I also have to make sure that I don't add the same filename to the same word twice.
I've already created a new dictionary called search, visited each of the files and opened them for reading, then isolated each word using the split() method. The thing I am struggling with is how to "find" a word in a particular file and mark down which file a word can be found in. I've tried "searching" for a word in a file then adding the file to the dictionary, but that gets me an error message.
Instead of searching for words in files, you should be going about it the other way around. You know you need to index every word in every file eventually, so why not just go through every word in every file, in order? In pseudocode it might look something like this.
for each file:
for each word in file:
if not word in search:
search[word] = []
search[word].append(file)
This is homework, so I'm going to help you with the algorithm, not the code. You seem to have figured out most of the problem. The only thing you need help with is how to actually populate the dictionary.
Open the file (say fname), read its contents
Split the contents to separate each word
Iterate over each word. Say we call it fword.
Is fword in the dictionary?
No? Create the key with an empty list as the value
Yes? Do nothing and move on
Now you know that fword is a key in the search dictionary, and its value is a list. (Say we call this list fwlist)
You also know that fword was found in the file fname
Check if fname is already in fwlist
No? Add fname to fwlist.
Yes? Don't add it again. Do nothing.
Now, there are optimizations you can make, such as using a set instead of a list. This way you don't need to check if fname already exists in fwlist, because sets automatically discard duplicates, but this should be enough for what you need.
Remember: before you start writing the program, it's helpful to sit down, think about the problem you're trying to solve, and plan out how you're going to attack the problem. Drawing a flowchart helps immensely when you're a novice programmer because it helps you organize your thoughts and figure out how your program is supposed to flow.
Debugging is also a crucial skill -- your code is useless if you can't fix errors. How to debug small programs.
|
What is a debugger and how can it help me diagnose problems?

highlighting and saving multiple words in a single string using python

I'm pretty new to programming and I've been having a problem: I have a txt file with a long (30,000+ chars) single string made of 4 letters (a DNA sequence) and I need to search that file for certain repeats (for example 'TTAGGG'), highlight them and save as a simple readable file. obviously I can't save it as a txt file because there is no highlight option.
I tried html as well as docx but every search I try removes the previous highlights.
Does anyone have any suggestions?
Open it in MS-Word, and do following Find-and-replace:

How to search for a set of words in a text file?

I'm writing a project on extracting a semantic orientation from a review stored in a text file.
I have a 400*2 array, each row contains a word and it's weight. I want to check which of these words is in the text file, and calculate the weight of the whole content.
My question is -
what is the most efficient way to do it? Should I search for each word separately, for example with a for loop?
Do I get any benefit from storing the content of the text file in a string object?
https://docs.python.org/3.6/library/mmap.html
This may work for you. You can use find
This may be out of the box thinking, but if you don't care for semantic/grammatic connection of the words:
sort all words from the text by length
sort your array by length
.
Write a for-loop:
Call len() (length) on each word from the text.
Then only check against those words which have the same length.
With some tinkering it might give you a good performance boost instead of the "naive" search.
Also look into search algorithms if you want to achieve an additional boost (concerning finding the first word (of the 400) with e.g. 6 letters - then go "down" the list until the first word with 5 letters comes up, then stop.
Alternatively you could also build an index array with the indexes of the first and last of all 5-letter words (analog for the rest), assuming your words dont change.

Python - Dividing a book in PDF form into individual text files that correspond with page numbers

I've converted my PDF file into a long string using PDFminer.
I'm wondering how I should go about dividing this string into smaller, individual strings/pages. Each page is divided by a certain series of characters (CRLF, FF, page number etc), and the string should be split and appended to a new text file according to these characters occurring.
I have no experience with regex, but is using the re module the best way to go about this?
My vague idea for implementation is that I have to iterate through the file using the re.search function, creating text files with each new form feed found. The only code I have is PDF > text conversion. Can anyone point me in the right direction?
Edit: I think the expression I should use is something like ^.*(?=(\d\n\n\d\n\n\f\bFavela\b)) (capture everything before 2 digits, the line breaks and the book's title 'Favela' which appears on top of each page.
Can I save these \d digits as variables? I want to use them as file names, as I iterate through the book and scoop up the portions of text divided by each appearance of \f\Favela.
I'm thinking the re.sub method would do it, looping through and replacing with an empty string as I go.

Python: joining multiple lines into a single line/string and appending it to a list

Basically I have a text file, I am reading it line by line. I want to merge some lines together (a part of the text) in to a single string and add it as an element to a list.
These parts of the text that I want to combine start with the letters "gi" and end with ">". I can successfully isolate this part of the text but I am having trouble manipulating with it in any way, i would like it to be a single variable, acting like a individual entity. So far it is only adding single lines to the list.
def lines(File):
dataFile = open(File)
list =[]
for letters in dataFile:
start = letters.find("gi") + 2
end = letters.find(">", start)
unit = letters[start:end]
list.append(unit)
return list
This is an example:
https://www.dropbox.com/s/1cwv2spfcpp0q0s/pythonmafft.txt?dl=0
So every entry that is in the file I would like to manipulate as a single string and be able to append it to a list. Every entry is seperated by a few empty lines.
First off, don't use list as a variable name. list is a builtin and you override it each time you assign the same name elsewhere in your code. Try to use more descriptive names in general and you'll easily avoid this pitfall.
There is an easier way to do what you're asking, since '>gi' (in the example you gave) is placed together. You can simply use split and it'll give you the units (without '>gi').
def lines(File):
dataFile = open(File)
wordlist = dataFile.read().split('>gi')
return wordlist

Categories