How to read a text file by line in python - python

I want to randomly retrieve and print a whole line from a text file.
The text file is basically a list and so each item on that list needs to be searched.
import random
a= random.random
prefix = ["CYBER-", "up-", "down-", "joy-"]
suprafix = ["with", "in", "by", "who", "thus", "what"]
suffix = ["boy", "girl", "bread", "hippy", "box", "christ"]
print (random.choice(prefix), random.choice(suprafix), random.choice(prefix), random.choice(suffix))
this is my code for if i were to just manually input it into a list but i can't seem to find how to use an array or index to capture line by line the text and use that

I'm not sure I completely understand what you're asking, but I'll try to help.
If you're trying to choose a random line from a file, you can use open(), then readlines(), then random.choice():
import random
line = random.choice(open("file").readlines())
If you're trying to choose a random element from each of three lists, you can use random.choice():
import random
choices=[random.choice(i) for i in lists]
lists is a list of lists to choose from here.

Use Python's file.readLines() method:
with open("file_name.txt") as f:
prefix = f.readlines()
Now you should be able to iterate through the list prefix.

Those answers have helped me grab things from a list in a text file. as you can see in my code below. But I have three lists as text files and I am trying to generate a 4 word message randomly, choosing from the 'prefix' and 'suprafix' list for the first 3 words and 'suffix' file for the fourth word but I want to prevent it, when printing them, from picking a word that was already chosen by the random.choice function
import random
a= random.random
prefix = open('prefix.txt','r').readlines()
suprafix = open('suprafix.txt','r').readlines()
suffix = open('suffix.txt','r').readlines()
print (random.choice(prefix + suprafix), random.choice(prefix + suprafix), random.choice(prefix + suprafix), random.choice(suffix))
as you can see it chooses randomly from those 2 lists for 3 words

Related

How to print a random item from a dictionary that is in a text file on Python?

So I have a dictionary with a bunch of words as keys and their defination as values.
E.g., word_list.txt
words = {
happy: "feeling or showing pleasure or contentment.",
apple: "the round fruit which typically has thin green or red skin and crisp flesh.",
today: "on or in the course of this present day."
faeces: "waste matter remaining after food has been digested, discharged from the bowels; excrement."
}
How do I print a random word from the dictionary that is in the text file on Python?
You need to open that file in your code, load it with json library and then you can do any random operation.
To load your file you have to properly add the , to the end of elements.
Also, since your file have a 'words = ' before the keys, you need to split it. You also need to replace single quotes with double:
import json, random
with open('word_list.txt', 'r') as file:
file_text = file.read()
words = json.loads(file_text.split(' = ')[1].replace("'", '"'))
random_word = random.choice(list(words))
print(random_word)
random.choice() will pick a random element from a list. Therefore you just need to pass your dict as a list to it as param. random.choice(list(your_dict))
EDIT: op has edited his question removing the single quotes from every key in his word_list.txt sample. This code will only work if that keys are single or double quoted.
First, you will need to fix your txt file. This could also be a json file but to make it a json file you will need to modify the code. But for the future json is the proper way to do this. You need to remove words =. You also need to put your keys(apple, today, those words) in quotes. Here is the fixed file:
{
"happy": "feeling or showing pleasure or contentment.",
"apple": "the round fruit which typically has thin green or red skin and crisp flesh.",
"today": "on or in the course of this present day.",
"faeces": "waste matter remaining after food has been digested, discharged from the bowels; excrement."
}
Here is some code to do it.
#Nessasary imports.
import json, random
#Open the txt file.
words_file = open("words.txt", "r")
#Turn the data from the file into a string.
words_string = words_file.read()
#Covert the string into json so we can use the data easily.
words_json = json.loads(words_string)
#This gets the values of each item in the json dictionary. It removes the "apple" or whatever it is for that entry.
words_json_values = words_json.values()
#Turns it into a list that python can use.
words_list = list(words_json_values)
#Gets a random word from the list.
picked_word = random.choice(words_list)
#prints is so we can see it.
print(picked_word)
If you want it all on the same line here you go.
#Nessasary imports.
import json, random
#The code to do it.
print(random.choice(list(json.loads(open("words.txt", "r").read()).values())))

Is there any way to obtain a random word from PyEnchant?

Is there a way to obtain a random word from PyEnchant's dictionaries?
I've tried doing the following:
enchant.Dict("<language>").keys() #Type 'Dict' has no attribute 'keys'
list(enchant.Dict("<language>")) #Type 'Dict' is not iterable
I've also tried looking into the module to see where it gets its wordlist from but haven't had any success.
Using the separate "Random-Words" module is a workaround, but as it doesn't follow the same wordlist as PyEnchant, not all words will match. It is also quite a slow method. It is, however, the best alternative I've found so far.
Your question really got me curious so I thought of some way to make a random word using enchant.
import enchant
import random
import string
# here I am getting hold of all the letters
letters = string.ascii_lowercase
# crating a string with a random length with random letters
word = "".join([random.choice(letters) for _ in range(random.randint(3, 8))])
d = enchant.Dict("en_US")
# using the `enchant` to suggest a word based on the random string we provide
random_word = d.suggest(word)
Sometimes the suggest method will not return any suggestion so you will need to make a loop to check if random_word has any value.
With the help of #furas this question has been resolved.
Using the dict-en text file in furas' PyWordle, I wrote a short code that filters out invalid words in pyenchant's wordlist.
import enchant
wordlist = enchant.Dict("en_US")
baseWordlist = open("dict-en.txt", "r")
lines = baseWordlist.readlines()
baseWordlist.close()
newWordlist = open("dict-en_NEW.txt", "w") #write to new text file
for line in lines:
word = line.strip("\n")
if wordList.check(word) == True: #if word exists in pyenchant's dictionary
print(line + " is valid.")
newWordlist.write(line)
else:
print(line + " is invalid.")
newWordlist.close()
Afterwards, calling the text file will allow you to gather the information in that line.
validWords = open("dict-en_NEW", "r")
wordList = validWords.readlines()
myWord = wordList[<line>]
#<line> can be any int (max is .txt length), either a chosen one or a random one.
#this will return the word located at line <line>.

Trying to read text file and count words within defined groups

I'm a novice Python user. I'm trying to create a program that reads a text file and searches that text for certain words that are grouped (that I predefine by reading from csv). For example, if I wanted to create my own definition for "positive" containing the words "excited", "happy", and "optimistic", the csv would contain those terms. I know the below is messy - the txt file I am reading from contains 7 occurrences of the three "positive" tester words I read from the csv, yet the results print out to be 25. I think it's returning character count, not word count. Code:
import csv
import string
import re
from collections import Counter
remove = dict.fromkeys(map(ord, '\n' + string.punctuation))
# Read the .txt file to analyze.
with open("test.txt", "r") as f:
textanalysis = f.read()
textresult = textanalysis.lower().translate(remove).split()
# Read the CSV list of terms.
with open("positivetest.csv", "r") as senti_file:
reader = csv.reader(senti_file)
positivelist = list(reader)
# Convert term list into flat chain.
from itertools import chain
newposlist = list(chain.from_iterable(positivelist))
# Convert chain list into string.
posstring = ' '.join(str(e) for e in newposlist)
posstring2 = posstring.split(' ')
posstring3 = ', '.join('"{}"'.format(word) for word in posstring2)
# Count number of words as defined in list category
def positive(str):
counts = dict()
for word in posstring3:
if word in counts:
counts[word] += 1
else:
counts[word] = 1
total = sum (counts.values())
return total
# Print result; will write to CSV eventually
print ("Positive: ", positive(textresult))
I'm a beginner as well but I stumbled upon a process that might help. After you read in the file, split the text at every space, tab, and newline. In your case, I would keep all the words lowercase and include punctuation in your split call. Save this as an array and then parse it with some sort of loop to get the number of instances of each 'positive,' or other, word.
Look at this, specifically the "train" function:
https://github.com/G3Kappa/Adjustable-Markov-Chains/blob/master/markovchain.py
Also, this link, ignore the JSON stuff at the beginning, the article talks about sentiment analysis:
https://dev.to/rodolfoferro/sentiment-analysis-on-trumpss-tweets-using-python-
Same applies with this link:
http://adilmoujahid.com/posts/2014/07/twitter-analytics/
Good luck!
I looked at your code and passed through some of my own as a sample.
I have 2 idea's for you, based on what I think you may want.
First Assumption: You want a basic sentiment count?
Getting to 'textresult' is great. Then you did the same with the 'positive lexicon' - to [positivelist] which I thought would be the perfect action? Then you converted [positivelist] to essentially a big sentence.
Would you not just:
1. Pass a 'stop_words' list through [textresult]
2. merge the two dataframes [textresult (less stopwords) and positivelist] for common words - as in an 'inner join'
3. Then basically do your term frequency
4. It is much easier to aggregate the score then
Second assumption: you are focusing on "excited", "happy", and "optimistic"
and you are trying to isolate text themes into those 3 categories?
1. again stop at [textresult]
2. download the 'nrc' and/or 'syuzhet' emotional valence dictionaries
They breakdown emotive words by 8 emotional groups
So if you only want 3 of the 8 emotive groups (subset)
3. Process it like you did to get [positivelist]
4. do another join
Sorry, this is a bit hashed up, but if I was anywhere near what you were thinking let me know and we can make contact.
Second apology, Im also a novice python user, I am adapting what I use in R to python in the above (its not subtle either :) )

Using a dictionary as regex in Python

I had a Python question I was hoping for some help on.
Let's start with the important part, here is my current code:
import re #for regex
import numpy as np #for matrix
f1 = open('file-to-analyze.txt','r') #file to analyze
#convert files of words into arrays.
#These words are used to be matched against in the "file-to-analyze"
math = open('sample_math.txt','r')
matharray = list(math.read().split())
math.close()
logic = open('sample_logic.txt','r')
logicarray = list(logic.read().split())
logic.close()
priv = open ('sample_priv.txt','r')
privarray = list(priv.read().split())
priv.close()
... Read in 5 more files and make associated arrays
#convert arrays into dictionaries
math_dict = dict()
math_dict.update(dict.fromkeys(matharray,0))
logic_dict = dict()
logic_dict.update(dict.fromkeys(logicarray,1))
...Make more dictionaries from the arrays (8 total dictionaries - the same number as there are arrays)
#create big dictionary of all keys
word_set = dict(math_dict.items() + logic_dict.items() + priv_dict.items() ... )
statelist = list()
for line in f1:
for word in word_set:
for m in re.finditer(word, line):
print word.value()
The goal of the program is to take a large text file and perform analysis on it. Essentially, I want the program to loop through the text file and match words found in Python dictionaries and associate them with a category and keep track of it in a list.
So for example, let's say I was parsing through the file and I ran across the word "ADD". ADD is listed under the "math" or '0' category of words. The program should then add it to a list that it ran across a 0 category and then continue to parse the file. Essentially generating a large list that looks like [0,4,6,7,4,3,4,1,2,7,1,2,2,2,4...] with each of the numbers corresponding to a particular state or category of words as illustrated above. For the sake of understanding, we'll call this large list 'statelist'
As you can tell from my code, so far I can take as input the file to analyze, take and store the text files that contain the list of words into arrays and from there into dictionaries with their correct corresponding list value (a numerical value from 1 - 7). However, I'm having trouble with the analysis portion.
As you can tell from my code, I'm trying to go line by line through the text file and regex any of the found words with the dictionaries. This is done through a loop and regexing with an additional, 9th dictionary that is more or less a "super" dictionary to help simplify the parsing.
However, I'm having trouble matching all the words in the file and when I find the word, matching it to the dictionary value, not the key. That is when it runs across and "ADD" to add 0 to the list because it is a part of the 0 or "math" category.
Would someone be able to help me figure out how to write this script? I really appreciate it! Sorry for the long post, but the code requires a lot of explanation so you know what's going on. Thank you so much in advance for your help!
The simplest change to your existing code would just be to just keep track of both the word and the category in the loop:
for line in f1:
for word, category in word_set.iteritems():
for m in re.finditer(word, line):
print word, category
statelist.append(category)

Scan through txt, append certain data to an empty list in Python

I have a text file that I am reading in python . I'm trying to extract certain elements from the text file that follow keywords to append them into empty lists . The file looks like this:
so I want to make two empty lists
1st list will append the sequence names
2nd list will be a list of lists which will include be in the format [Bacteria,Phylum,Class,Order, Family, Genus, Species]
most of the organisms will be Uncultured bacterium . I am trying to add the Uncultured bacterium with the following IDs that are separated by ;
Is there anyway to scan for a certain word and when the word is found, take the word that is after it [separated by a '\t'] ?
I need it to create a dictionary of the Sequence Name to be translated to the taxonomic data .
I know i will need an empty list to append the names to:
seq_names=[ ]
a second list to put the taxonomy lists into
taxonomy=[ ]
and a 3rd list that will be reset after every iteration
temp = [ ]
I'm sure it can be done in Biopython but i'm working on my python skills
Yes there is a way.
You can split the string which you get from reading the file into an array using the inbuilt function split. From this you can find the index of the word you are looking for and then using this index plus one to get the word after it. For example using a text file called test.text that looks like so (the formatting is a bit weird because SO doesn't seem to like hard tabs).
one two three four five six seven eight nine
The following code
f = open('test.txt','r')
string = f.read()
words = string.split('\t')
ind = words.index('seven')
desired = words[ind+1]
will return desired as 'eight'
Edit: To return every following word in the list
f = open('test.txt','r')
string = f.read()
words = string.split('\t')
desired = [words[ind+1] for ind, word in enumerate(words) if word == "seven"]
This is using list comprehensions. It enumerates the list of words and if the word is what you are looking for includes the word at the next index in the list.
Edit2: To split it on both new lines and tabs you can use regular expressions
import re
f = open('testtest.txt','r')
string = f.read()
words = re.split('\t|\n',string)
desired = [words[ind+1] for ind, word in enumerate(words) if word == "seven"]
It sounds like you might want a dictionary indexed by sequence name. For instance,
my_data = {
'some_sequence': [Bacteria,Phylum,Class,Order, Family, Genus, Species],
'some_other_sequence': [Bacteria,Phylum,Class,Order, Family, Genus, Species]
}
Then, you'd just access my_data['some_sequence'] to pull up the data about that sequence.
To populate your data structure, I would just loop over the lines of the files, .split('\t') to break them into "columns" and then do something like my_data[the_row[0]] = [the_row[10], the_row[11], the_row[13]...] to load the row into the dictionary.
So,
for row in inp_file.readlines():
row = row.split('\t')
my_data[row[0]] = [row[10], row[11], row[13], ...]

Categories