Searching for a string in a file and saving the results

Searching for a string in a file and saving the results - python

I have a few quite large text files with data on them. I need to find a string that repeats from the data and the string will always have an id number after it. I will need to then save that number.
Ive done some simple scripting with python but I am unsure where to start from with this or if python is even a good idea for this problem. Any help is appreciated.

I will post more information next time (my bad), but I managed to get something to work that should do it for me.
import re
with open("test.txt", "r") as opened:
text = opened.read()
output = re.findall(r"\bdata........", text)
out_str = ",".join(output)
print (out_str)
#with open("output.txt", "w") as outp:
#outp.write(out_str)

Related

How I can find all string substring in file and then manipulate with it? - Python

I have a problem. I want to read this file
https://cdn.discordapp.com/attachments/852226751832653864/870341903923695626/map.txt
And find all "targetname" lines and then finding out what comes next of targetname. For example,
"targetname" "rope01"
I want to know "rope01", and in .txt file there are multiple amount of substrings. And then I want to manipulate with rope01. How I can do it?
import re
value = '"targetname" "test"'
text = value[13:]
print(text)
#map_itself = open('map.txt', 'r')
with open('map.txt', 'rb') as map_itself:
for line in map_itself:
if '"targetname"'.encode() in map_itself:
print("I found it!")
This code is pretty stupid and it can't even find any "targetname". However, they are existing in .txt. (CTRL + F to find "targetname" since it's not on top)

Fast text use (getting it up to compare word vectors)

I am a little ashamed that I have to ask this question because I feel like I should know this. I haven't been programming long but I am trying to apply what I learn to a project I'm working on, and that is how I got to this question. Fast Text has a library of word and associated points https://fasttext.cc/docs/en/english-vectors.html . It is used to find the vector of the word. I just want to look a word or two up and see what the result is in order to see if it is useful for my project. They have provided a list of vectors and then a small code chunck. I cannot make heads or tails out of it. some of it I get but i do not see a print function - is it returning the data to a different part of your own code? I also am not sure where the chunk of code opens the data file, usually fname is a handle right? Or are they expecting you to type your file's path there. I also am not familiar with io, I googled the word but didn't find anything useful. Is this something I need to download or is it already a part of python. I know I might be a little out of my league but I learn best by doing, so please don't hate on me.
import io
def load_vectors(fname):
fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
n, d = map(int, fin.readline().split())
data = {}
for line in fin:
tokens = line.rstrip().split(' ')
data[tokens[0]] = map(float, tokens[1:])
return data

Try the following:
my_file_name = 'C:/path/to/file.txt' # Use the path to your file of rows of sentences
my_data = load_vectors(my_file_name) # Function will return data
print(my_data) # To see the output

How to keep the unicode character codes in my csv file?

I am handling a large number of incoming emails and many of them have various emoticons in them. I am planning to apply an NLP analysis on the user comments and train a classifier to provide relevant answers, instead of having to manually reply to hundreds of these messages. For this as a first step, I parsed all emails and saved their content in a list called userMessages that I wrote in a csv file. I plan to add further columns to the csv for analytic purposes, such as user name, address, date, and time but this is not relevant for this question now.
Here is the code I use to write the userMessages list into a csv file called user-messages.csv:
with open('user-messages.csv', 'wb') as myfile:
wr = csv.writer(myfile, dialect='excel', encoding='utf-8', quoting=csv.QUOTE_ALL)
for _msg in userMessages:
wr.writerow([_msg])
This doesn't run into an error due to the encoding='utf-8' parameter, however, it removes/recodes the emoticons in such a way that it is no longer retraceable, for instance in the following format: ðŸ˜. Ideally, I would like to have the original unicode codes in the csv file, such as '\U0001f604' (smiling face with open mouth and smiling eyes) and later substitute these codes with their (approximate) meaning for the NLP to better understand the context of the messages, for instance in the case of this character ('\U0001f604'), remove the code and add the words 'smile' or 'happy'.
Can this be achieved? Or am I overcomplicating things? Any advice would be greatly appreciated. Thank you!
Edit: I am using Windows and I open the csv files in Microsoft Excel 2016.

I really encourage replacing these Unicode characters with their meaning now, rather than keeping the Unicode as a string (which can be simply done by adding the escape character \) and convert them later.
Replacing the Unicode with their meaning can be done easily using unicodedata.name() method like so:
import unicodedata
def normalize_unicode(text):
output = []
for word in text.split(' '):
try:
meaning = unicodedata.name(word).lower()
output.append(meaning)
except TypeError:
output.append(word)
return " ".join(output)
Let's test out this function:
>>> x = "I'm happy \U0001f604"
>>> normalize_unicode(x)
I'm happy smiling face with open mouth and smiling eyes
Now, let's see how are you going to use this method in your code:
with open('user-messages.csv', 'wb') as myfile:
wr = csv.writer(myfile, dialect='excel', encoding='utf-8', quoting=csv.QUOTE_ALL)
for _msg in userMessages:
wr.writerow([ normalize_unicode(_msg) ]) #<-- can be added here
print(normalize_unicode(x))

File to string conversion in python

dictionary = file . read()
I'm currently creating a cipher solver for the 2017 cipher challenge
I have a word document of fifty eight thousand words but i cannot get the file as a string in python 2.7.9
I have tried many thing i have read online such as the above code but to no avail.
I also need this to be easy to understand as i am new to python
Thanks!Don't be negative be constructive!
The word are from:
http://www.mieliestronk.com/corncob_lowercase.txt

You probably should consult some code examples on the web for reading a file. You need something like:
fp = open(fname, "r")
lines = fp.readlines()
for line in lines:
do_something_with_the_lines
fp.close()

All you have to do is:
with open("dictionary.txt") as f: # Open the file and save it as "f"
dictionary = f.read() # Read the content of the file and save it to "dictionary"
If you want to read it from a website, try this:
import urllib2
dictionary = urllib2.urlopen("http://www.mieliestronk.com/corncob_lowercase.txt").read() # Open the website from the url and save its contents to "dictionary"

I think you should check this out for what you're trying to do (http://www.pythonforbeginners.com/files/reading-and-writing-files-in-python)
This should be helpful

Read file / match string / replace part of the string with user input / write file

Reading the million resources around the web, I'm getting more confused than helped, as I believe that there are many ways to do what I need to do with py.
So I hope some of you python gurus can lend me a hand.
What I need to do is the following:
Prompt user for input [INPUT]
Open an html file (simple, nothing too big)
Search for <a target="_top" href="http://website">Local website</a>
Replace http://website (which is never the same string) with [INPUT]
Write the file (as the same file opened)
Now, if I understand correctly I should use regex within python, is this correct?
My pseudo code (sorry, I know it looks terrible) would be:
var = raw_input("Enter input: ")
print var, "will be the new site"
import re
o = open("test.html","w")
data = open("test.html").read()
o.write( re.sub("<a target="_top" href="(*)">Local website</a>",var,data) )
o.close()
The above is probably not even the best way to do this, but it works without the regex part, doing a simple match-replace (where the match is always the same).
Any hint from you folks?

Your code looks pretty good. I just changed a little bit. I wasn't super clear on what your question was since your code seems to be functional. Hope it helps:
import re
INFILE = 'test.html'
OUTFILE = 'replaced.html'
new_site_name = raw_input('Enter input: ')
print new_site_name, 'will be the new site.'
pattern = '<a .* href="(.+)">.+</a>'
replacement = '<a target="_top" href=%s>Local website</a>' % new_site_name
with open(INFILE, 'r') as f:
html_text = f.read()
with open(OUTFILE, 'w') as f:
f.write(re.sub('<a .* href="(.+)">.+</a>', replacement, html_text))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.