Python - Delete Conditional Lines of Chat Log File - python

I am trying to delete my conversation from a chat log file and only analyse the other persons data. When I load the file into Python like this:
with open(chatFile) as f:
chatLog = f.read().splitlines()
The data is loaded like this (much longer than the example):
'My Name',
'08:39 Chat data....!',
'Other person's name',
'08:39 Chat Data....',
'08:40 Chat data...,
'08:40 Chat data...?',
I would like it to look like this:
'Other person's name',
'08:39 Chat Data....',
'08:40 Chat data...,
'08:40 Chat data...?',
I was thinking of using an if statement with regular expressions:
name = 'My Name'
for x in chatLog:
if x == name:
"delete all data below until you get to reach the other
person's name"
I could not get this code to work properly, any ideas?

I think you misunderstand what "regular expressions" means... It doesn't mean you can just write English language instructions and the python interpreter will understand them. Either that or you were using pseudocode, which makes it impossible to debug.
If you don't have the other person's name, we can probably assume it doesn't begin with a number. Assuming all of the non-name lines do begin with a number, as in your example:
name = 'My Name'
skipLines = False
results = []
for x in chatLog:
if x == name:
skipLines = True
elif not x[0].isdigit():
skipLines = False
if not skipLines:
results.append(x)

others = []
on = True
for line in chatLog:
if not line[0].isdigit():
on = line != name
if on:
others.append(line)

You can delete all of your messages using re.sub with an empty string as the second argument which is your replacement string.
Assuming each chat message starts on a new line beginning with a time stamp, and that nobody's name can begin with a digit, the regular expression pattern re.escape(yourname) + r',\n(?:\d.*?\n)*' should match all of your messages, and then those matches can be replaced with the empty string.
import re
with open(chatfile) as f:
chatlog = f.read()
yourname = 'My Name'
pattern = re.escape(yourname) + r',\n(?:\d.*?\n)*'
others_messages = re.sub(pattern, '', chatlog)
print(others_messages)
This will work to delete the messages of any user from any chat log where an arbitrary number of users are chatting.

Related

Add text in word based on content

I have a batch of .doc documents, in the first line of each document I have the name of a person written. I would like to add in each document the email adress of the person, based on a list I have. How can I use python or vba to program something that does the job for me?
I tried to do this vba code, that finds the name of the person and then writes the email, was thinking to loop it over. However even this minumum working example does not actually work. What am I doing wrong?
Sub email()
Selection.find.ClearFormatting
Selection.find.Replacement.ClearFormatting
If Selection.find.Text = "Chiara Gatta" Then
With Selection.find
.Text = "E-mail:"
.Replacement.Text = "E-mail: chiara.gatta#gmail.com"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchByte = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.find.Execute replace:=wdReplaceAll
End If
End Sub
The question lacks minimum details & code required for help. However I am trying to give you a code that would pickup person names & email addresses from one table in a document containing the code. the table should have 3 columns, 1st col contain Name of the person, 2nd col should contain Email address with 3rd column blank for remarks from code. See image
On running the code you would be prompted to select the word files that would be replaced by the email address. On trial use only copy files and may try only a handful of files at a time (if file sizes are large). It is assumed that files will contain Name and word “E-mail:” (if "E-mail:" word is not in the file try to modify the code as commented)
Code:
Sub test2()
Dim Fldg As FileDialog, Fl As Variant
Dim Thdoc As Document, Edoc As Document
Dim Tbl As Table, Rw As Long, Fnd As Boolean
Dim xName As String, xEmail As String
Set Thdoc = ThisDocument
Set Tbl = Thdoc.Tables(1)
Set Fldg = Application.FileDialog(msoFileDialogFilePicker)
With Fldg
.Filters.Clear
.Filters.Add "Word Documents ", "*.doc,*.dot,*docx,*.docm,*.dotm", 1
.AllowMultiSelect = True
.InitialFileName = "C:\users\user\desktop\folder1\*.doc*" 'use your choice of folder
If .Show <> -1 Then Exit Sub
End With
'Search for each Name in Table 1 column 1
For Rw = 1 To Tbl.Rows.Count
xName = Tbl.Cell(Rw, 1).Range.Text
xEmail = Tbl.Cell(Rw, 2).Range.Text
If Len(xName) > 2 And Len(xEmail) > 2 Then
xName = Left(xName, Len(xName) - 2) 'Clean special characters in word cell text
xEmail = Left(xEmail, Len(xEmail) - 2) 'Clean special characters in word cell text
'open each Document selected & search for names
For Each Fl In Fldg.SelectedItems
Set Edoc = Documents.Open(Fl)
Fnd = False
With Edoc.Content.Find
.ClearFormatting
.Text = xName
.Replacement.Text = xName & vbCrLf & "E-mail: " & xEmail
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceNone
'.Execute Replace:=wdReplaceOne
Fnd = .Found
End With
'if Word "E-mail is not already in the file, delete next if Fnd Branch"
' And use .Execute Replace:=wdReplaceOne instead of .Execute Replace:=wdReplaceNone
If Fnd Then ' If Name is found then Search for "E-Mail:"
Fnd = False
With Edoc.Content.Find
.ClearFormatting
.Text = "E-mail:"
.Replacement.Text = "E-mail: " & xEmail
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceOne
Fnd = .Found
End With
End If
If Fnd Then
Edoc.Save
Tbl.Cell(Rw, 3).Range.Text = "Found & Replaced in " & Fl
Exit For
Else
Tbl.Cell(Rw, 3).Range.Text = "Not found in any selected document"
End If
Edoc.Close False
Next Fl
End If
Next Rw
End Sub
it's operation would be like this. Try to understand each action in the code and modify to your requirement.

How to make a Python program automatically prints what matched after iterating through lists

I have this Python code:
with open('save.data') as fp:
save_data = dict([line.split(' = ') for line in fp.read().splitlines()])
with open('brute.txt') as fp:
brute = fp.read().splitlines()
for username, password in save_data.items():
if username in brute:
break
else:
print("didn't find the username")
Here is a quick explanation; the save.data is a file that contains variables of Batch-file game (such as username, hp etc...) and brute.txt is a file that contains "random" strings (like what seen in wordlists used for brute-force).
save.data:
username1 = PlayerName
password1 = PlayerPass
hp = 100
As i said before, it's a Batch-file game so, no need to quote strings
brute.txt:
username
usrnm
username1
password
password1
health
hp
So, let's assume that the Python file is a "game hacker" that "brute" a Batch-file's game save file in hope of finding matches and when it does find, it retrieves them and display them to the user.
## We did all the previous code
...
>>> print(save_data["username1"])
PlayerName
Success! we retrieved the variables! But I want to make the program capable of displaying the variables it self (because I knew that "username1" was the match, that's why I chose to print it). What I mean is, I want to make the program print the variables that matched. E.g: If instead of "username1" in save.data there was "usrnm", it will surely get recognized after the "bruting" process because it's already in brute.txt. So, how to make the program print what matched? because I don't know if it's "username" or "username1" etc... The program does :p (of course without opening save.data) And of course that doesn't mean the program will search only for the username, it's a game and there should be other variables like gold/coins, hp etc... If you didn't understand something, kindly comment it and I will clear it up, and thanks for your time!
Use a dict such as this:
with open('brute.txt', 'r') as f:
# First get all the brute file stuff
lookup_dic = {word.strip(): None for word in f.readlines()}
with open('save.data', 'r') as f:
# Update that dict with the stuff from the save.data
lines = (line.strip().split(' = ') for line in f.readlines())
for lookup, val in lines:
if lookup in lookup_dic:
print(f"{lookup} matched and its value is {val}")
lookup_dic[lookup] = val
# Now you have a complete lookup table.
print(lookup_dic)
print(lookup_dic['hp'])
Output:
username1 matched and its value is PlayerName
password1 matched and its value is PlayerPass
hp matched and its value is 100
{'username': None, 'usrnm': None, 'username1': 'PlayerName', 'password': None, 'password1': 'PlayerPass','health': None, 'hp': '100'}
100

Read email body text and put each line in some different variable

import imaplib
import re
mail = imaplib.IMAP4_SSL("imap.gmail.com", 993)
mail.login("****iot#gmail.com","*****iot")
while True:
mail.select("inbox")
status, response = mail.search(None,'(SUBJECT "Example")')
unread_msg_nums = response[0].split()
data = []
for e_id in unread_msg_nums:
_, response = mail.fetch(e_id, '(UID BODY[TEXT])')
data.append(response[0][1].decode("utf-8"))
str1 = ''.join(map(str,data))
#a = int(re.search(r"\d+",str1).group())
print(str1)
#for e_id in unread_msg_nums:
#mail.store(e_id, '+FLAGS', '\Seen')
When I **print str1 i have this:
Temperature:time,5
Lux:time,6
Distance:time,3
This is the text from email message and it's ok. It's configuration message for raspberry pi to do some things.
For temperature , lux and Distance i can set 1-10 number(minutes) for each of them, and that numbers represent time for example during which time something will happen in loop. This is all on the side of email message. How to put each line i some different variable, and check them later?
**For example**
string1= first line of message #Temperature:time,5
string2= second line of message #Lux:time,6
string3= third line of message #Distance:time,3
This is not fix, first line may be Lux, or may be Distance etc..
A job for regular expressions, really (this approach uses a dict comprehension):
import re
string = """
Temperature:time,5
Lux:time,6
Distance:time,3
"""
rx = re.compile(r'''^(?P<key>\w+):\s*(?P<value>.+)$''', re.MULTILINE)
cmds = {m.group('key'): m.group('value') for m in rx.finditer(string)}
print(cmds)
# {'Lux': 'time,6', 'Distance': 'time,3', 'Temperature': 'time,5'}
The order in which your commands occur does not matter but they need to be unique (otherwise they will get overwritten by the next match). Afterwards, you can get your values with eg. cmds['Lux']

Searching and sorting in text files

I am fairly new to code and i have a problem in reading a text file.
For my code i need to ask the user to type in a specific name code in order to proceed to the code. However, there are various name codes the user could use and i don't know how to make it so if you type either code in, you can proceed.
For example the text file looks like this
john123,x,x,x
susan233,x,x,x
conor,x,x,x
What i need to do is accept the name tag despite what one it is and be able to print it after. All the name tags are in one column.
file = open("paintingjobs.txt","r")
details = file.readlines()
for line in details:
estimatenum = input ("Please enter the estimate number.")
if estimatenum = line.split
This is my code so far, but i do not know what to do in terms of seeing if the name tag is valid to let the user proceed.
Here is another solution, without pickle. I'm assuming that your credentials are stored one per line. If not, you need to tell me how they are separated.
name = 'John'
code = '1234'
with open('file.txt', 'r') as file:
possible_match = [line.replace(name, '') for line in file if name in line]
authenticated = False
for item in possible_match:
if code in tmp: # Or, e.g. int(code) == int(tmp)
authenticated = True
break
You can use a module called pickle. This is a Python 3.0 internal library. In Python 2.0, it is called: cPickle; everything else is the same in both.
Be warned that the way you're doing this is not a secure approach!
from pickle import dump
credentials = {
'John': 1234,
'James': 4321,
'Julie': 6789
}
dump(credentials, open("credentials.p", "wb"))
This saves a file entitled credentials.p. You can the load this as follows:
from pickle import load
credentials = load(open("credentials.p", "rb"))
print(credentials)
Here are a couple of tests:
test_name = 'John'
test_code = 1234
This will amount to:
print('Test: ', credentials[test_name] == test_code)
which displays: {'John': 1234, 'James': 4321, 'Julie': 6789}
Displays: Test: True
test_code = 2343
print('Test:', credentials[test_name] == test_code)
Displays: Test: False

Python: How to loop through blocks of lines and copy specific text within lines

Input file:
DATE: 07/01/15 # 0800 HYRULE HOSPITAL PAGE 1
USER: LINK Antibiotic Resistance Report
--------------------------------------------------------------------------------------------
Activity Date Range: 01/01/15 - 02/01/15
--------------------------------------------------------------------------------------------
HH0000000001 LINK,DARK 30/M <DIS IN 01/05> (UJ00000001) A001-01 0A ZELDA,PRINCESS MD
15:M0000001R COMP, Coll: 01/02/15-0800 Recd: 01/02/15-0850 (R#00000001) ZELDA,PRINCESS MD
Source: SPUTUM
PSEUDOMONAS FLUORESCENS LEVOFLOXACIN >=8 R
--------------------------------------------------------------------------------------------
HH0000000002 FAIRY,GREAT 25/F <DIS IN 01/06> (UJ00000002) A002-01 0A ZELDA,PRINCESS MD
15:M0000002R COMP, Coll: 01/03/15-2025 Recd: 01/03/15-2035 (R#00000002) ZELDA,PRINCESS MD
Source: URINE- STRAIGHT CATH
PROTEUS MIRABILIS CEFTRIAXONE-other R
--------------------------------------------------------------------------------------------
HH0000000003 MAN,OLD 85/M <DIS IN 01/07> (UJ00000003) A003-01 0A ZELDA,PRINCESS MD
15:M0000003R COMP, Coll: 01/04/15-1800 Recd: 01/04/15-1800 (R#00000003) ZELDA,PRINCESS MD
Source: URINE-CLEAN VOIDED SPEC
ESCHERICHIA COLI LEVOFLOXACIN >=8 R
--------------------------------------------------------------------------------------------
Completely new to programming/scripting and Python. How do you recommend looping through this sample input to grab specific text in the fields?
Each patient has a unique identifier (e.g. HH0000000001). I want to grab specific text from each line.
Output should look like:
Date|Time|Name|Account|Specimen|Source|Antibiotic
01/02/15|0800|LINK, DARK|HH0000000001|PSEUDOMONAS FLUORESCENS|SPUTUM|LEVOFLOXACIN
01/03/15|2025|FAIRY, GREAT|HH0000000002|PROTEUS MIRABILIS|URINE- STRAIGHT CATH|CEFTRIAXONE-other
Edit: My current code looks like this:
(Disclaimer: I am fumbling around in the dark, so the code is not going to be pretty at all.
input = open('report.txt')
output = open('abx.txt', 'w')
date = '' # Defining global variables outside of the loop
time = ''
name = ''
name_last = ''
name_first = ''
account = ''
specimen = ''
source = ''
output.write('Date|Time|Name|Account|Specimen|Source\n')
lines = input.readlines()
for index, line in enumerate(lines):
print index, line
if last_line_location:
new_patient = True
if not first_time_through:
output.write("{}|{}|{}, {}|{}|{}|{}\n".format(
'Date', # temporary placeholder
'Time', # temporary placeholder
name_last.capitalize(),
name_first.capitalize(),
account,
'Specimen', # temporary placeholder
'Source' # temporary placeholder
) )
last_line_location = False
first_time_through = False
for each in lines:
if line.startswith('HH'): # Extract account and name
account = line.split()[0]
name = line.split()[1]
name_last = name.split(',')[0]
name_first = name.split(',')[1]
last_line_location = True
input.close()
output.close()
Currently, the output will skip the first patient and will only display information for the 2nd and 3rd patient. Output looks like this:
Date|Time|Name|Account|Specimen|Source
Date|Time|Fairy, Great|HH0000000002|Specimen|Source
Date|Time|Man, Old|HH0000000003|Specimen|Source
Please feel free to make suggestions on how to improve any aspect of this, including output style or overall strategy.
You code actually works if you add...
last_line_location = True
first_time_through = True
...before your for loop
You asked for pointers as well though...
As has been suggested in the comments, you could look at the re module.
I've knocked something together that shows this. It may not be suitable for all data because three records is a very small sample, and I've made some assumptions.
The last item is also quite contrived because there's nothing definite to search for (such as Coll, Source). It will fail if there are no spaces at the start of the final line, for example.
This code is merely a suggestion of another way of doing things:
import re
startflag = False
with open('report.txt','r') as infile:
with open('abx.txt','w') as outfile:
outfile.write('Date|Time|Name|Account|Specimen|Source|Antibiotic\n')
for line in infile:
if '---------------' in line:
if startflag:
outfile.write('|'.join((date, time, name, account, spec, source, anti))+'\n')
else:
startflag = True
continue
if 'Activity' in line:
startflag = False
acc_name = re.findall('HH\d+ \w+,\w+', line)
if acc_name:
account, name = acc_name[0].split(' ')
date_time = re.findall('(?<=Coll: ).+(?= Recd:)', line)
if date_time:
date, time = date_time[0].split('-')
source_re = re.findall('(?<=Source: ).+',line)
if source_re:
source = source_re[0].strip()
anti_spec = re.findall('^ +(?!Source)\w+ *\w+ + \S+', line)
if anti_spec:
stripped_list = anti_spec[0].strip().split()
anti = stripped_list[-1]
spec = ' '.join(stripped_list[:-1])
Output
Date|Time|Name|Account|Specimen|Source|Antibiotic
01/02/15|0800|LINK,DARK|HH0000000001|PSEUDOMONAS FLUORESCENS|SPUTUM|LEVOFLOXACIN
01/03/15|2025|FAIRY,GREAT|HH0000000002|PROTEUS MIRABILIS|URINE- STRAIGHT CATH|CEFTRIAXONE-other
01/04/15|1800|MAN,OLD|HH0000000003|ESCHERICHIA COLI|URINE-CLEAN VOIDED SPEC|LEVOFLOXACIN
Edit:
Obviously, the variables should be reset to some dummy value between writes on case of a corrupt record. Also, if there is no line of dashes after the last record it won't get written as it stands.

Categories