regex from txt file python - python

I'm building a task manager and I need to find tasks in a txt file assigned to a user who is logged in. I added the output from my txt file to a variable counts and then I use regex to find the needed txt .My problem comes in when I try and match the output with re.findall().
this is the text file I read from
admin, Register Users with taskManager.py, Use taskManager.py to add the usernames and passwords for all team members that will be using this program., 10 Oct 2019, 20 Oct 2019, No
admin, Assign initial tasks, Use taskManager.py to assign each team member with appropriate tasks, 10 Oct 2019, 25 Oct 2019, No
chris, build task manager, finish manager, 29 June 2022, 22 May 2025, No
tony, build new suit, builing a new iron man suit, 5 July 2022, 25 October 2023, no
this is my code
elif menu == "vm":
#find_tasks = input("Please enter your username: ")
with open(r"C:\Users\27711\Desktop\PROGRAMMING\Bootcamp lvl 1\task 20 Capstone\tasks.txt", 'r') as file3:
for line in file3:
iteration = line.split(", ")
counts = f'''\nAssigned to:\t {iteration[0]}
\nTask:\t {iteration[1]}
\nDate assigned:\t {iteration[3]}
\nDue Date:\t {iteration[4]}
\nTask Complete:\t {iteration[5]}
\nTask Description: \n{iteration[2]}\n'''
found = re.findall('Task:\s*(.*)\s*Assigned to:.\s*(admin)\s*Date assigned:\s*(.*)\s*Due Date:\s*(.*)\s*Task Complete:\s*(.*)\s*Task Description:\s*(.*)', counts)
for item in found:
print(f'''\nTask:\t {item[1]}
\nAssigned to:\t {item[0]}
\nDate assigned:\t {item[2]}
\nDue Date:\t {item[0]}
\nTask Complete:\t {item[0]}
\nTask Description: \n{item[0]}\n''')
So my code made the txt file contents into this
Task: Register Users with taskManager.py
Assigned to: admin
Date assigned: 10 Oct 2019
Due Date: 20 Oct 2019
Task Complete: No
Task Description:
Use taskManager.py to add the usernames and passwords for all team members that will be using this program.
Why wont my regex work on this I need to print only the tasks that is assigned to the user thats logged in

Related

How to Download Files based on a pattern in Python when using FTP_TLS [duplicate]

This question already has answers here:
Download files from an FTP server containing given string using Python
(3 answers)
Closed 2 years ago.
I'm actually trying to do FTP and get all the files with a specific File pattern using python.
I was able to do FTP and download a single file form ftp server as below but, can suggest me how do I download all files on that ftp server based on file pattern 'sample_'.
ftps.login(user='abc#xyz.com', passwd = 'PassWord')
'230 User logged in, proceed.'
>>> ftps.retrlines('LIST') # List Directories
drwx------ 1 owner group 0 Apr 29 11:27 Get Started
drwx------ 1 owner group 0 Jul 7 02:29 Folder_1
'226 Closing data connection.'
>>> ftps.cwd('Folder_1') # Change Directory
'250 Directory changed to /Folder_1'
>>> ftps.retrlines('LIST') # List items in the directory
drwx------ 1 owner group 0 Jul 7 02:29 .
drwx------ 1 owner group 0 Jan 30 1970 ..
-rw------- 1 owner group 491 Jul 2 14:04 smaple_test1.csv
-rw------- 1 owner group 365 Jul 7 02:22 smaple_test2.csv
-rw------- 1 owner group 9948 Jun 30 14:34 smaple_test3.csv
-rw------- 1 owner group 9948 Jun 30 14:34 note1.csv
-rw------- 1 owner group 9948 Jun 30 14:34 note2.csv
'226 Closing data connection.'
>>> ftps.retrbinary('RETR smaple_test1.csv', gFile.write) # Download a single file
'226 Transfer complete.'
ftps = FTP_TLS(host)
ftps.login(user = "ftp.url.com", passwd ="password")
ftps.cwd('sdir') # Change Directory on FTP Server
files = ftps.nlst(sfrmt)
for f in files:
with open(f, 'wb') as fh:
ftps.retrbinary('RETR ' + f, fh.write)

There is redundant "last login info" in the stdout of fabric.operations.sudo

When I run the fabric.operations.sudo to get the info from a remote VM (its kernel is 4.14.35 EL7.6), such as "date +%s", the excepted result should be "1549853543", but in my test, it's "Last login: Mon Feb 11 02:53:18 UTC 2019 on pts/0\r\n1549853543".
I have run the command "ssh user#vm 'date +%s'", the result is normal(only the number).
Does anyone know what's the reason? I have also fixed the "PrintLastLog" to "no" in the /etc/ssh/sshd_config.
result = sudo('date +%s').stdout.strip()
run_time = int(result) => exception occurs
Except: 1549853543
Actual: invalid literal for int() with base 10: 'Last login: Mon Feb 11 02:53:18 UTC 2019 on pts/0\r\n1549853543'
Fix the 2 places it seems the last login info disappear:
/etc/pam.d/system-auth:
session required pam_lastlog.so silent showfailed
2: /etc/ssh/sshd_config:
# Per CCE-CCE-80225-6: Set PrintLastLog yes in /etc/ssh/sshd_config
PrintLastLog no

Get the latest FTP folder name in Python

I am trying to write a script to get the latest file from the latest sub-
directory of FTP server in Python. My problem is I am unable to figure out the
latest sub-directory. There are two options available, sub-directories have ctime available. Also in directory name date is mentioned that on which date directory was created. But I do not know how to get the name of the latest directory. I have figured out the following way (hoping for the server side to be sorted by latest ctime). I have done it the following way which will work if first object is the latest directory.
import ftplib
import os
import time
ftp = ftplib.FTP('test.rebex.net','demo', 'password')
ftp.cwd(str((ftp.nlst())[0])) #if directory is sorted in descending order by date.
But is there any way where I will find the exact directory by ctime or by date in directory name ?
Thanks a lot guys.
If your FTP server supports MLSD command, a solution is easy:
If you want to base the decision on a modification timestamp:
entries = list(ftp.mlsd())
# Only interested in directories
entries = [entry for entry in entries if entry[1]["type"] == "dir"]
# Sort by timestamp
entries.sort(key = lambda entry: entry[1]['modify'], reverse = True)
# Pick the first one
latest_name = entries[0][0]
print(latest_name)
If you want to use a file name:
# Sort by filename
entries.sort(key = lambda entry: entry[0], reverse = True)
If you need to rely on an obsolete LIST command, you have to parse a proprietary listing it returns.
A common *nix listing is like:
drw-r--r-- 1 user group 4096 Mar 26 2018 folder1-20180326
drw-r--r-- 1 user group 4096 Jun 18 11:21 folder2-20180618
-rw-r--r-- 1 user group 4467 Mar 27 2018 file-20180327.zip
-rw-r--r-- 1 user group 124529 Jun 18 15:31 file-20180618.zip
With a listing like this, this code will do:
If you want to base the decision on a modification timestamp:
lines = []
ftp.dir("", lines.append)
latest_time = None
latest_name = None
for line in lines:
tokens = line.split(maxsplit = 9)
# Only interested in directories
if tokens[0][0] == "d":
time_str = tokens[5] + " " + tokens[6] + " " + tokens[7]
time = parser.parse(time_str)
if (latest_time is None) or (time > latest_time):
latest_name = tokens[8]
latest_time = time
print(latest_name)
If you want to use a file name:
lines = []
ftp.dir("", lines.append)
latest_name = None
for line in lines:
tokens = line.split(maxsplit = 9)
# Only interested in directories
if tokens[0][0] == "d":
name = tokens[8]
if (latest_name is None) or (name > latest_name):
latest_name = name
print(latest_name)
Some FTP servers may return . and .. entries in LIST results. You may need to filter those.
Partially based on: Python FTP get the most recent file by date.
If the folder does not contain any files, only subfolders, there are other easier options.
If you want to base the decision on a modification timestamp and the server supports non-standard -t switch, you can use:
lines = ftp.nlst("-t")
latest_name = lines[-1]
See How to get files in FTP folder sorted by modification time
If you want to use a file name:
lines = ftp.nlst()
latest_name = max(lines)

Splitting a list of twitter data

I have a file full of hundreds of un-separated tweets all formatted like so:
{"text": "Just posted a photo # Navarre Conference Center", "created_at": "Sun Nov 13 01:52:03 +0000 2016", "coordinates": [-86.8586, 30.40299]}
I am trying to split them up so I can assign each part to a variable.
The text
The timestamp
The location coordinates
I was able to split the tweets up using .split('{}') but I don't really know how to split the rest into the three things that I want.
My basic idea that didn't work:
file = open('tweets_with_time.json' , 'r')
line = file.readline()
for line in file:
line = line.split(',')
message = (line[0])
timestamp = (line[1])
position = (line[2])
#just to test if it's working
print(position)
Thanks!
I just downloaded your file, it's not as bad as you said. Each tweet is on a separate line. It would be nicer if the file was a JSON list, but we can still parse it fairly easily, line by line. Here's an example that extracts the 1st 10 tweets.
import json
fname = 'tweets_with_time.json'
with open(fname) as f:
for i, line in enumerate(f, 1):
# Convert this JSON line into a Python dict
data = json.loads(line)
# Extract the data
message = data['text']
timestamp = data['created_at']
position = data['coordinates']
# Print it
print(i)
print('Message:', message)
print('Timestamp:', timestamp)
print('Position:', position)
print()
#Only print the first 10 tweets
if i == 10:
break
Unfortunately, I can't show the output of this script: Stack Exchange won't allow me to put those shortened URLs into a post.
Here's a modified version that cuts off each message at the URL.
import json
fname = 'tweets_with_time.json'
with open(fname) as f:
for i, line in enumerate(f, 1):
# Convert this JSON line to a Python dict
data = json.loads(line)
# Extract the data
message = data['text']
timestamp = data['created_at']
position = data['coordinates']
# Remove the URL from the message
idx = message.find('https://')
if idx != -1:
message = message[:idx]
# Print it
print(i)
print('Message:', message)
print('Timestamp:', timestamp)
print('Position:', position)
print()
#Only print the first 10 tweets
if i == 10:
break
output
1
Message: Just posted a photo # Navarre Conference Center
Timestamp: Sun Nov 13 01:52:03 +0000 2016
Position: [-86.8586, 30.40299]
2
Message: I don't usually drink #coffee, but I do love a good #Vietnamese drip coffee with condense milk…
Timestamp: Sun Nov 13 01:52:04 +0000 2016
Position: [-123.04437109, 49.26211779]
3
Message: #bestcurryπŸ’₯πŸ‘£πŸ‘ŒπŸ½πŸ˜ŽπŸ€‘πŸ‘πŸ½πŸ‘πŸΌπŸ‘ŠπŸΌβ˜πŸ½πŸ™ŒπŸΌπŸ’ͺπŸΌπŸŒ΄πŸŒΊπŸŒžπŸŒŠπŸ·πŸ‰πŸπŸŠπŸΌπŸ„πŸ½πŸ‹πŸ½πŸŒβœˆοΈπŸ’ΈβœπŸ’―πŸ†’πŸ‡ΏπŸ‡¦πŸ‡ΊπŸ‡ΈπŸ™πŸΌ#johanvanaarde #kauai #rugby #surfing…
Timestamp: Sun Nov 13 01:52:04 +0000 2016
Position: [-159.4958861, 22.20321232]
4
Message: #thatonePerezwedding πŸ’πŸ’ # Scenic Springs
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-98.68685568, 29.62182898]
5
Message: Miami trends now: Heat, Wade, VeteransDay, OneLetterOffBands and TheyMightBeACatfishIf.
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-80.19240081, 25.78111669]
6
Message: Thank you family for supporting my efforts. I love you all!…
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-117.83012, 33.65558157]
7
Message: If you're looking for work in #HONOLULU, HI, check out this #job:
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-157.7973653, 21.2868901]
8
Message: Drinking a L'Brett d'Apricot by #CrookedStave # FOBAB β€”
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-87.6455, 41.8671]
9
Message: Can you recommend anyone for this #job? Barista (US) -
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-121.9766823, 38.350109]
10
Message: He makes me happy # Frank and Bank
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-75.69360487, 45.41268776]
It looks like well-formatted JSON data. Try the following:
import json
from pprint import pprint
file_ptr = open('tweets_with_time.json' , 'r')
data = json.load(file_ptr)
pprint(data)
It should parse your data into a nice Python dictionary. You can access the elements by their names like:
# Return the first 'coordinates' data point as a list of floats
data[0]["coordinates"]
# Return the 5th 'text' data point as a string
data[4]["text"]

Parsing email headers with regular expressions in python

I'm a python beginner trying to extract data from email headers. I have thousands of email messages in a single text file, and from each message I want to extract the sender's address, recipient(s) address, and the date, and write it to a single, semicolon-delimitted line in a new file.
this is ugly, but it's what I've come up with:
import re
emails = open("demo_text.txt","r") #opens the file to analyze
results = open("results.txt","w") #creates new file for search results
resultsList = []
for line in emails:
if "From - " in line: #recgonizes the beginning of a email message and adds a linebreak
newMessage = re.findall(r'\w\w\w\s\w\w\w.*', line)
if newMessage:
resultsList.append("\n")
if "From: " in line:
address = re.findall(r'[\w.-]+#[\w.-]+', line)
if address:
resultsList.append(address)
resultsList.append(";")
if "To: " in line:
if "Delivered-To:" not in line: #avoids confusion with 'Delivered-To:' tag
address = re.findall(r'[\w.-]+#[\w.-]+', line)
if address:
for person in address:
resultsList.append(person)
resultsList.append(";")
if "Date: " in line:
date = re.findall(r'\w\w\w\,.*', line)
resultsList.append(date)
resultsList.append(";")
for result in resultsList:
results.writelines(result)
emails.close()
results.close()
and here's my 'demo_text.txt':
From - Sun Jan 06 19:08:49 2013
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Delivered-To: somebody_1#hotmail.com
Received: by 10.48.48.3 with SMTP id v3cs417003nfv;
Mon, 15 Jan 2007 10:14:19 -0800 (PST)
Received: by 10.65.211.13 with SMTP id n13mr5741660qbq.1168884841872;
Mon, 15 Jan 2007 10:14:01 -0800 (PST)
Return-Path: <nobody#hotmail.com>
Received: from bay0-omc3-s21.bay0.hotmail.com (bay0-omc3-s21.bay0.hotmail.com [65.54.246.221])
by mx.google.com with ESMTP id e13si6347910qbe.2007.01.15.10.13.58;
Mon, 15 Jan 2007 10:14:01 -0800 (PST)
Received-SPF: pass (google.com: domain of nobody#hotmail.com designates 65.54.246.221 as permitted sender)
Received: from hotmail.com ([65.54.250.22]) by bay0-omc3-s21.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668);
Mon, 15 Jan 2007 10:13:48 -0800
Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC;
Mon, 15 Jan 2007 10:13:47 -0800
Message-ID: <BAY115-F12E4E575FF2272CF577605A1B50#phx.gbl>
Received: from 65.54.250.200 by by115fd.bay115.hotmail.msn.com with HTTP;
Mon, 15 Jan 2007 18:13:43 GMT
X-Originating-IP: [200.122.47.165]
X-Originating-Email: [nobody#hotmail.com]
X-Sender: nobody#hotmail.com
From: =?iso-8859-1?B?UGF1bGEgTWFy7WEgTGlkaWEgRmxvcmVuemE=?=
<nobody#hotmail.com>
To: somebody_1#hotmail.com, somebody_2#gmail.com, 3_nobodies#yahoo.com.ar
Bcc:
Subject: fotos
Date: Mon, 15 Jan 2007 18:13:43 +0000
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_d98_1c4f_3aa9"
X-OriginalArrivalTime: 15 Jan 2007 18:13:47.0572 (UTC) FILETIME=[E68D4740:01C738D0]
Return-Path: nobody#hotmail.com
The output is:
somebody_1#hotmail.com;somebody_2#gmail.com;3_nobodies#yahoo.com.ar;Mon, 15 Jan 2007 18:13:43 +0000;
This output would be fine except there's a line break in the 'From:' field in my demo_text.txt (line 24), and so I miss 'nobody#hotmail.com'.
I'm not sure how to tell my code to skip line break and still find email address in the From: tag.
More generally, I'm sure there are many more sensible ways to go about this task. If anyone could point me in the right direction, I'd sure appreciate it.
Your demo text is practicallly the mbox format, which can be perfectly processed with the appropriate object in the mailbox module:
from mailbox import mbox
import re
PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\#[0-9A-Za-z._-]+")
mymbox = mbox("demo.txt")
for email in mymbox.values():
from_address = PAT_EMAIL.findall(email["from"])
to_address = PAT_EMAIL.findall(email["to"])
date = [ email["date"], ]
print ";".join(from_address + to_address + date)
In order to skip newlines, you can't read it line by line. You can try loading in your file, and using your keywords (From, To, etc.) as boundaries. So when you search for 'From -', you use the rest of your keywords as boundaries so they won't be included in the portion of the list.
Also, mentioning this cause you said you're a beginner:
The "Pythonic" way of naming your non-class variables is with underscores. So resultsList should be results_list.

Categories