How to use grep on files in python

How to use grep on files in python - python

I am trying to do make a script that will show users in a group the problem is I don't know how to properly use grep in python. My code goes something like this:
with open("/etc/group" , "r") as f2:
for line in f2:
grouplist=line.split(":")
print grouplist[0]
group_choose = raw_input("Choose a group > ")
glist = "1)show users in group \n2)Show group ID \n3)Add user to the group
print
print glist
print
I want the "Show users in the group" script to be similar to the one I just did with opening the file and doing a grep to filter only the users in the group that will be mentioned in"group_choose". I would also love to hear an explanation on how you did it since I don't really know how to use grep in python.

My take on this would be to read the content of "/etc/group" and create a key/value list based on the groupe name.
Very crude example (with harcoded values, since I don't have access to a "/etc/group" file) :
line="G1:X:T2:u1,u2,u3"
groups=[]
users=line.split(":")[3].split(",")
groupname=line.split(":")[0]
groups.append([groupname,users])
for group in groups:
if group[0]=="G1":
print group[1]

Related

delete only 1 instance of a string from a file

I have a file that looks like this:
1234:AnneShirly:anneshirley#seneca.ca:4:5\[SRT111,OPS105,OPS110,SPR100,ENG100\]
3217:Illyas:illay#seneca.ca:2:4\[SRT211,OPS225,SPR200,ENG200\]
1127:john Marcus:johnmarcus#seneca.ca:1:4\[SRT111,OPS105,SPR100,ENG100\]
0001:Amin Malik:amin_malik#seneca.ca:1:3\[OPS105,SPR100,ENG100\]
I want to be able to ask the user for an input(the student number at the beginning of each line) and then ask which course they want to delete(the course codes are the list). So the program would delete the course from the list in the student number without deleting other instances of the course. Cause other students have the same courses.
studentid = input("enter studentid")
course = input("enter the course to delete")
with open("studentDatabase.dat") as file:
f = file.readlines()
with open("studentDatabase.dat","w") as file:
for line in lines:
if line.find(course) == -1:
file.write(line)
This just deletes the whole line but I only want to delete the course

Welcome to the site. You have a little ways to go to make this work. It would be good if you put some additional effort in to this before asking somebody to code this up. Let me suggest a structure for you that perhaps you can work on/augment and then you can re-post if you get stuck by editing your question above and/or commenting back on this answer. Here is a framework that I suggest:
make a section of code to read in your whole .dat file into memory. I would suggest putting the data into a dictionary that looks like this:
data = {1001: (name, email, <whatever the digits stand for>, [SRT111, OPS333, ...],
1044: ( ... )}
basically a dictionary with the ID as the key and the rest in a tuple or list. Test that, make sure it works OK by inspecting a few values.
Make a little "control loop" that uses your input statements, and see if you can locate the "record" from your dictionary. Add some "if" logic to do "something" if the ID is not found or if the user enters something like "quit" to exit/break the loop. Test it to make sure it can find the ID's and then test it again to see that it can find the course in the list inside the tuple/list with the data. You probably need another "if" statement in there to "do something" if the course is not in the data element. Test it.
Make a little "helper function" that can re-write a data element with the course removed. A suggested signature would be:
def remove_course(data_element, course):
# make the new data element (name, ... , [reduced course list]
return new_data_element
Test it, make sure it works.
Put those pieces together and you should have the ingredients to change the dictionary by using the loop and function to put the new data element into the dictionary, over-writing the old one.
Write a widget to write the new .dat file from the dictionary in its entirety.
EDIT:
You can make the dictionary from a data file with something like this:
filename = 'student_data.dat'
data = {} # an empty dictionary to stuff the results in
# use a context manager to handle opening/closing the file...
with open(filename, 'r') as src:
# loop through the lines
for line in src:
# strip any whitespace from the end and tokenize the line by ":"
tokens = line.strip().split(':')
# check it... (remove later)
print(tokens)
# gather the pieces, make conversions as necessary...
stu_id = int(tokens[0])
name = tokens[1]
email = tokens[2]
some_number = int(tokens[3])
# splitting the number from the list of courses is a little complicated
# you *could* do this more elegantly with regex, but for your level,
# here is a simple way to find the "chop points" and split this up...
last_blobs = tokens[4].split('[')
course_count = int(last_blobs[0])
course_list = last_blobs[1][:-1] # everything except the last bracket
# split up the courses by comma
courses = course_list.split(',')
# now stuff that into the dictionary...
# a little sanity check:
if data.get(stu_id):
print(f'duplicate ID found: {stu_id}. OVERWRITING')
data[stu_id] = (name,
email,
some_number,
course_count,
courses)
for key, value in data.items():
print(key, value)

i got something for you. What you want to do is to find the student first and then delete the course: like this.
studentid = input("enter studentid")
course = input("enter the course to delete")
with open("studentDatabase.dat") as file:
f = file.readlines()
with open("studentDatabase.dat","w") as file:
for line in lines:
if studentid in line: # Check if it's the right sudent
line = line.replace(course, "") # replace course with nothing
file.write(line)
You want to check if we are looking at the correct student, then replace the line but without the course code. Hope you can find it useful.

Getting the data from multiple lists within a file

I am trying to create an Instagram bot using python.
So my problem is that I have created a text file that will contain all the usernames of the people my bot follows and the text appears as follows.
These are the lines of code that I have used to append the file.
followers_list contains the list of all the users.
with open("file.txt", 'a') as file:
file.write(str(followers_list))
This is how the usernames are entered into the file.
["user1"]["user2"]["user3"]
Now I want to make a function that unfollows all the users present in the list. So I am going to need the username from these lists and I have been trying to find information on how to do that but I have not found anything useful. So I need suggestions on how to do that.

First of all I would suggest you to change: file.write(str(followers_list)) to file.write(",".join(str(followers_list))). Once that is done, you can simply read the file via with open("file.txt", 'r') as f: lines=f.read() And then make the for loop that you need: for username in lines.split(",").
This is fast code, maybe needs some debug, if you can edit the question and add some examples we will be able to help you more. Only with an example of the variable follower_list should be enough, feel free to add fake data.
Note: Also instead of commas, using a json format would be nice too.

Not entirely clear what is the problem here. Whether is your write function or your read function you're trying to fix. Assuming the problem is the write function something like this should get the results that you want I guess.
with open("file.txt", 'a') as file:
for follower in followers_list:
# assuming follower is a string therefore doesn't need to be converted
file.write(follower)
Otherwise if you need to pick the username from each list just use indexing when you are reading your follower_lists
e.g.
for follower_list in follower_lists:
follower = follower_list[0]

Right now you don't have a completely valid data structure in python, so modules like json and ast are going to be tricky. If you are regex-inclined, you could try the following:
import re
userstr = '["user1"]["user2"]["user3"]'
# capture everything except " in the group
re.findall('\[\"([^\"]+)\"\]', userstr)
['user1', 'user2', 'user3']
Where this will also work if there is a newline between user entries:
userstr = '''["user1"]["user2"]
["user3"]
'''
re.findall('\[\"([^\"]+)\"\]', userstr)
['user1', 'user2', 'user3']
Otherwise, I'd agree with #MarkMeyer and try to get these users in some sort of json file format or something that is a bit more compatible with built-in python data structures. One suggestion to make life easy would be just to format users.txt like so:
user1
user2
user3
...
Then you can just do:
with open('users.txt') as fh:
# this will create a list of users, and strip()
# removes leading/trailing whitespace
users = [user.strip() for user in fh]
And adding users is as simple as
with open('users.txt', 'a') as fh:
fh.write('userN')

Filtering for multiple text patterns and storing them and their respective occurrences

I'm new to python and to stackoverflow itself, it's my first post here.
I'm working with a log file that looks like this:
Feb 1 00:00:02 bridge kernel: INBOUND TCP: IN=br0 PHYSIN=eth0 OUT=br0 PHYSOUT=eth1 SRC=XXX.XXX.XXX.XXX DST=XXX.XXX.XXX.XXX LEN=40 TOS=0x00 PREC=0x00 TTL=110 ID=12973 PROTO=TCP SPT=220 DPT=6129 WINDOW=16384 RES=0x00 SYN URGP=0
I need to search for everything between the colons. In this line the pattern matched would be INBOUND TCP, but there are other types of patterns.
I have to search that field, store all unique type and how many times they occured in the file.
I already know how to open the file and use re.compile to parse it and i managed to save the unique results in another text file.
Reading the documentation i imagine that i need to use a dictionary with some sort of loop in order to store the different patterns and their occurence number.
Can someone help me?
Thank you if read this far.
#!/usr/bin/python3
import sys
import os
import re
p= re.compile ('bridge kernel:.*:')
with open (sys.argv[1], "r") as f:
with open ('tipos.txt',"w" ) as f2:
for line in f:
if p.search(line):
f2.write(line.split(":")[3] + '\n')
os.system('sort tipos.txt|uniq > tipos2.txt')
dict={}
with open (sys.argv[1],"r") as log:
with open ('tipos2.txt','r') as f:
for l in f:
if f in log:
dict={"(f.line)", "(len(log))"}
print (dict)

First of all you shouldn't call your dictionary dict because it is already an existing keyword in python (The dict() constructor builds dictionaries directly from sequences of key-value pairs).
This line dict={"(f.line)", "(len(log))"} is incorrect, the curly brackets used like this mean you are actually defining a new set containing two strings, and not the variables you want - they are in quotes.
The declaration of the empty dictionary itself is fine.
To add values to an existing dictionary use dictName[key] = value. To declare a dictionary with value pairs use dictName = {key1 : value1, key2 : value2} etc.

How can I extract specific data from e-prime output (.txt file)

Been learning Python the last couple of days for the function of completing a data extraction. I'm not getting anywhere & hope one of you lovely people can advise.
I need to extract data that follows: RESP, CRESP, RTTime and RT.
Here's a snippit for an example of the mess I have to deal with.
Thoughts?
Level: 4
*** LogFrame Start ***
Procedure: ActProcScenarios
No: 1
Line1: It is almost time for your town's spring festival. A friend of yours is
Line2: on the committee and asks if you would be prepared to help out with the
Line3: barbecue in the park. There is a large barn for use if it rains.
Line4: You hope that on that day it will be
pfrag: s-n-y
pword: sunny
pletter: u
Quest: Does the town have an autumn festival?
Correct: {LEFTARROW}
ScenarioListPract: 1
Topic: practice
Subtheme: practice
ActPracScenarios: 1
Running: ActPracScenarios
ActPracScenarios.Cycle: 1
ActPracScenarios.Sample: 1
DisplayFragInstr.OnsetDelay: 17
DisplayFragInstr.OnsetTime: 98031
DisplayFragInstr.DurationError: -999999
DisplayFragInstr.RTTime: 103886
DisplayFragInstr.ACC: 0
DisplayFragInstr.RT: 5855
DisplayFragInstr.RESP: {DOWNARROW}
DisplayFragInstr.CRESP:
FragInput.OnsetDelay: 13
FragInput.OnsetTime: 103899
FragInput.DurationError: -999999
FragInput.RTTime: 104998

I think regular expressions would be the right tool here because the \b word boundary anchors allow you to make sure that RESP only matches a whole word RESP and not just part of a longer word (like CRESP).
Something like this should get you started:
>>> import re
>>> for line in myfile:
... match = re.search(r"\b(RT|RTTime|RESP|CRESP): (.*)", line)
... if match:
... print("Matched {0} with value {1}".format(match.group(1),
... match.group(2)))
Output:
Matched RTTime with value 103886
Matched RT with value 5855
Matched RESP with value {DOWNARROW}
Matched CRESP with value
Matched RTTime with value 104998

transform it to a dict first, then just get items from the dict as you wish
d = {k.strip(): v.strip() for (k, v) in
[line.split(':') for line in s.split('\n') if line.find(':') != -1]}
print (d['DisplayFragInstr.RESP'], d['DisplayFragInstr.CRESP'],
d['DisplayFragInstr.RTTime'], d['DisplayFragInstr.RT'])
>>> ('{DOWNARROW}', '', '103886', '5855')

I think you may be making things harder for yourself than needed. E-prime has a file format called .edat that is designed for the purpose you are describing. An edat file is another format that contains the same information as the .txt file but it a way that makes extracting variables easier. I personally only use the type of text file you have posted here as a form of data storage redundancy.
If you are doing things this way because you do not have a software key, it might help to know that the E-Merge and E-DataAid programs for eprime don't require a key. You only need the key for editing build files. Whoever provided you with the .txt files should probably have an install disk for these programs. If not, it is available on the PST website (I believe you need a serial code to create an account, but not certain)
Eprime generally creates a .edat file that matches the content of the text file you have posted an example of. Sometimes though if eprime crashes you don't get the edat file and only have the .txt. Luckily you can generate the edat file from the .txt file.
Here's how I would approach this issue: If you do not have the edat files available first use E-DataAid to recover the files.
Then presuming you have multiple participants you can use e-merge to merge all of the edat files together for all participants in who completed this task.
Open the merged file. It might look a little chaotic depending on how much you have in the file. You can got to Go to tools->Arrange columns This will show a list of all your variables. Adjust so that only the desired variables are in the right hand box. Hit ok.
Looking at the file you posted it says level 4 at the top so I'm guessing there are a lot of procedures in this experiment. If you have many procedures in the program you might at this point have lines that just have startup info and NULL in the locations where your variables or interest are. You and fix this by going to tools->filter and creating a filter to eliminate those lines. Sometimes also depending on file structure you might also end up with duplicate lines of the same data. You can also fix this with filtering.
You can then export this file as a csv

import re
import pprint
def parse_logs(file_name):
with open(file_name, "r") as f:
lines = [line.strip() for line in f.readlines()]
base_regex = r'^.*{0}: (.*)$'
match_terms = ["RESP", "CRESP", "RTTime", "RT"]
regexes = {term: base_regex.format(term) for term in match_terms}
output_list = []
for line in lines:
for key, regex in regexes.items():
match = re.match(regex, line)
if match:
match_tuple = (key, match.groups()[0])
output_list.append(match_tuple)
return output_list
pprint.pprint(parse_logs("respregex"))
Edit: Tim and Guy's answers are both better. I was in a hurry to write something and missed two much more elegant solutions.

Trying to create a list of users in AD

So, I've created a script that searches AD for a list of users in a specific OU, and outputs this to a text file. I need to format this text file. The top OU I'm searching contains within it an OU for each location of this company, containing the user accounts for that location.
Here's my script:
import active_directory
import sys
sys.stdout = open('output.txt', 'w')
users = active_directory.AD_object ("LDAP://ou=%company%,dc=%domain%,dc=%name%
for user in users.search (objectCategory='Person'):
print user
sys.stdout.close()
Here's what my output looks like, and there's just 20-something lines of this for each different user:
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
So, what I want to do is just to put this in plain English, make it easier to read, just by showing the username and the subset OU. So this:
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
Becomes THIS:
%username%, %location%
If there's any way to export this to .csv or a .xls to put into columns that can be sorted by location or just alphabetical order, that would be GREAT. I had one hell of a time just figuring out the text file.

If you have a string like this
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
Then manipulating it is quite easy. If the format is standard and doesn't change, the fastest way to manipulate it would just be to use string.split()
>>> splitted = "LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%".split('=')
yields a list
>>> splitted
["LDAP://CN",
"%username%, OU",
"%location%, OU",
"%company%, dc",
"%domain%, dc",
"%name%"]
Now we can access the items of the list
>>> splitted[1]
"%username%, OU"
To get rid of the ", OU", we'll need to do another split.
>>> username = splitted[1].split(", OU")[0]
>>> username
%username%
CSV is just a text file, so all you have to do is change your file ending. Here's a full example.
output = open("output.csv","w")
users = active_directory.AD_object ("LDAP://ou=%company%,dc=%domain%,dc=%name%
for user in users.search (objectCategory='Person'):
# Because the AD_object.search() returns another AD_object
# we cannot split it. We need the string representation
# of this AD object, and thus have to wrap the user in str()
splitteduser = str(user).split('=')
username = splitteduser[1].split(", OU")[0]
location = splitteduser[2].split(", OU")[0]
output.write("%s, %s\n"%(username,location))
% \n is a line ending
% The above is the old way to format strings, but it looks simpler.
% Correct way would be:
% output.write("{0}, {1}\n".format(username,location))
output.close()
It's not the prettiest solution around, but it should be easy enough to understand.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.