So, I've created a script that searches AD for a list of users in a specific OU, and outputs this to a text file. I need to format this text file. The top OU I'm searching contains within it an OU for each location of this company, containing the user accounts for that location.
Here's my script:
import active_directory
import sys
sys.stdout = open('output.txt', 'w')
users = active_directory.AD_object ("LDAP://ou=%company%,dc=%domain%,dc=%name%
for user in users.search (objectCategory='Person'):
print user
sys.stdout.close()
Here's what my output looks like, and there's just 20-something lines of this for each different user:
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
So, what I want to do is just to put this in plain English, make it easier to read, just by showing the username and the subset OU. So this:
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
Becomes THIS:
%username%, %location%
If there's any way to export this to .csv or a .xls to put into columns that can be sorted by location or just alphabetical order, that would be GREAT. I had one hell of a time just figuring out the text file.
If you have a string like this
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
Then manipulating it is quite easy. If the format is standard and doesn't change, the fastest way to manipulate it would just be to use string.split()
>>> splitted = "LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%".split('=')
yields a list
>>> splitted
["LDAP://CN",
"%username%, OU",
"%location%, OU",
"%company%, dc",
"%domain%, dc",
"%name%"]
Now we can access the items of the list
>>> splitted[1]
"%username%, OU"
To get rid of the ", OU", we'll need to do another split.
>>> username = splitted[1].split(", OU")[0]
>>> username
%username%
CSV is just a text file, so all you have to do is change your file ending. Here's a full example.
output = open("output.csv","w")
users = active_directory.AD_object ("LDAP://ou=%company%,dc=%domain%,dc=%name%
for user in users.search (objectCategory='Person'):
# Because the AD_object.search() returns another AD_object
# we cannot split it. We need the string representation
# of this AD object, and thus have to wrap the user in str()
splitteduser = str(user).split('=')
username = splitteduser[1].split(", OU")[0]
location = splitteduser[2].split(", OU")[0]
output.write("%s, %s\n"%(username,location))
% \n is a line ending
% The above is the old way to format strings, but it looks simpler.
% Correct way would be:
% output.write("{0}, {1}\n".format(username,location))
output.close()
It's not the prettiest solution around, but it should be easy enough to understand.
Related
I have a file that looks like this:
1234:AnneShirly:anneshirley#seneca.ca:4:5\[SRT111,OPS105,OPS110,SPR100,ENG100\]
3217:Illyas:illay#seneca.ca:2:4\[SRT211,OPS225,SPR200,ENG200\]
1127:john Marcus:johnmarcus#seneca.ca:1:4\[SRT111,OPS105,SPR100,ENG100\]
0001:Amin Malik:amin_malik#seneca.ca:1:3\[OPS105,SPR100,ENG100\]
I want to be able to ask the user for an input(the student number at the beginning of each line) and then ask which course they want to delete(the course codes are the list). So the program would delete the course from the list in the student number without deleting other instances of the course. Cause other students have the same courses.
studentid = input("enter studentid")
course = input("enter the course to delete")
with open("studentDatabase.dat") as file:
f = file.readlines()
with open("studentDatabase.dat","w") as file:
for line in lines:
if line.find(course) == -1:
file.write(line)
This just deletes the whole line but I only want to delete the course
Welcome to the site. You have a little ways to go to make this work. It would be good if you put some additional effort in to this before asking somebody to code this up. Let me suggest a structure for you that perhaps you can work on/augment and then you can re-post if you get stuck by editing your question above and/or commenting back on this answer. Here is a framework that I suggest:
make a section of code to read in your whole .dat file into memory. I would suggest putting the data into a dictionary that looks like this:
data = {1001: (name, email, <whatever the digits stand for>, [SRT111, OPS333, ...],
1044: ( ... )}
basically a dictionary with the ID as the key and the rest in a tuple or list. Test that, make sure it works OK by inspecting a few values.
Make a little "control loop" that uses your input statements, and see if you can locate the "record" from your dictionary. Add some "if" logic to do "something" if the ID is not found or if the user enters something like "quit" to exit/break the loop. Test it to make sure it can find the ID's and then test it again to see that it can find the course in the list inside the tuple/list with the data. You probably need another "if" statement in there to "do something" if the course is not in the data element. Test it.
Make a little "helper function" that can re-write a data element with the course removed. A suggested signature would be:
def remove_course(data_element, course):
# make the new data element (name, ... , [reduced course list]
return new_data_element
Test it, make sure it works.
Put those pieces together and you should have the ingredients to change the dictionary by using the loop and function to put the new data element into the dictionary, over-writing the old one.
Write a widget to write the new .dat file from the dictionary in its entirety.
EDIT:
You can make the dictionary from a data file with something like this:
filename = 'student_data.dat'
data = {} # an empty dictionary to stuff the results in
# use a context manager to handle opening/closing the file...
with open(filename, 'r') as src:
# loop through the lines
for line in src:
# strip any whitespace from the end and tokenize the line by ":"
tokens = line.strip().split(':')
# check it... (remove later)
print(tokens)
# gather the pieces, make conversions as necessary...
stu_id = int(tokens[0])
name = tokens[1]
email = tokens[2]
some_number = int(tokens[3])
# splitting the number from the list of courses is a little complicated
# you *could* do this more elegantly with regex, but for your level,
# here is a simple way to find the "chop points" and split this up...
last_blobs = tokens[4].split('[')
course_count = int(last_blobs[0])
course_list = last_blobs[1][:-1] # everything except the last bracket
# split up the courses by comma
courses = course_list.split(',')
# now stuff that into the dictionary...
# a little sanity check:
if data.get(stu_id):
print(f'duplicate ID found: {stu_id}. OVERWRITING')
data[stu_id] = (name,
email,
some_number,
course_count,
courses)
for key, value in data.items():
print(key, value)
i got something for you. What you want to do is to find the student first and then delete the course: like this.
studentid = input("enter studentid")
course = input("enter the course to delete")
with open("studentDatabase.dat") as file:
f = file.readlines()
with open("studentDatabase.dat","w") as file:
for line in lines:
if studentid in line: # Check if it's the right sudent
line = line.replace(course, "") # replace course with nothing
file.write(line)
You want to check if we are looking at the correct student, then replace the line but without the course code. Hope you can find it useful.
As the title mentions, my issue is that I don't understand quite how to extract the data I need for my table (The columns for the table I need are Date, Time, Courtroom, File Number, Defendant Name, Attorney, Bond, Charge, etc.)
I think regex is what I need but my class did not go over this, so I am confused on how to parse in order to extract and output the correct data into an organized table...
I am supposed to turn my text file from this
https://pastebin.com/ZM8EPu0p
and export it into a more readable format like this- example output is below
Here is what I have so far.
def readFile(court):
csv_rows = []
# read and split txt file into pages & chunks of data by pagragraph
with open(court, "r") as file:
data_chunks = file.read().split("\n\n")
for chunk in data_chunks:
chunk = chunk.strip # .strip removes useless spaces
if str(data_chunks[:4]).isnumeric(): # if first 4 characters are digits
entry = None # initialize an empty dictionary
elif (
str(data_chunks).isspace() and entry
): # if we're on an empty line and the entry dict is not empty
csv_rows.DictWriter(dialect="excel") # turn csv_rows into needed output
entry = {}
else:
# parse here?
print(data_chunks)
return csv_rows
readFile("/Users/mia/Desktop/School/programming/court.txt")
It is quite a lot of work to achieve that, but it is possible. If you split it in a couple of sub-tasks.
First, your input looks like a text file so you could parse it line by line. -- using https://www.w3schools.com/python/ref_file_readlines.asp
Then, I noticed that your data can be split in pages. You would need to prepare a lot of regular expressions, but you can start with one for identifying where each page starts. -- you may want to read this as your expression might get quite complicated: https://www.w3schools.com/python/python_regex.asp
The goal of this step is to collect all lines from a page in some container (might be a list, dict, whatever you find it suitable).
And afterwards, write some code that parses the information page by page. But for simplicity I suggest to start with something easy, like the columns for "no, file number and defendant".
And when you got some data in a reliable manner, you can address the export part, using pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
I am trying to create an Instagram bot using python.
So my problem is that I have created a text file that will contain all the usernames of the people my bot follows and the text appears as follows.
These are the lines of code that I have used to append the file.
followers_list contains the list of all the users.
with open("file.txt", 'a') as file:
file.write(str(followers_list))
This is how the usernames are entered into the file.
["user1"]["user2"]["user3"]
Now I want to make a function that unfollows all the users present in the list. So I am going to need the username from these lists and I have been trying to find information on how to do that but I have not found anything useful. So I need suggestions on how to do that.
First of all I would suggest you to change: file.write(str(followers_list)) to file.write(",".join(str(followers_list))). Once that is done, you can simply read the file via with open("file.txt", 'r') as f: lines=f.read() And then make the for loop that you need: for username in lines.split(",").
This is fast code, maybe needs some debug, if you can edit the question and add some examples we will be able to help you more. Only with an example of the variable follower_list should be enough, feel free to add fake data.
Note: Also instead of commas, using a json format would be nice too.
Not entirely clear what is the problem here. Whether is your write function or your read function you're trying to fix. Assuming the problem is the write function something like this should get the results that you want I guess.
with open("file.txt", 'a') as file:
for follower in followers_list:
# assuming follower is a string therefore doesn't need to be converted
file.write(follower)
Otherwise if you need to pick the username from each list just use indexing when you are reading your follower_lists
e.g.
for follower_list in follower_lists:
follower = follower_list[0]
Right now you don't have a completely valid data structure in python, so modules like json and ast are going to be tricky. If you are regex-inclined, you could try the following:
import re
userstr = '["user1"]["user2"]["user3"]'
# capture everything except " in the group
re.findall('\[\"([^\"]+)\"\]', userstr)
['user1', 'user2', 'user3']
Where this will also work if there is a newline between user entries:
userstr = '''["user1"]["user2"]
["user3"]
'''
re.findall('\[\"([^\"]+)\"\]', userstr)
['user1', 'user2', 'user3']
Otherwise, I'd agree with #MarkMeyer and try to get these users in some sort of json file format or something that is a bit more compatible with built-in python data structures. One suggestion to make life easy would be just to format users.txt like so:
user1
user2
user3
...
Then you can just do:
with open('users.txt') as fh:
# this will create a list of users, and strip()
# removes leading/trailing whitespace
users = [user.strip() for user in fh]
And adding users is as simple as
with open('users.txt', 'a') as fh:
fh.write('userN')
I am using Python to make a filter to search through thousands of text files for specific queries. These text files consist of several sections, and they do not all have consistent formatting. I want each of these sections to be checked for specific criteria, so in the section of the text file called "DESCRIPTION OF RECORD", I was doing something like this to store the string to a variable:
with open(some_file, 'r') as r:
for line in r:
if "DESCRIPTION OF RECORD" in line:
record = line
Now this works pretty well for most files, but some files have a line break in the the section, so it does not store the whole section to the variable. I was wondering how I could use a delimiter to control how many lines are stored to the variable. I would probably use the title of the next section called "CORRELATION" for the delimiter. Any ideas?
An example structure of the file could look like:
CLINICAL HISTORY: Some information.
MEDICATIONS: Other information
INTRODUCTION: Some more information.
DESCRIPTION OF THE RECORD: Some information here....
another line of information
IMPRESSION: More info
CLINICAL CORRELATION: The last bit of information
You could use builtin re module like that:
import re
# I assume you have a list of all possible sections
sections = [
'CLINICAL HISTORY',
'MEDICATIONS',
'INTRODUCTION',
'DESCRIPTION OF THE RECORD',
'IMPRESSION',
'CLINICAL CORRELATION'
]
# Build a regexp that will match any of the section names
exp = '|'.join(sections)
with open(some_file, 'r') as r:
contents_of_file = r.read()
infos = list(re.split(exp, contents_of_file)) # infos is a list of what's between the section names
infos = [info.strip('\n :') for info in infos] # let's get rid of colons and whitespace in our infos
print(infos) # you don't have to print it :)
If I use your example text instead of a file, it prints something like that:
['', 'Some information.', 'Other information', 'Some more information.', 'Some information here....\nanother line of information', 'More info', 'The last bit of information']
The first element is empty, but you can get rid of it simply by doing so:
infos = infos[1:]
By the way, if we merge lines in which we deal with infos, into one, it would probably be cleaner, and would surely be more efficient (but maybe a little bit less understandable):
infos = [info.strip('\n :') in re.split(exp, contents_of_file)][1:]
If you do not know the sections you'll find, here's a version which seems to work, as long as the text is formatted as in your example :
import itertools
text = """
CLINICAL HISTORY: Some information.
MEDICATIONS: Other information
INTRODUCTION: Some more information.
DESCRIPTION OF THE RECORD: Some information here....
another line of information
IMPRESSION: More info
CLINICAL CORRELATION: The last bit of information
"""
def method_tuple(s):
# sp holds strings which finish with the section names.
sp = s.split(":")
# This line removes spurious "\n" at both end of the strings in sp.
# It then splits them once at "\n" starting from their end, effectively
# seperating the sections and the descriptions.
# It builds a list of strings alternating section names and information.
fragments = list(itertools.chain.from_iterable( p.strip("\n").rsplit("\n", 1) for p in sp ))
# You can now build a list of 2-uples.
pairs = [ (fragments[i*2],fragments[i*2+1]) for i in range(len(fragments)//2)]
# Or you could build a dict
# pairs = { fragments[i*2]:fragments[i*2+1] for i in range(len(fragments)//2)}
return pairs
print(method_tuple(text))
The timings compared the regex version of Ilya are roughly equivalent, although building a dictionnary seems to start winning over building a list of tuples or using regexp, on the sample text at 1 billion loops...
I found another possible solution for this using the indexes of the line. I first opened the check file, and stored its f.read() contents into a variable called info. I then did this:
with open(check_file, 'r') as r:
for line in r:
if "DESCRIPTION" in line:
record_Index = info.index(line)
record = info[info.index(line):]
if "IMPRESSION" in record:
impression_Index = info.index("IMPRESSION")
record = info[record_Index:impression_Index]
This method worked as well, although I don't know how efficient it is memory and speed wise. Instead of using with open(...) multiple times, it might be better just to store it all in the variable called info and then do everything with that.
I am trying to use python (v3.4) to act as a 'sandwich' between Neo4j and a text output file. This code gives me a py2neo RecordList:
from py2neo import Graph
from py2neo.packages.httpstream import http
http.socket_timeout = 9999
graph = Graph('http://localhost:7474/db/data/')
sCypher = 'MATCH (a) RETURN count(a)'
results = graph.cypher.execute(sCypher)
I also have some really simple text file interaction:
f = open('Output.txt','a') #open for append access
f.write ('\n Hello world')
f.close
What I really want to do is f.write (str(results)) but it really didn't like that at all. Can someone help me to convert my RecordList into a string please? I'm assuming I'll need to loop through the columns to get each column name, then loop through each record and write it individually, but I don't know how to go about this. Where I'm eventually planning to go with this is to change the Cypher every time.
Closest related question I could find is this one: How to convert Neo4j return types to python types. I'm sure there's someone out there who'll read this and say that using the REST API directly is a much better way of spitting out text, but I'm not quite at that level...
Thanks in advance,
Andy
Here is how you can iterate a RecordList and print the columns of the individual Records to a file (e.g. comma separated). If the properties you return are lists you would need some more formatting to get strings for your output file.
# use with to open files, this makes sure that it's properly closed after an exception
with open('output.txt', 'a') as f:
# iterate over individual Records in RecordList
for record in results:
# concatenate all columns of the Record into a string, comma separated
# list comprehension with str() to handle int and other types
output_string = ','.join([str(x) for x in record])
# actually write to file
print(output_string, file=f)
The format of the output file depends on what you want to do with it of course.