Output py2neo recordlist to text file - python

I am trying to use python (v3.4) to act as a 'sandwich' between Neo4j and a text output file. This code gives me a py2neo RecordList:
from py2neo import Graph
from py2neo.packages.httpstream import http
http.socket_timeout = 9999
graph = Graph('http://localhost:7474/db/data/')
sCypher = 'MATCH (a) RETURN count(a)'
results = graph.cypher.execute(sCypher)
I also have some really simple text file interaction:
f = open('Output.txt','a') #open for append access
f.write ('\n Hello world')
f.close
What I really want to do is f.write (str(results)) but it really didn't like that at all. Can someone help me to convert my RecordList into a string please? I'm assuming I'll need to loop through the columns to get each column name, then loop through each record and write it individually, but I don't know how to go about this. Where I'm eventually planning to go with this is to change the Cypher every time.
Closest related question I could find is this one: How to convert Neo4j return types to python types. I'm sure there's someone out there who'll read this and say that using the REST API directly is a much better way of spitting out text, but I'm not quite at that level...
Thanks in advance,
Andy

Here is how you can iterate a RecordList and print the columns of the individual Records to a file (e.g. comma separated). If the properties you return are lists you would need some more formatting to get strings for your output file.
# use with to open files, this makes sure that it's properly closed after an exception
with open('output.txt', 'a') as f:
# iterate over individual Records in RecordList
for record in results:
# concatenate all columns of the Record into a string, comma separated
# list comprehension with str() to handle int and other types
output_string = ','.join([str(x) for x in record])
# actually write to file
print(output_string, file=f)
The format of the output file depends on what you want to do with it of course.

Related

How to parse a complex text file using Python string methods or regex and export into tabular form

As the title mentions, my issue is that I don't understand quite how to extract the data I need for my table (The columns for the table I need are Date, Time, Courtroom, File Number, Defendant Name, Attorney, Bond, Charge, etc.)
I think regex is what I need but my class did not go over this, so I am confused on how to parse in order to extract and output the correct data into an organized table...
I am supposed to turn my text file from this
https://pastebin.com/ZM8EPu0p
and export it into a more readable format like this- example output is below
Here is what I have so far.
def readFile(court):
csv_rows = []
# read and split txt file into pages & chunks of data by pagragraph
with open(court, "r") as file:
data_chunks = file.read().split("\n\n")
for chunk in data_chunks:
chunk = chunk.strip # .strip removes useless spaces
if str(data_chunks[:4]).isnumeric(): # if first 4 characters are digits
entry = None # initialize an empty dictionary
elif (
str(data_chunks).isspace() and entry
): # if we're on an empty line and the entry dict is not empty
csv_rows.DictWriter(dialect="excel") # turn csv_rows into needed output
entry = {}
else:
# parse here?
print(data_chunks)
return csv_rows
readFile("/Users/mia/Desktop/School/programming/court.txt")
It is quite a lot of work to achieve that, but it is possible. If you split it in a couple of sub-tasks.
First, your input looks like a text file so you could parse it line by line. -- using https://www.w3schools.com/python/ref_file_readlines.asp
Then, I noticed that your data can be split in pages. You would need to prepare a lot of regular expressions, but you can start with one for identifying where each page starts. -- you may want to read this as your expression might get quite complicated: https://www.w3schools.com/python/python_regex.asp
The goal of this step is to collect all lines from a page in some container (might be a list, dict, whatever you find it suitable).
And afterwards, write some code that parses the information page by page. But for simplicity I suggest to start with something easy, like the columns for "no, file number and defendant".
And when you got some data in a reliable manner, you can address the export part, using pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

Python: json -> text, how to only write unique values?

I have a json file, from which I'm extracting quotes. It's the file from Kaggle (formatted exactly the same way).
My goal is to extract all the quotes (just the quotes, not the authors or other metadata) into a simple text document. The first 5 lines would be:
# Don't cry because it's over, smile because it happened.
# I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.
# Be yourself; everyone else is already taken.
# Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.
# Be who you are and say what you feel, because those who mind don't matter, and those who matter don't mind.
The challenge is that some quotes repeat and I only want to write each quote once. What's a good way to only write down unique values into a text doc?
The best i came up with was this:
import json
with open('quotes.json', 'r') as json_f:
data = json.load(json_f)
quote_list = []
with open('quotes.txt', 'w') as text_f:
for quote_object in data:
quote = quote_object['Quote']
if quote not in quote_list:
text_f.write(f'{quote}\n')
quote_list.append(quote)
But it feels grossly inefficient to have to create and maintain a separate list with 40,000 values.
I tried reading the file on each iteration of the write function, but somehow read always comes back empty:
with open('quotes.json', 'r') as json_f:
data = json.load(json_f)
with open('quotes.txt', 'w+') as text_f:
for quote_object in data:
quote = quote_object['Quote']
print(text_f.read()) # prints nothing?
# if it can't read the doc, I can't check if quote already there
text_f.write(f'{quote}\n')
Would love to understand why text_f.read() comes back empty, and what's a more elegant solution.
You can use a set:
import json
with open('quotes.json', 'r') as json_f:
data = json.load(json_f)
quotes = set()
with open('quotes.txt', 'w') as text_f:
for quote_object in data:
quote = quote_object['Quote']
quotes.add(quote)
Adding the same quote to the set multiple times will have no effect: only a single object is kept!

Python: Using str.split and getting list index out of range

I just started using python and am trying to convert some of my R code into python. The task is relatively simple; I have many csv file with a variable name (in this case cell lines) and values ( IC50's). I need to pull out all variables and their values shared in common among all files. Some of these files share the save variables but are formatted differently. For example in some files a variable is just "Cell_line" and in others it is MEL:Cell_line. So first things first to make a direct string comparison I need to format them the same and hence am trying ti use str.split() to do so. There is probably a much better way to do this but for now I am using the following code:
import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv"
with open(file_name) as csvfile:
NCI_data=csv.reader(csvfile, delimiter=',')
alldata={}
for row in NCI_data:
name_str=row[0]
splt=name_str.split(':')
n_name=splt[1]
alldata[n_name]=row
[1]
name_str.split return a list of length 2. Since the portion I want is after the ":" I want the second element which should be indexed as splt[1] as splt[0] is the first in python. However when I run the code I get this error message "IndexError: list index out of range"
I'm trying the second element out of a list of length 2 thus I have no idea why it is out of range. Any help or suggestions would be appreciated.
I am pretty sure that there are some rows where name_str does not have a : in them. From your own example if the name_str is Cell_line it would fail.
If you are sure that there would only be 1 : in name_str (at max) , or if there are multiple : you want to select the last one, instead of splt[1] , you should use - splt[-1] . -1 index would take the last element in the list (unless its empty) .
The simple answer is that sometimes the data isn't following the specification being assumed when you write this code (i.e. that there is a colon and two fields).
The easiest way to deal with this is to add an if block if len(splot)==2: and do the subsequent lines within that block.
Optionally, add an else: and print the lines that are not so spec or save them somewhere so you can diagnose.
Like this:
import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv"
with open(file_name) as csvfile:
NCI_data=csv.reader(csvfile, delimiter=',')
alldata={}
for row in NCI_data:
name_str=row[0]
splt=name_str.split(':')
if len(splt)==2:
n_name=splt[1]
alldata[n_name]=row
else:
print "invalid name: "+name_str
Alternatively, you can use try/except, which in this case is a bit more robust because we can handle IndexError anywhere, in either row[0] or in split[1], with the one exception handler, and we don't have to specify that the length of the : split field should be 2.
In addition we could explicitly check that there actually is a : before splitting, and assign the name appropriately.
import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv"
with open(file_name) as csvfile:
NCI_data=csv.reader(csvfile, delimiter=',')
alldata={}
for row in NCI_data:
try:
name_str=row[0]
if ':' in name_str:
splt=name_str.split(':')
n_name=splt[1]
else:
n_name = name_str
alldata[n_name]=row
except IndexError:
print "bad row:"+str(row)

Python, make a list for a for loop when parsing a CSV

I'm working on parsing a CSV from an export of my company's database. This is a slimmed down version has around 15 columns, the actual CSV has over 400 columns of data (all necessary). The below works perfectly:
inv = csv.reader(open('inventory_report.txt', 'rU'), dialect='excel', delimiter="\t")
for PART_CODE,MODEL_NUMBER,PRODUCT_NAME,COLOR,TOTAL_ONHAND,TOTAL_ON_ORDER,TOTAL_SALES,\
SALES_YEAR_TO_DATE,SALES_LASTYEAR_TO_DATE,TOTAL_NUMBER_OF_QTYsSOLD,TOTAL_PURCHASES,\
PURCHASES_YEAR_TO_DATE,PURCHASES_LASTYEAR_TO_DATE,TOTAL_NUMBER_OF_QTYpurchased,\
DATE_LAST_SOLD,DATE_FIRST_SOLD in inv:
print ('%-20s %-90s OnHand: %-10s OnOrder: %-10s') % (MODEL_NUMBER,PRODUCT_NAME,\
TOTAL_ONHAND,TOTAL_ON_ORDER)
As you can already tell, it will be very painful to read when the 'for' loop has 400+ names attached to it for each of item of the row in the CSV. However annoying, it is however very handy for being able to access the output I'm after by this method. I can easily get specific items and perform calculations within the common names we're already familiar with in our point of sale database.
I've been attempting to make this more readable. Trying to figure out a way where I could define a list of all these names in the for loop but still be able to call for them by name when it's time to do calculations and print the output.
Any thoughts?
you can use csv.DictReader. Elements are read as dict. Assuming u have first line as column name.
inv = csv.DictReader(open('file.csv')):
for i in inv:
print ('%-20s %-90s OnHand: %-10s OnOrder: %-10s') % (i['MODEL_NUMBER'],i['PRODUCT_NAME'],i['TOTAL_ONHAND'],i['TOTAL_ON_ORDER'])
And if you want the i[MODEL_NUMBER] to come from list. Define a list with all column name. Assuming, l = ['MODEL_NUMBER','PRODUCT_NAME','TOTAL_ONHAND','TOTAL_ON_ORDER']. Then my print statement in above code will be,
print ('%-20s %-90s OnHand: %-10s OnOrder: %-10s') % (i[l[0]],i[l[1]],i[l[2]],i[l[3]])
Code not checked.. :)
To make your code more readable and easier to reuse, you should read the name of the columns dynamically. CSV files use to have a header with this information on top of the file, so you can read the first line and store it in a tuple or a list.

Trying to create a list of users in AD

So, I've created a script that searches AD for a list of users in a specific OU, and outputs this to a text file. I need to format this text file. The top OU I'm searching contains within it an OU for each location of this company, containing the user accounts for that location.
Here's my script:
import active_directory
import sys
sys.stdout = open('output.txt', 'w')
users = active_directory.AD_object ("LDAP://ou=%company%,dc=%domain%,dc=%name%
for user in users.search (objectCategory='Person'):
print user
sys.stdout.close()
Here's what my output looks like, and there's just 20-something lines of this for each different user:
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
So, what I want to do is just to put this in plain English, make it easier to read, just by showing the username and the subset OU. So this:
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
Becomes THIS:
%username%, %location%
If there's any way to export this to .csv or a .xls to put into columns that can be sorted by location or just alphabetical order, that would be GREAT. I had one hell of a time just figuring out the text file.
If you have a string like this
LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%
Then manipulating it is quite easy. If the format is standard and doesn't change, the fastest way to manipulate it would just be to use string.split()
>>> splitted = "LDAP://CN=%username%,OU=%location%,OU=%company%,dc=%domain%,dc=%name%".split('=')
yields a list
>>> splitted
["LDAP://CN",
"%username%, OU",
"%location%, OU",
"%company%, dc",
"%domain%, dc",
"%name%"]
Now we can access the items of the list
>>> splitted[1]
"%username%, OU"
To get rid of the ", OU", we'll need to do another split.
>>> username = splitted[1].split(", OU")[0]
>>> username
%username%
CSV is just a text file, so all you have to do is change your file ending. Here's a full example.
output = open("output.csv","w")
users = active_directory.AD_object ("LDAP://ou=%company%,dc=%domain%,dc=%name%
for user in users.search (objectCategory='Person'):
# Because the AD_object.search() returns another AD_object
# we cannot split it. We need the string representation
# of this AD object, and thus have to wrap the user in str()
splitteduser = str(user).split('=')
username = splitteduser[1].split(", OU")[0]
location = splitteduser[2].split(", OU")[0]
output.write("%s, %s\n"%(username,location))
% \n is a line ending
% The above is the old way to format strings, but it looks simpler.
% Correct way would be:
% output.write("{0}, {1}\n".format(username,location))
output.close()
It's not the prettiest solution around, but it should be easy enough to understand.

Categories