Great CSV module for python? - python

I'm automating a long task that involves vulnerabilities within a spreadsheet. However, I'm noticing that the "recommendation" for these vulnerabilities are sometimes pretty long.
The CSV module for python seems to be truncating some of this text when writing new rows. Is there any way to prevent this from happening? I simply see "NOTE: THIS FIELD WAS TRUNCATED" in places where the recommendation (which is a lot of text) would be.

The whole objective is to do this:
Import a master spreadsheet which has confirmation statuses and everything up-to-date
Take a new spreadsheet containing vulnerabilities which doesn't have conf status/severity up-to-date.
Compare the second spreadsheet to the first. It'll update the severity levels from the second spreadsheet, and then write to a new file.
Newly created csv file can be copied and pasted into master spreadsheet. All vulnerabilities which match the first spreadsheet now have the same severity level/confirmation status.
What I'm noticing though, even in Ruby for some reason, is that some of the recommendations in these vulnerabilities have long text; therefore, it's being truncated when the CSV file is created for some reason. Here's a sample piece of the code that I've quickly written for demonstration:
#!/usr/bin/python
from sys import argv
import getopt, csv
master_vulns = {}
criticality = {}
############################ Extracting unique vulnerabilities from master file
contents = csv.reader(open(argv[1], 'rb'), delimiter=',')
for row in contents:
if "Confirmation_Status" in row:
continue
try:
if row[7] in master_vulns:
continue
if row[7] in master_vulns:
continue
master_vulns[row[7]] = row[3]
criticality[rows[7]] = row[2]
except Exception:
pass
############################ Updating confirmation status of newly created file
new_contents = csv.reader(open(argv[1], 'rb'), delimiter=',')
new_data = []
results = open('results.csv','wb')
writer = csv.writer(results, delimiter=',')
for nrow in new_contents:
if "Confirmation_Status" in nrow:
continue
try:
if nrow[1] == "DELETE":
continue
vuln_name = nrow[7]
vuln_status = nrow[3]
criticality = criticality[vuln_name]
vuln_status = master_vulns[vuln_name]
nrow[3] = vuln_status
nrow[2] = criticality
writer.writerow(nrow)
except Exception:
writer.writerow(nrow)
pass
results.close()

Related

How do I download an xlsm file and read every sheet in python?

Right now I am doing the following.
import xlrd
resp = requests.get(url, auth=auth).content
output = open(r'temp.xlsx', 'wb')
output.write(resp)
output.close()
xl = xlrd.open_workbook(r'temp.xlsx')
sh = 1
try:
for sheet in xl.sheets():
xls.append(sheet.name)
except:
xls = ['']
It's extracting the sheets but I don't know how to read the file or if saving the file as an .xlsx is actually working for macros. All I know is that the code is not working right now and I need to be able to catch the data that is being generated in a macro. Please help! Thanks.
I highly recommend using xlwings if you want to open, modify, and save .xlsm files without corrupting them. I have tried a ton of different methods (using other modules like openpyxl) and the macros always end up being corrupted.
import xlwings as xw
app = xw.App(visible=False) # IF YOU WANT EXCEL TO RUN IN BACKGROUND
xlwb = xw.Book('PATH\\TO\\FILE.xlsm')
xlws = {}
xlws['ws1'] = xlwb.sheets['Your Worksheet']
print(xlws['ws1'].range('B1').value) # get value
xlws['ws1'].range('B1').value = 'New Value' # change value
yourMacro = xlwb.macro('YourExcelMacro')
yourMacro()
xlwb.save()
xlwb.close()
Edit - I added an option to keep Excel invisible at users request

How to delete a specific cell in a csv file when the contents is said in an input in python

elif addordelete = "delete":
whichdelete = input("What thing do you want to delete? ")
GameCharacter.csv
I want to know how to delete a specific cell in a csv file through a python input.
For example if the user in the python program says that they want to delete MP40 from the file, then it should delete. Can someone explain how to do this in simple terms as possible (I'm kind of a python noob. Code is appreciated.
You'll have to import the whole CSV into python, process it and saving it back to file.
Here's a simple snippet which opens the CSV, processes it asking what you want to delete, then saves it back again.
You can start from this code to get what you need.
import csv
try:
csvfile = open('testcsv.csv','rb')
table = [row for row in csv.reader(csvfile)] #further parameters such as delimiters on doc
whichdelete = whichdelete = input("What thing do you want to delete? ")
header = table.pop(0) #remove header and save it to header variable
res = [row for row in table if whichdelete not in row] #save result without the row you want to delete
csvfile.close()
except IOError:
print "File not found"
quit()
try:
resultFile = open('resultCSV.csv','wb+')
writer = csv.writer(resultFile)
writer.writerow(header)
for row in res:
writer.writerow(row)
resultFile.close()
except IOError:
print "Error creating result file"
quit()

Many-record upload to postgres

I have a series of .csv files with some data, and I want a Python script to open them all, do some preprocessing, and upload the processed data to my postgres database.
I have it mostly complete, but my upload step isn't working. I'm sure it's something simple that I'm missing, but I just can't find it. I'd appreciate any help you can provide.
Here's the code:
import psycopg2
import sys
from os import listdir
from os.path import isfile, join
import csv
import re
import io
try:
con = db_connect("dbname = '[redacted]' user = '[redacted]' password = '[redacted]' host = '[redacted]'")
except:
print("Can't connect to database.")
sys.exit(1)
cur = con.cursor()
upload_file = io.StringIO()
file_list = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for file in file_list:
id_match = re.search(r'.*-(\d+)\.csv', file)
if id_match:
id = id_match.group(1)
file_name = format(id_match.group())
with open(mypath+file_name) as fh:
id_reader = csv.reader(fh)
next(id_reader, None) # Skip the header row
for row in id_reader:
[stuff goes here to get desired values from file]
if upload_file.getvalue() != '': upload_file.write('\n')
upload_file.write('{0}\t{1}\t{2}'.format(id, [val1], [val2]))
print(upload_file.getvalue()) # prints output that looks like I expect it to
# with thousands of rows that seem to have the right values in the right fields
cur.copy_from(upload_file, '[my_table]', sep='\t', columns=('id', 'col_1', 'col_2'))
con.commit()
if con:
con.close()
This runs without error, but a select query in psql still shows no records in the table. What am I missing?
Edit: I ended up giving up and writing it to a temporary file, and then uploading the file. This worked without any trouble...I'd obviously rather not have the temporary file though, so I'm happy to have suggestions if someone sees the problem.
When you write to an io.StringIO (or any other file) object, the file pointer remains at the position of the last character written. So, when you do
f = io.StringIO()
f.write('1\t2\t3\n')
s = f.readline()
the file pointer stays at the end of the file and s contains an empty string.
To read (not getvalue) the contents, you must reposition the file pointer to the beginning, e.g. use seek(0)
upload_file.seek(0)
cur.copy_from(upload_file, '[my_table]', columns = ('id', 'col_1', 'col_2'))
This allows copy_from to read from the beginning and import all the lines in your upload_file.
Don't forget, that you read and keep all the files in your memory, which might work for a single small import, but may become a problem when doing large imports or multiple imports in parallel.

Read data from api and populate .csv bug

I am trying to write a script (Python 2.7.11, Windows 10) to collect data from an API and append it to a csv file.
The API I want to use returns data in json.
It limits the # of displayed records though, and pages them.
So there is a max number of records you can get with a single query, and then you have to run another query, changing the page number.
The API informs you about the nr of pages a dataset is divided to.
Let's assume that the max # of records per page is 100 and the nr of pages is 2.
My script:
import json
import urllib2
import csv
url = "https://some_api_address?page="
limit = "&limit=100"
myfile = open('C:\Python27\myscripts\somefile.csv', 'ab')
def api_iterate():
for i in xrange(1, 2, 1):
parse_url = url,(i),limit
json_page = urllib2.urlopen(parse_url)
data = json.load(json_page)
for item in data['someobject']:
print item ['some_item1'], ['some_item2'], ['some_item3']
f = csv.writer(myfile)
for row in data:
f.writerow([str(row)])
This does not seem to work, i.e. it creates a csv file, but the file is not populated. There is obviously something wrong with either the part of the script which builds the address for the query OR the part dealing with reading json OR the part dealing with writing query to csv. Or all of them.
I have tried using other resources and tutorials, but at some point I got stuck and I would appreciate your assistance.
The url you have given provides a link to the next page as one of the objects. You can use this to iterate automatically over all of the pages.
The script below gets each page, extracts two of the entries from the Dataobject array and writes them to an output.csv file:
import json
import urllib2
import csv
def api_iterate(myfile):
url = "https://api-v3.mojepanstwo.pl/dane/krs_osoby"
csv_myfile = csv.writer(myfile)
cols = ['id', 'url']
csv_myfile.writerow(cols) # Write a header
while True:
print url
json_page = urllib2.urlopen(url)
data = json.load(json_page)
json_page.close()
for data_object in data['Dataobject']:
csv_myfile.writerow([data_object[col] for col in cols])
try:
url = data['Links']['next'] # Get the next url
except KeyError as e:
break
with open(r'e:\python temp\output.csv', 'wb') as myfile:
api_iterate(myfile)
This will give you an output file looking something like:
id,url
1347854,https://api-v3.mojepanstwo.pl/dane/krs_osoby/1347854
1296239,https://api-v3.mojepanstwo.pl/dane/krs_osoby/1296239
705217,https://api-v3.mojepanstwo.pl/dane/krs_osoby/705217
802970,https://api-v3.mojepanstwo.pl/dane/krs_osoby/802970

DictReader get a value when a column # and row # are known

I am given a csv file that looks something like this
ID, name, age, city
1, Andy, 25, Ann Arbor
2, Bella, 40, Los Angeles
3, Cathy, 13, Eureka
...
...
If I want to get the city of ID=3, which would be Eureka for this example. Is there a way to do this efficiently instead of iterating each row? My php code will be executing this python script each time to get the value, and I feel like being very inefficient to loop through the csv file every time.
iterate over the file once and save the data into a dictionary:
data = {}
with open('input.csv') as fin:
reader = csv.DictReader(fin)
for record in reader:
data[record['ID']] = {k:v for k,v in record.items() if k <> 'ID'}
then just access the required key in the dictionary:
print data[3]['city'] # Eureka
in case you want to persist the data in the key:value format you can save it as a json file:
import json
import csv
j = {}
with open('input.csv') as fin:
reader = csv.DictReader(fin)
for record in reader:
j[record['ID']] = {k:v for k,v in record.items() if k <> 'ID'}
with open('output.json','w') as fout:
json.dump(j,fout)
In a word: no.
As yurib mentioned, one method is to convert your files to JSON and go from there, or just to dump to a dict. This gives you the ability to do things like pickle if you need to serialize your dataset, or shelve if you want to stash it someplace for later use.
Another option is to dump your CSV into a queryable database by way of using something like Python's built-in sqlite3 support. It depends on where you want your overhead to lie: pre-processing your data in this manner saves you from having to parse a large file every time your script runs.
Check out this answer for a quick rundown.
If I want to get the city of ID=3, which would be Eureka for this
example. Is there a way to do this efficiently instead of iterating
each row? My php code will be executing this python script each time
to get the value, and I feel like being very inefficient to loop
through the csv file every time.
Your ideal solution is to wrap this Python code into an API that you can call from your PHP code.
On startup, the Python code would load the file into a data structure, and then wait for your request.
If the file is very big, your Python script would load it into a database and read from there.
You can then choose to return either a string, or a json object.
Here is a sample, using Flask:
import csv
from flask import Flask, request, abort
with open('somefile.txt') as f:
reader = csv.DictReader(f, delimiter=',')
rows = list(reader)
keys = row[0].keys()
app = Flask(__name__)
#app.route('/<id>')
#app.route('/')
def get_item():
if request.args.get('key') not in keys:
abort(400) # this is an invalid request
key = request.args.get('key')
try:
result = next(i for i in rows if i['id'] == id)
except StopIteration:
# ID passed doesn't exist
abort(400)
return result[key]
if __name__ == '__main__':
app.run()
You would call it like this:
http://localhost:5000/3?key=city

Categories