API endpoint for data from Microsoft Exchange Online Protection?

API endpoint for data from Microsoft Exchange Online Protection? - python

I am working on a project where I have been using Python to make API calls to our organization's various technologies to get data, which I then push to Power BI to track metrics over time relating to IT Security.
My boss wants to see info added from Exchange Online Protection such as malware detected in emails, spam blocks etc., essentially replicating some of the email and collaboration reports you'd see in M365 defender > reports > email and collaboration (security.microsoft.com/emailandcollabreport).
I have tried the Defender API and MS Graph API, read through a ton of documentation, and can't seem to find anywhere to pull this info from. Has anyone done something similar, or know where this data can be pulled from?
Thanks in advance.

You can try using the Microsoft Graph Security API using which you can get the alerts, information protection, secure score using that. Also you can refer the alerts section in the documentation which talks about the list of supported providers at this point using the Microsoft Graph security api.

In case anyone else runs into this, this is the solution I ended up using (hacky as it may be);
The only way to extract the pertinent info seems to be through PowerShell, you need the modules ExchangeOnlineManagement and PSWSMan so those will need to be installed.
You need to add an app to your Azure instance with global reader role minimum (or something custom) and generate and upload self-signed certificates to the app.
I then ran the following lines as a ps1 script:
Connect-ExchangeOnline -CertificateFilePath "<PATH>" -AppID "<APPID>" -Organization "<ORG>.onmicrosoft.com" -CertificatePassword (ConvertTo-SecureString -String '<PASSWORD>' -AsPlainText -Force)
$dte = (Get-Date).AddDays(-30)
Get-MailflowStatusReport -StartDate $dte -EndDate (Get-Date)
Disconnect-ExchangeOnline
I used python to call the powershell script, then extract the info I needed from the output and push it to PowerBI.
I'm sure there is a more secure and efficient way to do this but I was able to accomplish the task this way.

Related

How to integrate BIRT with Python Django Project by using Py4j

Hi is there anyone who is help me to Integrate BIRT report with Django Projects? or any suggestion for connect third party reporting tools with Django like Crystal or Crystal Clear Report.

Some of the 3rd-party Crystal Reports viewers listed here provide a full command line API, so your python code can preview/export/print reports via subprocess.call()
The resulting process can span anything between an interactive Crystal Report viewer session (user can login, set/change parameters, print, export) and an automated (no user interaction) report printing/exporting.
While this would simplify your code, it would restrict deployment to Windows.

For prototyping, or if you don't mind performance, you can call from BIRT from the command line.
For example, download the POJO runtime and use the script genReport.bat (IIRC) to generate a report to a file (eg. PDF format). You can specify the output options and the report parameters on the command line.
However, the BIRT startup is heavy overhead (several seconds).
For achieving reasonable performance, it is much better to perform this only once.
To achieve this goal, there are at least two possible ways:
You can use the BIRT viewer servlet (which is included as a WAR file with the POJO runtime). So you start the servlet with a web server, then you use HTTP requests to generate reports.
This looks technically old-fashioned (eg. no JSON Requests), but it should work. However, I never used this approach.
The other option is to write your own BIRT server.
In our product, we followed this approach.
You can take the viewer servlet as a template for seeing how this could work.
The basic idea is:
You start one (or possibly more than one) Java process.
The Java process initializes the BIRT runtime (this is what takes some seconds).
After that, the Java process listens for requests somehow (we used a plain socket listener, but of course you could use HTTP or some REST server framework as well).
A request would contain the following information:
which module to run
which output format
report parameters (specific to the module)
possibly other data/metadata, e.g. for authentication
This would create a RunAndRenderTask or separate RunTask and RenderTasks.
Depending on your reports, you might consider returning the resulting output (e.g. PDF) directly as a response, or using an asynchronous approach.
Note that BIRT will happily create several reports at the same time - multi-threading is no problem (except for the initialization), given enough RAM.
Be warned, however, that you will need at least a few days to build a POC for this "create your own server" approach, and probably some weeks for prodction quality.
So if you just want to build something fast to see if the right tool for you, you should start with the command line approach, then the servlet approach and only then, and only if you find that the servlet approach is not quite good enough, you should go the "create your own server" way.
It's a pity that currently there doesn't seem to exist an open-source, production-quality, modern BIRT REST service.
That would make a really good contribution to the BIRT open-source project... (https://github.com/eclipse/birt)

Use of ELK with Python

The project that I am working on is a bit confidential, but I will try to explain my issues and be as clear as possible because I need your opinion.
Project:
They asked me to set up a local ELK environment , and to use Python scripts to communicate with this stack (ELK), to store data, retrieve it, analyse it and visualise it thanks to Kibana, and finally there is a decision making based on that data(AI). So as you can see, it is a Data Engineering project with some AI for the decision making process. The issues that I am facing are:
I don't know how to use Python to communicate with the stack, I didn't find resources about it
Since the data is confidential, how can I assure a high security?
How many instances to use?
I am lost because I am new to ELK and my team is not Dev oriented
I am new to ELK, so please any advice would be really helpful!

I don't know how to use Python to communicate with the stack, I didn't
find resources about it
For learning how to interact with your stack use the python library:
You can install using pip3 install elasticsearch and the following links contain a wealth of tutorials on almost anything you would need to be doing.
https://kb.objectrocket.com/category/elasticsearch?filter=python
Suggest you start with these two:
https://kb.objectrocket.com/elasticsearch/how-to-parse-lines-in-a-text-file-and-index-as-elasticsearch-documents-using-python-641
https://kb.objectrocket.com/elasticsearch/how-to-query-elasticsearch-documents-in-python-268
Since the data is confidential, how can I assure a high security?
You can mask the data or restrict index access.
https://www.elastic.co/guide/en/elasticsearch/reference/current/authorization.html
https://nl.devoteam.com/expert-view/field-level-security-and-data-masking-in-elasticsearch/
How many instances to use?
I am lost because I am new to ELK and my team is not Dev oriented
I suggest you start with 1 Elasticsearch node, if you're on AWS use a t3a.large or equivalent and run Elasticsearch, Kibana and Logstash all on the same machine.
For setting it up: https://www.elastic.co/guide/en/elastic-stack-get-started/current/get-started-stack-docker.html#run-docker-secure

If you want to use phyton as your integration tools to Elasticsearch you can use elasticsearch phyton client.
The other options you can use python to create the result and save it in log file or insert to database than Logstash will get your data.
For the security ELK have good security from API authorization user authentication to cluster security. you can see in here Secure the Elastic Stack
I just use 1 instance, but feel free if you think you will need to separate between Kibana and Elasticsearch and Logstash (if you use it) or you can use docker to separate it.
Based on my experience, if you are going to load a lot of data in a short time it will be wise If you separate it so the processes don't interfere with each other.

Schedule web scraping jobs on Azure and store results on ADLS

I have a python job which uses beautiful soup to scrape data from the web.I have tried executing the script using U-SQL, however I keep receiving a generic error message :
An unhandled exception from user code has been reported
I haven't explored the error too much as I am not sure if it is possible to scrape the web through U-SQL.
Is this possible using U-SQL, and if not which Azure resource can i use to schedule this script and store the results on Azure data lake store?

Also, it normally would be helpful if you provided the complete error code and exactly how you want to scrape the web.
I make the random assumption right now that you wrote some code that accessed web pages and tried to run it from within U-SQL. If that is correct, you will get blocked by that the U-SQL container blocks all external network access. For more details why that is done, see the previous answer here.

Hi I'm a PM from the Azure Data Lake team and I'd love to help out with this. I just need some clarification first about what you're trying to do. Could you reach out to me at mabasile(at)microsoft.com with the job ID of the failed job? (Any sensitive information can of course be scrubbed out). That'll be the best way to figure out exactly what you're trying to do and if it's possible on ADL.
Thanks, and I hope to hear from you soon!
Matt Basile
Azure Data Lake Analytics
Update: Confirming Michael Rys's answer - you cannot call external services through U-SQL, because if ADLA scales out to hundreds of vertices and each vertex makes a separate call, you could end up DDOSing the service, so ADLA blocks external calls.

Workaround for Python & Selenium: authenticate against Active Directory

I am using Python (2.7) and Selenium (3.4.3) to drive Firefox (52.2.0 ESR) via geckodriver (0.19.0) to automate a process on a CentOS 7 machine.
I need totally unattended operation of this automation with user credentials passed through; no storage allowed and no breaking in.
One piece of drama is being caused by the fact that the internal website required for the process is within an Active Directory domain while the machine running my automation is not. I have no need to validate the user, only pass the credentials to the website in such a way as to not require human interaction or for the person to be a local user on the machine.
I have tried various permutations of:
[protocol]://[user,pass]#[url]
driver.switch_to_alert() + send_keys
It seems some of those only work on IE, something I have no access to.
I have checked for libraries to handle this and all to no avail.
I can add libraries to python and I have sudo access to the machine - can't touch authentication, so AD integration is not possible.
How can I give this AD website the credentials of an arbitrary user such that no local storage of their credentials happens an no user interaction is required?
Thank you
EDIT
I think something like a proxy which could authenticate the user then retain that authentication for selenium to do its thing ...
Is there a simple LDAP/AD proxy available?
EDIT 2
Perhaps a very simple way of stating this is that I want to pass user credentials and prevent the authentication popup from happening.

Solution Found:
I needed to use a browser extension.
My solution has been built for chromium but it should port almost-unchanged for Firefox and maybe edge.
First up, you need 2 APIs to be available for your browser:
webRequest.onAuthRequired - Chrome & Firefox
runtime.nativeMessaging - Chrome & Firefox
While both browser APIs are very similar, they do have some significant differences - such as Chrome's implementation lacking Promises.
If you setup your Native Messaging Host to send a properly-formed JSON string, you need only poll it once. This means you can use a single call to runtime.sendNativeMessage() and be assured that your credentials are paresable. Pun intended.
Next, we need to look at how we're supposed to handle the webRequest.onAuthRequired event.
Since I'm working in Chromium, I need to use the promise-less Chrome API.
chrome.webRequest.onAuthRequired.addListener(
callbackFunctionHere,
{urls:[targetUrls]},
['asyncBlocking'] // --> this line is important, too. Very.
The Change:
I'll be calling my function provideCredentials because I'm a big stealy-stealer and used an example from this source. Look for the asynchronous version.
The example code fetches the credentials from storage.local ...
chrome.storage.local.get(null, gotCredentials);
We don't want that. Nope.
We want to get the credentials from a single call to sendNativeMessage so we'll change that one line.
chrome.runtime.sendNativeMessage(hostName, { text: "Ready" }, gotCredentials);
That's all it takes. Seriously. As long as your Host plays nice, this is the big secret. I won't even tell you how long it took me to find it!
Links:
My questions with helpful links:
Here - Workaround for Authenticating against Active Directory
Here - Also has some working code for a functional NM Host
Here - Some enlightening material on promises
So this turns out to be a non-trivial problem.
I haven't implemented the solution, yet, but I know how to get there...
Passing values to an extension is the first step - this can be done in both Chrome and Firefox. Watch the version to make sure the API required, nativeMessaging, actually exists in your version. I have had to switch to chromium for this reason.
Alternatively, one can use the storage API to put values in browser storage first. [edit: I did not go this way for security concerns]
Next is to use the onAuthRequired event from the webRequest API . Setup a listener on the event and pass in the values you need.
Caveats: I have built everything right up to the extension itself for the nativeMessaging API solution and there's still a problem with getting the script to recognise the data. This is almost certainly my JavaScript skills clashing with the arcane knowledge required to make these APIs make much sense ...
I have yet to attempt the storage method as it's less secure (in my mind) but it does seem to be simpler.

How do I access Production Datastore from my local development server?

I have a existing Website deployed in Google App Engine for Python. Now I have setup the local development server in my System. But I don't know how to get the updated DataBase from live server. There is no Export option in Google's developer console.
And, I don't want to read the data for each request from Production Datastore, I want to set it up locally for once. The google manual says that it stores the local datastore in sqlite file.
Any hint would be appreciated.

First, make sure your app.yaml enables the "remote" built-in, with a stanza such as:
builtins:
- remote_api: on
This app.yaml of course must be the one deployed to your appspot.com (or whatever) "production" GAE app.
Then, it's a job for /usr/local/google_appengine/bulkloader.py or wherever you may have installed the bulkloader component. Run it with -h to get a list of the many, many options you can pass.
You may need to generate an application-specific password for this use on your google accounts page. Then, the general use will be something like:
/usr/local/google_appengine/bulkloader.py --dump --url=http://your_app.appspot.com/_ah/remote_api --filename=allkinds.sq3
You may not (yet) be able to use this "all kinds" query -- the server only generates the needed statistics for the all-kinds query "periodically", so you may get an error message including info such as:
[ERROR ] Unable to download kind stats for all-kinds download.
[ERROR ] Kind stats are generated periodically by the appserver
[ERROR ] Kind stats are not available on dev_appserver.
If that's the case, then you can still get things "one kind at a time" by adding the option --kind=EntityKind and running the bulkloader repeatedly (with separate sqlite3 result files) for each kind of entity.
Once you've dumped (kind by kind if you have to, all at once if you can) the production datastore, you can use the bulkloader again, this time with --restore and addressing your localhost dev_appserver instance, to rebuild the latter's datastore.
It should be possible to explicitly list kinds in the --kind flag (by separating them with commas and putting them all in parentheses) but unfortunately I think I've found a bug stopping that from working -- I'll try to get it fixed but don't hold your breath. In any case, this feature is not documented (I just found it by studying the open-source release of bulkloader.py) so it may be best not to rely on it!-)
More info about the then-new bulkloader can be found in a blog post by Nick Johnson at http://blog.notdot.net/2010/04/Using-the-new-bulkloader (though it doesn't cover newer functionalities such as the sqlite3 format of results in the "zero configuration" approach I outlined above). There's also a demo, with plenty of links, at http://bulkloadersample.appspot.com/ (also a bit outdated, alas).

Check out the remote API. This will tunnel your database calls over HTTP to the production database.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.