Access Azure blob storage from within an Azure ML experiment

Access Azure blob storage from within an Azure ML experiment - python

Azure ML Experiments provide ways to read and write CSV files to Azure blob storage through the Reader and Writer modules. However, I need to write a JSON file to blob storage. Since there is no module to do so, I'm trying to do so from within an Execute Python Script module.
# Import the necessary items
from azure.storage.blob import BlobService
def azureml_main(dataframe1 = None, dataframe2 = None):
account_name = 'mystorageaccount'
account_key='mykeyhere=='
json_string='{jsonstring here}'
blob_service = BlobService(account_name, account_key)
blob_service.put_block_blob_from_text("upload","out.json",json_string)
# Return value must be of a sequence of pandas.DataFrame
return dataframe1,
However, this results in an error: ImportError: No module named azure.storage.blob
This implies that the azure-storage Python package is not installed on Azure ML.
How can I write to Azure blob storage from inside an Azure ML Experiment?
Here's the fill error message:
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,Caught exception while executing function: Traceback (most recent call last):
File "C:\server\invokepy.py", line 162, in batch
mod = import_module(moduleName)
File "C:\pyhome\lib\importlib\__init__.py", line 37, in import_module
__import__(name)
File "C:\temp\azuremod.py", line 19, in <module>
from azure.storage.blob import BlobService
ImportError: No module named azure.storage.blob
---------- End of error message from Python interpreter ----------
Start time: UTC 02/06/2016 17:59:47
End time: UTC 02/06/2016 18:00:00`
Thanks, everyone!
UPDATE: Thanks to Dan and Peter for the ideas below. This is the progress I've made using those recommendations. I created a clean Python 2.7 virtual environment (in VS 2005), and did a pip install azure-storage to get the dependencies into my site-packages directory. I then zipped the site-packages folder and uploaded as the Zip file, as per Dan's note below. I then included the reference to the site-packages directory and successfully imported the required items. This resulted in a time out error when writing to blog storage.
Here is my code:
# Get access to the uploaded Python packages
import sys
packages = ".\Script Bundle\site-packages"
sys.path.append(packages)
# Import the necessary items from packages referenced above
from azure.storage.blob import BlobService
from azure.storage.queue import QueueService
def azureml_main(dataframe1 = None, dataframe2 = None):
account_name = 'mystorageaccount'
account_key='p8kSy3F...elided...3plQ=='
blob_service = BlobService(account_name, account_key)
blob_service.put_block_blob_from_text("upload","out.txt","Test to write")
# All of the following also fail
#blob_service.create_container('images')
#blob_service.put_blob("upload","testme.txt","foo","BlockBlob")
#queue_service = QueueService(account_name, account_key)
#queue_service.create_queue('taskqueue')
# Return value must be of a sequence of pandas.DataFrame
return dataframe1,
And here is the new error log:
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,C:\pyhome\lib\site-packages\requests\packages\urllib3\util\ssl_.py:79: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Caught exception while executing function: Traceback (most recent call last):
File "C:\server\invokepy.py", line 169, in batch
odfs = mod.azureml_main(*idfs)
File "C:\temp\azuremod.py", line 44, in azureml_main
blob_service.put_blob("upload","testme.txt","foo","BlockBlob")
File ".\Script Bundle\site-packages\azure\storage\blob\blobservice.py", line 883, in put_blob
self._perform_request(request)
File ".\Script Bundle\site-packages\azure\storage\storageclient.py", line 171, in _perform_request
resp = self._filter(request)
File ".\Script Bundle\site-packages\azure\storage\storageclient.py", line 160, in _perform_request_worker
return self._httpclient.perform_request(request)
File ".\Script Bundle\site-packages\azure\storage\_http\httpclient.py", line 181, in perform_request
self.send_request_body(connection, request.body)
File ".\Script Bundle\site-packages\azure\storage\_http\httpclient.py", line 143, in send_request_body
connection.send(request_body)
File ".\Script Bundle\site-packages\azure\storage\_http\requestsclient.py", line 81, in send
self.response = self.session.request(self.method, self.uri, data=request_body, headers=self.headers, timeout=self.timeout)
File "C:\pyhome\lib\site-packages\requests\sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "C:\pyhome\lib\site-packages\requests\sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "C:\pyhome\lib\site-packages\requests\adapters.py", line 431, in send
raise SSLError(e, request=request)
SSLError: The write operation timed out
---------- End of error message from Python interpreter ----------
Start time: UTC 02/10/2016 15:33:00
End time: UTC 02/10/2016 15:34:18
Where my current exploration is leading is that there is a dependency on the requests Python package in azure-storage. requests has a known bug in Python 2.7 for calling newer SSL protocols. Not sure, but I'm digging around in that area now.
UPDATE 2: This code runs perfectly fine inside of a Python 3 Jupyter notebook. Additionally, if I make the Blob Container open to public access, I can directly READ from the Container through a URL. For instance: df = pd.read_csv("https://mystorageaccount.blob.core.windows.net/upload/test.csv") easily loads the file from blob storage. However, I cannot use the azure.storage.blob.BlobService to read from the same file.
UPDATE 3: Dan, in a comment below, suggested I try from the Jupyter notebooks hosted on Azure ML. I had been running it from a local Jupyter notebook (see update 2 above). However, it fails when run from an Azure ML Notebook, and the errors point to the requires package again. I'll need to find the known issues with that package, but from my reading, the known issue is with urllib3 and only impacts Python 2.7 and NOT any Python 3.x versions. And this was run in a Python 3.x notebook. Grrr.
UPDATE 4: As Dan notes below, this may be an issue with Azure ML networking, as Execute Python Script is relatively new and just got networking support. However, I have also tested this on an Azure App Service webjob, which is on an entirely different Azure platform. (It is also on an entirely different Python distribution and supports both Python 2.7 and 3.4/5, but only at 32 bit - even on 64 bit machines.) The code there also fails, with an InsecurePlatformWarning message.
[02/08/2016 15:53:54 > b40783: SYS INFO] Run script 'ListenToQueue.py' with script host - 'PythonScriptHost'
[02/08/2016 15:53:54 > b40783: SYS INFO] Status changed to Running
[02/08/2016 15:54:09 > b40783: INFO] test.csv
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
[02/08/2016 15:54:09 > b40783: ERR ] SNIMissingWarning
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
[02/08/2016 15:54:09 > b40783: ERR ] InsecurePlatformWarning
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
[02/08/2016 15:54:09 > b40783: ERR ] InsecurePlatformWarning

Bottom Line Up Front: Use HTTP instead of HTTPS for accessing Azure storage.
When declaring BlobService pass in protocol='http' to force the service to communicate over HTTP. Note that you must have your container configured to allow requests over HTTP (which it does by default).
client = BlobService(STORAGE_ACCOUNT, STORAGE_KEY, protocol="http")
History and credit:
I posted a query on this topic to #AzureHelps and they opened a ticket on the MSDN forums: https://social.msdn.microsoft.com/Forums/azure/en-US/46166b22-47ae-4808-ab87-402388dd7a5c/trouble-writing-blob-storage-file-in-azure-ml-experiment?forum=MachineLearning&prof=required
Sudarshan Raghunathan replied with the magic. Here are the steps to make it easy for everyone to duplicate my fix:
Download azure.zip which provides the required libraries: https://azuremlpackagesupport.blob.core.windows.net/python/azure.zip
Upload them as a DataSet to the Azure ML Studio
Connect them to the Zip input on an Execute Python Script module
Write your script as you would normally, being sure to create your BlobService object with protocol='http'
Run the Experiment - you should now be able to write to blob storage.
Some example code can be found here: https://gist.github.com/drdarshan/92fff2a12ad9946892df
The code I used was the following, which doesn't first write the CSV to the file system, but sends as a text stream.
from azure.storage.blob import BlobService
def azureml_main(dataframe1 = None, dataframe2 = None):
account_name = 'mystorageaccount'
account_key='p8kSy3FACx...redacted...ebz3plQ=='
container_name = "upload"
json_output_file_name = 'testfromml.json'
json_orient = 'records' # Can be index, records, split, columns, values
json_force_ascii=False;
blob_service = BlobService(account_name, account_key, protocol='http')
blob_service.put_block_blob_from_text(container_name,json_output_file_name,dataframe1.to_json(orient=json_orient, force_ascii=json_force_ascii))
# Return value must be of a sequence of pandas.DataFrame
return dataframe1,
Some thoughts:
I would prefer if the azure Python libraries were imported by default. Microsoft imports hundreds of 3rd party libraries into Azure ML as part of the Anaconda distribution. They should also include those necessary to work with Azure. We're in Azure, we've committed to Azure. Embrace it.
I don't like that I have to use HTTP, instead of HTTPS. Granted, this is internal Azure communication, so it's likely no big deal. However, most of the documentation suggests the use of SSL / HTTPS when working with blob storage, so I'd prefer to be able to do that.
I still get random timeout errors in the Experiment. Sometimes the Python code will execute in milliseconds, other times it runs for several 60 or seconds and then times out. This makes running it in an experiment very frustrating at times. However, when published as a Web Service I do not seem to have this problem.
I would prefer that the experience from my local code matched more closely Azure ML. Locally, I can use HTTPS and never time out. It's blazing fast, and easy to write. But moving to an Azure ML experiment means some debugging, nearly every time.
Huge props to Dan, Peter and Sudarshan, all from Microsoft, for their help in resolving this. I very much appreciate it!

You are going down the correct path. The Execution Python Script module is meant for custom needs just like this. Your real issue is how to import existing Python script modules. The complete directions can be found here, but I will summarize for SO.
You will want to take the Azure Python SDK and zip it up, upload, then import into your module. I can look into why this is not there by default...
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-execute-python-scripts/
Importing existing Python script modules
A common use-case for many data scientists is to incorporate existing Python scripts into Azure Machine Learning experiments. Instead of concatenating and pasting all the code into a single script box, the Execute Python Script module accepts a third input port to which a zip file that contains the Python modules can be connected. The file is then unzipped by the execution framework at runtime and the contents are added to the library path of the Python interpreter. The azureml_main entry point function can then import these modules directly.
As an example, consider the file Hello.py containing a simple “Hello, World” function.
Figure 4. User-defined function.
Next, we can create a file Hello.zip containing Hello.py:
Figure 5. Zip file containing user-defined Python code.
Then, upload this as a dataset into Azure Machine Learning Studio. If we then create and run a simple experiment a uses the module:
Figure 6. Sample experiment with user-defined Python code uploaded as a zip file.
The module output shows that the zip file has been unpackaged and the function print_hello has indeed been run.  
Figure 7. User-defined function in use inside the Execute Python Script module.

As I know, you can use other packages via a zip file which you provide to the third input. The comments in the Python template script in Azure ML say:
If a zip file is connected to the third input port is connected, it is unzipped under ".\Script Bundle". This directory is added to sys.path. Therefore, if your zip file contains a Python file mymodule.py you can import it using:
import mymodule
So you can package azure-storage-python as a zip file thru click New, click Dataset, and then select From local file and the Zip file option to upload a ZIP file to your workspace.
As reference, you can see more information at the section How to Use Execute Python Script of the doc Execute Python Script.

Related

Getting error while running the configuring MSTICPY on Azure sentinel notebook

I am using Azure sentinel notebook for threat intelligence. While trying to configure msticpy for it to connect to Azure sentinel but getting 'Value error'. Following is the code that I am using :
from msticpy.config import MpConfigEdit
import os
mp_conf = "msticpyconfig.yaml"
# check if MSTICPYCONFIG is already an env variable
mp_env = os.environ.get("MSTICPYCONFIG")
mp_conf = mp_env if mp_env and Path (mp_env).is_file() else mp_conf
if not Path(mp_conf).is_file():
print(
"No msticpyconfig.yaml was found!",
"Please check that there is a config.json file in your workspace folder.",
"If this is not there, go back to the Microsoft Sentinel portal and launch",
"this notebook from there.",
sep="\n"
)
else:
mpedit = MpConfigEdit(mp_conf)
mpedit.set_tab("AzureSentinel")
display(mpedit)
ValueError: File not found: 'None'.

In the Azure ML terminal, create the nbuser_settings.py file in the root of your user folder, which is the folder with your username.
In the nbuser_settings.py file, add the following lines:
import os
os.environ["MSTICPYCONFIG"] = "~/msticpyconfig.yaml"
https://learn.microsoft.com/en-us/Azure/sentinel/notebooks-msticpy-advanced?msclkid=e7cd84dfd05c11ecb0df15e0892300fc&tabs=azure-ml
Reference
Some elements of MSTICPy require configuration parameters. An example is the Threat Intelligence providers. Values for these and other parameters can be set in the msticpyconfig.yaml file.
The package has a default configuration file, which is stored in the package directory. You should not need to edit this file directly. Instead you can create a custom file with your own parameters - these settings will combine with or override the settings in the default file.
By default, the custom msticpyconfig.yaml is read from the current directory. You can specify an explicit location using an environment variable MSTICPYCONFIG.
You should also read the MSTICPy Settings Editor document to see how to configure settings using and interactive User Interface from a Jupyter notebook.
!!! NOTE !!! For the Linux and Windows options, you'll need to restart your Jupyter server for it to pick up the environment variable that you defined.
https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html?msclkid=96fde57dd04d11ec9e5406de243d7c67

The author of msticpy has posted the issue on github & we have to wait for the latest release. Please follow the thread for more details:
https://github.com/microsoft/msticpy/issues/393

jpype._jclass.NoClassDefFoundError: edu/stanford/nlp/python/SUTimeWrapper

I'm trying to use the sutime python wrapper to make a date normalizer, that would convert any temporal information in strings into dates in the format YYYY-MM-DD. I've created a class, with rules over the sutime outputs to convert the sutime outputs into the standard format as mentioned above. The program is working properly on my local machine, but when i try to run it on a server I get the jpype._jclass.NoClassDefFoundError. The server is on ubuntu with python2, while my local has windows, with python3.
I've tried to implement the solutions to a similar problem on this https://sourceforge.net/p/jpype/discussion/379372/thread/689d7a9b/ forum, but i'm not sure if i was able to implement these soultions correctly. I've also checked that sutime supports both python3 and python2
I think the issue is with jpype or with the sutime library.
This is the traceback that i got
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "date_normalizer.py", line 38, in __init__
self.sutime = SUTime(jars=self.jar_files, mark_time_ranges=mark_time_ranges)
File "/home/bridgei2i/.local/lib/python2.7/site-packages/sutime/sutime.py", line 57, in __init__
'edu.stanford.nlp.python.SUTimeWrapper')
File "/home/bridgei2i/.local/lib/python2.7/site-packages/jpype/_jclass.py", line 130, in __new__
return _JClassNew(args[0], **kwargs)
File "/home/bridgei2i/.local/lib/python2.7/site-packages/jpype/_jclass.py", line 213, in _JClassNew
javaClass = _jpype.PyJPClass(arg)
jpype._jclass.NoClassDefFoundError: edu/stanford/nlp/python/SUTimeWrapper

Seems likely that the jar file holding edu/stanford/nlp/python/SUTimeWrapper was not found on the server. The specific code that failed was a call to JClass('edu.stanford.nlp.python.SUTimeWrapper') which is a request to load a class from a jar. I would recommend checking the classpath and configuration on the server.
Likely causes are (in order of likelihood)
jar file is not located in the classpath on the server.
The jar file is compiled with a JDK which is newer than runtime environment (though this should generate a different exception).
Some jar file that the class depends on is missing or has the wrong version. (this should produce a different classname in the exception, so it is unlikely.)
A dll for a native portion the jar file is missing or has an incorrect architecture. (rare)
Assuming the jar file is on the server, I would recommend checking the initialization in which the JPype startJVM call is made to see if the path to the jar was correct. It is also possible to examine the loaded classpath using print(jpype.java.lang.System.getProperty('java.class.path')) to see if there is a difference between your local and server machine.

thank you, as you said:Some jar file that the class depends on is missing or has the wrong version. (this should produce a different classname in the exception, so it is unlikely.)

suds Python: Permission denied

I have some code that's been untouched since last November, its worked fine this whole time, until now. as far as i know, nothing else has been changed on this host.
The error:
Traceback (most recent call last):
File "/scm/pvcs/scripts/pyscripts/update_scr_20.py", line 115, in <module>
updateSCR(SCR, myDeployer, myDeployerID, myEnv, myEnvID, deployTime)
File "/scm/pvcs/scripts/pyscripts/update_scr_20.py", line 33, in updateSCR
client = Client(url=SBM_WSDL, location=SBM_ENDPOINT, timeout=180)
File "build/bdist.linux-x86_64/egg/suds/client.py", line 109, in __init__
File "build/bdist.linux-x86_64/egg/suds/cache.py", line 145, in __init__
File "build/bdist.linux-x86_64/egg/suds/cache.py", line 277, in checkversion
File "build/bdist.linux-x86_64/egg/suds/cache.py", line 251, in clear
OSError: [Errno 13] Permission denied: '/tmp/suds/suds-7962357479995671267-document.px'
I've changed the file permissions to 777, still get the same 'permission denied' error.

This error is raised when suds is run in multiuser environment. Your user (using whom you are running a script) must not have an ownership of that directory. Also try turning the cache off or change the cache directory.
Can you share your part of code which is causing the error.? You should catch the exception and see the full error log.

This is essentially a less than perfect design decision on the part of the python soap client. It by default creates a file in a global space (/tmp/suds) that is owned by a single user, and that locks out other users from using the python soap client. If you chmod /tmp/suds/* to allow the world access it will work (what IBM recommends in their OpenStack product) ... or clean up after the use of the client by deleting the garbage it leaves behind.
The soap client ought to have created the suds directory in the users space (under /home/username) so each user would have their own, or if it really ought be a global resource it should have used open access to the file. By doing neither, it probably has caused a lot of time lost by many a user. I'd call it a bug. Something that costs users time and is easily fixed.

"Windows Error: provider DLL failed to initialize correctly" on import of cgi module in frozen wxpython app

I have a user of a frozen wxpython app that gets the appended screenshot.
The error message is "Windows Error: provider DLL failed to initialize correctly"
A screenshot taken from a paused video is the only way I could get this error message from them because the whole thing disappears instantly (including this DOS window created to capture stderr, where this message is appearing). IE python is dieing before it even really gets going.
The traceback points to my code at controller.py line 14.
This line is
import cgi
For some reason, it seems that cgi is calling random during import (why would that be?) and for some reason this is failing for some DLL reason.
Any clues?
Note 1: this app works fine for hundreds of other Windows and Mac users. So it's as if I'm not supplying something that is not on only this user's machine for some reason.
Note 2: the executable is created using bbfreeze, with the following config:
f = Freezer(distdir = distdir,
includes = ['wx.lib.pubsub.core.kwargs.*',
'wx.lib.pubsub.core.*',
'dbhash',
'platform']
)
I'm not sure what else I'd put in here. 'cgi'? 'random'?

For me, the exact error message was:
WindowsError: [Error -2146893795] Provider DLL failed to initialize correctly
with a trace such as:
File "C:\Dev\Python\python-2.7.11\lib\tempfile.py", line 35, in <module>
from random import Random as _Random
File "C:\Dev\Python\python-2.7.11\lib\random.py", line 885, in <module>
_inst = Random()
File "C:\Dev\Python\python-2.7.11\lib\random.py", line 97, in __init__
self.seed(x)
File "C:\Dev\Python\python-2.7.11\lib\random.py", line 113, in seed
a = long(_hexlify(_urandom(2500)), 16)
WindowsError: [Error -2146893795] Provider DLL failed to initialize correctly
And what solved it for me was a comment from http://bugs.python.org/issue1384175 (http://bugs.python.org/msg248947), saying the following:
This happened at a call to `os.urandom` for me.
This was in a subprocess.
The bug for me was that I called `_subprocess.CreateProcess`
with an `env_mapper = {'foo': 'bar'}`. The fix:
env_mapper = os.environ.copy()
env_mapper.update({'foo': 'bar'})

I think the minimal solution is to include the SYSTEMROOT environment variable in the Python subprocess.
I have seen the problem when trying to load os.urandom:
self._authkey = AuthenticationString(os.urandom(32))
WindowsError: [Error -2146893795] Provider DLL failed to initialize correctly
It turns out that the _PyOS_URandom on Windows relies on the SYSTEMROOT environment to be set. See: http://bugs.python.org/issue1384175#msg248951 for a detailed explaination

This seems to occur somewhere inside os.urandom and is probably caused by some missing or incorrect environment variables. In particular it happens if the environment is too long.
If you are starting Python from a shell, open a new shell and try again. If the problem persists, check if there are unusually many environment variables
If you are starting Python from another process, check if the proces environment is set up correctly. I found that this is often not the case for processes run by Apache's CGI module.
if you are starting Python as a CGI process, then you may want to consider better alternatives, such as mod_wsgi.

Plone 3.1.2 - TypeError in ATDocument.getText() method

My task is to unload content from a Plone 3.1.2 website and load information about the content to an SQL database + file system
I've recreated the website, got access to ZODB and recreated object and folder structure. I am also able to read properties of folders, files and documents. I can't get the .getText() method of ATDocument to work. The Traceback looks like this:
Traceback (most recent call last):
File "C:\Users\jan\Eclipse_workspace\Plone\start.py", line 133, in ?
main()
File "C:\Users\jan\Eclipse_workspace\Plone\start.py", line 118, in main
print dokument.getText()
File "e:\Program Files\Plone 3\Data\Products\Archetypes\ClassGen.py", line 54, in generatedAccessor
File "e:\Program Files\Plone 3\Data\Products\Archetypes\BaseObject.py", line 828, in Schema
TypeError: ('Could not adapt', <ATDocument at /*object_path*>, <InterfaceClass Products.Archetypes.interfaces._schema.ISchema>)
I suspect that there is a problem with connecting the object to interface ISchema, but I've never worked with Plone before and don't know it's object model.
Any suggestions what might be wrong or missing, how can I fix it and/or what to do next? I suspect that I have to connect ISchema interface class with this object somehow, but have no idea where to start. Any suggestions?
I'll be greatful for any help since I'm stuck for 2 days now and not moving forward.
I know nothing about ZCML format or how to edit it.
Because after >>> print dokument.getText() in debug mode the script jumps to makeMethod() method in Generator class I assume that the script doesn't execute .getText() but tries to create this method instead.
Since inspect.getmembers(dokument) returns a getText() method I'm really confused.
Do you know in which ZCML file might be related to ATDocument class? Or where can I look for any information on this subject?
My start.py file doesn't do much else than the following imports:
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
from OFS.Application import Application
from BTrees import OOBTree
from Products.CMFPlone.Portal import PloneSite
then it gets access to dokument object and tries to execute .getText()
Edit 213-03-26 15:27 (GMT):
About the .zcml files
The site I've received was 3 folders: Products (extracted to \Plone 3\Data), lib and package-includes.
Inside the lib there is python folder containing 3 subfolders: 'common', 'abc' and 'def' (names changed not to release client's information). Each of these subfolders contains a configure.zcml file, one of these also includes override.zcml file.
In the folder package-includes there are 4 files, each of them 1 line long. They contain the following lines:
<include package="abc" file="configure.zcml" />
<include package="def" file="overrides.zcml" />
<include package="common" file="configure.zcml" />
<include package="def" file="configure.zcml" />
These zcml files are not copied at the moment. Where can I copy these to have these imported?

You are missing component registrations, usually registered when loading the ZCML files in a site.
You want to end up with the possibility to run bin/instance run yourscript.py instead, which leaves all the tedious site and ZCML loading to Zope.
Once you have that running reliably, you can then access the site in a script that sets up the local component manager and a security manager:
from zope.app.component.hooks import setSite
from Testing.makerequest import makerequest
from AccessControl.SecurityManagement import newSecurityManager
site_id = 'Plone' # adjust to match your Plone site object id.
admin_user = 'admin' # usually 'admin', probably won't need adjusting
app = makerequest(app)
site = app[site_id]
setSite(site)
user = app.acl_users.getUser(admin_user).__of__(site.acl_users)
newSecurityManager(None, user)
# `site` is your Plone site, now correctly set up
Save this script somewhere, and run it with:
bin/instance run path/to/yourscript.py

The way you are starting your task is not the good one.
You are trying to use the API without the framework setup. It's possible but you have to know the framework very well (load the persistent sitemanager, ...)
You should add a 'browser view' and call it to export your content.
You can do that by:
create your own addon and install it
modify an installed addon (hey it's temporary work after all)
You will find documentation about browserview and plone at http://developer.plone.org
Sorry but if you need to develop for Plone you will need to read a bit about all this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.