Apache Beam in Python, error with beam.io.TextFileSource - python

I'm trying to run the code in the Data Science on GCP repo and keep hitting an error in the Beam code.
This is the line that gives an error:
beam.Read(beam.io.TextFileSource('airports.csv.gz')
Here's the error I'm getting:
AttributeError: 'module' object has no attribute 'TextFileSource'
Here's the complete file:
https://github.com/GoogleCloudPlatform/data-science-on-gcp/blob/master/04_streaming/simulate/df01.py
Does anyone know how to get this working, or what I'm missing?

Google Dataflow is migrating to the Apache Beam standard which means you should be using apache_beam.io.textio.ReadFromText. The standard is still evolving so it's best to consult the Release Notes whenever you upgrade the package.

It appears that you are using an older version of apache-beam/cloud-dataflow.
Do:
pip freeze | grep dataflow
When I do this, I get:
google-cloud-dataflow==0.4.3
If your version you get is older, try:
pip install google-cloud-dataflow
and repeat the pip freeze command. If you keep getting an older version, then you are in Python library hell and I suggest using virtualenv to ensure that you are using the latest version of all packages ...

Related

Apache Beam Error: Unable to get file system for GCS

I'm trying to write to GCS bucket via Beam (and TF Transform). But I keep getting the following error:
ValueError: Unable to get the Filesystem for path [...]
The answer here and some other sources suggest that I need to pip install aache-beam[gcp] to get a different variant of Apache Beam that works with GCP.
So, I tried changing the setup.py of my training package as:
REQUIRED_PACKAGES = ['apache_beam[gcp]==2.14.0', 'tensorflow-ranking', 'tensorflow_transform==0.14.0']
which didn't help. I also tried adding the following to the beginning of my code:
subprocess.check_call('pip uninstall apache-beam'.split())
subprocess.check_call('pip install apache-beam[gcp]'.split())
which didn't work either.
The logs of the failed GCP job is here. The traceback and the error message appear on row 276.
I should mention that running the same code using Beam's DirectRunner and writing the outputs to local disk runs fine. But I'm now trying to switch to DataflowRunner.
Thanks.
It turns out that you need to uninstall google-cloud-dataflow in addition to installing apache-beam with the gcp option. I guess this happens because google-cloud-dataflow is installed on GCP instances by default. Not sure if the same would be true on other platforms like AWS. But anyway, here are the commands I used:
pip uninstall -y google-cloud-dataflow
pip install apache-beam[gcp]
I noticed this in the very first cell of [this notebook] (https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/10_recommend/wals_tft.ipynb).

How can I make this script run

I found this script (tutorial) on GitHub (https://github.com/amyoshino/Dash_Tutorial_Series/blob/master/ex4.py) and I am trying to run in my local machine.
Unfortunately I am having and Error
I would really appreciate if anyone can help me to run this script.
Perhaps this is something easy but I am new in coding.
Thank you!
You probably just need to pip install the dash-core-components library!
Take a look at the Dash Installation documentation. It currently recommends running these commands:
pip install dash==0.38.0 # The core dash backend
pip install dash-html-components==0.13.5 # HTML components
pip install dash-core-components==0.43.1 # Supercharged components
pip install dash-table==3.5.0 # Interactive DataTable component (new!)
pip install dash-daq==0.1.0 # DAQ components (newly open-sourced!)
For more info on using pip to install Python packages, see: Installing Packages.
If you have run those commands, and Flask still throws that error, you may be having a path/environment issue, and should provide more info in your question about your Python setup.
Also, just to give you a sense of how to interpret this error message:
It's often easiest to start at the bottom and work your way up.
Here, the bottommost message is a FileNotFound error.
The program is looking for the file in your Python37/lib/site-packages folder. That tells you it's looking for a Python package. That is the directory to which Python packages get installed when you use a tool like pip.

Google AdWords API v201802 via Python3 GoogleAds library "ReportDownloadError.INVALID_REPORT_DEFINITION_XML"

Problem Description:
Using a Google API development key for AdWords. Approved for Standard Access.
Using the latest version of AdWords API: v201802.
Using the googleads Python module to create all my report definitions and subsequent downloads.
Works on Local, Staging, Demo, and Production environments with the same library code.
Does not work on the Development server.
All servers are running Ubuntu 17.10, exception Production which is running 16.10 LTS.
In the requirements.txt file, googleads>=4.7.0 (this might be the core issue, maybe the version needs to be updated. However, this doesn't explain why the other servers work and Development doesn't.
Doing a pip freeze | grep googleads results in googleads==10.1.0, which should be the latest version.
The error I'm getting:
ReportDownloadError.INVALID_REPORT_DEFINITION_XML
Trigger: Invalid ReportDefinition Xml: cvc-enumeration-valid:
Value '('CUSTOM_DATE',)' is not facet-valid with respect to enumeration
'[TODAY, YESTERDAY, LAST_7_DAYS, LAST_WEEK, LAST_BUSINESS_WEEK,
THIS_MONTH, LAST_MONTH, ALL_TIME, CUSTOM_DATE, LAST_14_DAYS,
LAST_30_DAYS, THIS_WEEK_SUN_TODAY, THIS_WEEK_MON_TODAY,
LAST_WEEK_SUN_SAT]'. It must be a value from the enumeration.
Any ideas or suggestions will help greatly!
UPDATE: Updated pip using pip install -r requirements.txt --upgrade and now pip freeze | grep googleads results in googleads==11.0.1 and now my local branch is also not working. So the issue seems to be in the version of googleads. Reverting to 10.1.0 as the last known stable version.

Using Flask and Jython under Windows 7

Hello everyone,
So I have been trying to use Jython to connect to an API Rest and retrieve some information. Now I want to use the Flask Framework with it. I have been trying to install the Flask with Jython but it does not seem to work at all. I am working on a Windows 7 machine and the problem for me is also that I can not download directly from the internet. For all other framework I used python wheels and installed these with Jython which worked fine.
I already tried to following commands and got these errors:
First error that I got was that it could not find the 'init.py' file in the flask folder so I changed the path in the file to the total path. But it just continued to give me more errors.
jython -m pip install '*.whl
Screenshot of the command line ouput of the error
pip install '*.whl (same as above)
I am a little stuck here and I hope that someone has an idea on how to solve this problem.
Big thanks already!!
This appears to be a bug with Jython 2.7.0. See this error report in pip and this one in Jython.
The second of those indicates that it is fixed in the 2.7.1 release candidate.

Python 2.4 - Installing Modules Syntax Error

I'm currently teaching myself Python and am working my way through automatetheboringstuff.com. I'm having trouble installing third party modules in Python 2.4 (this is the newest version I have available to me at work). For instance, when I try to install the requests module so I can work with web pages, I get an invalid syntax error. Here's the procedure I'm using:
Open command line and cd to the folder where the setup.py file for the module is
Type into the command line: setup.py install
Then I get the following error:
File "C:\Users\Username\Desktop\PyRequestsModule\setup.py", line 52
with open('requests/_init__py', 'r') as fd:
^
syntaxError: invalid syntax
I get a similar error every time I try to install a module. Is the issue that I'm running Python 2.4 or am I doing something wrong?
Please do not learn Python 2, if you do not have to use this version for legacy reasons. Its support will end 2020 and most libraries are available for Python 3 too. Some even only for Python 3.
If you can't install a newer version, you can try to use a portable one.

Categories