Airflow using GCP. Unable to ping external IP Address within Airflow DAG - python

Background
I have created an Airflow webserver using a Composer Environment within Google Cloud Platform. i.e. 3 nodes, composer-1.10.0-airflow-1.10.6 image version, machine type n1-standard-1.
I have not yet configured any networks for this environment.
The Airflow works fine for simple test DAGs, i.e.:
The problem
I wrote a ping_ip DAG for determining whether a physical machine (i.e. my laptop) is connected to the internet. (Code: https://pastebin.com/FSBPNnkP)
I tested the python used to ping the machine locally (via response = os.system("ping -c 1 " + ip_address)) and it returned 0, aka Active Network.
When I moved this code into an Airflow DAG, the code ran fine, but this time returned 256 for the same IP address.
Here's the DAG code in a pastebin:
https://pastebin.com/FSBPNnkP
Here are the Airflow Logs for the triggered DAG pasted above:
[2020-04-28 07:59:35,671] {base_task_runner.py:115} INFO - Job 2514: Subtask ping_ip 1 packets transmitted, 0 received, 100% packet loss, time 0ms
[2020-04-28 07:59:35,673] {base_task_runner.py:115} INFO - Job 2514: Subtask ping_ip [2020-04-28 07:59:35,672] {logging_mixin.py:112} INFO - Network Error.
[2020-04-28 07:59:35,674] {base_task_runner.py:115} INFO - Job 2514: Subtask ping_ip [2020-04-28 07:59:35,672] {python_operator.py:114} INFO - Done. Returned value was: ('Network Error.', 256)
I guess I have Networking issues for external IPs in my server.
Does anybody know how to ping an external IP from within an Airflow Service managed by GCP?
The end goal is to create a DAG that prompts a physical machine to run a python script. I thought this process should start with a simple sub-DAG that checks to see if the machine is connected to the internet. If I'm going about this the wrong way, please lemme know.

What worked for me is removing the response part. Here's the code:
import os
def ping_ip():
ip_address = "8.8.8.8" # My laptop IP
response = os.system("ping -c 1 " + ip_address)
if response == 0:
pingstatus = "Network Active."
else:
pingstatus = "Network Error."
print("\n *** Network status for IP Address=%s is : ***" % ip_address)
print(pingstatus)
return pingstatus
print(ping_ip())

Let me give my opinion.
Composer by default uses the default network that contains a firewall rule that allow ICMP protocol (ping). So, I think any public IP should be reached out, for example, when PYPI packages are installed you usually don't configure anything special, the PYPI repositories are accessible.
A machine that has Internet access not necessarily means that it has a public IP, e.g. it can be under NAT or any other network configuration (network is not my expertise). To ensure you are specifying the public address of your Internet connection, you can use tools like https://www.myip.com/, where you will see the Public IP (e.g 189.226.116.31) and your Host IP (e.g. 10.0.0.30), if you get something similar, you will need to use the public one.
If you are using the Host IP, it is possible that it is working locally as that IP is reachable from the same private network you are in, the traffic is not going outside the network. But in the case of Composer where your DAG was uploaded, the nodes are completely outside of your local network.
I didn't find what the ping code 256 could mean, but if you are using the correct public IP, you can try increasing the timeout of the response with -W, it is probably only taking more time to reach out the IP.

The VMs created by Composer are unlikely to have "ping" installed. These are standard images. I think you are basically invoking the Linux "ping" command and it fails because it is not installed in the node/vm. So you need to change your implementation to "ping" the server another way.
You can SSH to the Composer node VMs and install "ping" and then rerun the DAG. But even if it works I would not consider it a clean solution and it will not scale. But it is okay to do this for a pilot.
Lastly, if your goal is to execute a Python script have you thought of using a Python Operator from within a DAG. If you want to somehow decouple the execution of Python script from the DAG itself, an alternative is to use a PubSub + CloudFunction combination.
Other probable causes for being unable to reach External IPs is misconfigured firewall rules. To fix this you must:
Define an egress firewall rule to enable ping to your destination IP and attach the firewall rule to a "tag".
Make sure you attach the same "tag" to the VMs/nodes created for Composer.

Related

Cloud function 2gen failed to start and listen on the port defined provided by the PORT=8080 environment variable

When running my code i receive the following error:
Ready condition status changed to False for Service function with message: Revision is not ready and cannot serve traffic. The user-provided container failed to start and listen on the port defined provided by the PORT=8080 environment variable. Logs for this revision might contain more information.
The function makes 1300 api requests to mercadolibre. The logs shows the response is ok but it throws the message before finishing all the requests.
I set the cf with 1 gb of memory and the max timeout.
Is it possible to use a cf to execute this kind of code or should I use another google product? In that case which one do you recommend?

Accessing AWS ElastiCache (Redis CLUSTER mode) from different AWS accounts via AWS PrivateLink

I have a business case where I want to access a clustered Redis cache from one account (let's say account A) to an account B.
I have used the solution mentioned in the below link and for the most part, it works Base Solution
The base solution works fine if I am trying to access the clustered Redis via redis-py however if I try to use it with redis-py-cluster it fails.
I am testing all this in a staging environment where the Redis cluster has only one node but in the production environment, it has two nodes, so the redis-py approach will not work for me.
Below is my sample code
redis = "3.5.3"
redis-py-cluster = "2.1.3"
==============================
from redis import Redis
from rediscluster import RedisCluster
respCluster = 'error'
respRegular = 'error'
host = "vpce-XXX.us-east-1.vpce.amazonaws.com"
port = "6379"
try:
ru = RedisCluster(startup_nodes=[{"host": host, "port": port}], decode_responses=True, skip_full_coverage_check=True)
respCluster = ru.get('ABC')
except Exception as e:
print(e)
try:
ru = Redis(host=host, port=port, decode_responses=True)
respRegular = ru.get('ABC')
except Exception as e:
print(e)
return {"respCluster": respCluster, "respRegular": respRegular}
The above code works perfectly in account A but in account B the output that I got was
{'respCluster': 'error', 'respRegular': '123456789'}
And the error that I am getting is
rediscluster.exceptions.ClusterError: TTL exhausted
In account A we are using AWS ECS + EC2 + docker to run this and
In account B we are running the code in an AWS EKS Kubernetes pod.
What should I do to make the redis-py-cluster work in this case? or is there an alternative to redis-py-cluster in python to access a multinode Redis cluster?
I know this is a highly specific case, any help is appreciated.
EDIT 1: Upon further research, it seems that TTL exhaust is a general error, in the logs the initial error is
redis.exceptions.ConnectionError:
Error 101 connecting to XX.XXX.XX.XXX:6379. Network is unreachable
Here the XXXX is the IP of the Redus cluster in Account A.
This is strange since the redis-py also connects to the same IP and port,
this error should not exist.
So turns out the issue was due to how redis-py-cluster manages host and port.
When a new redis-py-cluster object is created it gets a list of host IPs from the Redis server(i.e. Redis cluster host IPs form account A), after which the client tries to connect to the new host and ports.
In normal cases, it works as the initial host and the IP from the response are one and the same.(i.e. the host and port added at the time of object creation)
In our case, the object creation host and port are obtained from the DNS name from the Endpoint service of Account B.
It leads to the code trying to access the actual IP from account A instead of the DNS name from account B.
The issue was resolved using Host port remapping, here we bound the IP returned from the Redis server from Account A with IP Of Account B's endpoints services DNA name.
Based on your comment:
this was not possible because of VPCs in Account-A and Account-B had the same CIDR range. Peered VPCs can’t have the same CIDR range.
I think what you are looking for is impossible. Routing within a VPC always happens first - it happens before any route tables are considered at all. Said another way, if the destination of the packet lies within the sending VPC it will never leave that VPC because AWS will try routing it within its own VPC, even if the IP isn't in use at that time in the VPC.
So, if you are trying to communicate with a another VPC which has the same IP range as yours, even if you specifically put a route to egress traffic to a different IP (but in the same range), the rule will be silently ignored and AWS will try to deliver the packet in the originating VPC, which seems like it is not what you are trying to accomplish.

Access dashboard on AWS ec2 local cluster

I am running a Dask script on a EC2 instance from AWS. I would like to connect and see the dashboard provided by Dask, but I can't figure out how.
I am creating a Local Cluster on my EC2 instance, the scripts runs fine and I am connecting to my AWS instance via Putty. However, I would like to see the available dashboard: on my PC it is enough to connect to the provided IP and port, but I am not able to do that on the AWS machine.
Once the script is running, this is my output for the "parameters" of the local cluster:
<Client: 'inproc://172.31.29.4/7475/1' processes=1 threads=8, memory=27.94 GiB>
LocalCluster(b8be08dd, 'inproc://172.31.29.4/7475/1', workers=1, threads=8, memory=27.94 GiB)
dashboard address: {'dashboard': 8787}
For example, I tried to write 172.32.29.4:8787/status on my browser, but I wasn't able to connect to the dashboard.
I already checked this question: How to view Dask dashboard when running on a virtual machine? However I am using a Local Cluster and I would like to connect to its dashboard from remote. Is it possible? If so, how?
The answer is in the comments, but I will type it out here, so that the original question looks "answered".
You need two things to connect to a port on an EC2 machine: the external IP, and access. The former is most easily found from the AWS console. For the latter, you typically need to edit the security group to add an inbound TCP rule for the port (either open to the world, or just your IP). There are other ways to do this part, depending on whether your machine is inside a VPC, has any custom gateways or routers... but if you don't know what that means, find the security group first. Both the public IP and the security group will be linked from the machine's row in the EC2 "running instances" list.
I've setup dask-labextension visualizations to provide this type of UI.
Create the client object:
from dask.distributed import Client
client = Client()
Then click the magnifying glass provided by the extension to automatically connect with the cluster.
The detailed instructions are in this post.

Trouble connecting with Connman using dbus, but only the first time

I've been trying to use various Python libraries for working with Connman and the dbus, particularly this sample code:
https://github.com/liamw9534/pyconnman/blob/master/demo/demo.py
The problem I have is that when connecting to a WPA2 access point for the very first time, I will always receive a timeout message. For example:
CONN> list-services
CONN> agent-start /test/agent ssid=myNetwork passphrase=myPassphrase
CONN> service-connect /net/connman/service/wifi_xxxxx__managed_psk
Eventually this is the message I receive back from the interface:
Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken
I can confirm at this point that Connman has no connected to a wifi network or obtained an IP address. The only way I can manage to get this to work is by using the Connman application itself from a Linux terminal:
connmanctl
connmanctl> agent on
connmanctl> connect wifi_xxxxx__managed_psk
Agent RequestInput wifi_xxxxx__managed_psk
Passphrase = [ Type=psk, Requirement=mandatory ]
Passphrase? myPassword
connmanctl> Connected wifi_xxxxx__managed_psk
This creates a settings folder under /var/lib/connman for the wifi network. I can now use the demo.py script mentioned above to either disconnect or reconnect.
Connman is still a bit of a mystery to me in many ways, and I'm not sure why I have to use the interactive shell to connect to a network for the first time. Any ideas?
In case you're still looking for the answer :
Connman needs an agent to answer the security questions (in the WPA2: it's the password). You can either run an agent and reply to Connman questions or you can create a file in /var/lib/connman with the right keys. See here. Once a file is created or deleted Connman will auto magically act accordingly (try to connect or disconnect.
A basic file would look like:
[service_mywificonfig]
Type = wifi
Security = wpa2
Name = myssid
Passphrase = yourpass

difference between finding ip address by python an cmd

I wrote this code for finding google ip in python
import socket
print socket.gethostbyname('google.com')
.
.
173.194.39.0
but if we use command prompt and ping command for finding google ip result is:216.58.208.36
why there is difference between two results?
Both of those IP addresses resolve to Google.com. We can verify this from the command line with the unix whois command.
$ whois 216.58.208.36
NetRange: 216.58.192.0 - 216.58.223.255
CIDR: 216.58.192.0/19
NetName: GOOGLE
$ whois 173.194.39.0
NetRange: 173.194.0.0 - 173.194.255.255
CIDR: 173.194.0.0/16
NetName: GOOGLE
I ran into this same issue and the cause was that the first command that required an IP address was using a cached DNS entry (because the DNS entry's time to live (TTL) hadn't expired yet) and then by the time the second command was issued the TTL had expired on the cached entry so a new DNS request was made for the domain therefore grabbing a new IP address from the DNS server which happened to be different because the domain had a lot of IP addresses just like Google.com.
Python just relies on the Operating System's DNS resolver (or whatever daemon is running) and as far as I know the socket module doesn't give you the ability to clear the DNS cache before it tries to resolve an address. If you want more control over this functionality you can use DNSPython or something similar. If you are using a daemon for DNS on your operating system (like on Linux, for example) then usually restarting the daemon will force a flush of DNS cache and you find both addresses to the be same (unless you run into the timing issue as described above with the TTL's expiring).
Hostnames are translated to IP addresses through something called a DNS server. When you type a name into a web browser or use a program such as ping, the hostname that you provide (google.com) eventually reaches an authoritative DNS server for that domain-separate from the server that you correspond with for the actual content.
google.com has multiple different servers that can respond to data requests. Depending on the implementation of the different programs you are using to generate the request and other factors such as the network traffic at the time that you make the request, multiple requests from the same host may be directed to different servers by the authoritative DNS server. This is accomplished by returning different IP addresses to your machine.
FWIW, both ping and socket.gethostbyname() for google.com resolve to 216.58.217.14 on my machine, running OS X Yosemite.

Categories