gremlin in AWS Glue with PySpark - python

I am trying to access gremlin via AWS Glue with PySpark as Runtime. As gremlinpython is external library, i had downloaded .whl file and placed in AWS S3. Now it was asking for "aenom" did the same. then isodate is required. So just wanted to know if there is anypackage which i can use instead of having separate modules.
Below is the sample script i am testing initially with all modules to keep it simple.
import boto3
import os
import sys
import site
import json
import pandas as pd
#from setuptools.command import easy_install
from importlib import reload
from io import StringIO
s3 = boto3.client('s3')
#dir_path = os.path.dirname(os.path.realpath(__file__))
#os.path.dirname(sys.modules['__main__'].__file__)
#install_path = os.environ['GLUE_INSTALLATION']
#easy_install.main( ["--install-dir", install_path, "gremlinpython"] )
#(site)
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.traversal import T, Column
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
required libraries are below, after that there is no error related to modules.
tornado-6.0.4-cp35-cp35m-win32.whl
isodate-0.6.0-py2.py3-none-any.whl
aenum-2.2.4-py3-none-any.whl
gremlinpython-3.4.8-py2.py3-none-any.whl

Related

How to resolve PyDrive authentication permanantly

I had one more problem with my app on stremalit. I am using PyDrive library to connect with google drive where I am uploading photos for educational purposes. The problem is that every week the app is throwing authentication error. The app is on the streamlit cloud. In my github directory I have clients_secrets.json file, creds.json file and also settings.yaml file.
Here is my code:
import streamlit as st
import pandas as pd
import requests
import sqlalchemy as db
#import matplotlib.pyplot as plt
import leafmap.foliumap as leafmap
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import text
from plotly import graph_objects as go
from plotly.subplots import make_subplots
import branca
#import os
from pathlib import Path
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import streamlit as st
#import logging
import numpy as np
from PIL import Image
#logging.getLogger('googleapicliet.discovery_cache').setLevel(logging.ERROR)
gauth = GoogleAuth()
gauth.CommandLineAuth()
drive = GoogleDrive(gauth)
What I am doing at the moment is changing creds.json file the moment my credentials are expired. But I wish that I won't need to do it all the time. Any idea how can I make authentication process automatic without the need to change the creds.json file every week?
I tried to change creds.json file every week

Cant create s3 session

i am trying to download some data from our s3 server and im not being able to create the session.
i am running the following code:
session = boto3.Session(
aws_access_key_id = "###########",
aws_secret_access_key = "###########",
)
s3 = session.resource('s3')
bucket = s3.Bucket('########')
file_names = []
but it spits out the following error:
DataNotFoundError: Unable to load data for: sdk-default-configuration
These are my imports:
import pandas as pd
import mysql.connector
import boto3
import s3fs
import botocore
import pandas as pd
import os
and my versions of boto3 and botocore installed are boto3-1.20.44 and botocore-1.23.44
I have tried downloading different versions of boto3 and botocore with no success...
The problem appears to be in your session constructor:
boto3.Session(aws_access_key_id=a, aws_secret_access_key=b)
It should instead read as follows, per the documentation:
boto3.session.Session(aws_access_key_id=a, aws_secret_access_key=b)

What is the json file I need to read?

enter image description hereI need to download satellite images using python. I have found a code in GitHub but I did not understand what at this line. Please help me what it exactly is.
Visit https://github.com/kscottz/PythonFromSpace/blob/master/TheBasics.ipynb
import sys
import os
import json
import scipy
import urllib
import datetime
import urllib3
import rasterio
import subprocess
import numpy as np
import pandas as pd
import seaborn as sns
from osgeo import gdal
from planet import api
from planet.api import filters
from traitlets import link
import rasterio.tools.mask as rio_mask
from shapely.geometry import mapping, shape
from IPython.display import display, Image, HTML
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
urllib3.disable_warnings()
from ipyleaflet import (
Map,
Marker,
TileLayer, ImageOverlay,
Polyline, Polygon, Rectangle, Circle, CircleMarker,
GeoJSON,
DrawControl
)
%matplotlib inline
# will pick up api_key via environment variable PL_API_KEY
# but can be specified using `api_key` named argument
api_keys = json.load(open("apikeys.json",'r'))
client = api.ClientV1(api_key=api_keys["PLANET_API_KEY"])
# Make a slippy map to get GeoJSON
api_keys = json.load(open("apikeys.json",'r'))
client = api.ClientV1(api_key=api_keys["PLANET_API_KEY"])
What is the meaning of these two lines. What file should I upload for apikeys.json
You should follow this link to get an API Key.
https://support.planet.com/hc/en-us/articles/212318178-What-is-my-API-key-
apikeys.json is a JSON file of following format/content in json:
{"PLANET_API_KEY":"<Some API Key here>"}
json.load(...) API loads this json file as a dictionary

Issue importing another class in the same directory

I'm trying to import a class Parser from a file parser.py in the same directory. However I keep getting the following error whenever I try to instantiate a Parser object:
File "bot.py", line 4, in <module>
from parser import Parser
ImportError: cannot import name Parser
I'm instantiating the object in the following way:
parser = Parser();
The same script works at my friend's setup so I'm not sure if it's a problem with the code. my version of python is 2.7.13. I'm including the imports below:
bot.py:
from parser import Parser
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters
import logging
import csv
import os, sys, types
from random import randint
import urllib2
import json
import string
from bs4 import BeautifulSoup
import requests
parser.py:
import re
class Parser:
.....
##parser stufff
Did you try passing a relative path like
from .parser import Parser

Engine object has no attribute cursor

I am running this code to import multiple file into a table in mysql, and the return is Engine object has no attribute cursor, I find many similiar topic, the answer is about pandas version, but I use pandas 0.19 so it might not be the reason. Could anyone help? Or are there any other way to import multiple text file into mysql
import MySQLdb
import os
import glob
import pandas
import datetime
import MySQLdb as mdb
import requests
from sqlalchemy import create_engine
engine =create_engine("mysql://vn_user:thientai3004#127.0.0.1/vietnam_stock?charset=utf8")
indir='E:\DataExport'
os.chdir(indir)
fileList=glob.glob('*.txt')
dfList = []
colnames= ['Ticker','Date','Open','High','Low','Close','Volume']
for filename in fileList:
print(filename)
df = pandas.read_csv(filename,header=0)
df.to_sql('daily_price',engine, if_exists='append', index=False)

Categories