Using Benthic Golden6 "ImpExp6" tool -- I can successfully import 122+K rows of data from csv file.
Attempting to automate via .py as I have with other smaller data sets and I am encountering the exceeded table space error. I dropped everything from the user, maximizing available space just for test purposes-- continue to receive the error -- however I can use the import tool and import the 122K rows no problems.
If I can import the file manually with no issues -- should I not be able to also do so via python script? Below is the script I am using.
Note: if I use lines = [] for lines in reader: lines.append(line) it will append 5556 rows of data VS the nothing I am getting with the script below. Using Python2.7
import cx_Oracle
import csv
connection = cx_Oracle.connect('myinfo')
cursor = connection.cursor()
L=[]
reader = csv.reader(open("myfile.csv","r"))
for row in reader:
L.append(row)
cursor.execute("ALTER SESSION SET NLS_DATE_FORMAT = 'MM/DD/YYYY'")
cursor.executemany("INSERT INTO BI_VANTAGE_TEST VALUES(:25,:24,:23,:22,:21,:20,:19,:18,:17,:16,:15,:14,:13,:12,:11,:10,:9,:8,:7,:6,:5,:4,:3,:2,:1)",L)
connection.commit
I was able to automate this import using an alternate method (note keystroke commands are unique to what steps I needed to complete within the tool I was utilizing).
from pywinauto.application import Application
import pyautogui
app = Application().start("C:\myprogram.exe")
pyautogui.typewrite(['enter', 'right', 'tab'])
pyautogui.typewrite('myfile.txt')
pyautogui.typewrite(['tab'])
pyautogui.typewrite('myoracletbl')
pyautogui.typewrite(['tab', 'tab', 'tab'])
pyautogui.typewrite(['enter'])
pyautogui.typewrite(['enter'])
time.sleep(#seconds)
Application.Kill_(app)
Related
I have a python script to insert a csv file into mongodb collection
import pymongo
import pandas as pd
import json
client = pymongo.MongoClient("mongodb://localhost:27017")
df = pd.read_csv("iris.csv")
data = df.to_dict(oreint = "records")
db = client["Database name"]
db.CollectionName.insert_many(data)
Here all the columns of csv files are getting inserted into mongo collection. How can I achieve a usecase where I want to insert only specific columns of csv file in the mongo collection .
What changes I can make to existing code.
Lets say I also have database already created in my Mongo. Will this command work even if the database is present (db = client["Database name"])
Have you checked out pymongoarrow? the latest release has write support where you can import a csv file into mongodb. Here are the release notes and documentation. You can also use mongoimport to import a csv file, documentation is here, but I can't see any way to exclude fields like the way you can with pymongoarrow.
I am trying to setup a simple playground environment to use the Flink Python Table API. The Jobs I am ultimately trying to write will feed off of a Kafka or Kenesis queue, but that makes playing around with ideas (and tests) very difficult.
I can happily load from a CSV and process it in Batch mode. But I cannot get it to work in Streaming Mode. How would I do something similar but in a StreamingExecutionEnvironment (primarily so I can play around with windows).
I understand that I need to get the system to use EventTime (because ProcTime would all come in at once), but I cannot find anyway to set this up. In principle I should be able to set one of the columns of the CSV to be the event time, but it is not clear form the docs how to do this (or if it is possible).
To get the Batch execution tests running I used the below code, which reads from an input.csv and outputs to an output.csv.
from pyflink.dataset import ExecutionEnvironment
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import (
TableConfig,
DataTypes,
BatchTableEnvironment,
StreamTableEnvironment,
)
from pyflink.table.descriptors import Schema, Csv, OldCsv, FileSystem
from pathlib import Path
exec_env = ExecutionEnvironment.get_execution_environment()
exec_env.set_parallelism(1)
t_config = TableConfig()
t_env = BatchTableEnvironment.create(exec_env, t_config)
root = Path(__file__).parent.resolve()
out_path = root / "output.csv"
try:
out_path.unlink()
except:
pass
from pyflink.table.window import Tumble
(
t_env.connect(FileSystem().path(str(root / "input.csv")))
.with_format(Csv())
.with_schema(
Schema().field("time", DataTypes.TIMESTAMP(3)).field("word", DataTypes.STRING())
)
.create_temporary_table("mySource")
)
(
t_env.connect(FileSystem().path(str(out_path)))
.with_format(Csv())
.with_schema(
Schema().field("word", DataTypes.STRING()).field("count", DataTypes.BIGINT())
)
.create_temporary_table("mySink")
)
(
t_env.from_path("mySource")
.group_by("word")
.select("word, count(1) as count")
.filter("count > 1")
.insert_into("mySink")
)
t_env.execute("tutorial_job")
and input.csv is
2000-01-01 00:00:00.000000000,james
2000-01-01 00:00:00.000000000,james
2002-01-01 00:00:00.000000000,steve
So my question is how could I set it up so that it reads from the same CSV, but uses the first column as the event time and allow me to write code like:
(
t_env.from_path("mySource")
.window(Tumble.over("10.minutes").on("time").alias("w"))
.group_by("w, word")
.select("w, word, count(1) as count")
.filter("count > 1")
.insert_into("mySink")
)
Any help would be appreciated, I cant work this out from the docs. I am using python 3.7 and flink 1.11.1 .
If you use the descriptor API, you can specify a field is the event-time field through the schema:
.with_schema( # declare the schema of the table
Schema()
.field("rowtime", DataTypes.TIMESTAMP())
.rowtime(
Rowtime()
.timestamps_from_field("time")
.watermarks_periodic_bounded(60000))
.field("a", DataTypes.STRING())
.field("b", DataTypes.STRING())
.field("c", DataTypes.STRING())
)
But I still recommend you to use DDL, on the one hand it is easier to use, on the other hand there are some bugs in the existing Descriptor API, the community is discussing refactoring the Descriptor API
Have you tried using watermark strategies? As mentioned here, you need to use watermark strategies to use event time. For pyflink case, personally i think it is easier to declare it in the ddl format like this.
I want to connect Oracle Db via python and take query results data and create excel or csv reports by using these data. I never tried before and did not see anyone who did something like this around me, do you have any recommendations or ideas for that case?
Regards
You can connect Oracle db with python cx_Oracle library using syntax below for connection string. You should be aware that your connection_oracle_textfile.txt file and your .py file which had your python code must be in the samefolder for start.
connection_oracle_textfile.txt -> username/password#HOST:PORT/SERVICE_NAME(you can find all of them but username and password in tnsnames.ora file)
import cx_Oracle as cx_Oracle
import pandas as pd
def get_oracle_table_from_dbm(sql_text):
if 'connection_oracle' not in globals():
print('connection does not exist. Try to connect it...')
f = open('connection_oracle_textfile.txt', "r")
fx = f.read()
####
global connection_oracle
connection_oracle = cx_Oracle.connect(fx)
####
print('connection established!!')
print('Already have connection. Just fetch data!!')
return pd.read_sql(sql_text, con=connection_oracle)
df=get_oracle_table_from_dbm('select * from dual')
There are other stackoverflow answers to this, e.g. How to export a table to csv or excel format. Remember to tune cursor.arraysize.
You don't strictly need the pandas library for to create csv files, though you may want it for future data analysis.
The cx_Oracle documentation discussions installation, connection, and querying, amongst other topics.
If you want to read from a CSV file, see Loading CSV Files into Oracle Database.
Hi everyone,
I'm trying to wrap my head around microsoft server 2017 and python script.
In general - I'm trying to store a table I took from a website (using bs4),
storing it in a panda df , and then simply put the results in a temp sql table.
I entered the following code (I'm skipping parts of the code because the python script
does work in python. Keep in mind I'm calling the script from microsoft sql server 2017):
CREATE PROC OTC
AS
BEGIN
EXEC sp_execute_external_script
#language = N'Python',
#script = N'
import bs4 as bs
import pandas as pd
import requests
....
r = requests.get(url, verify = False)
html = r.text
soup = bs.BeautifulSoup(html, "html.parser")
data_date = str(soup.find(id="ctl00_SPWebPartManager1_g_4be2cf24_5a47_472d_a6ab_4248c8eb10eb_ctl00_lDate").contents)
t_tab1 = soup.find(id="ctl00_SPWebPartManager1_g_4be2cf24_5a47_472d_a6ab_4248c8eb10eb_ctl00_NiaROGrid1_DataGrid1")
df = parse_html_table(1,t_tab1)
print(df)
OutputDataSet=df
'
I tried the microsoft tutorials and simply couldn't understand how to
handle the inputs/outputs to get the result as a sql table.
Furthermore, I get the error
"
import bs4 as bs
ImportError: No module named 'bs4'
"
I'm obviously missing a lot here.
What am I to add to the sql code?
does the sql server even supports bs4? or only pandas?
and then I need to find another solution like write as csv?
Thanks for any help or advice you can offer
To use pip to install a Python package on SQL Server 2017:
On the server, open a command prompt as administrator.
Then cd to {instance directory}\PYTHON_SERVICES\Scripts
(for example: C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\PYTHON_SERVICES\Scripts).
Then execute pip install {package name}.
One you have the necessary package(s) installed and the script executes successfully, simply setting variable OutputDataSet to a pandas data frame will result in the contents of that data frame being returned as a result set from the stored procedure.
If you want to capture that result set in a table (perhaps a temporary table), you can use INSERT...EXEC (e.g. INSERT MyTable(Col1, Col2) EXEC sp_execute_external_script ...).
I am trying to retrieve data from National Stock Exchange for a given Scrip name.
I already have created a database name "NSE" in MySQL. But did not create any table.
Following script I am using to retrieve per minute data from the NSE website (let's say I want to retrieve data for scrip (stock) 'CYIENT'.
from alpha_vantage.timeseries import TimeSeries
import matplotlib.pyplot as plt
import sys
import pymysql
#database connection
conn = pymysql.connect(host="localhost", user="root", passwd="pwd123", database="NSE")
c = conn.cursor()
your_key = "WLLS3TVOG22C6P9J"
def stockchart(symbol):
ts = TimeSeries(key=your_key, output_format='pandas')
data, meta_data = ts.get_intraday(symbol=symbol,interval='1min', outputsize='full')
sql.write_frame(data, con=conn, name='NSE', if_exists='replace', flavor='mysql')
print(data.head())
data['close'].plot()
plt.title('Stock chart')
plt.show()
symbol=input("Enter symbol name:")
stockchart(symbol)
#commiting the connection then closing it.
conn.commit()
conn.close()
On running the above script I am getting following errors:
'sql' is not defined.
Also I am not sure if the above script will also create a table in NSE for (user input) stock 'CYIENT'.
Before answering, I hope the code is a mock, not the real code. Otherwise, I'd suggest to change your credentials.
Now, I believe you are trying to use pandas.io.sql.write_frame (for pandas<=0.13.1). However, you forgot to import the module, thus the interpreter doesn't recognize the module sql. To fix it just add
from pandas.io import sql
to the begining of the script.
Notice the parameters you use in the function call. You use if_exists='replace', so the table NSE will be dropped and recreated every time you run the function. It will contain whatever data contains.