How to specify length prefix in bcp command? - python

I'm using bcp tool to import CSV into a sql server table. I'm using python subprocess to execute the bcp command. My sample bcp command is like below:
bcp someDatabase.dbo.sometable IN myData.csv -n -t , -r \n -S mysqlserver.com -U myusername -P 'mypassword'
the command executes and says
0 rows copied.
Even if i remove the -t or -n option, the message is still the same. I read from sql server docs that there is something called length prefix(if bcp tool is used with -n (native) mode).
How can i specify that length prefix with bcp command?
My goal is to import CSV into a sql server table using bcp tool. I first create my table according to my date in the CSV file and i dont create a format file for bcp. I want all my data to be inserted correctly(according to the data type i have sepecified in my table).

If it is a csv file then do not use -n, -t or -r options. Use -e errorFileName to catch the error(s) you may be encountering. You can then take the appropriate steps.

It is a very common practice with ETL tasks to first load text files into a "load" table that has all varchar/char data types. This avoids any possible implied data conversion errors that are more difficult/time-consuming to troubleshoot via BCP. Just pass the character data in the text file into character datatype columns in SQL Server. Then you can move data from the "load" table into your final destination table. This will allow you to use the MUCH more functional T-SQL commands to handle transformation of data types. Do not force BCP/SQL Server to transform your data-types for you by going from text file directly into your final table via BCP.
Also, I would also suggest visually inspecting your incoming data file to confirm it is formatted as specified. I often see mixups betweeen \n and \r\n for line terminator.
Last, when loading the data, you should also use the -e option as Neeraj has stated. This will capture "data" errors (it does not report command/syntax errors; just data/formatting errors). Since you incoming file is an ascii text file, you DO want to use the -c option for loading into the all-varchar "load" table.

Related

Getting all pods for a container, storing them in text files and then using those files as args in single command

The picture above shows the list of all kubernetes pods I need to save to a text file (or multiple text files).
I need a command which:
stores multiple pod logs into text files (or on single text file) - so far I have this command which stores one pod into one text file but this is not enough since I will have to spell out each pod name individually for every pod:
$ kubectl logs ipt-prodcat-db-kp-kkng2 -n ho-it-sst4-i-ie-enf > latest.txt
I then need the command to send these files into a python script where it will check for various strings - so far this works but if this could be included with the above command then that would be extremely useful:
python CheckLogs.py latest.txt latest2.txt
Is it possible to do either (1) or both (1) and (2) in a single command?
The simplest solution is to create a shell script that does exactly what you are looking for:
#!/bin/sh
FILE="text1.txt"
for p in $(kubectl get pods -o jsonpath="{.items[*].metadata.name}"); do
kubectl logs $p >> $FILE
done
With this script you will get the logs of all the pods in your namespace in a FILE.
You can even add python CheckLogs.py latest.txt
There are various tools that could help here. Some of these are commonly available, and some of these are shortcuts that I create my own scripts for.
xargs: This is used to run multiple command lines in various combinations, based on the input. For instance, if you piped text output containing three lines, you could potentially execute three commands using the content of those three lines. There are many possible variations
arg1: This is a shortcut that I wrote that simply takes stdin and produces the first argument. The simplest form of this would just be "awk '{print $1}'", but I designed mine to take optional parameters, for instance, to override the argument number, separator, and to take a filename instead. I often use "-i{}" to specify a substitution marker for the value.
skipfirstline: Another shortcut I wrote, that simply takes some multiline text input and omits the first line. It is just "sed -n '1!p'".
head/tail: These print some of the first or last lines of stdin. Interesting forms of this take negative numbers. Read the man page and experiment.
sed: Often a part of my pipelines, for making inline replacements of text.

Python-sqlite-how to save a database output to a text file

I have a python program which uses SQLite features. The program lists First name, last name, and type of pet. The program is saved as a DB file called pets.db. I want to be able to convert this database into text. To do this, I tried to use a dump statement in command prompt. Here is my output:
sqlite> .output file location of pets.db
Usage: .output FILE
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
COMMIT;
sqlite>.exit
However, pets.txt does not exist when I type
dir pets.txt /s /p
in command prompt.
Any suggestions? I used http://www.sqlitetutorial.net/sqlite-dump/ as a guide.
Your .output command is slightly off. Based on your comment, it sounds like you're typing .output file $(location of pets.db). This isn't how the command works.
First off, open the database you want to dump with the command sqlite3 pets.db. This will open your databse. You can ensure you have the database you want by using the command .tables. If you see tables in the database, you know you've opened it correctly. If not, the command won't display anything.
Once you've opened the file, .output $(filename).txt will actually set the output to a specified file. Now, you can use the .dump command. It'll take a moment for the driver to actually write the whole db if it's somewhat large.
Once the file is finished writing, you can exit with .exit.

Using Postgres's COPY FROM file query in Python without writing to a temporary file

I need to load data from some source data sources to a Postgres database.
To do this task, I first write the data to a temporary CSV file and then load data from the CSV file to Postgres database using COPY FROM query. I do all of this on Python.
The code looks like this:
table_name = 'products'
temp_file = "'C:\\Users\\username\\tempfile.csv'"
db_conn = psycopg2.connect(host, port, user, password, database)
cursor = db_conn.cursor()
query = """COPY """ + table_name + """ FROM """ + temp_file + " WITH NULL AS ''; """
cursor.execute(query)
I want to avoid the step of writing to the intermediate file. Instead, I would like to write to a Python object and then load data to postgres database using COPY FROM file method.
I am aware of this technique of using psycopg2's copy_from method which copies data from a StringIO object to the postgres database. However, I cannot use psycopg2 for a reason and hence, I don't want my COPY FROM task to be dependent on a library. I want it to be Postgres query which can be run by any other postgres driver as well.
Please advise a better way of doing this without writing to an intermediate file.
You could call the psql command-line tool from your script (i.e. using subprocess.call) and leverage its \copy command, piping the output of one instance to the input of another, avoiding a temp file. i.e.
psql -X -h from_host -U user -c "\copy from_table to stdout" | psql -X -h to_host -U user -c "\copy to_table from stdin"
This assumes the table exists in the destination database. If not, a separate command would first need to create it.
Also, note that one caveat of this method is that errors from the first psql call can get swallowed by the piping process.
psycopg2 has integrated support for the COPY wire-protocol, allowing you to use COPY ... FROM STDIN / COPY ... TO STDOUT.
See Using COPY TO and COPY FROM in the psycopg2 docs.
Since you say you can't use psycopg2, you're out of luck. Drivers must understand COPY TO STDOUT / COPY FROM STDIN in order to use them, or must provide a way to write raw data to the socket so you can hijack the driver's network socket and implement the COPY protocol yourself. Driver specific code is absolutely required for this, it is not possible to simply use the DB-API.
So khampson's suggestion, while usually a really bad idea, seems to be your only alternative.
(I'm posting this mostly to make sure that other people who find this answer who don't have restrictions against using psycopg2 do the sane thing.)
If you must use psql, please:
Use the subprocess module with the Popen constructor
Pass -qAtX and -v ON_ERROR_STOP=1 to psql to get sane behaviour for batching.
Use the array form command, e.g. ['psql', '-v', 'ON_ERROR_STOP=1', '-qAtX', '-c', '\copy mytable from stdin'], rather than using a shell.
Write to psql's stdin, then close it, and wait for psql to finish.
Remember to trap exceptions thrown on command failure. Let subprocess capture stderr and wrap it in the exception object.
It's safer, cleaner, and easier to get right than the old-style os.popen2 etc.

Export & Map CSV output to MySQL table using Python

I have a multiple clients to single server bidirectional iperf set-up for network monitoring. The iperf server runs well and displays output in CSV format based on the cron jobs written on the client end.
I wish to write a python script to automate the process of mapping these CSV outputs to a MySQL database; which in turn would be updated and saved at regular intervals without need of human intervention.
I am using a Ubuntu 13.10 machine as the iperf server. Following is a sample CSV output that I get. This is not being stored to a file, just being displayed on screen.
s1:~$ iperf -s -y C
20140422105054,172.16.10.76,41065,172.16.10.65,5001,6,0.0-20.0,73138176,29215083
20140422105054,172.16.10.76,5001,172.16.10.65,56254,4,0.0-20.0,46350336,18502933
20140422105100,172.16.10.76,54550,172.16.10.50,5001,8,0.0-20.0,67895296,27129408
20140422105100,172.16.10.76,5001,172.16.10.50,58447,5,0.0-20.1,50937856,20292796
20140422105553,172.16.10.76,5001,172.16.10.65,47382,7,0.0-20.1,51118080,20358083
20140422105553,172.16.10.76,41067,172.16.10.65,5001,5,0.0-20.1,76677120,30524007
20140422105600,172.16.10.76,5001,172.16.10.50,40734,4,0.0-20.0,57606144,23001066
20140422105600,172.16.10.76,54552,172.16.10.50,5001,8,0.0-20.0,70123520,28019115
20140422110053,172.16.10.76,41070,172.16.10.65,5001,5,0.0-20.1,63438848,25284066
20140422110053,172.16.10.76,5001,172.16.10.65,46462,6,0.0-20.1,11321344,4497094
The fields I want to map them to are: timestamp, server_ip, server_port, client_ip, client_port, tag_id, interval, transferred, bandwidth
I want to map this CSV output periodically to a MySQL database, for which I do understand that I would have to write a Python script (inside a cron job) querying and storing in MySQL database. I am a beginner at Python scripting and database queries.
I went through another discussion on Server Fault at [https://serverfault.com/questions/566737/iperf-csv-output-format]; and would like to build my query based on this.
Generate SQL script, then run it
If you do not want to use complex solutions like sqlalchemy, following approach is possible.
having your csv data, convert them into SQL script
use mysql command line tool to run this script
Before you do it the first time, be sure you create needed database structure in the database (this I leave to you).
My following sample uses (just for my convenience) package docopt, so you need installing it:
$ pip install docopt
CSV to SQL script conversion utility
csv2sql.py:
"""
Usage:
csv2sql.py [--table <tablename>] <csvfile>
Options:
--table <tablename> Name of table in database to import into [default: mytable]
Convert csv file with iperf data into sql script for importing
those data into MySQL database.
"""
from csv import DictReader
from docopt import docopt
if __name__ == "__main__":
args = docopt(__doc__)
fname = args["<csvfile>"]
tablename = args["--table"]
headers = ["timestamp",
"server_ip",
"server_port",
"client_ip",
"client_port",
"tag_id",
"interval",
"transferred",
"bandwidth"
]
sql = """insert into {tablename}
values ({timestamp},"{server_ip}",{server_port},"{client_ip}",{client_port},{tag_id},"{interval}",{transferred},{bandwidth});"""
with open(fname) as f:
reader = DictReader(f, headers, delimiter=",")
for rec in reader:
print(sql.format(tablename=tablename, **rec)) # python <= 2.6 will fail here
Convert CSV to SQL script
First let the conversion utility introduce:
$ python csv2sql.py -h
Usage:
csv2sql.py [--table <tablename>] <csvfile>
Options:
--table <tablename> Name of table in database to import into [default: mytable]
Convert csv file with iperf data into sql script for importing
those data into MySQL database.
Having your data in file data.csv:
$ python csv2sql.py data.csv
insert into mytable
values (20140422105054,"172.16.10.76",41065,"172.16.10.65",5001,6,"0.0-20.0",73138176,29215083);
insert into mytable
values (20140422105054,"172.16.10.76",5001,"172.16.10.65",56254,4,"0.0-20.0",46350336,18502933);
insert into mytable
values (20140422105100,"172.16.10.76",54550,"172.16.10.50",5001,8,"0.0-20.0",67895296,27129408);
insert into mytable
values (20140422105100,"172.16.10.76",5001,"172.16.10.50",58447,5,"0.0-20.1",50937856,20292796);
insert into mytable
values (20140422105553,"172.16.10.76",5001,"172.16.10.65",47382,7,"0.0-20.1",51118080,20358083);
insert into mytable
values (20140422105553,"172.16.10.76",41067,"172.16.10.65",5001,5,"0.0-20.1",76677120,30524007);
insert into mytable
values (20140422105600,"172.16.10.76",5001,"172.16.10.50",40734,4,"0.0-20.0",57606144,23001066);
insert into mytable
values (20140422105600,"172.16.10.76",54552,"172.16.10.50",5001,8,"0.0-20.0",70123520,28019115);
insert into mytable
values (20140422110053,"172.16.10.76",41070,"172.16.10.65",5001,5,"0.0-20.1",63438848,25284066);
insert into mytable
values (20140422110053,"172.16.10.76",5001,"172.16.10.65",46462,6,"0.0-20.1",11321344,4497094);
Put it all into file data.sql:
$ python csv2sql.py data.csv > data.sql
Apply data.sql to your MySQL database
And finally use mysql command (provided by MySQL) to do the import into database:
$ myslq --user username --password password db_name < data.sql
If you plan using Python, then I would recommend using sqlalchemy
General approach is:
define class, which has all the attributes, you want to store
map all the properties of the class to database columns and types
read your data from csv (using e.g. csv module), for each row create corresponding object being the class prepared before, and let it to be stored.
sqlalchemy shall provide you more details and instructions, your requirement seems rather easy.
Other option is to find out an existing csv import tool, some are already available with MySQL, there are plenty of others too.
This probably is not the kind of answer you are looking for, but if you learn a little sqlite3 (a native Python module - "import sqlite3") by doing a basic tutorial online, you will realize that your problem is not at all difficult to solve. Then just use a standard timer, such as time.sleep() to repeat the procedure.

Python subprocess calling bcp on .csv: 'unexpected eof'

I'm having an EOF issue when trying to bcp a .csv file I generated with Python's csv.writer. I've done lots of googling with no luck, so I turn to you helpful folks on SO
Here's the error message (which is triggered on the subprocess.call() line):
Starting copy...
Unexpected EOF encountered in BCP data-file.
bcp copy in failed
Here's the code:
sel_str = 'select blahblahblah...'
result = engine.execute(sel_str) #engine is a SQLAlchemy engine instance
# write to disk temporarily to be able to bcp the results to the db temp table
with open('tempscratch.csv','wb') as temp_bcp_file:
csvw = csv.writer(temp_bcp_file)
for r in result:
csvw.writerow(r)
temp_bcp_file.flush()
# upload the temp scratch file
bcp_string = 'bcp tempdb..collection in #INFILE -c -U username -P password -S DSN'
bcp_string = string.replace(bcp_string,'#INFILE','tempscratch.csv')
result_code = subprocess.call(bcp_string, shell=True)
I looked at the tempscratch.csv file in a text editor and didn't see any weird EOF or other control characters. Moreover, I looked at other .csv files for comparison, and there doesn't seem to be a standardized EOF that bcp is looking for.
Also, yes this is hacky, pulling down a result set, writing it to disk and then reuploading it to the db with bcp. I have to do this because SQLAlchemy does not support multi-line statements (aka DDL and DML) in the same execute() command. Further, this connection is with a Sybase db, which does not support SQLAlchemy's wonderful ORM :( (which is why I'm using execute() in the first place)
From what I can tell, the bcp default field delimiter is the tab character '\t' while Python's csv writer defaults to the comma. Try this...
# write to disk temporarily to be able to bcp the results to the db temp table
with open('tempscratch.csv','wb') as temp_bcp_file:
csvw = csv.writer(temp_bcp_file, delimiter = '\t')
for r in result:
csvw.writerow(r)
temp_bcp_file.flush()

Categories