How to add a header to an existing csv file? - python

I know this is a very basic question.
I have a CSV file, which contains data already. This file is generated automatically not using opening with Dictreader or open object.
Goal
I want to open an existing file
Append the Header in the first row (Shift the first row data)
Save the file
Return the file
Any clues?
cursor.execute(sql, params + (csv_path,))
This command generates file, without header.
Code
sql, params = queryset.query.sql_with_params()
sql += ''' INTO OUTFILE %s
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' '''
csv_path = os.path.join(settings.MEDIA_ROOT + '\\tmp', csv_filename)
cursor = connection.cursor()
cursor.execute(sql, params + (csv_path,))
columns = [column[0] for column in cursor.description] #error
Tried
SELECT `website` UNION SELECT `request_system_potentialcustomers`.`website` FROM `request_system_potentialcustomers` ORDER BY `request_system_potentialcustomers`.`revenue` DESC
INTO OUTFILE "D:\\out.csv"
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';

Wait a minute. If you have not yet called
cursor.execute(sql, params + (csv_path,))
then you have the opportunity to write the CSV file correctly from the get-go. You should not need to write a new file with the header line, then copy all that CSV into the new file and so forth. That is slow and inefficient -- and your only choice -- if you really have to prepend a line to an existing file.
If instead you have not yet written the CSV file, and if you know the header, then you can add it to the SQL using SELECT ... UNION ... SELECT:
header = ['foo', 'bar', 'baz', ]
query = ['SELECT {} UNION'.format(','.join([repr(h) for h in header]))]
sql, params = queryset.query.sql_with_params()
query.append(sql)
sql = '''INTO OUTFILE %s
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' '''
query.append(sql)
sql = ' '.join(query)
csv_path = os.path.join(settings.MEDIA_ROOT + '\\tmp', csv_filename)
cursor = connection.cursor()
cursor.execute(sql, params + (csv_path,))
Demo:
mysql> SELECT "foo", "bar" UNION SELECT "baz", "quux" INTO OUTFILE "/tmp/out";
Produces the file /tmp/out containing
foo bar
baz quux

Cursor.description attribute give you information about the result column.
cursor.execute(sql, params + (csv_path,))
columns = [column[0] for column in cursor.description]
Write above information to the new file.
Append old csv contents to the new file.
Rename the new file with the old file name.

Not quite clear if you are trying to read an existing csv file or not, but to read a csv off disk without column names:
Use dictreader/dictwriter and specify the column names in your file
Python 3:
import csv
ordered_filenames = ['animal','height','weight']
with open('stuff.csv') as csvfile, open("result.csv","w",newline='') as result:
rdr = csv.DictReader(csvfile, fieldnames=ordered_filenames)
wtr = csv.DictWriter(result, ordered_filenames)
wtr.writeheader()
for line in rdr:
wtr.writerow(line)
With stuff.csv in the same directory:
elephant,1,200
cat,0.1,1
dog,0.2,2
and the output result file:
animal,height,weight
elephant,1,200
cat,0.1,1
dog,0.2,2

Related

Export MS SQL table with `null` values to CSV

I am trying to figure out how to create a csv file that contains the null values I have in my MS SQL database table. Right now the script I am using fills up the null values with '' (empty strings). How I am supposed to instruct the csv Writer to keep the null values?
example of source table
ID,Date,Entitled Key
10000002,NULL,805
10000003,2020-11-22 00:00:00,805
export_sql_to_csv.py
import csv
import os
import pyodbc
filePath = os.getcwd() + '/'
fileName = 'rigs_latest.csv'
server = 'ip-address'
database = 'db-name'
username = 'admin'
password = 'password'
# Database connection variable.
connect = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' +
server+';DATABASE='+database+';UID='+username+';PWD=' + password)
cursor = connect.cursor()
sqlSelect = "SELECT * FROM my_table"
cursor.execute(sqlSelect)
results = cursor.fetchall()
# Extract the table headers.
headers = [i[0] for i in cursor.description]
# Open CSV file for writing.
csvFile = csv.writer(open(filePath + fileName, 'w', newline=''),
delimiter=',', lineterminator='\r\n',
quoting=csv.QUOTE_NONE, escapechar='\\')
# Add the headers and data to the CSV file.
csvFile.writerow(headers)
csvFile.writerows(results)
Example of the result after running the above script:
ID,Date,Entitled Key
10000002,,805
10000003,2020-11-22 00:00:00,805
The main reason why I would like to keep the null values is that I would like to convert that csv file into series of insert SQL statements and execute those against Aurora Serverless PostgreSQL database. The database doesn't accept empty strings for the type date and results in that error: ERROR: invalid input syntax for type date: ""
As described in the docs for the csv module, the None value is written to CSV as '' (empty string) by design. All other non-string values call str first.
So if you want your CSV to have the string null instead of '' then you have to modify the values before they reach the CSV writer. Perhaps:
results = [
['null' if val is None else val for val in row] for row in results
]

how to automatically create table based on CSV into postgres using python

I am a new Python programmer and trying to import a sample CSV file into my Postgres database using python script.
I have CSV file with name abstable1 it has 3 headers:
absid, name, number
I have many such files in a folder
I want to create a table into PostgreSQL with the same name as the CSV file for all.
Here is the code which I tried to just create a table for one file to test:
import psycopg2
import csv
import os
#filePath = 'c:\\Python27\\Scripts\\abstable1.csv'
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password= pwdxx")
print("Connecting to Database")
cur = conn.cursor()
#Uncomment to execute the code below to create a table
cur.execute("""CREATE TABLE abs.abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")
#to copy the csv data into created table
with open('abstable1.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abs.abstable1', sep=',')
conn.commit()
conn.close()
This is the error that I am getting:
File "c:\Python27\Scripts\testabs.py", line 26, in <module>
cur.copy_from(f, 'abs.abstable1', sep=',')
psycopg2.errors.QueryCanceled: COPY from stdin failed: error in .read() call: exceptions.ValueError Mixing iteration and read methods would lose data
CONTEXT: COPY abstable1, line 1
Any recommendation or alternate solution to resolve this issue is highly appreciated.
Here's what worked for me by: import glob
This code automatically reads all CSV files in a folder and Creates a table with Same name as of the file.
Although I'm still trying to figure out how to extract specific datatypes according to the data in CSV.
But as far as table creation is concerned, this works like a charm for all CSV files in a folder.
import csv
import psycopg2
import os
import glob
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password=
pwdxx")
print("Connecting to Database")
csvPath = "./TestDataLGA/"
# Loop through each CSV
for filename in glob.glob(csvPath+"*.csv"):
# Create a table name
tablename = filename.replace("./TestDataLGA\\", "").replace(".csv", "")
print tablename
# Open file
fileInput = open(filename, "r")
# Extract first line of file
firstLine = fileInput.readline().strip()
# Split columns into an array [...]
columns = firstLine.split(",")
# Build SQL code to drop table if exists and create table
sqlQueryCreate = 'DROP TABLE IF EXISTS '+ tablename + ";\n"
sqlQueryCreate += 'CREATE TABLE'+ tablename + "("
#some loop or function according to your requiremennt
# Define columns for table
for column in columns:
sqlQueryCreate += column + " VARCHAR(64),\n"
sqlQueryCreate = sqlQueryCreate[:-2]
sqlQueryCreate += ");"
cur = conn.cursor()
cur.execute(sqlQueryCreate)
conn.commit()
cur.close()
i tried your code and works fine
import psycopg2
conn = psycopg2.connect("host= 127.0.0.1 dbname=testdb user=postgres password=postgres")
print("Connecting to Database")
cur = conn.cursor()
'''cur.execute("""CREATE TABLE abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")'''
with open('lolo.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abstable1', sep=',', columns=('absid', 'name', 'number'))
conn.commit()
conn.close()
although i had to make some changes for it to work:
i had to name the table abstable1 because using abs.abstable1 postgres assumes that i'm using the schema abs, maybe you created that schema on your database if not check on that, also i'm using python 3.7
i noticed that you are using python 2.7(which i think is no longer supported), this may cause issues, since you say you are learning i would recommend that you use python 3 since it is more used now and you most likely encounter code written on it and you would have to be adapting your code to fit your python 2.7
I post my solution here based on #Rose answer.
I used sqlalchemy, a JSON file as config and glob.
import json
import glob
from sqlalchemy import create_engine, text
def create_tables_from_files(files_folder, engine, config):
try:
for filename in glob.glob(files_folder+"\*csv"):
tablename = filename.replace(files_folder, "").replace('\\', "").replace(".csv", "")
input_file = open(filename, "r")
columns = input_file.readline().strip().split(",")
create_query = 'DROP TABLE IF EXISTS ' + config["staging_schema"] + "." + tablename + "; \n"
create_query +='CREATE TABLE ' + config["staging_schema"] + "." + tablename + " ( "
for column in columns:
create_query += column + " VARCHAR, \n "
create_query = create_query[:-4]
create_query += ");"
engine.execute(text(create_query).execution_options(autocommit=True))
print(tablename + " table created")
except:
print("Error at uploading tables")

Skip first row and delete certain char before importing csv to db

I'm new to Python, and my task is to import the csv to mysql database. I have this sample values inside my csv file:
SHA1,VSDT,TRX
005c41fc0f8580f51644493fcbaa0d2d468312c3,(WIN32 (EXE 7-2)),Ransom.Win32.TRX.XXPE50FFF027,
006ea7ce2768fa208ec7dfbf948bffda9da09e4e,WIN32 EXE 2-2,TROJ.Win32.TRX.XXPE50FFF027,
My problem here is, how can I remove "( " and ")" only at the start and end point of string at the second column before importing to database?
I have this code to import the csv
import csv
import mysql.connector
file = open(fullPath, 'rb')
csv_data = csv.reader(file)
mycursor = mydb.cursor()
cursor = mydb.cursor()
for row in csv_data:
cursor.execute('INSERT INTO jeremy_table_test(sha1,vsdt,trendx)'
'VALUES(%s, %s, %s)',[(row[0]),(row[1]),(row[2]))
mydb.commit()
cursor.close()
print("Done")
Skip the row when you read it in, rather than when you write it.
with open(fullPath, 'rb') as file:
csv_data = csv.reader(file)
next(csv_data)
mycursor = mydb.cursor()
cursor = mydb.cursor()
for row in csv_data:
cursor.execute('INSERT INTO jeremy_table_test(sha1,vsdt,trendx)'
'VALUES(%s, %s, %s)',[(row[0]),(row[1]),(row[2]))
mydb.commit()
cursor.close()
print("Done")
MySQL LOAD DATA tool can probably do what you want here. Here is what the LOAD DATA call might look like:
LOAD DATA INFILE 'path/to/rb'
INTO TABLE jeremy_table_test
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n' -- or '\n'
IGNORE 1 LINES
(sha1, #var1, trendx)
SET vsdt = TRIM(TRAILING ')' FROM TRIM(LEADING '(' FROM #var1));
To make this call from your Python code, you may try something like this:
query = "LOAD DATA INFILE 'path/to/rb' INTO TABLE jeremy_table_test FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (sha1, #var1, trendx) SET vsdt = TRIM(TRAILING ')' FROM TRIM(LEADING '(' FROM #var1))"
cursor.execute(query)
connection.commit()

How do I create new JSON data after every script run

I have JSON data stored in the variable data.
I want to make it write to a text file after every time it runs so I know which data json that is new instead of re-writting the same Json.
Currently, I am trying this:
Saving = firstname + ' ' + lastname+ ' - ' + email
with open('data.json', 'a') as f:
json.dump(Saving, f)
f.write("\n")
which just adds up to the json file and the beginning of the script where the first code starts, I clean it with
Infotext = "First name : Last name : Email"
with open('data.json', 'w') as f:
json.dump(Infotext, f)
f.write("\n")
How can I make instead of re-write the same Json, instead create new file with Infotext information and then add up with Saving?
Output in Json:
"First name : Last name : Email"
Hello World - helloworld#test.com
Hello2 World - helloworld2#test.com
Hello3 World - helloworld3#test.com
Hello4 World - helloworld4#test.com
Thats the outprint I wish to be. So basically it needs to start with
"First name : Last name : Email"
And then the Names, Lastname Email will add up below that until there is no names anymore.
So basically easy to say now - What I want is that instead of clearing and add to the same json file which is data.json, I want it to create to a newfile called data1.json - then if I rerun the program again tommorow etc - it gonna be data2.json and so on.
Just use a datetime in the file name, to create a unique file each time the code is run. In this case, granularity goes down to per-second so, if the code is run more than once per second, you will overwrite the existing contents of a file. In that case, step down to file names with microseconds in their name.
import datetime as dt
import json
time_script_run = dt.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
with open('{}_data.json'.format(time_script_run), 'w') as outfile:
json.dump(Infotext, outfile)
This has multiple drawbacks:
You'll have an ever-growing number of files
Even if you load the file with the latest datetime in its name (and finding that file grows in run time), you can only see data as it was in the single time before the last run; the full history is very difficult to look up.
I think you're better using a light-weight database such as sqlite3:
import sqlite3
import random
import time
import datetime as dt
# Create DB
with sqlite3.connect('some_database.db') as conn:
c = conn.cursor()
# Just for this example, we'll clear the whole table to make it repeatable
try:
c.execute("DROP TABLE user_emails")
except sqlite3.OperationalError: # First time you run this code
pass
c.execute("""CREATE TABLE IF NOT EXISTS user_emails(
datetime TEXT,
first_name TEXT,
last_name TEXT,
email TEXT)
""")
# Now let's create some fake user behaviour
for x in range(5):
now = dt.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
c.execute("INSERT INTO user_emails VALUES (?, ?, ?, ?)",
(now, 'John', 'Smith', random.randint(0, 1000)))
time.sleep(1) # so we get new timestamps
# Later on, doing some work
with sqlite3.connect('some_database.db') as conn:
c = conn.cursor()
# Get whole user history
c.execute("""SELECT * FROM user_emails
WHERE first_name = ? AND last_name = ?
""", ('John', 'Smith'))
print("All data")
for row in c.fetchall():
print(row)
print('...............................................................')
# Or, let's get the last email address
print("Latest data")
c.execute("""
SELECT * FROM user_emails
WHERE first_name = ? AND last_name = ?
ORDER BY datetime DESC
LIMIT 1;
""", ('John', 'Smith'))
print(c.fetchall())
Note: the data retrieval runs really quickly in this code, it only takes ~5 secs to run because I use time.sleep(1) in generating the fake user data.
The JSON file should contain a list of strings. You should read the current contents of the file into a variable, append to the variable, then rewrite the file.
with open("data.json", "r") as f:
data = json.load(f)
data.append(firstname + ' ' + lastname+ ' - ' + email)
with open("data.json", "w") as f:
json.dump(data, f)
I think what you could do is to use seek() for files and write in the related position of the json file . for example you need to update firstname , you seek for the : after firstname , and update the text there.
There are examples here :
https://www.tutorialspoint.com/python/file_seek.htm

Python csv from database query adding a custom column to csv file

here is what I try to achieve my current code is working fine I get the query to run on my sql server but I will need to gather information from several servers. How would I add a column with the dbserver listed in that column?
import pyodbc
import csv
f = open("dblist.ini")
dbserver,UID,PWD = [ variable[variable.find("=")+1 :] for variable in f.readline().split("~")]
connectstring = "DRIVER={SQL server};SERVER=" + dbserver + ";DATABASE=master;UID="+UID+";PWD="+PWD
cnxn = pyodbc.connect(connectstring)
cursor = cnxn.cursor()
fd = open('mssql1.txt', 'r')
sqlFile = fd.read()
fd.close()
cursor.execute(sqlFile)
with open("out.csv", "wb") as csv_file:
csv_writer = csv.writer(csv_file, delimiter = '!')
csv_writer.writerow([i[0] for i in cursor.description]) # write headers
csv_writer.writerows(cursor)
You could add the extra information in your sql query. For example:
select "dbServerName", * from table;
Your cursor will return with an extra column in front of your real data that has the db Server name. The downside to this method is you're transferring a little more extra data.

Categories